The project will generate three main sources of laboratory data: microscopy data, observational data, and physical data objects such as Western films. Imaging data from microscopes in GCT, CZI, and OME-TIFF file formats is captured and saved on the instrument control device, then transferred to two locations. While some microscopy image files will be saved to the project storage space in Microsoft Teams, there are too many large files for Teams to be the primary storage location. Monthly, the microscopy files will be transferred from instrument control computers to external hard drives and laptops, then moved to storage on a data server maintained by the departmental IT team, which is managed in alignment with the IU Information Security Program.
We will also generate observational data and scans of physical data objects. These are digitally scanned as JPG or TIFF and saved to the instrument control machines, with a backup to the departmental server. Measurements derived from these objects are saved into Excel spreadsheets, which are stored in the project notebook in LabArchives. We sometimes create DNA sequencing data and qPCR data, which are generated by the IUSM Genomics Core. Sequence files are saved in the project storage space in Microsoft Teams, with copies backed up to the departmental server. We estimate that the project will generate approximately 250GB of data. Each Excel spreadsheet will contain a data dictionary to define abbreviations, fields, and valid parameters. Genomic sequence data will be documented in LabArchives pages that describe the source organism, isolate, sequence, phenotype, and project information. Microscopy image files are documented using the REMBI schema in a database operating on the IU Research Database Complex. Data will be discussed at weekly project meetings with corrections recorded in meeting notes stored within the project lab notebook on LabArchives. Gene expression data will be deposited to GEO within 6 months of creation. Selected data will be described, packaged into TAR files, and archived for long-term storage on the Scholarly Data Archive. These data will be retained for 10 years after the end of the project period.
Participants (n = 1000) will provide self-reported medical history information during interviews with study personnel; these data are entered into the REDCap project database. Study personnel collect a family medical history from participants, which is recorded on paper and then scanned (PDF) and uploaded into REDCap. Data pertinent to sample collection are recorded on de-identified data sheets until the information is uploaded to REDCap by study personnel. Data from CT scans are entered into an Excel spreadsheet (50 columns by 1000 rows). These data are aggregated with the REDCap database prior to analysis. All downstream assays of samples in the lab utilize only study participant IDs (de-identified). Elements from the final database will be harmonized and deposited into the NIMH Data Archive.
Data Source, Content, Format | Number of Files | Storage Required |
Medical history interviews/pedigrees, PDF | 1000 | 5GB |
Medical history interviews/pedigrees, REDCap database | 1 | <1GB |
Clinical data, REDCap database | 3 | <1GB |
CT scan images, DICOM | 3000 | 100GB |
CT scan measurements, Excel spreadsheet | 1 | <1GB |
Sample 1
Participants may have genetic testing or other molecular assays performed on a research basis, which utilize the biospecimens that are collected as part of the study. The clinical data are entered into IU REDCap. Family data are recorded as pedigrees, scanned electronically, and stored in the Microsoft Secure Storage space created for the project. The results of radiological studies are recorded on an electronic spreadsheet, which is also stored in Microsoft Secure Storage.
Sample 2
This project involves four data streams:
- Participants will provide self-reported data that will be entered into the REDCap database.
- Participants will be asked interview questions and interviewer will enter data into the REDCap database.
- Field data will be recorded on paper (stored in study binders) and entered into an IU system approved for use with critical data including PHI.
- Information will be collected from the Electronic Medical Record (EMR) and entered into an IU system approved for use with critical data including PHI.
Specimens are collected specifically for research purposes and include: recording of a history and physical examination, photographs and recording of skin lesions and clinical data, surface cultures, skin biopsies and blood. Data include subject demographics, contact information, HIV serology and pregnancy tests. The history, physical exam, laboratory and daily visit data are recorded on a paper chart, where the linkage to a study number and subject identifiers can be made. All laboratory specimens are coded with the subject number and have no other identifiers. Data concerning each subject (age, gender, ethnicity, trial, date of infection, date of biopsy, days infected, outcome of each infected site, hypertrophic scar formation, specimens that are stored) are stored in an IU system approved for use with critical data including PHI. The database does not contain patient identifiers. For the blood drawing protocol, specimens are coded with a participant number without other identifiers; data concerning each subject (age, gender, ethnicity, date of donation, amount donated) are recorded in REDCap.
There are two data components to this project. The first will use three sources of existing data: [location redacted] County Birth Certificates, outpatient EHR data from [Health System redacted] via the [name redacted] data repository, and geographical data from [name redacted]. The second part of the project will use human-centered design techniques to develop a communication strategy for parents/caregivers regarding their child’s weight and obesity risk. Participants will be engaged in focus group activities that can include discussion, collage, card sorting, and other activities such as cognitive interviews. Focus group sessions will be recorded (audio and video) and photographs may also be captured. Data will include participant demographics as well as information shared during the focus group session, which may include participant provided health information or their child's health information. Data will be stored in REDCap and Qualtrics. Audio files will be securely transmitted to a transcription service that has received university approval for use with HIPAA protected information (i.e., a Business Associate Agreement is in place). The transcription service will upload completed transcripts to a designated space within the project’s Microsoft Secure Storage space.
Data Source, Content, Format | Number of Files | Storage Required |
Clinical data, REDCap database (200 records x 300 fields) | 3 | 2GB |
Photographs | 600 | 10GB |
Audio recordings | 30 | 5GB |
Video recordings | 30 | 100GB |
Transcriptions (.docx files) | 30 | <1GB |