Pilot Projects Enhancing Utility and Usage of Common Fund Data Sets (R03 Clinical Trial Not Allowed)
Description
Many valuable and widely available data sets have been generated by multiple Common Fund programs. The purpose of this NOFO is to announce the availability of funding to demonstrate and enhance the utility of selected Common Fund data sets, including generating hypotheses and catalyzing discoveries. Award recipients are also asked to provide feedback on the utility of the Common Fund data resources. The NIH Common Fund (https://commonfund.nih.gov/) has supported many transformative research programs that generate new technologies, methods, and data. Many of these programs generate rich public data sets containing multi-dimensional molecular and phenotypic data from several organisms, including mice and humans. To maximize the impact of these data, engage a broader community of end-users for wider adoption of these data sets, and to obtain feedback from successful applicants to enhance the data resources, the Common Fund plans to support small research projects (R03) encouraging the use of Common Fund data sets. Awards are intended to enable novel and compelling biological questions to be formulated and addressed, and/or to generate cross-cutting hypotheses for future research. The goals of this NOFO are to 1) promote the use of Common Fund data sets by supporting pilot studies based on analyses across two or more Common Fund data sets; 2) enhance the utility of existing Common Fund data sets by developing workflows, analytic and simulation tools which will enable simultaneous analysis of multiple Common Fund data sets; and 3) demonstrate the added-value of integrating multiple Common Fund data resources in addressing biomedical research questions.
Investigators are encouraged to utilize various approaches including, but not limited to, systems approaches, artificial intelligence (including generative)/machine learning/deep learning methods, advanced data science methods for data set integration and harmonization, and incorporating computational modeling to bring together high throughput genotype and phenotype data sets. Because information regarding the user experience could help NIH improve its data resources, the recipients will provide feedback on the find ability, usability, and utility of data sets and public data portals, which the awardees will offer during a virtual CFDE R03 awardee meeting and in their close-out reports.
The established Common Fund data sets listed below are well-poised for increased community use. Applicants must propose using TWO or more Common Fund program data sets from the following list, and they can propose using other data sets (including additional NIH Common Fund data sets not listed, other NIH data sets, and non-NIH data sets that are publicly available).
4D Nucleome (4DN) (https://www.4dnucleome.org/): Reference nucleomics and imaging data sets, including an expanding tool set for open data processing and visualization
Acute to Chronic Pain Signatures (A2CPS) (https://a2cps.org/): Imaging, high-throughput omics, sensory testing, and psychosocial assessment data from patients who either transition to or are resilient to chronic pain
Bridge to Artificial Intelligence (Bridge2AI) (https://bridge2ai.org/): Ethically sourced, trustworthy, and well-defined flagship biomedical and behavioral datasets on salutogenesis, clinical care, functional genomics, and voice as a biomarker. This program is in its early phase, and data generation is ongoing.
Cellular Senescence Network (SenNet) (https://commonfund.nih.gov/senescence): Atlases and datasets of senescent cells and their secretomes. This program is in its early phase, and data generation is ongoing.
Extracellular RNA Communication (exRNA) (https://exrna.org/): Catalog of exRNA molecules found in human biofluids like plasma, saliva, and urine; and potential exRNA biomarkers for diseases
Gabriella First Kids First (KF) (https://kidsfirstdrc.org/): Data from whole-genome sequencing of cohorts with structural birth defects and/or susceptibility to childhood cancer, with associated phenotypic and clinical data
Genotype-Tissue Expression (GTEx) (https://www.gtexportal.org/home/): Whole genome- and RNA sequence data from multiple human tissues to study tissue-specific gene expression and regulation, including tissue samples
Glycoscience (GL) (https://glygen.org/): A data integration and dissemination project for carbohydrate and glycoconjugate related data
Human BioMolecular Atlas Program (HuBMAP) (https://hubmapconsortium.org/): An open and global platform to map healthy cells in the human body to determine how the relationships between cells can affect the health of an individual
H3Africa (https://h3abionet.org/): Genomic and phenotypic research data generated by the Human Heredity and Health in Africa program, consisting of 51 projects across 30 countries. Includes population-based genomic studies of common, non-communicable disorders (e.g., heart and renal disease), as well as communicable diseases (e.g., tuberculosis).
Human Microbiome Project (HMP) (https://commonfund.nih.gov/hmp): Characterization of the microbiomes from healthy human participants at five major body sites using 16S metagenomic shotgun sequencing; as well as characterization of microbiome and human host from three cohorts of microbiome-associated conditions
Illuminating the Druggable Genome (IDG) (https://druggablegenome.net/): Data on understudied druggable proteins, including mRNA and protein expression data, phenotype associations, bioactivity data, drug target interactions, disease links, and functional information
Integrated Human Microbiome Project (iHMP) (https://hmpdacc.org/ihmp/): Microbiome, epigenomic, metabolomic, and phenotypic data for three cohorts
Knockout Mouse Phenotyping Program (KOMP2) (http://www.mousephenotype.org/): Data from broad, standardized phenotyping of a genome-wide collection of mouse knockouts
Library of Integrated Network-based Cellular Signatures (LINCS) (http://lincsproject.org/): Molecular signatures that describe how different types of cells respond to a variety of agents that disrupt normal cellular function
Metabolomics Workbench (https://www.metabolomicsworkbench.org/): Metabolomics data and metadata from studies on cells, tissues, and organisms
Molecular Transducers of Physical Activity in Humans (MoTrPAC) (https://motrpac-data.org/data-access): Data contain assay-specific results, associated metadata, quality control reports, and animal phenotype data related to molecular transducers that underlie the effects of physical activity
Somatic Mosaicism Across Human Tissues (SmaHT) (https://commonfund.nih.gov/smaht): Data on DNA sequence variants within personal genomes in tissues from human donors. This program is in its early phase, and data generation is ongoing.
Stimulating Peripheral Activity to Relieve Conditions (SPARC) (https://sparc.science/): Maps and tools to identify and influence therapeutic targets that exist within the neural circuitry of a wide range of organs and tissues
Undiagnosed Diseases Network (UDN) (https://undiagnosed.hms.harvard.edu): Provides clinical, multiomics, and model organism data to provide answers for patients and families affected by these mysterious conditions
This NOFO accepts different types of projects with the intent of generating preliminary and/or validation data for subsequent funding, including, but not limited to, the following:
Building synthetic cohorts by combining and comparing data sets;
Creating synthetic data from extant CF data to enable the use of AI or advanced modeling methods;
Leveraging existing data across humans and model organisms for novel discovery;
Developing research methods or analytic tools to support data visualization, harmonization, and integration;
Developing workflows and tools to automate data integration and interoperability;
Applying new artificial intelligence (AI)/machine learning/deep learning approaches for metadata harmonization to aid in data integration;
Developing new approaches and tools for simultaneous analysis of data available on multiple platforms (e.g., data sets residing in two separate cloud platforms);
Investigating gene expression, genome topology, protein expression, and/or epigenetic patterns across several disease conditions, phases of the lifespan, or in the analysis of sexual dimorphism;
Identifying biomarkers (metabolites, genetic variants, DNA methylation and/or histone marks, etc.) associated with various diseases and risk factors;
New approaches for integrating and analyzing single cell data;
Network analysis across genetic variation, expression profiling, and/or GWAS data to reveal pathways associated with various diseases;
Incorporating machine learning and computational approaches to imaging data for data harmonization;
Enhancement of information in the data resources through the developing analytic tools, curating and annotating existing data, or the adding phenotypic or clinical information.
Applications are expected to utilize the existing data in the above listed Common Fund data sets. Generation of new data should be limited to testing/validation of predictions. As noted in the R&R or Modular Budget section, there is a budget limit for such activities if experimental studies are proposed.
If a website or online tool is proposed for users, a sustainability plan addressing the maintenance of the tool is expected to be discussed in the submission, with plans that allow up to two additional years of maintenance/utility encouraged. Where applicable, the applicants should describe how they plan to share any tools, pipelines, or workflows used or created through open access channels (e.g., public GitHub links).
Applicants should describe the anticipated timeline, formats, and methods of providing the data generated or annotated -and other products used or created under this NOFO to the relevant Data Coordinating Center or public repository.
For applications that aim to co-analyze Common Fund data with other genomic data sets that are currently accessible through an NIH-approved repository (e.g., dbGaP) or some other public controlled access database (e.g., European Genome-phenome Archive), applicants must describe the database through which the proposed data are accessible to the research community and the details of the data sets including any data use limitations based on the associated consent form and discuss the applicability of the data sets to diverse human populations.
For applications that aim to co-analyze Common Fund data with genomic data sets that are not currently accessible through an NIH-approved repository (e.g., dbGaP) or some other public controlled access database (e.g., European Genome-phenome Archive), applicants must describe their ability and willingness to submit the individual-level sequence data to an NIH-approved repository (e.g., dbGaP) and provide an associated Institutional Certification using the current NIH template (https://osp.od.nih.gov/-sharing/institutional-certifications/). If the Institutional Certification is not available, provide a Provisional Certification and describe the anticipated data use limitations and associated modifiers separately. If submitting a Provisional Certification with the application, please note that a completed Institutional Certification may be required before award.
Frequently Asked Questions regarding this NOFO will be posted on the Common Fund Data Ecosystem Frequently Asked Questions (FAQs) (https://commonfund.nih.gov/dataecosystem/faqs) website. Applicants are encouraged to review the FAQs before submitting their applications.