Session 5 Chair: Annie M. Jarabek
National Center for Environmental Assessment (NCEA)
Senior Science Advisor
Realizing the promise of emerging biomedical resources requires coherent application of the comprehensive and diverse data that reflect rapid advances in novel measurement technologies and computational modeling. Scientific inquiry and research productivity in this era has been described as relying on observations related to the “3 V’s”: “Volume” (big), “Velocity” (fast – both with respect to measurement and computing), and Variety (multiple sources) and has recently focused attention on “rigor and reproducibility” https://www.ncbi.nlm.nih.gov/pubmed/26811418). Further, data sharing that enables interdisciplinary collaboration among experimental biologists, statisticians, and computational modelers is key to making effective use of such data. Thus, good data management is foundational to supporting discovery, innovation, knowledge, and subsequent reuse and translation to actionable information. Beyond proper collection, quality control, curation, annotation, and archiving, it is now necessary to extend principles of reporting such as that of the Standard for Exchange of Nonclinical Data (SEND) and the Minimum Information about Microarray Experiment (MIAME) standard, to the characterization of meta-data that describe the components of the “cyberinfrastructure” and elucidate details of “what and how” the data or models were designed, created and used in analysis. Guiding principles (FAIR – Findable, Accessible, Interoperable, and Reusable) that support both manual or automated discovery and exploration have been proposed and are evolving to support good data stewardship (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175/pdf/sdata201618.pdf). Data formats, description of the data pipeline and workflows are recognized as a means to implement these principles, increase the utility of data, and sustain databases and repositories. This session provides examples of such work flows in several specific application areas to illustrate the needed components as well as the advantages to the adoption of data sharing approaches.