S5: Leveraging and Integrating Standardized Language for Predictive Toxicology

Leveraging and Integrating Standardized Language for Predictive Toxicology, OpenTox 2018 USA

Carolyn J. Mattingly


NC State University


Associate Professor


The Comparative Toxicogenomics Database (CTD; http://ctdbase.org) is a publicly available scientific resource that aims to inform understanding about the molecular mechanisms by which environmental exposures affect human health. CTD’s content is literature-based and derived by manually curating data modules, including toxicogenomic (for chemical-gene interactions), disease (for chemical-disease and gene-disease associations), phenoytpes, and exposure (relating environmental stressors, populations, events, and outcomes) cores. All data points and relationships are captured using standardized vocabularies and ontologies, some of which we developed or modified (e.g., ExO and MEDIC, respectively), in order to integrate across data types and facilitate identification of connections that are otherwise difficult to discern in the published literature. We have manually curated over 1.7 million interactions involving over 12,00 unique chemicals and 45,000 unique genes across 582 organisms; 37,000 gene-disease associations; 20,000 chemical-disease associations; and 175,000 phenotype-based interactions. Standardization and integration of these data are enabling new opportunities to develop predictive models about chemical exposure and toxicity at multiple levels of disease progression. This presentation will discuss our fit-for-purpose approach to standards development and new initiatives to computationally building predictive toxicity models such as Adverse Outcome Pathways.