OpenTox Virtual Conference 2021 Session 15
Enhancing the Interoperability of the Ecotoxicology (ECOTOX) Knowledgebase via Mapping to Existing Controlled Vocabularies and Ontologies
Authors: Jennifer H. Olker*, Kellie Fay†, Carlie LaLone*, Rong-Lin Wang*, Colleen Elonen*, Michael Skopinski§, Travis Karschnik§, Anne Pilli§, and Dale Hoff*
* U.S. EPA, Office of Research and Development, Center for Computational Toxicology and Exposure
†U.S. EPA, Office of Chemical Safety and Pollution Prevention, Office of Pollution Prevention and Toxics
§General Dynamics Information Technology
The ECOTOXicology Knowledgebase (ECOTOX) has been locating and curating ecologically relevant toxicity data for over 30 years and is now a nationally and internationally recognized source of single-chemical toxicity test results for aquatic and terrestrial organisms. Through decades of reviewing and curating data from the ecotoxicological literature, ECOTOX has established a standard format and controlled vocabularies for extracting pertinent study and toxicity effects information. ECOTOX data extraction captures author-reported information about species, chemicals, test methods and conditions, and toxicity results using 90 different study entities (e.g., species scientific and common name, chemical name and CASRN, media type, type of control, endpoint, statistical significance), each with acceptable data type and set of terms defined. These controlled vocabularies currently include over 7,000 terms across all entities. This presentation will describe ongoing efforts to harmonize the ECOTOX-specific terminologies with existing controlled vocabularies and ontologies in order to support integration and exchange with other data resources. Nearly all 12,326 chemicals with data in ECOTOX were successfully mapped to DSSTox Substance IDs (based on chemical name and CASRN already in ECOTOX). NCBI TaxIDs were added for ~80% of the 13,610 biological species based on scientific name, with updates occurring annually. Initial mapping of ECOTOX terms to ontological classes was then completed using a Java-based lookup tool for the ontology browser BioPortal (https://bioportal.bioontology.org/). This semi-automated mapping identified potential ontological class identifiers for the majority of the ECOTOX terms. However, the manual review was required to ensure proper context for the mapped classes and some highly relevant ecotoxicological terms (e.g., ‘LC50’, ‘concurrent control’, gene names) remained unmapped due to absence in existing ontologies and/or term complexity. These results led to current efforts to complete annotation of ECOTOX terms (e.g., mapping gene names to NCBI, simplifying complex terms) and planning for the development of an Application Ontology as the foundation to semanticize ECOTOX. In combination, the mappings to OBO ontologies and the incorporation of unique identifiers from public domains will enable linking to a variety of chemical, toxicological, and biological databases, increase the accessibility of data in ECOTOX, support advanced querying capabilities, and, ultimately, expand the ECOTOX functionalities and applications. The views expressed in this abstract are solely those of the authors and do not represent the policies of EPA. The mention of trade names or commercial products should not be interpreted as an endorsement by EPA.