S2: DSSTox: The Open Environmental Chemistry Data
DSSTox: The Open Environmental Chemistry Data underlying the CompTox Chemical Dashboard
The DSSTox project has long been dedicated to the idea of making chemical information linked to environmental or toxicological data of interest freely available. Since its first posting of data files in 2004, DSSTox database development has consistently focused on enabling data interoperability on the chemical structure plane, particularly by linking valuable data sources to chemical structures to facilitate Structure Activity Relationship (SAR) modeling. A commitment to quality has resulted in about 20,000 clearly defined chemical substance mappings (CASRN, name, structure records associated with high interest lists) verified through careful manual curation efforts. Over the past two years, this core set of ~20,000 chemical substances was used to enable significant expansion of the DSSTox database, to better cover chemicals of interest to the environmental science community, while assessing the degree of conflicting data within data sources as substance records are auto-loaded. DSSTox currently contains about 750,000 substances with each substance assigned a quality curation score. This score provides users the capability to map their data to our chemistry while simultaneously weighing the potential uncertainty of chemical identifier mappings into decisions. The clarity and accuracy of unique chemical identifier mappings sets DSSTox content apart from other public chemical databases. This enables resolution of chemical entities to support accurate aggregation of data from a variety of sources. We have recently developed and launched the EPA CompTox Chemical Dashboard, a user-friendly interface for accessing DSSTox chemical substances and a compendium of associated data/models. Information available includes in vivo toxicity data, in vitro screening data, chemical use information, exposure predictions, toxicokinetic data, and predicted physicochemical properties. The Dashboard also allows batch searching and direct download of underlying data in multiple formats thereby facilitating the mapping of data to a common set of substances to support data interoperability.