S3: Development of QSAR Models for Systemic Toxicity Points of Departure with Variability in Experimental Data

Development of QSAR Models for Systemic Toxicity Points of Departure with Variability in Experimental Data

Development of QSAR Models, OpenTox USA 2018

Prachi Pradeep


National Center of Computational Toxicology (NCCT)


Postdoctoral Research Scientist (ORISE Fellow)


Prachi Pradeep, Richard Judson

ORISE, Oak Ridge, TN, United States
2NCCT, ORD, US EPA, NC, United States

Human health risk assessment associated with environmental chemical exposure is limited by the tens of thousands of chemicals with little or no experimental in vivo toxicity data. Data gap filling techniques, such as quantitative structure activity relationship (QSAR) models based on chemical structure information, are commonly used to predict hazard in the absence of experimental data. However, variability in the experimental data leads to uncertainty in QSAR model predictions and impacts model quality estimates. This study presents two sets of QSAR models developed for systemic toxicity in vivo points of departure (POD, the point on the dose-response that marks the beginning of a low-dose extrapolation). The in vivo data is taken from the EPA’s ToxValDB, a compilation of information on ~3000 chemicals from a variety of public data sources. The first set of QSAR models were developed and evaluated to predict point estimates of POD values using structural and physicochemical descriptors, and four machine learning algorithms. Random forest algorithm resulted in the best model with an external test set root-mean-squared-error = 0.86 and coefficient of determination = 0.36. The second set of models were developed to account for the known lab to lab variability in the POD values. To do this, a POD distribution was constructed for each chemical using mean = median experimental POD value and standard deviation = 0.5 log-units, based on the typical lab to lab variability. Bootstrap models were built with random sampling of values from the pre-generated POD distribution to derive point estimates of POD values and confidence intervals for each prediction. These models illustrate the effect of variability in experimental data on uncertainty in QSAR model predictions. The models were also evaluated for their ability to accurately predict the PODs for the most potent chemicals (enrichment analysis) to aid chemical screening and prioritization efforts.

This abstract does not necessarily represent U.S. EPA policy.