Pathway analysis and mode-of-action prediction based on computational modeling

Pathway analysis and mode-of-action prediction based on computational modeling of high-throughput toxicogenomics


Saad Haider




Saad Haider, Kamel Mansouri; Michael B. Black; Patrick D. McMullen


The utility of transcriptomic data for interpreting cellular modes-of-action (MOA) after exposure to chemicals has been established for over a decade. This is conventionally performed by whole transcriptome gene expression tools such as microarray and next-generation sequencing (RNA-seq). However, for the toxicity assessment and testing prioritization of tens of thousands of chemicals (i.e., the TSCA inventory), there is a pressing need for efficient alternatives such as high-throughput transcriptomics (HTT). These HTT systems use a relatively small set of gene expression measurements coupled with mathematical predictions to estimate genome-wide gene expression response. Available HTT models are trained and validated using pharmaceutical agents. It is unclear whether this existing training set is fit for toxicity testing applications. Thus, the main motivation of this study is to develop suitable models for evaluating environmental, industrial, and agricultural compounds, which represent a more diverse pool of cellular modes of action.

In this study, we started from the pioneering Genometry’s L1000 platform as proof of concept to demonstrate the possibility for computational models to describe statistical relationships between genes and to infer whole genome transcriptional profiles from a sample of representative genes. Then we developed a qualitative MOA-oriented strategy to predict relevant cellular pathway information and applied it to Affymetrix array data from three cell lines (HepaRG, MCF7, and A673) exposed to three agrichemicals (imazalil, fenbuconazole, and 2,4-dichlorophenoxyacetic acid) over nine concentrations. To avoid commonly encountered pitfalls due to comparing different technologies and to increase prediction accuracy, we focused on building qualitative models predicting three classes of probes: up regulated, down regulated and unchanged essential to determining the cellular MOA.

This work was then extended to predict gene expression changes resulting from exposure to heterogenous chemicals from a wide range of classes. Predictive models were developed using the Open TG-GATEs toxicogenomics database containing gene expression data of multiple cell lines for over 900 samples covering ~170 compounds. A sequential forward search-based greedy algorithm, which uses different fitting approaches and machine learning techniques, was used to find a set of landmark genes which can predict differential expression changes of the remaining genome in TG-GATEs. We also compared results using the landmark genes found from our greedy algorithm with the landmark genes provided by L1000 and S1500. Such models can be used for MOA prediction and pathway analysis to prioritize large libraries of chemicals starting from gene expression data of a small set genes.