S7: Data integration

Data integration with identifiers and ontologies - Why are names and graphs not enough?
Egon Willighagen, Maastricht University

Egon Willighagen


Deptartment of Bioinformatics, Maastricht University


Assistant Professor


Predictive toxicology starts with the premise that the biological effects caused by a chemical are related to their chemical structure. At the same time, we assume that the same biological processes are characterized in similar biological changes, such as measured with transcriptomics experiments. To study the details of this relationship we chemically and biologically characterize the chemicals and the biological responses and integrate this data. However, because this relation is often not fully, carefully communication of the chemical and biological details is key to useful predictive toxicology.

This talk will give several practical applications in analysis of biological data, such as the pathway analysis of transcriptomics data from, for example, bronchial epithelial cells exposed to multiwall carbonanotubes. Here, gene and protein names are replaced by identifiers and the BridgeDb platform ensure correct linking experimental data with the WikiPathways knowledgebase. For metabolomics data, even identifiers and chemical graphs are not enough to make experimental data interoperable with knowledge bases; here, cheminformatics ontologies show a clear advantage, making the integration four to five times as successful. Other examples will further demonstrate the use of the QUDT ontology, Bioclipse, the Open PHACTS platform, and Wikidata in replacing names with identifiers and ontologies. These will also exemplify that while the latter two cannot solve all problems, they at least make the issues more clear.