Session 3: Chemoinformatics

Big Data In Chemistry Horizon2020 Marie Skłodowska-Curie Innovative Training Network European Industrial Doctorates
Session 3: big data in chemistry + informatics = chemoinformatics, OpenTox Euro 2016

Igor V. Tetko


Helmholtz Zentrum München


Chemoinformatics group leader at the Institute of Structural Biology

  • Tetko IV, Engkvist O, Koch U, Reymond JL, Chen H: BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry. Mol. Inform. 2016, doi: 10.1002/minf.201600073.
  • Tetko IV, Engkvist O, Chen H: Does “Big Data” exist in medicinal chemistry and, if so, how can it be harnessed? Future Med. Chem. 2016, 8(15):1801-1806.

Big data in chemistry + informatics = chemoinformatics

The increasing volume of biomedical data in chemistry and life sciences requires development of new methods and approaches for their analysis. This growth contributes to the increasing size of the market for big data, which currently develops six times faster than the overall IT sector, that itself is the driving force for our Information Age era. However, the specialized educational programs in this area are currently limited and fragmented thus restraining its development.

The BIGCHEM is a Horizon2020 Innovative Training Network, which has started in January 2016. It will provide a state-of-the-art education in large chemical data analysis. The research program will be implemented together with the targeted industry users, large pharma companies and Small and Medium Enterprises (SMEs), which generate and analyze large chemical data. It will also promote technology transfer from academy to industrial applications.

The project will train ten Early Stage Researchers (ESRs). Each ESR will spend at least 50% of time with industrial partners and will be employed for 36 months in total.

The presentation overviews research goals and areas that will be addressed during the project (see also refs [1,2]). The topics include visualization of millions of compounds by combining chemical and biological data, developing novel methods for mining the “Big Data” using machine-learning algorithms. The topics such as polypharmacology prediction, de novo design and secure information sharing without disclosing chemical structures will be also addressed. Last but not least, filters for prediction of frequent hitters to increase quality of HTS screening data will be developed.

The project leading to this article has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 676434, “Big Data in Chemistry” (“BIGCHEM”, The abstract reflects only the author’s view and neither the European Commission nor the Research Executive Agency are responsible for any use that may be made of the information it contains.