Skip to main content
Contact Info
Hyun Kil Shin
Toxicoinformatics, Department of predictive toxicology, Korea Institute of Toxicology, Daejeon, South Korea

OpenTox Summer School 2022

Preparation, Conversion, and Use of Chemical Structures supporting Search and Modelling in Python

Python arguably has become the de-facto standard language in data science. Python significantly lowered the entry barriers to programming for beginners; however, most people still struggle a lot to learn such programing skills. As many useful libraries have been developed in python, understanding the language of python gives a huge benefit to handle data and to build a model. In this lecture, we are going to read python code for data search and modelling together, so that later the participants of the course may modify the code for their own purpose. Below is a summary contents of the session activities.

  1. Install python: Anaconda (https://www.anaconda.com/)
  2. Python programming basics (variable, for loop, if, function)
  3. Where to ask if I am stuck? (Stack overflow) Debugging
  4. Dataset: DILIrank (https://www.fda.gov/science-research/liver-toxicity-knowledge-base-ltkb/drug-induced-liver-injury-rank-dilirank-dataset)
  5. Structure download: PubChemPy (https://pubchempy.readthedocs.io/en/latest/)
  6. Handling csv or excel file: Pandas (https://pandas.pydata.org/)
  7. Molecular structure file format (SDF, MOL, SMILES)
  8. Cheminformatics library: RDKit (https://www.rdkit.org/docs/GettingStartedInPython.html)
  9. Visualization library: Matplotlib (https://matplotlib.org/)
  10. Machine learning library: Scikit-learn (https://scikit-learn.org/stable/