Wojtek Plonka is a Senior Scientist at Life Sciences group in FQS Poland (Fujutsu Group). He joined Fujitsu short after receiving a degree from Technical Univesrity of Lodz, Poland in the field of Computational Chemistry. He has over 20 years of experience in design, development and use of software for simulations, particularly in chemistry and drug design. Recently his scientific interests shifted to the area Machine Learning in drug design and chemical safety in joint projects of Fujitsu and Univeristy of Hamburg where he is a PhD student. Designer and coauthor of several software packages including CAChe, SCIGRESS, ADMEWORKS and others also provides consulting and trainings for both academics and pharma industry.
Effects of hyperparameters on the performance of Random Forest Models - a case study
Random Forest models are a well established Machine Learning technique frequently used for in silico prediction of chemical properties of compounds, including pharmacology and toxicology related. They are fast and robust. The implementations of the algorithm require specifying a number of “hyperparameters” - parameters related to the size of the forest, number of features and experimental points used at each decision, etc. The search for the optimal set of hyperparameters for a given model is usually done by brute force by grid optimization. Multidimensionality of the problem scales the search to a large computational task suitable only for High Performance Computers. Findings suggesting methods of narrowing the search space coming from a search employing nearly 200000 Random Forest models of human main CYP inhibition will be presented.