OpenTox Virtual Conference 2021 Session 11
Using a combination of quantitative and censored data for QSAR modeling of hERG inhibitory constants
Kiril Lanevskij1*, Remigijus Didziapetris1, Andrius Sazonovas1, Karim Kassam2
- VšĮ “Aukštieji algoritmai”, A. Mickevičiaus 29, LT-08117 Vilnius, Lithuania
- ACD/Labs, Inc., 8 King Street East, Suite 107, Toronto, Ontario, M5C 1B5, Canada
Recently , we have published a large (> 6600 entries) compilation of literature data on patch-clamp and radioligand displacement measurements of hERG inhibitory potential of drug-like compounds and used these data to build a probabilistic classification model based on a minimal set of readily interpretable physicochemical descriptors (LogP, pKa, molecular size, and topology). However, this and the majority of other hERG inhibition models published so far have an inherent limitation in their qualitative nature – they can only classify compounds as inhibitors/non-inhibitors but cannot provide any insight on the quantitative inhibition characteristics. The choice of the binary endpoint is mostly related to the lack of quantitative data directly suitable for modeling, since many experimental studies only perform precise determination of IC50 or Ki values for a limited range of activities near the practical classification threshold (around 10 µM), and report other results in a semi-quantitative manner as left- or right-censored data points (i.e., open-ended intervals, such as IC50 > 30 µM).
Here we present preliminary results of a follow-up work aiming to deal with this issue and use both fully quantitative and censored data for modeling. The new models have been developed using the same Gradient Boosting Machine (GBM) statistical method as before, but with the target endpoint set to pIC50, and a custom optimization function adapted for censored regression objective. When the resulting pIC50 predictions are converted to binary classification at 10 µM threshold, they allow classifying the external validation set compounds with >75% overall accuracy, consistently outperforming the analogous logistic models by a few percentage points and achieving a better balance between sensitivity and specificity metrics. Most importantly, the output of the new models is not just a probability of significant inhibition, but a quantitative estimate of the actual IC50 value, allowing the user not only to discern potential hERG inhibitors from non-inhibitors but also to rank the compounds by their inhibitory potential.
 Didziapetris R & Lanevskij K. J Comput Aided Mol Des. 2016 Dec;30(12):1175-1188.