Oxide nanomaterials

Prediction model for the hazard assessment of oxide nanomaterials using PChem score-based data screening approach

Kieu My Ha, Xuan Tung Trinh and Tae Hyun Yoon


Kieu My Ha


Laboratory of Nanoscale Characterization and Environmental Chemistry, Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul, Republic of Korea


Engineered nanomaterials (ENMs) are increasingly attracting exploitation from various industries due to the unique properties in nanoscale dimension1. Recent studies have suggested a connection between the specific features of ENMs and the adverse impacts on the environment and human health2. Therefore, knowledge about the relationship between the characteristics of ENMs and their toxicity becomes critical for hazard assessment. One way to investigate that relationship is to develop nano-SAR (nano-structure-activity relationship) models. However, data that have been used for qualitative classification models or quantitative regression models were mostly generated from an individual study or a small number of studies rather than collected from general publications3. This has driven our research into developing a predictive model that describes the relationship between physico-chemical properties and cytotoxicity of metal oxide ENMs, using data gathered from comprehensive literature for more representative evidence. An extensive search identified 352 articles that satisfied all selection criteria. Each article was manually examined and relevant metal oxide nanoparticles’ physicochemical characteristics, experimental conditions and cytotoxicity data were extracted, yielding 9,515 data samples with 15 attributes.

One of the main challenges in meta-analysis is the heterogeneity of the compiled literature data. There is an absence of standardized nanomaterial characterization protocols between different laboratories, making it difficult to compare and assess the quality of their data4. Moreover, researchers often only reported the properties of the nanomaterial believed to be important for their research question, leaving a substantial amount of missing values in the dataset regarding the characteristics of the material. Therefore in this study, we have proposed a scoring system for evaluating the quality of the physicochemical data for their potentials of developing nano-SAR models (i.e., PChem score). Then, the data were screened based on their PChem score and applied for developing random forest classification models.

As we wanted to examine the effects of PChem score screening and also to compare the two approaches for filling in missing values, there were a number of possible combinations of preprocessing steps that we could apply on the datasets. There would be datasets that were left unscreened and datasets with data selected based on PChem score. In addition, there would be datasets that used mean imputation method and datasets that used complete case analysis method to handle missing values.

Through this research, we were trying to correlate the cytotoxicity with the physico-chemical properties of metal oxide ENMs using published evidence, and study the effects of PChem score-based data screening on the performance of the developed nano-SAR model.

1 H. J. Johnston, G. Hutchison, F. M. Christensen, S. Peters, S. Hankin, & V. Stone, Critical Reviews in Toxicology, 40, 328 (2010).
2 V. L. Colvin, Nature Biotechnology, 21, 1166 (2003).
3 E. Oh, R. Liu, A. Nel, K. B. Gemill, M. Bilal, Y. Cohen, & I. L. Medintz, Nature Nanotechnology (2016).
4 L. Lubinski, P. Urbaszek, A. Gajewicz, M. T. D. Cronin, S. J. Enoch, J. C. Madden, D. Leszczynska, J. Leszczynski and T. Puzyn, SAR and QSAR in Environmental Research, 24, 995 (2013).