Drug-induced liver injury (DILI) is one of the major factors in drug development because it causes failure in clinical trials and withdrawals from market. Though, determining the DILI potentiality is still challenging due to its low occurrences and unexpected contradicts with clinical animal studies.
There have been a variety of efforts and experiments on constructing prediction models to identify compounds that cause liver toxicity. However, the discriminant power of the previous models is not confident enough. Also, studies often relies on experimental data for better accuracy which can cause data access limitation and time consuming works. Therefore, some researchers started concerning compound properties and its molecular structures.
In this study, we developed a classification model using liver toxicity related compounds as training set. We acquired 192 toxin data and 187 DILI-negative drugs. We obtained toxin data related to liver toxicity from two databases; Hazardous Substances Data Bank (HSDB)  , Toxin and Toxin Target Database (T3DB) [2; 4]. For negative set, we used data from three previous studies which include DILI-labels.
We used 18 compound properties and molecular structures as features of a classification model. We collected property information using the admetSAR website and the CDK descriptor tool. Moreover, we used the Pybel API provided by Open Babel  to retrieve fingerprints as structure information.
In classification model construction, two machine learning algorithms, support vector machine (SVM) and random forest, were used with physicochemical properties and structure information as features. The classifiers were developed through 10-fold cross-validation and resulted in accuracy of 73% and 80% in the SVM and the random forest model respectively.