Multivariate Modeling of Cytochrome P450 Enzymes for 4-Aminoquinoline Antimalarial Analogues using Genetic-Algorithms Multiple Linear Regression

Purpose: To develop QSAR modeling of the inhibition of cytochrome P450s (CYPs) by chloroquine and a new series of 4-aminoquinoline derivatives in order to obtain a set of predictive in-silico models using genetic algorithms-multiple linear regression (GA-MLR) methods. Methods: Austin model 1 (AM1) semi-empirical quantum chemical calculation method was used to find the optimum 3D geometry of the studied molecules. The relevant molecular descriptors were selected by genetic algorithm-based multiple linear regression (GA-MLR) approach. In silico predictive models were generated to predict the inhibition of CYP 2B6, 2C9, 2C19, 2D6, and 3A4 isoforms using a set of descriptors. Results: The results obtained demonstrate that our model is capable of predicting the potential of new drug candidates to inhibit multiple CYP isoforms. A cross-validated Q 2 test and external validation showed that the models were robust. By inspection of R 2pred , and RMSE test sets, it can be seen that the predictive ability of the different CYP models varies considerably. Conclusion: Apart from insights into important molecular properties for CYP inhibition, the findings may also guide further investigations of novel drug candidates that are unlikely to inhibit multiple CYP sub-types.


INTRODUCTION
Malaria is one of the most serious parasitic diseases throughout tropical and subtropical regions, and it remains a major health problem in developing parts of the world [1]. Chloroquine (CQ), a low-cost drug, is widely used as an antimalarial agent. However, the emergence of CQ-resistant malarial parasite strains has prompted the search for alternative strategies to combat the disease.
Application of predictive methods such as quantitative structure-activity relationships (QSAR) and structure-based designs to absorption, distribution, metabolism, elimination and toxicology (ADMET) has become a very active area. Among the ADMET properties, drug metabolism is a key determinant of several important drug processes in vivo, such as metabolic stability, drug-drug interactions and drug toxicity [2]. Cytochrome P450 enzymes (CYPs) are an extremely important class of enzymes that are involved in Phase I oxidative metabolism of structurally diverse chemicals. The human genome contains about 60 P450s, but more than 90 % of all therapeutic drugs are metabolized by five main CYP isoforms: CYP2B6, CYP2C9, CYP2C19, CYP2D6, and CYP3A4 [3]. A considerable number of quantitative structure-activity relationship models have been generated for CYP inhibitors [4][5][6][7][8][9][10][11]. The objective of this study was to demonstrate possibility of obtaining a set of predictive in-silico models for cytochrome P450 2B6, 2C9, 2C19, 2D6, and 3A4 inhibitions, using relatively interpretable descriptors in conjunction with genetic algorithm-based MLR methods.

EXPERIMENTAL Ensemble ADME data and molecular descriptors
We used a series of 4-aminoquinoline antimalarial compounds with experimentallydetermined ADME properties [12]. Based on the results of this research group [12], antimalarial compounds that are effective against drugresistant strains of P. falciparum by varying the chemical substitutions around the heterocyclic ring and the basic amine side chain of the popular antimalarial drug chloroquine have been developed [13,14]. Several of these novel antimalarial compounds have been screened for improved leads based on the evaluated ADMET properties [12]. Figure 1 depicts the structures of the compounds used in this study. The panel includes a small number of CQ analogues with altered substitutions on the quinoline ring, although the majority of the compounds in the panel contain substitutions of the alkyl groups attached to the basic nitrogen position on the aminoalkyl side chain.
The inhibitory activity of the test compounds at two concentrations, 1 and 10 μM, was tested on various CYPs in pooled human liver microsomes (HLMs) including CYP2B6, CYP2C9, CYP2C19, CYP2D6, and CYP3A4. In this assay, HLMs were incubated with a test compound and a cocktail of specific P450 substrates for each enzyme. The known major metabolites of the substrates were subsequently quantified by LC/MS/MS to compute the percentage of inhibition due to the test compound in comparison to the percentage in non-drugtreated controls. As a rule of the thumb, enzyme activity levels of <70 % of the level observed for the untreated controls were considered to be significant inhibition. The majority of the compounds inhibited the CYP2D6 enzyme. Table 1 shows the data for 21 chloroquine analogues and their percent inhibition.
The molecular structures of all the chloroquine derivatives were built with Hyperchem (Version 7, HyperCube, Inc.) software. AM1 semiempirical calculation was used to optimize the 3D geometry of the molecules. The Polak-Ribier algorithm with root mean squares gradient 0.1 kcal/mol was selected for optimization. By using DRAGON [15], we derived a total of 1481 1D, 2D, and 3D molecular descriptors from the 3D structure of each compound.
The list and meaning of the molecular descriptors is provided by the DRAGON package, and the calculation procedure is explained in detail, with related literature references, in the Handbook of Molecular Descriptors [16].

MLR modeling procedure
Multiple Linear Regression (MLR) which demonstrates great ease of implementation along with the interpretability of resulting equations was the statistical method of choice for building the QSAR model. The forwardstepping variant of Multiple Linear Regression (MLR) was utilized, starting with the selection of a single variable which contributes most to the model based on its highest F-statistics or lowest p-value. At each step, MLR alters the model from the previous step by adding predictor variables and terminating the search when a statistically significant model has been obtained [17,18]. Genetic algorithm (GA) search was carried out exploring MLR models. The GA used was the same as that previously used [19,20].

The Selected Descriptors
The majority of the selected descriptors in our GA-MLR modeling are composite descriptors, which can be divided into five groups: GETAWAY, 3D-MoRSE, RDF, WHIM and 2D autocorrelations descriptors. Table 2(a) and 2(b) depicts the names and meanings of the molecular descriptors used in this work.

Validation of the models
A good fit was assessed based on the determination squared correlation coefficients (R 2 ), adjusted determination coefficient (R 2 adj ), standard deviation (s), root-mean-square error (RMSE), Fisher's statistic (F) and number of variables. The robustness and predictive ability of the model was evaluated by Q 2 based on leave-one-out (LOO) cross-validation. This procedure consists of removing one data point from the training set and constructing the model only on the basis of the remaining training data and then testing on the removed point. In order to make more realistic validation of the predictive power of the models, external validation was also performed. For that purpose, six chloroquine derivatives (3, 6, 8, 15, 18 and 19) were selected from 21 compounds at random to construct the external test set, and the remaining 15 chloroquine derivatives comprised the training set that was employed to calibrate the QSAR models.

QSAR models for human cytochrome P450 Inhibitors (CYPs)
Inhibition of CYPs can lead to drug-drug interactions and therefore it is considered important to evaluate potential drug candidates for CYP-inhibitory activities. Percent inhibition of CYP activities by the chloroquine analogues was calculated from the ratios of the activities of inhibited to control samples. Incubation conditions (enzyme concentration and substrates) for each of the inhibition assays are summarized in Table 1. This section describes the pharmacophore models that have been constructed for various P450s by using the QSAR techniques. A genetic algorithm was used to remove descriptors irrelevant to the prediction of CYP450 inhibitors. The retained descriptors from this process were used for representing the compounds studied in this work. Summaries of the relevant datasets employed for generating the QSARs relating the various molecular descriptors to the CYPinhibitory potencies of Chloroquine analogues used in this work are shown in Table 3 (a), 3(b).  The predictive power of the model was determined by using LOO cross-validation and by the use of a test set of 6 structurally and biologically diverse chloroquine analogues excluded from the model creation. A crossvalidated Q 2 , obtained as a result of this analysis, served as a quantitative measure of the predictive ability of the final QSAR models. The Q 2 value is a statistical indication of how well a model can predict the activity of members left out of the model formation. The training and test sets and statistical parameters for each CYP model are also presented in Table 4. The quality of the fit of the training set of a specific model was measured by its R 2 . However, a most important measure is the prediction quality; the R 2 pred and RMSE of the test set give a more realistic guide to the predictive power of the P450 CYP models (Table 4). Graphical representation of the performance of each approach in adjusting and predicting CYP inhibition data is also presented in Figure 2.

DISCUSSION
The GETAWAY (Geometry, Topology, and Atom Weights AssemblY) descriptors try to match the 3D molecular geometry provided by the molecular influence matrix and atom relatedness by topology with chemical information by using various atomic weighting schemes (unit weights, mass, polarizability, electronegativity). 3D-MoRSE descriptors, which are representations of the 3D structure of a molecule and encode features such as molecular weight, van der Waals volume, electronegativities, and polarizabilities. The radial distribution function (RDF) descriptors are based on the distance distribution of the compounds. The RDF descriptors of a molecule of n atoms can be interpreted as the probability distribution of finding an atom in a spherical volume of radius R. RDF descriptors provide information about bond lengths, ring types, planar and nonplanar systems, atom types, and molecular weight and have been used for pharmacokinetic studies. WHIM descriptors are based on statistical indices calculated on the projections of atoms along principal axes. The aim is to capture 3D information regarding size, shape, symmetry and atom distributions with respect to invariant reference frames. 2D autocorrelations descriptors, in general explain how the considered property is distributed along the topological structure. Three spatial autocorrelation vectors including unweighted and weighted Moran, Geary and Broto-Moreau autocorrelation vectors were calculated. The physicochemical property considered in atomic masses (m), atomic van der Waals volumes (v), atomic Sanderson electronegativities (e), and atomic polarizabilities (p) as weighting properties [16].
A cross-validated Q 2 test showed that the models were robust (Table 3(a), 3(b)). Also external validation yielded statistically significant and accurate predictions of pIC50 values for the majority of the CYP enzyme isoforms. By inspection of the R 2 pred , and RMSE test sets, it can be seen that the predictive ability of the different CYP models varies considerably. A weak correlation (R 2 pred = 0.39) was found between experimental and predicted 2B6 (at 10μM) data (Table 4). However, exclusion of one outlier (compound 8) resulted in a fairly good correlation (R 2 pred = 0.79), with the descriptors. Although the RMSE of the 2B6 model is lower at 0.03, suggesting this model predicts with lower error, this is a result of the test set observations having the smallest standard deviation. The RMSE of 2B6 model approaches the standard deviation of the observed data (i.e. a random prediction). We can conclude that the presence of most descriptors reveals the important role of size, shape, flexibility, atomic atomic van der Waals volume and atomic masses weighted terms of molecules on ligand-P450 isoenzyme interaction.

CONCLUSION
A quantitative structure-activity relationship (QSAR) study was applied to the series of 4aminoquinoline antimalarial compounds. For each strain, statistically significant models were obtained using the GA-based MLR method. These models may be considered as mathematical equations for the prediction of antimalarial activities of the compounds structurally similar to those used in this study. In silico models for CYP 2B6, 2C9, 2C19, 2D6 and 3A4 inhibition was undertaken using multiple linear regression method and a set of descriptors. The CYP models range from moderate to highly predictive and thus could prove useful in assessing the P450 liability of molecules for a particular isoform.