In-silico design of novel 4-aminoquinolinyl analogs as potential anti-malaria agents using quantitative structure– activity relationships and ADMET approach

Purpose: To design and screen for potential anti-malaria agents based on a series of 4-aminoquinolinyl analogues. Methods: Molecular fingerprint analysis was used for molecular partitioning of training and test sets. Acquired training sets were used for CoMFA and CoMSIA model construction after good alignment was achieved. Partial least squares analysis combined with external validation were used for model evaluation. Deep analysis of acquired contour maps was performed to summarize the substituent property requirements for further rational molecular design. Using the chosen models, activity prediction and subsequent ADMET investigation were performed to discover novel designed compounds with the desired properties. Results: Three different set partitions for model establishment were obtained using fingerprint-based selection. Partition 02 offered an optimal CoMFA model (r = 0.964, q = 0.605 and r = 0.6362) and the best CoMSIA model (r = 0.955, q = 0.585 and rpred = 0.6403). Based on contour map analysis, a series of compounds were designed for activity prediction. Two of the compounds (wmx09, wmx25) were chosen for their ideal predicted biological activities. Subsequent ADMET investigation indicated that these compoundss have acceptable drug-like characteristics. Conclusion: The screening reveals that compounds wmx09 and wmx25 have strong potential as antimalaria agents.


INTRODUCTION
Malaria is widely distributed from latitude 60 degrees north to 30 degrees south. It is generally recognized as a fatal parasitic disease threat. Three billion two hundred million people in more than 90 countries of Africa, Southeast Asia, South Asia, Arabian peninsula, Central and South America. Malaria kills approximately 400,000 people each year and children under the age of 5 years are a significant proportion of these deaths [1]. Plasmodium falciparum and Plasmodium vivax are associated with most malaria epidemics worldwide. However, most infections are caused by Plasmodium falciparum, which is responsible for more than 95 % of reported malaria-related cases [2].
Due to fact that progress on malaria vaccine development has been insufficient, chemotherapy is the only option for malaria treatment [3]. Due to its efficacy, safety and drugaccessibility, chloroquine has been the most widely used malaria therapy since its firstly clinical application in 1944 [4]. However, the increasingly serious problem of chloroquine resistance has gradually become a primary reason for failures of malaria prevention and control. Hence, studies of structural modifications based on chloroquine analogs have found these anti-malarial candidates to have considerable scientific value, and they have received significant research attention in recent years. [5,6].
Chemoinformatics-based computational approaches (e.g., molecular docking, dynamics, and quantitative structure-activity relationships [QSARs]) have resulted in numerous successful examples of novel candidate drug discovery [7,8]. Using appropriate statistical methods, quantitative structure-activity relationship analysis has been found to be an effective approach to generate physicochemical, structural, steric and electrostatic information for rational molecular design based on a series of analogs. Widely used during the last two decades, three-dimensional QSAR study analyzes relationship between structural features of compounds and their target properties in three-dimensional coordinates, Using this approach, researchers obtain visual interaction contour images and predict outcomes [9,10].
In the present study, we performed a carefully designed QSAR study based on a series of 4aminoquinolinyl analogs. We aimed to discover potential anti-Plasmodium falciparum agents and new candidates for further molecular design for malaria therapy based on chloroquine substructure.

EXPERIMENTAL Datasets
A totally of 48 different 4-aminoquinolinyl analogs were derived during previous studies performed by the Prem M. S. Chauhan research team [11][12][13]. To simplify the data format, all reported biological activity (IC 50 values) were translated to a negative logarithmic format (pIC 50 = -lgIC 50 ) and added into an attribute spreadsheet (Table 1  -Table 4).

Molecule preparation
All molecules were carefully sketched using software of ChemDraw Professional 15.0 (CambridgeSoft Corporation, USA; www.cambridgesoft.com). Each molecule was recorded as isolated model definition language (MDL) Molfile files. Discovery Studio 4.5 software (Biovea Inc, USA: www.biovea.com.) was used to generate molecular three-dimensional structures. The "Minimize ligands" protocol in Discovery Studio 4.5 was used for molecule minimization. The "Smart Minimizer" calculation algorithm was used to perform 1,000 steps of steepest descent with a root mean square (RMS) gradient tolerance of 3, followed by conjugate gradient minimization [14]. The "Max steps" was set at 2,000, the "RMS Gradient" was set at 0.001 kcal/mol Å and Merck Molecular Force Field was selected as input forcefield. All acquired molecular conformations were saved as Sybyl MOL2 files for further study.

Clustering analysis
To develop more robust QSAR models, a cluster analysis based on molecular fingerprint was performed [15]. Fingerprint of "MDL public keys" was used as the calculation precept to divide all 48 molecules into seven clusters [16]. One molecule was selected from each cluster for the test sets (i.e., seven molecules, or 15% of the total molecules). After a selection from each cluster based on a principle of sufficient structural diversity and gradient biological activity, three different test sets were built.

Molecule alignment
All minimized molecular conformations were delivered to software of Sybyl X-2.1 (Tripos Inc. USA) for CoMFA and CoMSIA QSAR studies. Following the cluster analysis results, three training-test molecule divisions were manually performed and saved as Sybyl databases. Due to its best reported biological activity, molecule 35 was selected as reference for molecular alignment. Each set was aligned using Sybyl "Align Database" function following maximum common substructure method [17].

CoMFA and CoMSIA field calculation
Sybyl was used to calculate CoMFA and CoMSIA fields for each aligned training set: An sp3 carbon atom with charge of + 1 was launched to probe steric and electrostatic fields data, 4 Å beyond every direction for each molecule was calculated and then a region file was created. When performing CoMSIA calculations, an accessional hydrophobicity property of + 1 and a hydrogen bond property of + 1 were added to the probe atom to calculate the hydrophobic, hydrogen bond acceptor and hydrogen bond donor fields energies. We used 30.0 kcal/mol as the steric and electrostatic cutoffs for the CoMFA field calculations and 0.3 as the attenuation factor for the CoMSIA field calculations [18]. The biological activity values were merged into the spreadsheets after they were recorded into a text document file.

Partial least squares analysis
Partial least squares regression analysis was performed to calculate correlation between molecular activities and created CoMFA and CoMSIA fields. The statistical analysis was performed according to a classical two-stage scheme. The first stage was a leave-one-out cross-validation analysis, which used the remaining model to predict one separated molecule. Based on the results of the leave-oneout analysis, a squared cross-validation coefficient (q 2 ) value and an optimum number of components (N) were acquired. Using the optimum number of components value, the second stage of no validation analysis was performed. No validation analysis assisted us to acquire regression values for r squared (r 2 ), the standard error of estimate (SEE) and the F values, which are important indicators for model evaluation. Based on the results for different training sets, every possible CoMFA and CoMSIA model was built and evaluated [19,20].

External validation analysis
As prediction ability contributes majority of a QSAR model's validity, external validation must be performed [21]. Calculation of predictive r 2 (r 2 pred ) values was used as an indicator for model external validation: Where SD was the sum of the squared deviations between the mean activities of the training set compounds and the reported activities of the test set compounds, and PRESS is the sum of squared deviations between reported and predicted activities of the test set compounds [22]. Test set molecules were delivered for external validation after alignment with molecule 35.

Molecule design
Contour maps are visual three-dimensional images created using QSAR models that display the interactions and correlations between molecular structural features with a certain field. Based on acquired contour maps, we performed a deep analysis to summarize the structural requirements for molecule design and acquire an in-house library consisting of a series of rationally designed compounds.

Applicability domain analysis
As the inherent "closed system" characteristic of every QSAR model limits its applicability, the applicability domains for created models should be calculated [23]. We performed optimum prediction space analysis to define the applicability domains for the models. We used the "optimum prediction space" function in the Discovery Studio software to automatically discriminate whether the designed molecules were located inside the applicability domains, based on Mahalanobis distance.

Activity prediction
Each designed molecule was optimized according to the method mentioned in Molecule Preparation section. Before prediction, each molecule was equally aligned using molecule 35 as the template. Molecules with better predicted biological activities were used for further study.

ADMET prediction
ADMET prediction studies were performed for screened molecules using "ADMET Descriptors" and "Toxicity Prediction" functions in Discovery Studio software. Comprehensive consideration of all these data was used to select more precise potential compounds [24,25].

Clustering analysis
The results for the cluster partition outcomes are presented in Table 5. We abided by the principle of sufficient structural diversity and gradient biological activity to carefully pick molecules for three different test sets.

Statistical data
All molecules from the datasets were wellaligned when molecule 35 was used as a reference (Figure 1). Three overlapping training databases were then sent for statistical analysis.    (Table 6). Because model CoMSIA_EHA consisted of the three electrostatic, hydrophobic, and hydrogen bond acceptor descriptor fields, it may also provide more information for further study. All molecules were aligned and predicted by both selected models. As presented in Table 7, Table 8, and Figure 2, the close proximities of predicted and reported activities proved the quality of selected models.
Consequently, CoMFA and CoMSIA_EHA models generated using dataset 02 were chosen for further study.

Contour map analysis
To implement rational molecular designs, we performed a deep analysis focusing on the acquired contour maps. Overlap of molecule 35 with each contour map using three-dimensional coordinate to display the relationships between the most active compound and each target property. Figure 3 A presents the overlap figure for electrostatic contour map: Blue regions (positive electrostatic favored), located around nitrogen atom on arylamine group, indicated nitrogen atom is required at this position. Several red modules (negative electrostatic favored) associated with substituent groups at position two and position three on the aromatic ring. If electron-withdrawing groups are placed at these positions, this change may enhance the molecular activity. The terminal of aliphatic chain of the 4 -ethylpiperazine group was also associated with red modules. This result suggested that electron-withdrawing groups are required at these positions.
The overlapping figure for the steric contour map is presented in Figure 3B. Position two on the aromatic ring of the anilino group is sieged by yellow regions (negative steric favored) while green (positive steric favored) modules were located near position three. This result suggested that any substitutions with bulky properties should occur at position three. The piperazine group was conglutinated with green regions, which indicated that placement of a hexatomic ring at this site is a rational decision. Figure 3C depicts the results for hydrophobic contour map: Hydrophilic favored regions (white regions) covered the arylamine group. This result indicated that increasing the hydrophilic properties of substituent groups is beneficial at this location. Figure 3D presents the results for hydrogen bond acceptor contour map: It suggests that groups with hydrogen bonding ability can be added to the piperazine ring because a large purple module (hydrogen bond acceptor favored) was located nearby. The aliphatic chain terminal also possesses the ability to form hydrogen bonds.

Molecule design, applicability domain analysis, and activity prediction
Based on the findings from QSAR studies, We successfully acquired robust and highly predictive models and summarized the overall requirements of substituent properties for rational molecular design: 1. arylamine group on triazines should be reserved and any substitution on aromatic ring should be executed on position three or position four; 2. It is beneficial to keep the piperazine group, or potential benefits may be realized if piperazine is replaced with any bulky ring group with hydrogen bond forming characteristics; 3. The aliphatic chain portion on the piperazine should be reserved and addition of a negative electrostatic favored chain terminal with hydrogen bond forming ability may achieve drug development objectives.
Development of the models and molecular design requirements were followed by examination of 89 molecules to discover potential anti-malaria agents. Each designed molecule was processed using the same structural optimization approach used for the dataset compounds. Optimum prediction space analysis (Discovery Studio software) was performed to build the applicability domain for the created models. Thirty-seven of all the designed compounds were evaluated as being unreliable for use in the created models. The remaining molecules were aligned with the molecule 35 reference for activity prediction, and compounds wmx09 and wmx25 were predicted to have better activity than molecule 35 (Table 9). We then examined these two compounds. A superimposed mapping analysis was performed to reveal the correlations for the contour maps and the designed molecules and then certify if the two compounds were rationally predicted. As presented in Figure 4A and Figure 4B, the modified cyano group of compound wmx09 orients to the green and red mixed region. Its bulky and electron-withdrawing properties met the requirements. The results presented in Figure 4E and Figure 4F indicate that substituent group of the amide on phenylamino of compound wmx25 was assigned to an interlaced region with red and green modules. Electron-withdrawing oxygen atoms directly orients to a negative electrostatic favored region, indicating that a rational placement for the amide group was position four on the aromatic ring. Modification of the amide group at the aliphatic chain terminal satisfactorily met the negative electrostatic favored and hydrogen bond forming requirements (red and purple modules) (Figure 4 B, D, F and H).    wmx09  2  2  3  True  True  wmx25  1  2  3 False True

ADMET prediction
ADMET investigation was performed for further molecular verification using "ADMET Descriptors" in the Discovery Studio software. The prediction outcomes for compound wmx09 and wmx25 are presented in Table 10. Both compounds were predicted to have acceptable solubility at level of 2 (-6.0 < log(Sw) < -4.1) and weak blood-brain barrier penetration abilities. Compound wmx25 was predicted to have relatively better outcomes of CYP2D6 inhibition and intestinal absorption. However, both compounds were evaluated to have potential hepatotoxicity, so additional changes should be made to reduce this toxicity. The "toxicity prediction" function in the Discovery Studio software was used for the molecular toxicity investigation. The results indicated that both compounds had acceptable toxicity characteristics. Compound wmx09 was predicted to have female rodent carcinogenicity based on National Toxicology Program criteria. Compound wmx25 was predicted to have skin sensitization characteristics. In general, acceptable ADMET investigation results were acquired for both designed compounds. However, more work is necessary to improve their drug-like characteristics before they can qualify as lead compounds for further study.

DISCUSSION
The present study was designed and performed to screen for potential anti-malaria agents based on 4-anilinoquinoline analogs. After a systematic model selection approach, CoMFA and CoMSIA_EHA QSAR models built using molecular partition approach 02 were chosen depending on their relatively optimal statistical values. Carefully analysis of created contour maps provided informative clues leading to overall requirements for a molecular design. Subsequent activity prediction based on the chosen models assisted us to discover two potential compounds (i.e., wmx09 and wmx25). The results of the superimposed mapping analysis reinforced the prediction outcomes by displaying several favorable interactions between the designed compounds and different contour regions. The evaluation using the AMDET approach also found that compound wmx09 and wmx25 with acceptable results.

CONCLUSION
Based on a series of 4-aminoquinolinyl analogs, we built robust QSAR predictive models. The results of subsequent molecular screening studies indicated that compounds wmx09 and wmx25 have high potential as anti-malaria agents. Further research on these two compounds would have considerable scientific value.

DECLARATIONS
(http://www.budapestopenaccessinitiative.org/rea d), which permit unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.