Application of Group-Based QSAR and Molecular Docking in the Design of Insulin-Like Growth Factor Antagonists

Purpose: To identify the structural requirements for designing a lead key for insulin-like growth factor (IGF-1R) inhibition using group-based quantitative structure activity relationship (GQSAR) and molecular docking. Methods: GQSAR method requires fragmentation of molecules. The molecules in the current dataset were fragmented into three (R1, R2 and R3) by applying common fragmentation pattern, and fragment-based 2D descriptors were then calculated. GQSAR models were derived by applying various methods including multiple linear regressions and partial least square or k-nearest neighbour. Results: Four generated GQSAR models were selected based on the statistical significance of the model. It was found that the presence of flexible and non-aromatic groups on fragment R1 was conducive for inhibition. Additionally, the existence of amino groups as hydrogen bond donors at fragments R2 and R3 was fruitful for inhibition. Docking studies revealed the binding orientation adopted by the active compounds at several amino acid residues, including Met 1126, Arg, 1128, Met 1052, GLU 1050, Met 1112, Leu 1051, Met 1049, Val 1033, and Val 983 at ATP binding sites of IGF-1R kinase domain. Conclusion: The generated models provide a site-specific insight into the structural requirements for IGF-1R inhibition which can be used to design and develop potent inhibitors.


INTRODUCTION
Insulin-like growth factor-1 receptor (IGF-1R) is a tyrosine kinase of the insulin receptor family with hetero-tetrameric α 2 β 2 subunits.Alpha subunit is located extracellularly and contains the binding domain, while the β subunit is mainly intracellular with a small portion spanning across the membranes.The latter subunits are responsible for intracellular signal transduction due to tyrosine kinase domain and associated motifs [1].IGF-1 receptors are located in all brain tissues such as median eminence, hippocampus, and cortex [2].
Physiological activation of IGF-1R, which activates PI3K/Akt/mTOR, and Ras/Raf/MEK/ ERK pathways only occurs upon binding of IGF-1 or IGF-2 ligands to the receptor.Conversely, ligand-dependent activation of IGF-1R is bypassed in many solid and hematological cancers leading to continuous activation of PI3K/Akt/mTOR and Ras/Raf/MEK/ERK pathways that promote proliferation and inhibition of apoptosis [3].IGF-1R has been widely targeted to decrease cellular proliferation and promote apoptosis in cancers [4,5].Crystallographic structures of IGF-1R cocrystallized with various ligands have been identified [3,[6][7][8].The biological activity of these ligands depends on the interactions with certain residues in the ATP binding sites of IGF-1R kinase domain, which serve as hydrogen bond donors, acceptors, or hydrophobic pockets.Despite the success in the design of IGF-1R inhibitors, there is a need to optimize the lead compounds to efficient drug compounds.
Fragment-based lead discovery (FBLD) has shown promise in current drug discovery and lead optimization.The main concept of this method is to drive new molecules by combining fragments that are determined based on ligandtarget interaction information [9].Recently, successful application of group-based quantitative structure activity relationship (GQSAR) for lead optimization have been reported [10].In GQSAR, molecules are fragmented, based on specific molecular sites, bonds, and rings or even interactions with the target, and then the 2D fragment-based descriptors are calculated.The models generated by GQSAR provide hints on the impact of fragment variation on activity [11].
In this study, GQSAR was performed using various compounds from different literature sources with the aim of designing potent IGF-1R inhibitors.To further elucidate the binding modes of the generated compounds, high-throughput virtual screening (HTVS) and extra-precision docking (XP) of compounds from the developed GQSAR models were carried out using the Glide module of Schrodinger LLC.

Group-based quantitative structure activity relationship (GQSAR)
The Hansch method of QSAR assumes that groups at different substitution sites of congeneric series do not interact with each other; on the contrary, conventional molecular descriptors currently in use capture this interaction by calculating them for the whole molecule.
However, for this method, interpretation of generated QSAR equation poses significant challenges.In view of this, the new G-QSAR approach, which uses established 2-D/3-D descriptors for the fragments and also includes interaction of fragments using cross terms, can provide promising results.Cross terms added are descriptors evaluated as a product of vectors of fragment descriptors.G-QSAR method was very recently proposed, [11] and differs from conventional fragment based QSAR methods in two ways: 1. Fragmentation of each molecule in the dataset is done with a set of predefined rules.2. G-QSAR method considers cross/interaction terms as descriptors to account for the fragment interactions.

Software
GQSAR modeling was executed using the Molecular Design Suite (VLifeMDS software package, version 4.1, from Vlife Sciences Technologies Pvt Ltd, India) on a Microsoft Windows 7 operating system.

Dataset collection and fragmentation
A dataset of 164 IGF-1R inhibitors with their IC 50 values were collected from a Binding-DB database [8,12,13].The biological activity values (IC 50 ) were converted into the negative logarithmic pIC (-logIC 50 +6).The structures were energy minimized using Merck Molecular Force Field (MMFF) using convergence criterion (RMS gradient) of 0.01 kcal/mol and maximum number of cycles of 100 [14].
All molecules were then fragmented into three different fragments as shown in Figure 1.

Fragment R2:
It is formed by aromatic heterocyclic ring (chemical scaffold) structure after Fragment R1.

Fragment-based molecular descriptors calculation
Individual physicochemical descriptors like molecular weight, hydrogen bond donors and acceptors, retention index (chi), path count, estate numbers, atomic valence connectivity index (chiv), polar surface area, etc were calculated for all fragment-based descriptors.The important descriptors found fruitful according to the developed model can be defined as follows: 1. chi1: The retention index (first order) derived directly from gradient retention times.2. chi3Cluster: Third order cluster chi index.3. chiV6chain: Atomic valence connectivity index for six membered ring.

chiV4pathCluster:
Valence molecular connectivity index of 3rd order path cluster.

Dividing the dataset into training and test sets
Data was divided into training and test sets using sphere exclusion algorithms with a dissimilarity level of (+1) [15].To ensure the goodness of data distribution across the two sets, uni-column statistics parameters of each set (maximum value, minimum value, average and standard deviation) were evaluated.
The model was considered to have a significant predictivity when the squared correlation coefficient (r 2 ) between descriptors and activity (pIC) was more than 0.6.Similarly, the models were considered to possess significant internal and external predictivity when the cross-validated correlation coefficient of the leave-one-out method (q 2 ) > 0.6 and the correlation coefficient of the training set (pred_r 2 ) > 0.5 [10].

Molecular docking of the GQSAR-generated compounds
In addition to GQSAR analysis of the generated compounds, we explored the use of High throughput virtual screening (HTVS) and extraprecision (XP) docking to compare the binding mode as well as docking scores of the highly active BMS1112 and poorly active compounds BMS1605 in comparison to the native ligand BMS_754805 in the 3I81 crystal structure.

Protein preparation and minimization
The coordinate of 3I81 crystal structure was obtained from the RCSB Data Bank (PDB) [8] and prepared using Protein Preparation Wizard, which is a part of the Maestro software package (Maestro, v9.3, Schrodinger, LLC, New York, NY, 2012).Briefly, all formal charges and bond orders were added for heteroatoms.Hydrogen atoms were also added to all atoms of the protein crystal structure.All water molecules were removed and the protein was then minimized using the refinement module in Prime (Prime, v3.1, Schrodinger, LLC, New York, NY, 2012).

Ligands preparation, refinement, and docking
Using LigPrep (LigPrep, v2.5, Schrödinger, LLC, New York, NY, 2012), all ligands were converted into low energy 3D structures by adding hydrogen atoms, neutralizing charged groups, and generating ionization and tautomeric states.A rectangular box was used to define the binding site of the ligands in 3I81 receptor and glide energy grid was generated using Glide Receptor Grid Generation workflow (Glide, v5.8 Schrödinger, LLC, New York, NY, 2012).High throughput virtual screening (HTVS) and extraprecision (XP) docking were carried out using the default Glide settings.The virtual screening workflow allows screening of a large number of ligands in three stages.The grid-based ligand docking utilized here allows the approximation of positions, orientations, and conformations of the ligand in the 3I81-binding pocket through many hierarchical filters.The scored docking parameters include total GlideScore (GScore): sum of XP terms excluding Epik state penalties, total docking score (Dockscore): sum of XP terms, lipophilicEvdW and Hydrogen bonding (HBond): ChemScore H-bond pair term.The docking mode of the most active compounds BMS1112 and an inactive compound BMS1605 in the active site of 3I81 were studied in terms of functional groups interactions with amino acid residues in the ATP binding site.The docking result of these compounds was compared to the docking results of the original co-crystalized ligand BMS_754807 in 3I81.

Selection of training and test sets
Fragment-based molecular descriptor calculations resulted in a pool of 325 different two-dimensional descriptors divided into 110, 100, and 115 descriptors for fragment R1, R2, and R3, respectively.The sphere exclusion method with a dissimilarity value of +1 resulted in a training set of 117 compounds and a test set of 47 compounds.The statistical parameters of both sets show that activity was evenly distributed within both sets; i.e. the average pIC 50 values for the training and test sets are 4.25 and 4.73, respectively.The test set (maximum and minimum pIC 50 values are 6.39 and 2.23) represents the whole training set (maximum and minimum pIC 50 values are 6.52 and 1.86).Multiple models were built using various modelbuilding and variable-selection methods.Table 1 shows the statistical parameters for each model.
The simulated annealing variable selection method coupled with multiple linear regression using 117/47 compounds as the training/test set resulted in a statistically significant model having a predictive activity of r 2 = 72.80%.Internal and external validation revealed a significant predictive power of 61.70 % and 81.24 %, respectively.The regression plot of predicted versus actual activity and the percentage contribution plot of descriptors are displayed in Figure 2A.Table 2 shows the percentage contribution of each 2D descriptor found in model 1.

Model 4 (SA/kNN)
The classification method of k-nearest neighbour was applied in combination with simulated annealing algorithms for generating a new model for IGF-1R inhibitors.Distance-based weighted average method was used for the prediction of biological activities of the molecules.This study led to a statistically significant GQSAR model with eight descriptors using the five nearest neighbours.The k-NN-generated model provided ranges (maximum and minimum) for eight different descriptors, as presented in Table 3.

Docking studies of best active compound and moderately active compound
The docking results showed two GQSAR generated compounds 608 and 1112 with docking scores of -10.10 and -9.72 respectively while the native ligand BMS_754807 scored -9.29 which is slightly lower score.Other compounds such as BMS_1126 and BMS_1148 scored -9.22 and 8.72 respectively.These scores are comparable to the native ligand's score.In order to understand the binding mode of the most active compound (BMS_1112) in comparison to the native ligand (BMS_754807) and to obtain the binding information for further structural optimization, we conducted molecular docking using Glide.We utilized Glide XP extra precision mode which allows flexible docking of ligands.As shown in Figure 3, the docking poses of compound 1112 at the 3I81 receptor active site (IGF-1R) shows interaction between fragment R1 and MET 1126 and ARG 1128 and between fragment R3 and MET 1052 and GLU 1050 on the hinge.The amide group of fragment R1 donates hydrogen to Met 1126 and Arg 1128.No hydrophobic interactions were observed between R1 and the active site of IGF-1R.Binding mode of moderately active compound BMS-1605 shows hydrophobic interactions between the phenyl ring of fragment R1 with VAL 983, ME1049, and VAL 1033 on the IGF-1R.Lastly the docking pose of the co-crystallize BMS-754807 ligand forms hydrogen bonds to GLU 1050 and MET 1052 at the hinge.No hydrogen bonds are formed with the other residues in the active site.

DISCUSSION
Group-based quantitative structure activity relationship (GQSAR)

Model 1 (SA/MLR)
As shown in Figure 2A and Table 2, model 1 indicates that structural modifications on fragment R2 are critical for generating potent IGF-1R inhibitors.Negatively contributing features like oxygen count (-35 %), N-oxides (-18 %) and single bonded sulfur atoms are detrimental for activity.
Remarkably, all of the above structures (O, S and N oxides) can be considered as hydrogen bond acceptors since they have lone pair electrons.Thus, we can deduce that the actual role of the scaffolds in ligand-target interaction is most likely different from being hydrogen bond acceptors.
On the other hand, the current model attributes about a quarter of the variation in biological activity to structural variation on fragment R3.Number of carbon atoms connected to one hydrogen atom and two aromatic bonds (SaaCHcount) are detrimental to the activity leading to the conclusion that aromatic hydrocarbons in fragment R3 are harmful to the inhibitory activity.Therefore, multi-nitrogencontaining heteroaromatic rings like triazine are expected to provide better inhibition than aromatic rings such as benzene, pyridine and pyrimidine.Electro-topological state indices of NH 2 connected to one single bond is conducive to bioactivity leading to speculate that primary amines are fruitful for inhibition, most likely via hydrogen bonding, either by donating or accepting hydrogen atoms, provided that the nitrogen atom is connected to electropositive groups.Third, information-based descriptors (IdwAverage) have a positive contribution of 8.5 % reflecting the effect of entropic interaction fields of fragment R3 to the free energy of drugtarget binding, which depends on the hydrophobic and hydrogen bonding properties of both.
Accordingly, we can surmise that fragment R3 of the ligand is crucial for binding to IGF-1R.The XlogP (-7 %) and NH group connected to two single bonds (7.5 %) of fragment R1 suggest that hydrophobic interactions, along with hydrogen donor of R1, are not favorable for inhibition.Consequently, aromatic rings containing nitrogen atoms in fragment R1 are more preferable for activity than their saturated counterparts.

Model 2 (SA/MLR)
As shown in Figure 2B and The positive contribution of this descriptor states that nitrogen-containing five-membered heteroaromatic rings (pyrrole, imidazole and triazole) are preferable as scaffolds.The terms found to be contributing to fragment R1, SaaCHE-index (~-11.5 %) and SdsCHcount (~-12.5 %), are unfavorable contributors while Chi1 (7.5 %) is a positive contributor towards activity.Collectively, the descriptors indicate double bonded CH group is not recommended in this fragment.

Model 4 (SA/kNN)
As shown in Table 2, the molecular descriptors suggest a key role of aromatic carbon (SaaCHcount and SaasCE-index) at fragment R1 and R3 along with terms like K3alpha at fragment R1.In addition, nitrogen count, volume, and chiV6 chain at R2, and chi3cluster and double bonded carbon count at R3 were classified as contributors.The ranges could be of a critical value when searching in the fragment library for designing new inhibitors.A fitness plot of the actual and predicted activities by this model is displayed in Figure 2D.

Molecular docking of the GQSAR-generated compounds
The docking results show some GQSAR generated compounds including BMS_608 and BMS_1112 scoring better than the native ligand BMS-754807.BMS_1112 formed three hydrogen bonds with the hinge in addition to forming three hydrogen bonds with ARG 1054, MET 1126 and ARG 1128, while the co-crystallize BMS-754807 ligand formed three hydrogen bonds to GLU 1050 and MET 1052 at the hinge without hydrogen bonding with other residues.This result revealed the importance of the BMS_1112's two hydroxyl groups, which donated two hydrogens to GLU1050 and accepted hydrogen from MET1052.The BMS_1112 also donated hydrogen atoms in the formation of non-hinge hydrogen bonds with ARG1128, MET1126 and ARG 1054 highlighting the availability of two amides in donating hydrogens to form many hydrogen bonds.
The co-crystallized BMS-754807 ligand lacks the non-hinge hydrogen bonds.Furthermore as shown in figure 3A, the docking analysis of fragment R1 of the BMS_1112 at the IGF-1R active site shows the amide group of fragment R1 donates hydrogen to Met 1126 and Arg 1128.No hydrophobic interactions were observed between R1 and the active site of IGF-1R.This is in agreement with the results of GQSAR models.
Furthermore, the analysis of fragment R3 of BMS_1112 shows interactions between hydroxyl groups on R3 with Met 1052 and GLU 1050, which are strategically located on the hinge region.There were also hydrophobic interactions between R3 and Met 1112, Leu 1051, Ala 1001, Met 1052, Met 1049, Val 1033, and Val 983.The large hydrophobic surfaces and three hydrogen bonding can increase the entropic interaction field of R3 that is related to information based descriptors (IdwAverage) mentioned in the GQSAR models.

CONCLUSION
All generated GQSAR models were statistically significant, and the consensus of all models provide site-specific clues for design of new IGF-1R inhibitors.Non-hydrophobic substituents with branching and low NH count at fragment R1 are found to be essential for better activity.Scaffolds of the molecules that form the fragment R2 can be modified by decreasing oxygen atoms and increasing hydrogen bond donors.GQSAR studies suggest that the major contribution towards activity is from substituents at R3 that have less aromatic carbon count and more NH 2 groups.
In searching for novel and potent compounds, fragments that fit within the requirements of the GQSAR models can be selected and grown to generate a library of compounds which can be synthetically evaluated for IGF-1R inhibition.Molecular docking results of the GQSARgenerated compounds showed that the molecules of a high predicted activity possess interaction terms in good agreements with the developed models (hydrogen bonding at R1 and hydrophobic interaction and hydrogen bonding at R3).Some of the generated molecules rewarded high docking scores and efficient interaction features with the hinge, which can be strong candidates for future investigation.

Figure 1 :
Figure 1: Fragmentation pattern.The dataset was grouped into different chemical structures based on common chemical parts that were mainly aromatic heterocyclic ring named as the scaffold or fragment R2 (red circle).The 3D structures were fragmented into three fragments R1 (blue circle), R2 (red circle), R3 (yellow circle)

Figure 3 :
Figure 3: Binding mode of the best active compound BMS-1112 compared to the native ligand BMS_754807 in the ATP binding site of IGF-1R (PDB: 3I81) crystal structure.All the ligands were docked into IGF-1R receptor using Glide XP function.(A) The docking poses of compound 1112.(B) Docking pose of co-crystallize BMS-754807 ligand.The ligand interaction diagrams (LID) of the ligands are shown highlighting the hydrophobic interactions, residues charge states and polarity

Table 1 :
Statistical parameters for the developed GQSAR models 2 : squared correlation coefficient; q 2 : cross-validated correlation coefficient of leave-one-out method (internal validation); pred_r 2 : correlation coefficient of the training set (external validation); se: standard error; SA: simulated annealing; MLR: multiple linear regression; PLSR: partial least square regression; kNN: k nearest neighbor

Table 2 :
The percentage contribution of important 2D descriptors found in the developed G-QSAR models

Table 3 :
Contribution range for fragment descriptors found in model 4