Quantitative Structure-Activity Relationship Analysis of the Anticonvulsant Activity of Some Benzylacetamides Based on Genetic Algorithm-Based Multiple Linear Regression

Purpose: To develop the quantitative structure-activity relationship (QSAR) for predicting the anticonvulsant activity of α -substituted acetamido-N-benzylacetamide derivatives. Methods: AM1 semiempirical quantum chemical calculation method was used to find the optimum 3D geometry of the studied molecules. Two types of molecular descriptors, including the 2D autocorrelation and GETAWAY descriptors, were used to derive a quantitative relation between anticonvulsant activity and structural properties. The relevant molecular descriptors were selected by genetic algorithm-based multiple linear regression (GA-MLR) approach. Results: The high value of the correlation coefficient, R 2 (0.900), indicate that the model was satisfactory. Conclusion: The proposed model has good stability, robustness and predictability when verified by internal and external validation.


INTRODUCTION
Epilepsy, a common neurological disorder characterized by recurrent spontaneous seizures arising from excessive electrical activity in some portion of the brain, is a worldwide public health problem which affects approximately 1 % of the population [1]. Over the years, the field of epilepsy has received a great deal of attention from research investigators in the hope of discovering new drugs that are more effective and have minimal adverse effects. Though several new anticonvulsants have been introduced, some types of epilepsies are still not adequately controlled with the current therapy. Adverse reactions and lack of efficacy for certain types of epilepsies are some of the limitations of existing medications [2]. Antiepileptic drugs exert their action by different mechanisms. They include an enhancement of the GABA-ergic neurotransmission, effects on neuronal voltage-gated sodium and/or calcium channels [3] Quantitative structure-activity relationships (QSAR), as a major factor in drug design, are mathematical equations relating chemical structure to their biological activity [4]. Anticonvulsant agents have been the aim of many SAR and QSAR studies [5][6][7][8][9][10][11][12][13][14][15][16]. Palludotto et al [5] synthesized a series of 2aryl-2,5-dihydropyridazino [4,3-b]indol-3-one derivatives and tested them as central benzodiazepine receptor ligands . These workers used 2D and 3D QSAR on these molecules and observed that the molar refractivity (MR) of the substituents was the major factor controlling the binding of the ligands to their receptors. A correlation between the theoretical descriptors of the tricyclic neuro-active drugs and their biological mode of action has been obtained by the theoretical studies of Marone and coworkers [6]. Three dimensional QSAR analyses on the anticonvulsant activity of a series of cinnamamides using comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) approaches have been reported by Hou et al. [7]. These investigators found that the interaction of these compounds with receptors is achieved by electrostatic and hydrophobic forces. Marder et al [8] have reported molecular modeling and QSAR analysis of flavone derivatives upon interaction with benzodiazepine binding site. The electronic properties of the ligands were found to be the major factor affecting the ligand-receptor binding [9].
In the present study, we aimed to develop QSAR equations for the anticonvulsant activity of a series of α-substituted acetamido-N-benzylacetamide drugs. We therefore used two types of molecular descriptors including 2D Autocorrelation and GETAWAY parameters to derive a quantitative relation between the anticonvulsant activity and structural descriptors obtained by genetic algorithmbased multiple linear regression (GA-MLR).

Activity data
A data set containing α-substituted acetamido-N-benzylacetamide drugs was used in this study. The ED 50 (mg/kg) data of the anticonvulsant activity, were evaluated by the maximal electroshock seizure (MES test) by Kohn et al [17], were taken from the paper of Jin et al. [11]. The supposed benzylacetamide anticonvulsant pharmacophore consists of a vicinal diamine linkage; an oxygen atom on the ethylene chain bridging two amino groups; and an aromatic ring one carbon removed from an amide group.
The chemical structure and anticonvulsant activity of the studied molecules are included in Figure 1 and Table  1.

2D Autocorrelation approach
Three spatial 2D autocorrelation (2DAUTO) vectors were employed for modeling [18]: Broto-Moreau's autocorrelation coefficients: Moran's indices: Geary's coefficient: where d ij is the topological distance or spatial lag between atoms i and j.
The 2D Autocorrelation descriptors in general explain how the considered property is distributed along the topological structure. Autocorrelation vectors were calculated at spatial lags l ranging from 1 up to 8. The physicochemical property considered in the four different weighting schemes: atomic masses (m), atomic van der Waals volumes (v), atomic Sanderson electronegativities (e), and atomic polarizabilities (p). The autocorrelation descriptors are denoted by the scheme: type of descriptor-spatial lagweighting property; for instance, GATS5p is the Geary autocorrelation of lag 5 weighted by atomic polarizabilities.

GETAWAY approach
The GETAWAY descriptors [19] are recently proposed molecular descriptors derived from a new representation of molecular structure, the molecular influence matrix (MIM), denoted by H and defined as the following: where M is the molecular matrix constituted by the centered Cartesian coordinates x, y, z of the molecule atoms (hydrogens included) in a chosen conformation, and the superscript T refers to the transposed matrix.
On the other hand, matrix R, a symmetrical matrix whose elements resemble the single terms in the sums of the gravitational indices, is defined as Where h ii and h jj are the leverages of the two considered atoms and r ij their geometric distance. Obviously, the diagonal elements of matrix R are zero, and the largest values of its off-diagonal elements derive from the most external atoms (i.e., with high leverages) and simultaneously next to each other in the molecular space (i.e., small interatomic distance).
Finally notice that, in many of these H and R descriptors, the molecule atoms are weighted in such a way as to account for atomic mass, polarizability, van der Waals volume, and electronegativity, with the aim of incorporating relevant chemical information. Two sets of theoretically closely related molecular descriptors have been devised: H-GETAWAY descriptors have been calculated from the MIM H, while R-GETAWAY descriptors are from the influence/distance matrix R where the elements of the MIM are combined with those of the geometry matrix.

Model development
HyperChem software was used to draw the chemical structure of the molecules. AM1 semi-empirical quantum-chemical calculation was used to optimize the 3D geometry of the molecules. The geometry optimization was preceded by the Polak-Rebiere algorithm until the root mean square gradient reaches 0.01. Dragon [20] computer software was employed to calculate the 2DAUTO and GETAWAY molecular descriptors. The calculated descriptors were gathered in a data matrix. First, the descriptors were checked for constant or near constant values and those detected were discarded from the original data matrix.
Then, the descriptors were correlated with each other and with the activity data. Among the collinear descriptors, one with the lowest correlation with anticonvulsant drug was removed from the data matrix. Multiple linear regressions (MLR) were used to derive the QSAR equation and feature selection was performed by the use of genetic algorithm (GA).
A genetic algorithm is a novel and simple optimization method based on the evolution process of beings that implicitly and effectively has been applied to the various types of optimization problems in many scientific fields. It is based on the simulation of natural genetics and evolutions. The genetic algorithm used was the same as that previously used [21,22]. Each individual of the population was defined by a chromosome of binary values representing a subset of descriptors. A gene took a value of 1 if its corresponding descriptor was included in the subset; otherwise it took a value of zero. The number of genes at each chromosome was equal to the number of descriptors. The initial population was created randomly. The population size was varied between 50 and 250 for different GA runs. The resulting models were validated by leave one-out (LOO) cross-validation procedures to check their predictability and robustness. Table 2 presents the notation and a short description of the molecular descriptors used to generate the QSAR model.

RESULTS
By using the genetic algorithm-based multiple linear regression (GA-MLR) method, the The correlations performed for the whole set provided the optimal equations for different numbers of descriptors in the range of 1-6. Figure 1 shows the plots of R 2 , Q 2 and s 2 (squared standard deviation) as a function of the number of variables in the regression model.  ), standard error of estimation (SE), root-mean-square error (RMS), Fisher statistic ratio (F) and LOO cross-validation (Q 2 ) are given in the Eq. 8. As can be seen, the MLR model has good statistical quality with low prediction error. The predicted activities by using GA-MLR regression method are listed in Table 1.
The robustness of the model and their prediction ability for the anticonvulsant activity, were evaluated by both LOO crossvalidation and external validation procedures. In order to estimate the predictive power of the GA-MLR model, an external validation test was performed by splitting the data into two sub-samples with one being used to fit (training set) and the other to test (test set). Models are generated based on training set compounds and predictive capacity of the models is judged based on the predictive R 2 values. Selection of the training set compounds is significantly important in QSAR analysis. One most widely used method for dividing a data set into training and test sets is mere random selection.
In the external validation procedure, 10 analogues are randomly selected and eliminated from the data set as unknown test samples. Then the training set generated using remaining 25 samples and log (ED 50 ) of eliminated samples are predicted using the MLR model. The test set is marked with superscript a in Table 1

DISCUSSION
The proposed QSAR model, due to the high predictive ability, can therefore act as a useful aid to the costly and time consuming experiments for determining the maximal electroshock seizure (MES test). We first tried to identify descriptors trends which lead to anticonvulsant activity based on the proposed QSAR equation. The QSAR model of α-substituted acetamido-N-benzylacetamide derivatives (Table 1) has shared six 2DAUTO and GETAWAY class descriptors ( Table 2). As mentioned before, Moran autocorrelation lag 6 weighted by Sanderson electronegativities (MATS6e) is the most important variable for predicting the anticonvulsant activity. The remaining five descriptors involve the summations of different functions corresponding to the different fragment lengths and with polarizability (p), electronegativity (e), volume (v) and mass (m) as the weighting parameter (Table 2).

CONCLUSION
In summary, multivariate linear QSAR models were obtained by MLR method combined with genetic algorithm for variable selection (GA-MLR). We have shown that the 2D AUTO and GETAWAY descriptors are able to describe the anticonvulsant activity of different α-substituted acetamido-Nbenzylacetamide derivatives. The proposed models have good stability, robustness and predictability when verified by internal validation (LOO-CV) and also external validation.