A comprehensive computational mutation structure- function approach for determining potential drug target sites in poliovirus 2A protease

Purpose: To investigate a computational approach for analysing the structure-function relationship of poliovirus 2A protease using various bioinformatics tools. Methods: The three-dimensional structure of 2Apro was modelled and analyzed using the crystal structure of coxsakievirus B4 as a template to understand the function of this protein. Structural validation programs, VADAR and QMEAN, were used to verify the 2Apro model. Analysis of protein stability changes in poliovirus 2A protease-mutated sequences using various servers was also performed. Furthermore, mutation pattern, intrinsic disorder regions (IDRs), hydrophobic regions, drug binding sites (DBS) and subcellular localization were identified. Results: Hydrophobicity results confirmed the suitability and reliability of 2A protease as a potential drug target. Less IDRs were observed in the protein. Moreover, the results showed the presence of various important drug binding targets among conserved regions of the protease. The predicted drug binding sites indicate their suitability for the inhibition and development of anti-viral drugs against poliovirus 2A protease. Conclusion: The current study resulted in the detection of important ligand interactions with respect to the binding site of the targeted protein. Thus, these compounds may be potent drug candidates and their potency may be increased against poliovirus 2A protease with relatively simple structural changes.


INTRODUCTION
Viral proteins involved in the cellular functions and manipulation of cellular processes by viruses have gained significant attention from the pharmaceutical and biotechnology industries as a potential source of drug targets [1]. Poliovirus proteinases are excellent substrates for drug development due to their key roles in the replication cycle.
Poliovirus (PV) of the Picornaviridae, has been categorized into three stable subtypes (1, 2 and 3). The PV genome comprises +ss RNA of approximately 7,440 nucleotides. The genome is packaged in a non-enveloped icosahedral protein capsid [6]. PV replication is carried out through a single open reading frame termed a polyprotein. Two cysteine proteases (2A and 3C) are involved in proteolytic processing of polyprotein, which are crucial for viral life cycle [7]. The poliomyelitis epidemic resulted in the discovery of two important vaccines, OPV and IPV. No drug is available for treating poliomyelitis.

PV 2A
pro plays a leading role in autocatalytic cleavage process of poliovirus polyprotein. 2A pro is involved in cleavage of eIF4G (eukaryotic translation initiation factor 4G [8]. Studies have demonstrated the role of poliovirus 2A pro in activating the apoptotic process in PV-infected cells, and it also exhibits anti-apoptotic activity [9]. The poliovirus 2A pro crystal structure is unknown; however, it is possibly very close to the known structure of HRV2 2A pro [10]. Research on 2A pro inhibitors has been limited so far. However, a number of peptide-based inhibitors have been developed against 2A pro . Anti-2A inhibitors are expected to show the additional advantage of suppressing genetic reversion along with negative effect son capsid formation [11]. Poliovirus proteases display minimal similarity to known mammalian enzymes, making them excellent drug targets [12]. The aim of this study was to explore poliovirus 2A protease as a probable drug target through computational approaches. Sequences in which mutations were already reported were used [13]. Analysis was performed to investigate drug target sites of 2Apro. We also investigated the effect of mutations on protein stability. The degree of conservation was mapped on the 3D structure of the proteins and compared with the novel predicted intrinsic disordered regions (IDRs) of poliovirus 2A protease (Figure 1).

EXPERIMENTAL Sequence retrieval
The 2A protease was amplified from blood samples (001 and 002) (from polio patients in the KP district of Pakistan with the help of Frontier Development Organization) then cloned into a TA cloning vector (PCR 2.1) and sent for sequencing (Eurofin, USA). The sequences were analyzed and submitted to Genbank (KF193869 and KF193870 for 001 and 002 respectively) [13]. Total of four mutations in the poliovirus 2A protease were observed after sequencing. Alanine residue (at position 12) was mutated into glycine, while wild type aspartic acid residue at position 36 was mutated into asparagine and normal serine was mutated to threonine at position 134 and isoleucine mutated to valine at amino acid position 136.

Homology modelling and model reputation
A PDB file of 001 and 002 samples of 2A protease was generated by the Swiss-Model server [14]. The structure of coxsackievirus B4 (1z8r) model was selected as a template to obtain the 2A model. The backbone conformation of the modelled structure of 001 and 002 proteases was calculated by investigating the torsion angles via Ramachandran plot version 2.0 [15]. Further analysis of the predicted model was done by QMEAN [16] and ProSA [17]. VADAR was performed to determine the volume area dihedral angle for fractional accessible surface area [18].

Protein structure prediction
I-Tasser [19] server was used to predict protein structure by giving precise structural and functional predictions by means of algorithms [20]. A total of five models were obtained based on the five largest structure clusters. Confidence scores (C-score) were calculated based on the significance of threading template alignments. Cscores lied in the range of [-5, 2], where a higher value indicates a model with a high confidence and vice-versa. A TM-score (template modelling score) of > 0.5 predicts a model of correct topology while a TM-score < 0.17 specifies a random similarity. 2A pro sequences containing mutations were subjected to structure prediction by I-Tasser.

Protein stability prediction
I-Mutant 2.0 [21] network predicts protein stability changes upon single point mutation from protein sequence/structure [22]. Output is obtained in the form of protein stability change and Gibbs-free energy change (DDG) using Eq 1. DDG = DG (mutant Protein) -DG (wild-type, kcal/mol) ..… (1) Furthermore, STRUM was used to predict fold stability change (ΔΔG) of protein molecules upon single-point mutations [23].

EASE-MM
sequence-based prediction of mutation-induced stability changes [24], was also used to affirm the stability of the protein upon mutations. Results are displayed in the form of an increase or decrease in stability upon mutation induction.

Protein intrinsic disorder region (IDR) analysis
Variations in IDR structural properties can play a regulatory role in protein activity. The IDRs of the 001 and 002 poliovirus 2A protease, were estimated using four software packages: MetaPrDOS [25], DisEMBL [26], PSIPRED [27] and IUPred [28]. A disorder score (0-1) for each amino acid position was calculated. Position of an amino acid was considered IDR if its disorder score was observed above the cut-off value of 0.5.

Prediction of hydrophobicity
Prediction of overall hydrophobic residues of 001 and 002 poliovirus 2A protease was performed using the TM Finder tool [29]. The TM-Finder predicts trans-membrane proteins. The regions are predicted to be trans-membrane based on the hydrophobicity and helicity of the adjacent amino acid sequences. The threshold value for hydrophobicity is 0.4 and values exceeding the threshold level predict the transmembrane nature of the protein [29]. Hydrophobicity results of the protein were also confirmed using the PEPTIDE.2 server.

Prediction protein-ligand-drug binding site
Tertiary structures of 001 and 002 poliovirus 2A protease were analyzed to predict the number of drug binding sites. The online servers COACH [19,30] I-Tasser were used to determine the number of binding sites and their positions. COACH is a meta-server approach for the prediction for ligand binding targets through two comparative methods TM-SITE [31] and S-SITE. The BioLiP protein function database is used in these methods to recognize ligand binding sites. These predictions were later combined with the results obtained from other methods including ConCavity [32], FINDSITE [33] and COFACTOR [20]. To predict the final ligand binding site of the protein, PDB structures of 001 and 002 poliovirus 2A protease were given to the server for analysis. Complete results were given in the form of the position of the potential drug binding site.

Subcellular localization analysis
To understand a protein's function and its suitability as a vaccine/target, subcellular localization analysis is performed. Cytoplasmic proteins tend to have more potential as drug targets compared to surface membrane proteins [34]. The CELLO v.2.5 [35] server was employed to predict subcellular localization of poliovirus 2A protease. The results were further validated with predictions obtained from Virus-mPLoc [36], TOPCONS [37] and TMHMM [38].

Structures of 001 and 002 PV2A protease models
The Swiss-Model predicts the 3D structure of both the models on the basis of known crystal structure of the protein (Figure 2). The maximum scoring and validated model for PV2A protease exhibiting the highest sequence identity with the crystal structure was coxsackievirus B4 (PDB ID: 1Z8R.A).
The stereo-chemical attributes of the predicted protein models were investigated through QMEAN (Table 1) and evaluated by Ramachandran plots (Figure 3).
The QMEAN4 and QMEAN6 score [39] of the model was close to the value of 0, showing the reliability of the model (model reliability in between 0 and 1; Table 1). Predicted structure analysis delivered solid evidence of the good quality of predicted 3D structure of poliovirus 2A protease.
Protein model quality and accessible surface area (ASA) was predicted through VADAR. Accessible surface area to water molecules was measured as fractional ASA range (0 to 1). Results of ASA showed that the major portion of the protein is hydrophobic, having ASA scores less than 0.8 signifying tight folding of the protein (Figure 4).

Predicted drug binding sites (DBS)
DBS analysis predicted functional ligand binding sites. All of these binding site residues of both models were also confirmed when predicted  Figure 6C and 6D).

Intrinsic disorder region prediction
To understand protein folding and function, precise prediction of a protein's tendency to have intrinsic disordered regions is an essential step. The current investigation used four software packages to identify disordered positions based on the majority voting by these software tools. In this prediction server, a residue is considered as disordered if its score is above 0.5. Based on the real valued disorder propensities generated by MetaPrDOS [25], DisEMBL [26], PSIPRED [27] and IUPred [28] analysis, disordered regions are defined as regions with scores above the threshold line (0.5); however, scores below this line specify a degree of flexibility. In the case of 001 and 002, regions GFGH (1-4) and EAMEQ (145-149) were found in ID regions.

Subcellular localization
Subcellular localization prediction of PV2A protease specifies its localization in the cytoplasmic membrane with no signal peptide and transmembrane helix.

Protein stability with mutations
The protein stability change upon mutation was assessed by using I-Mutant 2.0, STRUM and EASE-MM web-servers. The results showed the effect of mutants on stability of protein with reliability index at pH 7.0 and temperature 25 °C. Mutations (A12G, D36N, S134T, I136V) used in this study indicated a decrease in stability of the protein.

Hydrophobicity
Hydrophobic interactions play an important role in drug design involving biophysical events for instance protein-ligand or protein-protein binding.
Hydrophobic regions of the selected 001 and 002 proteins were determined with PEPTIDE 2 and TM-Finder. These software predicted the calculated hydrophobicity and helicity values, along with the predicted TM-segments of the proteins (Figure 8). In case of both proteins most residues were found to be hydrophobic (34.9 %) as compared to hydrophilic residues (11.4 %).

DISCUSSION
Poliovirus 2A protease is involved in many physiological processes, making it an important target in the cysteine protease superfamily of proteins. However, the absence of a high resolution crystal structure of poliovirus 2A protease is a limitation in understanding atomic details [10]. In this study, we took advantage of the PV2A pro primary structural similarity to other viral proteases, specifically that from Coxsackievirus B4, and performed an in silico analysis of its structure. Poliovirus 2A protease sequences (001 and 002) were taken from our previous work where we have reported the mutations in this protein.
No previous studies have reported mutations detected in the poliovirus 2A protease. In this study, we performed a computational analysis on these mutations to determine their impact on protease structure, function and stability. Intrinsic disordered region prediction results proved it to be suitable for drug binding sites, with a low number of residues lying in the disordered regions. Moreover, hydrophobicity results predicted that these proteases have more hydrophobic residues compared to hydrophilic residues. Important drug binding sites were predicted using the COACH server for these mutation containing proteases. These important residues include His20, Leu21, Asp38, Cys55, Cys57, Met83, Tyr88, Tyr89, Asp108, Cys109, Cyss115, His117, Ile123, Thr124, Ala125 and Gly126.
Previous studies have reported that His20, Cys109, Cys55, Cys57, Cys115 and His117 are involved in structural maintenance and catalytic activity of 2A pro [40]. Proteinase 2A of human rhinovirus 2 (HRV2 2A) Cys residues and His114 are important residues involved in coordinating Zn +2 . Moreover, Gly123, Gly124, Thr121 and Cys101 are suggested to form part of the construction of the substrate binding pocket and to offer a suitable setting for the active site of His18, Asp35 and Cys106 [41].
Previously reported mutations (A12G, D36N, S134T, I136V) in poliovirus 2A protease were subjected to predict single nucleotide polymorphisms along with their impact on protease stability. All mutations result in a decrease of protein stability, thus destabilizing the protein structure. Defects in cis cleavage with the minute plaque phenotype have been observed in poliovirus point mutants (G60R, S66F, L39F, N18K, D135N, V119M, A22T, C55Y, and S134L) when the poliovirus 2A gene was expressed in yeast [42]. Previously it was reported that single amino acid mutations at the His20 and Cysl09 residues led to complete loss of the proteolytic activity of 2Apro [43].  Important ligand-protein interactions have been identified following docking investigations in this study, providing potential candidates for drug targeting, the potency of which may be increased following relatively simple structural alterations.

CONCLUSION
A structure-function investigation of the poliovirus 2A protease is presented in the study. By using the crystal structure of coxsackievirus B4 as a template, the three-dimensional structure of the poliovirus 2A protease was predicted. I-TASSER and COACH results identified potential binding sites in the protein structure. Use of computational approaches to identify more potential drug binding sites in the conserved regions will lead to better drug design methods in the years ahead. This computational approach may be combined with experimental methods, thus contributing to functional interpretation of protein structures.