Modelling functional and structural impact of non-synonymous single nucleotide polymorphisms of the DQA1 gene of three Nigerian goat breeds

The DQA1 gene is a member of the highly polymorphic MHC class II locus that is responsible for the differences among individuals in immune response to infectious agents. In this study, the authors performed a comprehensive computational analysis of the functional and structural impact of non-synonymous or amino acid-changing single nucleotide polymorphisms (SNPs) (nsSNPs) that are deleterious to the DQA1 protein in Nigerian goats. A 310-bp fragment of exon 2 of the DQA1 gene was amplified and sequenced in 27 unrelated animals that are representative of three major Nigerian goat breeds (nine each of West African Dwarf, Red Sokoto, and Sahel of both sexes) using genomic DNA. Forty-two nsSNPs were identified from the alignment of the deduced amino acid sequences. Based on the PANTHER, PROVEAN and PolyPhen-2 algorithms, there was consensus in identifying the mutants I26D, E114V and V115F as being deleterious. Further, differences between the native and the mutant proteins in the subsequent molecular trajectory analysis (stabilizing and flexible residue composition, total grid energy, solvation energy, coulombic energy, solvent accessibility, and protein-protein interaction properties) revealed E114V and V115F to be highly deleterious. Combined mutational analysis comparing the amutant (I26D, E114V and V115F mutations collectively) with the native protein also showed changes that could affect protein function and structure. Further wet-lab confirmatory analysis in a pathological association study involving a larger population of goats is required at the DQA1 locus. This would lay a sound foundation for breeding disease-resistant individuals in the future. ______________________________________________________________________________________


Introduction
Single nucleotide polymorphisms (SNPs) are a common form of genetic variation among individuals (Li et al., 2009).They are amenable to high-throughput automated genotyping, and therefore are a preferred choice as genetic markers in research on diseases and their corresponding drugs (Doss et al., 2008).Nonsynonymous SNPs (nsSNPs) alter the transcribed amino acid residues and result in functional diversity of the encoded proteins (Yates & Sternberg, 2013).One of the major purposes of genetics studies is to distinguish functionally neutral mutations from those that contribute to differences in phenotypes (Alshatwi et al., 2012).An abundance of diverse biological data from various sources constitutes a rich fund of knowledge, which has the power to advance our understanding of organisms.Computational methods allow the integration and effective use of these data to elucidate local and genome-wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins (Gray et al., 2012).Next generation sequencing techniques generate large volumes of data related to SNPs, but evaluation of biologically functional SNPs using in vitro studies is tedious, time consuming and expensive.However, the consequences of mutations and their effects on biological pathways can be predicted in silico (Patel et al., 2015;Chandramohan et al., 2015;AbdulAzeez & Borgio, 2016).
The major histocompatibility complex (MHC) is a large genomic region or gene family, which is found in most vertebrates and is highly polymorphic (Amills et al., 2005;IIhan et al., 2016).Molecules encoded by the MHC play an important role in the immune system and autoimmunity (Amills et al., 1998).
The DQA1 gene is a member of the MHC (Zhou & Hickford, 2004).Sequencing of the DQA1 complementary DNA (cDNA) revealed a single 768 bp open reading frame consisting of four exons and encoding a 255 amino acid protein (Amills et al., 2005;Plasil et al., 2016).Among MHC genes, DRB and DQA are most polymorphic (Subramani et al., 2016).This extensive structural polymorphism is responsible for the differences among individuals in immune response to infectious agents (Vandre et al., 2014).Most of the functionally important polymorphisms in DQA1 are concentrated in exon 2, and this diversity is correlated with pathogen richness (Wegner et al., 2003).
Although previous studies have examined the sequence variability of the MHC DQB1 and DRB genes (Yakubu et al., 2013;Yakubu et al., 2017), there is a dearth of information on nsSNPs of the MHC DQA1 gene of Nigerian goats.Such information could help to unravel the possible genotypes that affect livestock diseases in future association studies.Therefore, the present investigation aimed at identifying and screening deleterious nsSNPs of the DQA1 gene of West African Dwarf, Red Sokoto and Sahel goats of Nigeria, which are likely to affect the structure and function of the transcribed protein.

Materials and Methods
A 310-bp fragment of exon 2 of the MHC Class II DQA1 gene was amplified in 27 unrelated animals from three major Nigerian goat breeds (nine each of West African Dwarf (WAD), Red Sokoto (RS) and Sahel (SH)] using genomic DNA.Primers were designed using data from Amills et al. (2005).Primer sequences were DQA/FW, 5´GAAGCCCACAATGTTTGATAGTCA-3´ and DQA/REV, 5´-GGGGAAGAACAACAAAGAGAGGCAG-3´.Primer-BLAST of NCBI (National Center for Biotechnology Information) was employed.The organism used was Capra hircus (taxid: 9925).The length of the forward primer was 24 bp with Tm (melting temperature) and GC (guanine-cytosine) values of 59.54 o C and 41.67%, respectively, while the corresponding length, Tm, and GC contents for the reverse primer were 25 bp, 63.73 o C and 52.00%.Primer dilution was done by adding 302.6 µL of 1 x TE buffer to DQA1 FW and 109 µL of 1 x TE buffer to DQA1 REV.Polymerase chain reaction (PCR) amplifications were carried out in a C1000 TM thermal cycler (Bio-Rad, USA) in a total reaction volume of 20 µL containing about 20 ng DNA, 10 pmol of each primer in 10 µL Syd Lab PCR Premix (Syd Labs, Inc., Malden, USA) containing Taq DNA polymerase, dNTPs, MgCl 2 , reaction buffer, PCR stabilizer and enhancer at optimal concentrations.The thermal profile for amplifying the DQA1 exon 2 involved 35 cycles of initial denaturation at 94 °C for 4 min, denaturation at 94 °C for 30 s, annealing at 62 °C for 30 s, extension at 72 °C for 30 s, and elongation at 72 °C for 10 min.Every other protocol, including sequencing of the gene, is as described in Yakubu (2015).The ethical guidelines of the International Council for Laboratory Animal Science and Cornell University, Ithaca, NY, USA, were followed strictly.For nsSNP (missense variant) identification, sequence alignment, translations, and comparisons were carried out with the ClustalW option of MEGA 5 (Tamura et al., 2011), using the Capra_hircus_DQA1_AY464652.1 complete coding sequence (CDS) from the GenBank as the reference.
Functional effects of nsSNPs were predicted in silico using the PANTHER, PROVEAN, and PolyPhen-2 algorithms.PANTHER (Thomas et al., 2003) estimates the likelihood of a particular non-synonymous amino acid change having a functional impact on the protein.The web-based tool PROVEAN predicts the effects of an amino acid substitution or indel on the biological function of a protein, based on sequence clustering and alignment-based scoring.Variants with scores less than -2.5 were considered deleterious (Choi & Chan, 2015).PolyPhen-2 predicts the potential impact of an amino acid substitution on the function and structure of a protein (Adzhubei et al., 2010;Adzhubei et al., 2013).Where there was consensus by the three algorithms on the deleterious effect of a nsSNP, further confirmatory analyses were carried out.Additionally, a combined mutational analysis was conducted, incorporating all of the deleterious nsSNPs simultaneously (the term 'amutant' was coined to refer collectively to the nsSNPs) to reduce the possibility of false positives and exploit the effect of correlated mutations because these may enhance or diminish the functional properties of a protein.
Effects of nsSNP on protein stability were assessed with I-Mutant2.0(Capriotti et al., 2005) and MuStab (Teng et al., 2010).I-Mutant2.0predicts whether a single site mutation stabilizes or destabilizes the protein in 80% of the cases in which the three-dimensional structure is known and 77% of the cases in which only the protein sequence is available.MuStab, on the other hand, encodes the input sequence with the optimal feature subset, and then calls the svm_classify program of SVM-Light software package to classify the protein stability changes after the amino acid substitution (Teng et al., 2010).
Structural models for the native protein and mutant proteins I26D, E114V, V115F, and amutant were generated with Phyre2 (Kelley et al., 2015).This suite of tools aligns an input target with pre-existing templates to generate a series of predicted models.Structural similarity of alternative protein models was quantified by template modelling (Tm) scores and root mean square deviation (RMSD) in Angstroms (Å) using Tm-Align.A Tm-score <0.2 is equivalent to a random structure from the PDB (Protein Data Bank) and a Tm-score of 0.5 or greater indicates that the proteins have a high probability of being in the same SCOP/CATH (structural classification of proteins/class architecture topology homology) fold (Zhang & Skolnick, 2005).An RMSD ≥2.0 has a negative effect on the stability and function of the protein (Han et al., 2006).Total energy after energy minimization was calculated for the native and each altered model using the Groningen Molecular Simulation (GROMOS) computer program package implementation of DeepView v 4.1 (Guex & Peitsch, 1997).
The proposed protein structures were validated with ERRAT and ProSA (Sippl, 1995;Wiederstein & Sippl, 2007).ERRAT is a program for verifying protein structures determined by crystallography (Colovos & Yeates, 1993).Error values are plotted as a function of the position of a sliding 9-residue window.The error function is based on the statistics of non-bonded atom-atom interactions in the reported structure compared with a database of reliable high-resolution structures.ProSA returns a z-score that indicates overall model quality based on the Cα positions in 3-D space.
Stabilizing residues of the native and mutant DQA1 proteins of Nigerian goats were identified with SRide (Gromiha et al., 2004).The option selected was the PSSM-based encoding from the PDB file.The analysis was based on parameters such as surrounding hydrophobicity, long range order (LRO), stabilization centre and conservation score.
Residue positions that could change the conformation of the DQA1 protein were predicted with FlexPred.The software uses solvent accessibility of a protein sequence to identify the positions and thus change kinetic energy and potentially cause pathogenic disorders (Kuznetsov, 2008;Kuznetsov & McDuffie, 2008).
The molecular dynamic simulation was performed to calculate the total energy difference in solvated condition using the Poisson-Boltzmann equation (PBE) solver online tool (Smith et al., 2012;Sarkar et al., 2013).The simulation was performed by placing the protein under these conditions: i) interior dielectric constant: 4.00, ii) exterior dielectric constant: 80.00, iii) percentage fill: 80, grid scale: 2.00, iv) salt concentration: 0.10, v) probe radius: 1.40, and vi) boundary conditions: 2.00.
Solvent accessibility of residues of alternative amino acid sequences of the DQA1 protein were predicted with Weighted Ensemble Solvent Accessibility (WESA) (Shan et al., 2001;Chen & Zhou, 2005).The software predicts a residue as being buried or exposed (defined as having a surface area greater than 20% of the maximum area for that type of amino acid) with an expected accuracy of 80%.
Potential protein-protein interaction effects were predicted with InterProSurf (Zhou & Shan, 2001;Negi et al., 2006;Negi & Braun, 2007;Negi et al., 2007) and consensus Protein-Protein Interaction Site Predictor (cons-PPISP) (Chen & Zhou, 2005).InterProSurf predicts interacting amino acid residues in proteins that are most likely to interact with other proteins, given the 3D structures of subunits of a protein complex.The prediction method is based on the solvent accessible surface area of residues in the isolated subunits, a propensity scale for interface residues and a clustering algorithm to identify surface regions with residues of high interface propensities.cons-PPISP is a consensus neural network method that is trained to predict whether a surface residue is in the interaction site.Given the structure of a protein, cons-PPISP will predict the residues that would probably form the binding site for another protein.

Results
Forty-two nsSNPs of the caprine DQA1 alleles were obtained from the alignment of the deduced amino acid sequences of Nigerian goats (Table 1).Of these, five, seven and thirteen were predicted to be deleterious by PANTHER, PROVEAN and PolyPhen2, respectively (Table 1).All three algorithms identified I26D, E114V and V115F as being harmful, indicating that these amino acid substitutions negatively affect the function of the DQA1 protein.Two of the three nsSNPs that were predicted to be collectively deleterious were found to decrease protein stability, with the amino acid substitution E114V being the exception.Although the stability of E114V increased, its reliability index of 1 and 24.64% prediction confidence were very low.In contrast the reliability indexes were 8 and 9, and the prediction confidence was 94% and 88% for I26D and V115F, respectively.
There were 181 residues (71% of DQA1 sequence) modelled with 100.0%confidence by the single highest scoring template for native, I26D, E114V, V115F amino acid substitutions and the amutant protein, respectively (Figure 1).That is, the percentage identity between the DQA1 sequence of the native protein and the mutations (above) in Nigerian goats and the template used to generate the structures is high, conferring accuracy to the 3D structures.This means all the five DQA1 structures were correctly predicted.For all of the amino acid substitutions that were deleterious, including amutant, RMSD = 0.0 and TM = 1.00.However, energy of minimization varied from one amino acid substitution to another.The values recorded for V115F and amutant (-7046.844 and -7048.283kJ/mol, respectively) were greater than for the native protein (-7208.369kJ/mol) and I26D (-7208.369kJ/mol) and E114V (-7209.886kJ/mol).
Seven stabilizing residues were predicted for the native and mutant I26D, six residues for E114V, five for V115F, and eight for amutant (Table 2).Variation in flexibility was also observed in the native and the variants.Residues 110 and 155 were unique to the native protein.The amino acid substitution E114V has a unique flexible residue at position 244, while flexible residues at positions 121 and 125 were found only in V115F (Table 3).It was not possible to calculate flexible residues for the amutant.

Discussion
The use of polymorphisms within genes is fast gaining attention as a complement to the current methods of selection because of their association with traits of interest in animals, especially those that affect livestock diseases (Miyasaka et al., 2011;Yakubu, 2015).However, the effects of nsSNPs in the DQA1 gene in goats have not been predicted to date in silico.Amino acid substitutions in DQA1 may induce structural and functional damage.Normal protein function can be altered by deleterious nsSNPs, through geometric constraint changes (Gromiha & Ponnuswamy, 1995), hydrophobic changes (Rose & Wolfenden, 1993), and disruption of salt bridges or hydrogen bonds (Shirley et al., 1992;Michels et al., 2011).These may have pathological phenotypic consequences (Kumar et al., 2012;Yates & Sternberg, 2014).The differences in prediction capabilities of the three algorithms used in the present study may be connected with their differing alignment procedures.The difference in the results of computational tools may be due to differences in features utilized by the methods and therefore dissimilar predictions might be expected (de Alencar & Lopes, 2010).
A structural model of a protein is important in understanding biological processes at molecular level (Wiederstein & Sippl, 2007).The statistic RMSD is commonly used to evaluate the similarity of protein structures to their templates and to determine the accuracy of the alignment of the residues of two structures (Kufareva & Abagyan, 2012).In the present study, the structures of the native and mutant proteins, which were transcribed from the DQA1 gene, appeared similar, based on RMSD values.However, energy of minimization differed between the native protein and the mutants.It implies that the higher energy of minimization of V115F and amutant might impact the protein negatively because high energy configurations may lead to physical perturbation and instability.This difference was also observed in the overall structure quality of V115F and amutant as revealed by ERRAT, where the model quality of the native protein was higher.The different energy profiles of the native and the mutants, as found in ProSA, indicate varying structural quality of the DQA1 protein of the present study.This is because the z-score of ProSA indicates overall model quality and measures the deviation of the total energy of a structure with respect to an energy distribution derived from random conformations (Sippl, 1995;Zhang & Skolnick, 1998).The more negative the z-score, the better the model quality (Saha et al., 2013).The energy separation between the native fold and the average of an ensemble of miss-folds also indicated that the conformational energy landscape of the E114V, V115F and amutant proteins differed from that of the native protein.The observed variation in the total grid energy, solvation energy, and coulombic energy of the native and the mutants could influence the structure and biological functions of the proteins.This is consistent with the submission of Smaoui et al. (2016) that mutations that increase the stability of the structure have the lowest energies, while mutations that destabilize the structure increase its energy.The varying composition of the stabilizing residues in the native and especially variants E114V, V115F and combined I26D, E114V, and V115F amino acid substitutions could influence the structure of the protein, thereby affecting its stability and function.Some of the flexible residues were found around the point of mutation in the present study.These variations may contribute to differences in folding of the protein at these positions.An inherent property of macromolecular structure is conformation flexibility (Ramesh et al., 2013).Changes in protein folding may be involved in various biological functions, such as signal transduction, catalysis, macromolecular recognition, locomotion, and many pathogenic disorders (Kuznetsov & McDuffie, 2008).
Hydrophobic core residues are probable sites of deleterious mutations (Zhangab & Wanga, 2014;David & Sternberg, 2015;Sahoo et al., 2015).Replacing a hydrophobic residue with another hydrophobic residue, as observed in mutant V115F, may induce a volume change (Eriksson et al., 1992) and thereby decrease the stability of the protein.This was supported by the submission of Ohkuri & Yamagishi (2003) that residues with larger Van der Waals volume may introduce stress in the conformation of side chains at the subunit interface.The mutant E114V is a change from a polar/hydrophilic, H-bonding, acidic residue to a non-polar/hydrophobic residue.It is possible that this change could result in loss of hydrogen bonds and, consequently, disturb correct folding (Doss & NagaSundaram, 2012).Doss et al. (2008) reported that a change from exposed to buried state could be considered functionally significant in a mutant protein.Accordingly, change in amino acid may affect polar-polar interactions within the protein molecule itself, alter the energy of stabilization, and further destabilize the protein (Peng et al., 2005).The difference in charge between hydrophobic and acidic residues, as observed in the present study, would probably disturb the ionic interaction in the native protein.
The prediction of residue solvent accessibility could assist in a better understanding of the relationship between structure and sequence (Alanazi et al., 2011).The cost of burying a hydroxyl group depends on solvent accessibility (Blaber et al., 1993).If the residue is exposed, the mutant is destabilized by <0.5 kcal/mol.If the residue is fully buried, the mutant is destabilized by ~ 1-3 kcal/mol (Blaber et al., 1993).Mutants E114V and V115F were predicted to be buried in this study, and therefore may disturb interactions with other molecules or other parts of the protein.

Conclusion
Identified structural and functional changes in the DQA1 protein supported predictions of the nsSNPs that resulted in I26D, E114V and V115F being deleterious.The combined mutational analysis carried out with them exposed the way in which amutant is different from other individual mutations compared with the native.The difference between results of the trajectory analysis and the RMSD values may be connected with the drawback of RMSD as a positional distance-based measure compared with measurements that are contact-based.Protein-protein interaction (PPI) properties of variants E114V and V115F and amutant differed from those of the native protein.Such PPI can be strengthened or weakened by nsSNPs, which may cause loss of salt bridges, steric clashes and later post-translational modifications, among other effects.Changes in the interactions among proteins can lead to rewiring (i.e.acquisition or loss of co-complex memberships) of the PPI network and thus alter phenotype.These results could contribute to explaining susceptibility to disease in Nigerian goats, pending their confirmation in future wet lab and pathogenic population-based association studies.

1Figure 1
Figure 1 Predicted 3D structures of native and mutant DQA1 proteins of goats a: Native DQA1 protein b: DQA1 protein of Nigerian goats as a result of change from isoleucine to aspartic acid at position 26 (I26D) c: DQA1 protein of Nigerian goats as a result of change from glutamic acid to valine at position 114 (E114V) d: DQA1 protein of Nigerian goats as a result of change from valine to phenylalanine at position 115 (V115F) e: DQA1 protein of Nigerian goats as a result of combined I26D, E114V and V115F amino acid substitutions Colours indicate amino acid residues

Table 1
Functional effect of amino acid substitutions in the protein sequence coded by DQA1 gene of Nigerian goats predicted using PANTHER, PROVEAN and PolyPhen-2

Table 2
Stabilizing residue variations between native and deleterious non-synonymous single nucleotide polymorphisms of DQA1 protein

Table 3
Flexible residue position variations between native and deleterious non-synonymous single nucleotide polymorphisms of DQA1 protein

Table 4
Energy calculation through molecular dynamic simulation of native and mutants

Table 5
Solvent accessibility prediction using Weighted Ensemble Solvent Accessibility

Table 6
InterProSurf protein-protein interaction prediction