Computational molecular analysis of deleterious mutations in serum amyloid A3 gene in goats and cattle

Serum amyloid A3 (SAA3) protein found within caprine and bovine mammary epithelial cells is said to be important in disease conditions and tissue remodeling. The present investigation aimed at identifying deleterious non-synonymous single nucleotide polymorphisms (nsSNPs) in SAA3 gene of goats and cattle using an in silico assay. Amino acid sequence data of the protein of goats and SNPs of cattle were retrieved from the National Centre for Biotechnology Information (NCBI) database. Bioinformatics prediction tools used for the detection of deleterious nsSNPs were PROVEAN, SIFT, PolyPhe-2 and PANTHER. A total of eleven nsSNPs were obtained from the aligned sequences of goats, out of which two variants (R123G and G126D) were predicted to be deleterious by three out of the four algorithms. However, in cattle, four out of the eleven nsSNPs were found to be harmful to the transcribed protein. The two mutants in goats and R114Q in cattle were also found to decrease protein stability. Further confirmatory analysis however, revealed that variant R123G was highly deleterious as there were marked differences between it and the native protein in terms of total free energy, stabilizing residues, ordered and disordered regions of protein and secondary structure prediction. Similarly, Cmutant (a combination of R123G and G126D mutations) in goats and Dmutant (a combination of S77R, Q84K, S103W and R114Q mutations) in cattle also appeared to distort SAA3 protein structural landscape and function. The present deleterious nsSNPs when validated using wet lab experimental protocols could be important biological markers for disease detection and therapy in goats and cattle.


Introduction
utilized as a diagnostic, prognostic or therapy Mastitis, an inflammation of the follow-up marker for many diseases. The mammary gland, is the most costly infectious mammary Serum amyloid A3 (M-SAA3) mRNA is disease of dairy ruminants worldwide (Brenaut localized to restricted caprine mammary et al., 2012) affecting milk yield and quality and epithelial cells (MECs). It has been reported to be the concomitant drastic reduction in farm profit.
expressed at a moderate level in late pregnancy; Milk contains a range of minor immune-related at a low level through lactation; induced early in proteins that collectively form a significant first milk stasis and expressed at high levels in most line of defence against pathogens, acting both MECs of ruminants during mid to late involution within the mammary gland itself as well as in the including inflammation/mastitis ((Molenaar et digestive tract of the suckling neonate (Wheeler al., 2009;Domènech et al., 2012;Domènech, et al., 2012). Serum amyloid A (SAA) is, like C-2013). reactive protein (CRP), an acute phase protein The molecular devices of the process that which, according to De Buck et al. (2016), can be causes disease conditions or regulate the innate and adaptive immune responses in livestock deleterious amino acid substitutions of SAA3 species may be revealed through in-depth gene in goats and cattle which may alter the knowledge of the structures and functions of conformational and functional features of the proteins. Non-synonymous single nucleotide SAA3 protein. polymorphisms (nsSNPs) may potentially affect the function and phenotype of the encoded Materials and Methods proteins (Samadian et al., 2017). There is a rapid Sequence retrieval growth in the identification of SNPs making it The amino acid sequence data on goats extremely difficult to evaluate the biological SAA3 gene were retrieved from the website of significance of each SNP through experimental the National Centre for Biotechnology protocols. However, first hand information could Information (NCBI). A total of four (4) goat be obtained through computational, theoretical sequences were obtained (Table 1). The amino approaches to identify and investigate whether acid sequence alignment of the species was the effects of nsSNPs are deleterious to the carried out using ClustalW algorithm of structure and function of proteins before Molecular Evolutionary Genetic Analysis (MEGA e m b a r k i n g o n w e t l a b. e x p e r i m e n t s 6.0) software (Tamura et al., 2013) to obtain (Nagasundaram et al., 2015). This knowledge nonsynonymous amino acid substitutions. For subsequently may be exploited in the design of cattle, apart from the amino acid sequences, better drugs to treat specific diseases (De Buck fifteen single nucleotide polymorphisms (SNPs) et al., 2016) especially those that are associated comprising four synonymous and eleven nonwith the mammary glands such as mastitis.
synonymous amino acid substitutions were There is dearth of information on snSNPs equally obtained from NCBI dbSNP database of SAA3 gene and the potential effects on ( ) (Table 2). transcribed proteins in ruminants. Therefore, the present study was embarked upon to screen http://www.ncbi.nlm.nih.gov/snp  The functional effects of the nsSNPs of Energy of minimization and root mean square SAA3 gene in goats and cattle were predicted deviation (RMSD) computationally using PROVEAN, Polyphen-2,  Energy minimization of the modeled refine models were applied as described in an earlier mutant proteins was carried out using the 3D study (Yakubu et al., 2017) while the procedure (Bhattacharya and Cheng, 2013). Its protocol for SIFT was highlighted in Kumar et al. (2009). refines the initial protein structures by optimizing SIFT uses sequence homology based algorithm Hydrogen Bonds network and also the atomicto classify the effect of amino acid substitutions level energy minimization using a combination of on protein function. PANTHER calculates the physics and knowledge based force fields. This length of time (in millions of years) a given amino force field is said to permit the evaluation of the acid has been preserved in the lineage leading to energy of the modeled structure as well as the protein of interest (Tang and Thomas, 2016).
overhaul distorted geometries through energy The PANTHER thresholds used in this analysis minimization. The mutant proteins were were: "probably damaging" (time > 450my, superimposed onto SAA3 native protein and the corresponding to a false positive rate of ~0.2 as corresponding root mean square deviation tested on HumVar), "possibly damaging" (RMSD) values were generated using SuperPose (450my > time > 200my, corresponding to a ver 1.0 (Maiti et al., 2004). false positive rate of ~0.4) and "probably benign" (time < 200my). Where majority of the Validation of protein structures algorithms agreed on the deleterious nature of a The proposed SAA3 protein structures of nsSNP, further analyses were carried out to goats and cattle were validated with ERRAT and confirm or prove otherwise such a claim. Also, ProSA statistical softwares (Sippl, 1995; the deleterious nsSNPs were combined (the term Wiederstein and . ERRAT uses 'Cmutant' was used to represent these combined characteristic atomic interaction to distinguish deleterious nsSNPs) to exploit the effect of between regions of protein structures that are correlated mutations as described in Yakubu et correctly and incorrectly determined (Colovos al. (2017). and Yeates, 1993). ProSA uses Z-scores to indicate the quality of models. It indicates that the higher the Z-score, the lower the quality of Protein stability prediction the model. The prediction of the effects of nsSNPs on protein stability of SAA3 gene of goats and Molecular dynamic simulation cattle was done using I-Mutant2.0 (Capriotti et Generalized Born (GB) models, executed al., 2005). The free energy change (DDG) between the mutant and native proteins is through The Bluues server, were used to find predicted by I-Mutant2.0. electrostatic differences in structures between the native and the mutant alleles of goats and Tertiary structure prediction cattle. The server employs the program Bluues The structural models of SAA3 of goats to execute electrostatic calculations for singleand cattle were constructed using the Phyre2 atomic structures and provides options for point server (Kelly et al., 2015), which uses the mutations (Walsh et al., 2012). alignment of hidden Markov models via HHsearch (Soding, 2005) to improve the Identification of stabilizing residues accuracy of alignment and rate of detection.
Stabilizing residues of the native and Structural similarities of alternative protein mutant SAA3 proteins of goats and cattle were models of both species were quantified by the identified with SRide. The SRide parameters template used in the prediction. The models were based on hydrophobicity, long-range were viewed using PyMOL (DeLano, 2006) which interactions, and sequence conservation was equally used to indicate the position of (Gromiha et al., 2004).

Prediction of ordered and disordered amino acid
There was a consensus by all the algorithms, residues with the exception on SIFT (predicted only RaptorX web server (Kallberg et al., R123G to be harmful) in the prediction of 2012) was used to predict whether the amino variants R123 and G126 as being deleterious acid variants are in ordered or disordered regions (Table 3). These two variants were therefore of the proteins. This was done for both species.
collectively referred to as 'Cmutant' for further confirmatory analysis. In cattle, four variants Protein-Protein Interaction namely, S77R, Q84K, S103W and R114Q were In order to predict the solvent also predicted to be deleterious by three out of accessibility and secondary structures in the 3D the four algorithms. These four variants were structure (Porollo and Meller, 2007)    The values of RMSD of the mutants of to effect structural change, is higher than the both caprine and bovine species obtained from value of 0.05 recorded for the native and G126D. SuperPose are shown in Table (

4). RMSD value of
In cattle, only substitution R114Q recorded 0.19 for variants R123G and Cmutant though RMSD value of less than 0.1 less than the threshold of > or = 2.0 for a variant variants in both species differed in born self However, the ProSA Z-scores of R123G and energy, coulomb energy, electrostatic solvation Cmutant (-2.57 and -2.58) were less negative energy and total energy ( Table 5). than that of the native and G126D proteins (- Table 5: Energy differences between the native protein and mutants There was a single stabilizing amino acid cattle, the native protein and all the mutants residue (Alanine) at position 50 for the SAA3 have Alanine at position 50. native protein and mutant G126D (Table 6). However, no stabilizing residue was obtained in case of the substitutions R12G and Cmutant. In In goats, the mutant R123G was found to residue, but in terms of E-Beta Strand was lower be a change from ordered (R) to disordered (G) by a single residue. The native and the mutants amino acid residue. However, the mutant G126D all have equal number of C-Coil residues. In was a change from disordered (G) to ordered (D) cattle, variant Q84K and Dmutant were higher by residue. In cattle, all the mutants (S77R, Q84K, two residues in C-Coil, one residue in E-Beta S103W and R114Q) were found as ordered Strand and lower by three residues in H-Helix residues. All predictions were done at a false compared with the native protein. Apart from positive rate threshold of 5% in both species. variant S77R, every other mutant differed from The native and the mutants varied in the the native protein in terms of secondary number of H-Helix and E-Beta Strand. While structure configuration. Substitutions R123G substitution R123G has a loss of residue in H-and G126D (goats), S77R and S103W (cattle) Helix, it however, has a gain of residue in E-Beta were found as non-interfacial residues in soluble Strand compared to the native protein (Table 7).
domain. However, Q84K and R114Q were The Cmutant H-Helix was higher by a single exposed (interfacial residues in soluble domain).

Discussion
attributed to loss of interactions by the natives. It has been reported that in non-human This may destabilize the protein molecules and mammals, SAA3 is the main SAA form being their subsequent functions. expressed extrahepatically (Upragarin et al., The identification of stabilizing residues 2005) with evidence in the colostrum of could be used as potential candidates for ruminant animals (McDonald et al., 2001). This studying protein folding and stability. This is gene may play a role in the protection of the because a single incorrectly predicted mutation, mammary gland during remodelling and apart from jeopardizing the stability of the entire infection (Molenaar et al., 2009). The high protein, can counteract a larger number of number of polymorphic sites predicted to be stabilizing mutations ( et al., 2017). The neutral at SAA3 locus of cattle and goats in the conversion of R123G from ordered to disordered present study could be an indication of high residue in the mutant may have structural and conservatism. This is in line with the submission functional effect on the SAA3 protein. Arginine of Domènech Guitart (2013) that the protein (R) is a basic-polar amino acid while glycine (G) sequences of SAA are highly conserved with a is non-polar. This change from a charged amino wide range of mammals exhibiting high acid/hydrophilic to non-polar/neutral state could homology. This conservation has been disturb the ionic interaction in the native protein, maintained through the evolution of eutherian thereby affecting the structural configuration. mammals (Uhlar and Whitehead, 1999), This is congruous to the submission of Yakubu et indicating that the SAA3 locus may likely have al. (2017). Protein fragments that are not well biological functions of importance. The ordered in the crystal have been reported to consensus in the prediction of R123G and G126D simply not visible in electron density, as a result (goats) including S77R, Q84K, S103W and of which they are not built into the final model R114Q (cattle) as being deleterious may be a ( et al., 2007). Disordered regions of pointer to their pathological phenotypic protein could have specific structural and amino consequences. The high reliability indexes acid affinities (Kumari et al., 2015) compared to obtained in the present study when the the ordered regions. Characterization of the substitutions R123G and G126D including physico-chemical properties of protein-protein R114Q were subjected to stability test further interactions is very useful because key biological indicate that the three variants could be disease-processes such as antigen-antibody recognition, related mutations.
hormone-receptor binding and signal Variants R123G and Cmutant (goats) as transduction are said to be regulated through well as Dmutant (cattle) appear to have more proteins' association and dissociation (Zen et al., propensity to alter protein structural landscape 2010). than other mutants though below RMSD value The varying configurations of the native threshold. Based on the fact that they have less and the SAA3 protein variants R123G, Q84K, negatve Z-scores, it seems variants R123G and Cmutant and Dmutant could disturb protein Cmutant including R114Q and Dmutant (-2.71) folding and hence its structure and function. This are more disposed to affecting the protein is because the 20 types of amino acids in the structure. This is consistent with the report of three distinct secondary structures (helix, beta Saha et al. (2013) that the model quality is better strand, and loop) have been reported to provide the more negative the Z-score is.
important information on the interaction There is every tendency that R123G, preferences of amino acids in the folding of Q84K, R114Q, Cmutant and Dmutant changes proteins ( et al., 2010). The non-interfacial could affect protein conformation and biological disposition of the mutant R123G, S77R and roles by virtue of their less negative total free S103W could also affect their level of interaction energy. In a related study, Alberts et al. (2002) which could possibly exert a disruptive effect on reported that the native structure or dimerization. According to Zhan and Lazaridis of a protein generally is enhanced (2009), non-interfacial ionizable residues can by free energy minimization. However, the influence dimer association by inducing large denatured state of proteins, as observed in the conformational changes. mutants of the present study is characterized by high conformational entropy, which could be . and Stelwagen, K. (2012). Hostdefence-related proteins in cows' milk. Animal 6 Nagasundaram, N., Zhu, H., Liu, J., Karthick, V.,