Sequence analysis of Maturase K (matK): A chloroplast-encoding gene in some selected pulses
The application and utilization of sequence data has been found very informative in the characterization and phylogenetic relationship of different crops species. This study aimed to use bioinformatics tools to characterize the matK gene in some selected legumes with special reference to pigeon pea [cajanus cajan (L.)Millsp] matK sequence as a quarry sequence. Nucleotide and amino acid sequence of matK gene of 10 legumes were retrieved from NCBI database and analysed for homology, physiochemical properties, motifs, GC content as well as phylogenetic relationships. Results showed that the nucleotide and amino acid sequence lengths of this gene among the selected legumes differs. Its nucleotide length varied between 631-1580bp, while the amino acids sequence varied between 21 and 509 residues. P. tetragonolobus matK and C. cajan matK sequences had percentage identity of 88% while V. sativa had the lowest percentage identity of 70%. G.tomentella and P. tetragonolobus matK sequence shared the same percentage similarity of 91% with C.cajan while V. sativa had the least (78%) with C.cajan. The motif predicted were tyrosine kinase phosphorylation site, N-myristoylation site, N-glycosylation site, protein kinase phosphorylation site, casein kinase II phosphorylation site and cAMP- and –cGMP dependent protein kinase phosphorylation site. However, microbodies C-terminal targeting site was only predicted in the amino acid sequence of matK gene of P. sativum and C.cajan. Phylogenetically, two major clades were revealed with P.sativum, V.sativa, and C. arientinum matK gene sequence in clade A and matK gene sequence of P.tetragonolobus, C. cajan, G. tomentella, P.vulgaris, V.unguiculata, V. angularis and V. radiate in clade B. It showed that clade A diverged from the ancestry legume approximately 39MYA while legume sequences in clade B diverged from the ancestor about 57MYA. GC content of the nucleotide sequence of matK gene of V. sativa was highest (31.37%) with the range in the selected legume varying between 7.29%-31.37%. The secondary structure of amino acids sequence of matK gene in the selected legume revealed the alpha helix (34.14%-41.27%), extended strand (11.56%-20.99%) and random coil (39.48%- 51.76%). The major domain architecture found in the amino acid sequence were single and double types. Implicitly, though maturase K gene sequences in the selected legumes differ in lengths physiochemical properties, GC content and motif. The result of this study revealed that C.cajan matK gene sequences is closely related to that of P. tetragonolobus but distant to V. unguiculata as well as P. vulgaris.
Keywords: Maturase K (matK) gene, bioinformatics, phylogenetics, selected legumes, breeding