In-silico single nucleotide polymorphisms (SNP) mining of Sorghum bicolor genome
Single nucleotide polymorphisms (SNPs) may be considered the ultimate genetic markers as they represent the finest resolution of a DNA sequence (a single nucleotide), and are generally abundant in populations with a low mutation rate. SNPs are important tools in studying complex genetic traits and genome evolution. SNP mining can be done by experimental and computational methods. Computational strategies for SNP discovery make use of a large number of sequences present in public databases [in most cases as expressed sequence tags (ESTs)] and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only. In the present study, online SNP and allele detection tool HaploSNPer (based on QualitySNP pipeline) and Sorghum bicolor genome was used. As a result, 77094 potential SNPs and 40589 reliable SNPs were detected in S. bicolor. In the 77094 potential SNPs detected
transitions, transversions and indels were 34398, 35871 and 6825, respectively. In the 40589 reliable SNPs detected transitions, transversions and indels were 17042, 20500 and 3047, respectively.
Key words: Single nucleotide polymorphisms (SNP), expressed sequence tags (EST), HaploSNPer.