Mining olive genome through library sequencing and bioinformatics: Novel sequences and new microsatellites
AbstractAs one of the initial steps of olive (Olea europaea L.) genome analysis, a small insert genomic DNA library was constructed (digesting olive genomic DNA with SmaI and cloning the digestion products into pUC19 vector) and randomly picked 83 colonies were sequenced. Analysis of the insert sequences revealed 12 clones that have no matches to previously characterized/ confirmed sequence records, and 5 insert sequences that are completely new to any nucleotide database available. The remaining
sequences had homology to previously described protein coding genes (13%), ribosomal RNAs/tRNAs (24%), phage DNA (1%) and non-functional sequences (such as “chloroplast DNA”, “Lotus chromosome 3” or “Arabidopsis chromosome 2”) that are confirmed for accuracy but have not been assigned a function (22%). Analysis of the insert sequences employing multiple bioinformatics tools including a secondary structure prediction analysis revealed potential properties such as coding regions, regulatory sequences and microsatellites that helped to extract more information especially about insert sequences with no hits to any sequence record with a described function. Our results and analyses also
suggested that olive di-nucleotide microsatellites with a repeat number of three [(XY)3] could be informative and therefore should not be excluded from studies involving microsatellite analysis. Common insights extracted from multiple bioinformatics analyses suggested that the utilization of these tools can be useful for mining genomic sequences.