A novel stepwise support vector machine (SVM) method based on optimal feature combination for predicting miRNA precursors
MicroRNAs (miRNAs) are a class of non-coding RNAs that are produced from miRNA precursors (premiRNAs) with stem-loop structure. At present, development of computational approach for pre-miRNA identification continues to be a challenging task, in which feature selection is greatly important. Here, we first extracted feature subsets by a hybrid algorithm of genetic algorithm (GA) and support vector machine (SVM) from 124 sequence and secondary structure features. Next, based on the highfrequency features taken from the feature subsets, we proposed a novel stepwise SVM method to identify the optimal feature combinations. The cooperative effect was found among different features in our study. Finally, we obtained 10 feature combinations with strong combined effect which possessed high classification performance for predicting pre-miRNAs. In external validation, all the 10 combinations could predict accurately over 13 pre-miRNAs from 16 new confirmed human pre-miRNAs in miRBase 14.0. The best one could reach 15 (93.75%), which significantly outperformed triplet-SVM (13, 81.25%) in predicting pre-miRNAs.
Key words: MicroRNA precursor, feature selection, genetic algorithm, support vector machine.