A novel ensemble and composite approach for classifying proteins based on Chou’s pseudo amino acid composition
For the fact that the location of proteins gave some details about the function of a protein whose location was uncertain, protein classification was regarded as a very important task in the field of biological data mining. However, the success of a human genome project led to a protein sequence explosion. There is a great need to develop a computational method for fast and reliable prediction of the locations of proteins according to their primary sequences. In this paper, we used the composite classifier system that was formed by a set of k-nearest neighbor (K-NN) classifiers, each of which was defined in a different pseudo amino composition vector. In the pseudo amino composition vector space, protein can be presented by Pseudo amino acid composition. The location of a queried protein is determined by the outcome of choice made among these constituent individual classifiers. It is shown through the outcome that the classifier outperformed the single classifier widely used in biological literature. So the composite classifier can be employed as a robust method to predict protein location in the field of biological data mining.
Key words: Composite classifier system, biological data mining, atomic classifiers, pseudo amino acid composition.