An enhanced model for automatically extracting topic phrase from web document snippet for cluster labels
Keyphrase are subset of more than one word or phrases from a document that can describe the meaning of the document. Manual assignment of high quality document into similar topic by keyphrase is expensive, time-consuming and error prone. Therefore, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. There are two approaches for keyphrase-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case, supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task can takes place. In our new approach, we propose automatic classification of documents through semantic keyphrase and a new model for generating keyphrase for web document topic label. Thus we reduce the human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this paper is the automatic classification of documents into machine-generated phrase-based cluster labels for classifications. The key benefit foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering and semantic index managing.
Key words: Keyphrase extraction, machine learning, search engine snippet, document classification, topic tracking