Extension of K-Means Algorithm for clustering mixed data
In this work, a new hybrid method has been proposed which extends K-means algorithm to categorical domain and mixed-type attributes. Also proposed is a new dissimilarity measure that uses relative cumulative frequency-based method in clustering objects with mixed values. The dissimilarity model developed could serve as a predictive tool for identifying attributes of objects in mixed datasets. It has been implemented using JAVA programming language and MATLAB. Experiments on real-world datasets show that the new hybrid algorithm is more efficient and more robust when compared with existing ones in terms of accuracy and time complexity. This tool can be used in a variety of applications such as in agro-based industries, in clinical datasets and in general information retrieval system (IRS). The new method has been applied on agro-based datasets of soybean and yeast for forming clusters that could help farmers in the management of crop pests.
Key words: Mixk-meansXFon, Clustering, Mixed data.