APPLICATION OF SINGLE-LINKAGE CLUSTERING METHOD IN THE ANALYSIS OF GROWTH RATE OF GROSS DOMESTIC PRODUCT ( GDP ) AT 1990 CONSTANT BASIC PRICES ( MILLION NAIRA )

Single-linkage is one of the methods in cluster analysis, which is used, for determining natural groupings in multi-variate data. Given a data set with one or more characteristics, singlelinkage system classifies the data into clusters so that they are as similar as possible within each cluster and as different as possible between clusters. The objective is to show the closeness or similarity in the growth rate of GDP. Using the MINITAB software the similarity of the growth rate of GDP and the similarity in the years of production were shown.


INTRODUCTION
The importance of clustering is to reduce the amount of data by categorizing or grouping similar data items together.Clustering can be used to reduce the amount of data and to induce a categorization.The aim is to establish a set of clusters such that data within a cluster are more similar to each other than they are to cases in other cluster.Such grouping is pervasive in the way humans process information, and one of the motivations for using clustering algorithms is to provide automated tools to help in constructing categories or taxonomies Jardine and Sibson(1971) and Sneath and Sokal(1973).The methods may also be used to minimize the effects of human factors in the process.Clustering methods Anderberg(1973), Hartigan(1975), Jain and Dubes(1988), and Tryon and Bailey(1973) can be divided into two basic types: hierarchical and partitional clustering.Within each of the types there exists a wealth of subtypes and different algorithms for finding the clusters.In this paper, Single-Linkage hierarchical clustering is used in the data analysis.Hierarchical clustering proceeds successively by either merging smaller clusters into larger ones, or by splitting larger clusters.This is a procedure for transforming a proximity matrix into a sequence of nested partitions.The clustering methods differ in the rule by which it is decided which two small clusters are merged or which large cluster is split.The end result of the algorithm is a tree of clusters called a dendrogram, which shows how the clusters are related.By cutting the dendrogram at a desired level a clustering of the data items into disjoint groups is obtained.Single-linkage clustering is a hierarchical clustering which considers the distance between one cluster and another cluster to be equal to the shortest distance from any member of one cluster to any member of the other cluster.If the data consist of similarities we consider the similarity between one cluster and another cluster to be equal to the greatest similarity from one member to any member of the other cluster.(Stephen P. Borgatti "How to Explain Hierarchical Clustering" http://www.Analytictech.Com/networks/hiclus.htpm).

M. C. DIKE
The underlying mathematics of most of these methods are relatively simple but large numbers of calculations are needed which can put a heavy demand on the computer.The relationship between objects is represented in a proximity matrix in which rows and columns correspond to objects Maria (1999).Single-linkage method otherwise known as Minimum or Nearest Neighbour method, is employed in the analysis of growth rate of Gross Domestic Product (GDP) for the year (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003) at 1990 basic prices (Million Naira); the aim is to maximize the minimum distance between clusters.The issues that often need to be considered when using Clustering in practice include how to scale the variables before calculating the distance matrix, which particular method of cluster analysis to be used and how to decide on the appropriate number of groups in the data Everitt et al (2001).

The calculation of distance and similarity coefficients for pairs of items
Pairs of items are often compared on the basis of the presence or absence of some characteristics.Similar items have more characteristics in common than dissimilar items.The presence or absence of a characteristic can be described mathematically by introducing a binary variable, which assumes value 1 if the characteristic is present and value 0 if the characteristic is absent.For example, for 5 binary variables, the variable scores for two items i and k might be arranged as follows In this case there are two 1-1matches, one 0-0 march, and two mismatches.
Let x ij be the score (1 or 0) of the jth binary variable on the ith item and x kj be the score (again, 1 or 0) of the jth variable on the kth item, j = 1,2, ,p.consequently are the values of the k th variable for observations i and j provides a count of the number of mismatches.A large distance corresponds to many mismatches; that is dissimilar items Richard and Dean(1982).The different members of the class of hierarchical clustering techniques arise because of the variety of ways in which the distance between a cluster containing several observations and a single observation, or between two clusters, can be defined.
The frequencies of matches and mismatches for item i and k are arranged in the form of contingency table as in Table 2.1 where "a" represents frequency of 1 -1 matches, "b" is frequency of 1 -0 matches, "c" is frequency of 0 -1 matches and "d" is frequency of 0 -0 matches.As earlier stated, the data employed in this study was obtained from the Central Bank of Nigeria Statistical Bulletin and is given in Table 2.3,and using the data the Dendrogram for the study was produced as in Figure 2.1.

The Algorithm
Given a set of N items to be clustered, and an N*N distance (or similarity) matrix, the basic process of hierarchical clustering defined by.Johnson(1967) is this: 1 Start by assigning each item to a cluster, so that if you have N items, you now have N clusters, each containing just one item.Let the distances (similarities) between the clusters be the same as the distances (similarities) between the items they contain.2 Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one cluster less.3 Compute distances (similarities) between the new cluster and each of the old clusters.4 Repeat steps 2 and 3 until all items are clustered into a single cluster of size N. (*)

APPLICATION OF SINGLE-LINKAGE CLUSTERING METHOD 85 M. C. DIKE Single-Linkage Clustering: The Algorithm
Let's now take a deeper look at how Johnson's algorithm works in the case of singlelinkage clustering.The algorithm is an agglomerative scheme that erases rows and columns in the proximity matrix as old clusters are merged into new ones.
The N*N proximity matrix is D = [d(i,j)].The clusterings are assigned sequence numbers 0,1,......, (n-1) and L(k) is the level of the kth clustering.A cluster with sequence number m is denoted (m) and the proximity between clusters (r) and ( s Update the proximity matrix, D, by deleting the rows and columns corresponding to clusters (r) and (s) and adding a row and column corresponding to the newly formed cluster.The proximity between the new cluster, denoted (r,s) and old cluster (k) is defined in this way: If all objects are in one cluster, stop.Else, go to step 2.

DATA
As earlier stated, the data employed in this study was obtained from the Central Bank of Nigeria Statistical Bulletin and is given in table 2.3, and using the data the Dendrogram for the study was produced as in Figure 2  Fig. 2 appendix 2, shows the similarity (dissimilarity) in years of the product.If the economy of a country is dependent on the agricultural produce in the table, this single-linkage method also shows the similarity in the economy of the country within the stipulated years.The similarities in year of the Gross domestic products are shown in the table below.This starts with the year of highest similarity to the least. 1 in appendix1, it can be seen that between 1994 and2003, the growth rate of the GDP of items 8 and 26 were most similar.This forms a new cluster 8.It can be seen that the nearest item similar to new cluster 8 now formed is that of item 14, therefore a new cluster of 8 is formed.At a higher linkage distance items 15 and 29 are seen to be more similar so they form a new cluster 15.At higher distance 0.57544, items 1 and 27 are similar, so they form a new cluster of item 1.This new cluster now compared with the nearest in similarity in the growth rate of items around.It can be seen that the most similar is item 16.Then item 1 is cluster to item 16 to form a new item of cluster of item 1.This item 1 is clustered to the nearest item which is item 13 to form a new cluster of item 1.At a higher distance of 0.62161, the growth rate of item 3 is similar to that of item 30, so they are joined to produce a new cluster of item 3.This item is joined to item 15 because of their similarity to form a new cluster of item 3.At a distance level of 0.69794, item 1 is similar to item 2. They are joined to form a new cluster of item 1.At a higher distance 0.73413, item 17 and 28 are joined together because of their similarity to form a new item 17.At a distance of 0.74461, item 1 is joined to the nearest item, which is item 3 to form a new cluster of item 1. Item 1 is joined to item 17 at a linkage distance of 0.77985, to form a new cluster of item 1.The new item 1 is joined to item 8 at a linkage distance of 0.87618, to form a new cluster of item 1.The new cluster 1 is joined with item 20 to form a new cluster of item1 with 15 observations in the new cluster.The new cluster 1 is joined to 19 to form a new cluster of item 1 at the linkage distance of 1.05619 with 16 observations in the new item.The new item 1, is joined with the most similar item which is item11to form a new cluster of item 1.This new item 1 is joined with item 12 to form a new item of item 1 with 18 observations in the new cluster; this is because of their similarity.Items 24 and 25 at the linkage distance of 1.25567 are similar, so they join to produce a new cluster of item 24.At a linkage distance of 1.39160 items 1 is joined to item 24 to form a new item of item 1.The new item is joined to item 23 because of their similarity to form a new item 1. Item 4 is joined to item 21 to form a new item 4. The new item 1 is a cluster joined to item 7 to form new item 1.The new item 1 is joined to 6 with the linkage distance of 4.04600 to form a new item of 1.This item 1 is clustered to item 22 to form new item1 with 24 observations.Item 5 is clustered with item 18 at the linkage distance of 4.34896 to form a new item 5.Item1 is a cluster to item 4 to form a new cluster item 1.This new item 1 is clustered to item 10 in similarity to form a new cluster of1.This new cluster of item 1 is joined with item 5 to form a new cluster of 1 with 29 observations.The new cluster item 1 is joined to item 9 to form a new cluster of item 1 with 30 observations.These clusters are done in terms of their similarities.
From Fig. 2 in appendix2, Growth rate of GDP of years 1994 and 1995 are more similar than any other year in the table.So years 1994 and 1995 form a new cluster of 1, which is now 1994.After these years 1997 and 1999 followed in similarity.These years 1997 and 1999 formed a new cluster of 1997.It is now discovered in similarity that year 1996 is similar to the new year 1997.Therefore, a cluster of new year 1996 is formed.This cluster of new year 1996 is now clustered with year 1998 in similarity to produce a cluster of a new 1996.The new 1996 is clustered with year 2000 to form a new cluster of 1996.The new cluster of 1 that is 1994 formed above is now clustered with year 2002 to form a new cluster of 1994.This new cluster of 1994 is clustered to the New Year 1996 to form a new 1994.This new 1994 is clustered to year 2001 to form a new cluster of 1994.This is clustered to year 2003 to form a one cluster of a new cluster of 1994.

CONCLUSION
Similarity (dissimilarity) in the products within the years.This can be extended to the similarity(dissimilarity) in the economy within the years.The similarity of the growth rate of the Gross domestic products can also be used to judge the similarity (dissimilarity) in the economy within the corresponding years.

Table 4 . 1 .
This shows the similarity levels of the growth rate of GDP