Hierarchical methods

A first class of approaches to cluster formation is based on the sequential construction of a tree (dendrogram), that leads us to form clusters; Fig. 17.5 shows a simple dendrogram. The leaves of the tree correspond to objects; branches of the tree correspond to sequential groupings of observations and clusters. Since a tree suggests a natural hierarchy, methods in this class are called hierarchical. Methods differ in the way clusters are built:

In divisive methods we start from one big cluster and proceed to build disjoint smaller clusters.
In agglomerative methods, increasingly large clusters are built by merging smaller ones.

The second approach is more common and is best illustrated by a simple example.

Example 17.4 Consider four observations collected in the following matrix:

Remember that rows correspond to observations and columns to variables; so, the first observation is X₁ = [3, 6, 9, −1]^T. Using Euclidean distance, we find the following distance matrix:

Now, let us build clusters using an agglomerative (bottom up) strategy based on the centroid distance between clusters. The first iteration involves the distances between single observations, and we immediately see in matrix D that the two closest observations are X₁ and X₃, whose distance is 5.0990. The centroid of these two observations is

Now we compute the following distances:

By comparing these distances with d(X₂, X₄) = 15.6525, we conclude that we should merge cluster A with X₂, obtaining a new cluster B consisting of three observations; this is finally merged with the remaining observation X₄ and the process stops, resulting in the dendrogram of Fig. 17.5.¹⁰ We observe that the centroid of cluster B is

and its distance from X₄ is 17.5278, which is the height of the dendrogram. In fact, the height of the vertical bars in the dendrogram is related to distances.

In the example, we always merge a cluster with a single observation, but this need not be the case in general, as we may well merge clusters. We should note that in this class of methods we do not necessarily specify the number of clusters a priori. Given the tree, we may decide where to draw a horizontal line separating clusters, based on the distances. Also note that the approach is not iterative, but one-shot: In agglomerative methods we build the tree all the way up, without revising previous decisions.

Hierarchical methods

Comments

Leave a Reply Cancel reply