Explain Hierarchical Clustering and its strength .

- December 18, 2021

Hierarchical clustering

Hierarchical clustering refers to an unsupervised learning procedure that determines successive clusters based on previously defined clusters. It works via grouping data into a tree of clusters. Hierarchical clustering stats by treating each data points as an individual cluster. The endpoint refers to a different set of clusters, where each cluster is different from the other cluster, and the objects within each cluster are the same as one another.

There are two types of hierarchical clustering

a) Agglomerative Hierarchical Clustering

b) Divisive Clustering

a) Agglomerative Hierarchical Clustering

Agglomerative clustering is one of the most common types of hierarchical clustering used to group similar objects in clusters. Agglomerative clustering is also known as AGNES (Agglomerative Nesting). In agglomerative clustering, each data point act as an individual cluster and at each step, data objects are grouped in a bottom-up method. Initially, each data object is in its cluster. At each iteration, the clusters are combined with different clusters until one cluster is formed.

b) Divisive Clustering

Divisive hierarchical clustering is exactly the opposite of Agglomerative Hierarchical clustering. In Divisive Hierarchical clustering, all the data points are considered an individual cluster, and in every iteration, the data points that are not similar are separated from the cluster. The separated data points are treated as an individual cluster. Finally, we are left with N clusters.

Advantages of Hierarchical clustering

It is simple to implement and gives the best output in some cases.
It is easy and results in a hierarchy, a structure that contains more information.
It does not need us to pre-specify the number of clusters.

Disadvantages of hierarchical clustering

It breaks the large clusters.
It is Difficult to handle different sized clusters and convex shapes.
It is sensitive to noise and outliers.
The algorithm can never be changed or deleted once it was done previously.

OR,

Hierarchical Clustering

Produces a set of nested clusters organized as a hierarchical tree

Can be visualized as a dendrogram

– A tree-like diagram that records the sequences of merges or splits

Use distance matrix as clustering criteria. This method does not require the number of clusters k as an input but needs a termination condition

Agglomerative (bottom up)

1. start with 1 point (singleton)

2. recursively add two or more appropriate clusters

3. Stop when the k number of clusters is achieved.

OR,

1. Start with the points as individual clusters

2. At each step, merge the closest pair of clusters until only one cluster (or k clusters) left

Divisive (top-down)

1. Start with a big cluster

2. Recursively divide into smaller clusters

3. Stop when the k number of clusters is achieved.

OR,

Start with one, all-inclusive cluster

At each step, split a cluster until each cluster contains a point (or there are k clusters)

Strengths of Hierarchical Clustering

a) Do not have to assume any particular number of clusters

– Any desired number of clusters can be obtained by ‘cutting’ the dendrogram at the proper level

b) They may correspond to meaningful taxonomies

– Example in biological sciences (e.g., animal kingdom, phylogeny reconstruction, ...)

Search This Blog

Notes for BSc CSIT

Explain Hierarchical Clustering and its strength .

Comments

Post a Comment

Popular posts from this blog

Discuss classification or taxonomy of virtualization at different levels.

Describe Multi-Way Array Aggregation (Multi-Way) Method for Computing Full Cubes with suitable example.