Explain Hierarchical Clustering and its strength .
Hierarchical clustering
Hierarchical clustering refers to an unsupervised learning procedure that determines successive clusters based on previously defined clusters. It works via grouping data into a tree of clusters. Hierarchical clustering stats by treating each data points as an individual cluster. The endpoint refers to a different set of clusters, where each cluster is different from the other cluster, and the objects within each cluster are the same as one another.
There are two types of hierarchical clustering
a) Agglomerative Hierarchical Clustering
b) Divisive Clustering
a) Agglomerative Hierarchical Clustering
Agglomerative clustering is one of the most common types of hierarchical clustering used to group similar objects in clusters. Agglomerative clustering is also known as AGNES (Agglomerative Nesting). In agglomerative clustering, each data point act as an individual cluster and at each step, data objects are grouped in a bottom-up method. Initially, each data object is in its cluster. At each iteration, the clusters are combined with different clusters until one cluster is formed.
b) Divisive Clustering
Divisive hierarchical clustering is exactly the opposite of Agglomerative Hierarchical clustering. In Divisive Hierarchical clustering, all the data points are considered an individual cluster, and in every iteration, the data points that are not similar are separated from the cluster. The separated data points are treated as an individual cluster. Finally, we are left with N clusters.
Advantages of Hierarchical clustering
- It is simple to implement and gives the best output in some cases.
- It is easy and results in a hierarchy, a structure that contains more information.
- It does not need us to pre-specify the number of clusters.
Disadvantages of hierarchical clustering
- It breaks the large clusters.
- It is Difficult to handle different sized clusters and convex shapes.
- It is sensitive to noise and outliers.
- The algorithm can never be changed or deleted once it was done previously.
OR,
Hierarchical Clustering
- Produces a set of nested clusters organized as a hierarchical tree
- Can be visualized as a dendrogram
– A tree-like diagram that records the sequences of merges or splits
- Use distance matrix as clustering criteria. This method does not require the number of clusters k as an input but needs a termination condition
Agglomerative (bottom up)
1. start with 1 point (singleton)
2. recursively add two or more appropriate clusters
3. Stop when the k number of clusters is achieved.
OR,
1. Start with the points as individual clusters
2. At each step, merge the closest pair of clusters until only one cluster (or k clusters) left
Divisive (top-down)
1. Start with a big cluster
2. Recursively divide into smaller clusters
3. Stop when the k number of clusters is achieved.
OR,
Start with one, all-inclusive cluster
At each step, split a cluster until each cluster contains a point (or there are k clusters)
Strengths of Hierarchical Clustering
a) Do not have to assume any particular number of clusters
– Any desired number of clusters can be obtained by ‘cutting’ the dendrogram at the proper level
b) They may correspond to meaningful taxonomies
– Example in biological sciences (e.g., animal kingdom, phylogeny reconstruction, ...)
Comments
Post a Comment