Explain the various techniques used for clustering.

- December 29, 2021

Clustering techniques are used to identify groups of similar objects in a data set collected from various sources such as marketing, bio-medical, web, geospatial, etc. The clustering techniques are broadly divided into Hard clustering (data point belongs to only one group) and Soft Clustering (data points can belong to another group also). But there are also other various techniques of Clustering that exist. They are:

1. Partitioning approach: The partitioning method groups the data into k groups, which together satisfy the following requirements:

- Each group must contain at least one data point, and

- Each data point must belong to exactly one group.

. The partitioning method creates an initial partitioning. It then uses an iterative relocation technique that attempts to improve the partitioning by moving objects from one group to another. Improvement in partitioning is evaluated by some criterion, e.g., minimizing the sum of square errors. It is also known as the centroid-based method because it uses the centroid of a cluster, C to represent each cluster. The centroid of a cluster is its center points such as the mean, median, or mode of the data points assigned to the cluster. A typical example of algorithms based on the partitioning approach are k-means, K-means ++, mini-batch k means, k medoids, etc.

2. Hierarchical approach: Hierarchical method creates a hierarchical decomposition of the given set of data points using some criterion such as proximity measures. Hierarchical decomposition of data points results in a tree-like structure, which is also called a dendrogram.

a. Agglomerative Approach: This approach starts with each object forming a separate group. It keeps on merging the objects or groups that are close to one another. It keeps on doing so until all of the groups are merged into one or until the termination condition holds.

b. Divisive Approach: This approach starts with all of the objects in the same cluster. In

the continuous iteration, a cluster is split up into smaller clusters. It is down until each object in one cluster or the termination condition holds. This method is rigid, i.e., once a merging or splitting is done, it can never be undone. Typical examples of algorithms based on hierarchical approach are Diana, Agnes, BIRCH CAMELEON, etc.

3. Density-based approach: Density-based method is based on connectivity and density functions. That is, this method continues growing the given cluster as long as the density in the neighborhood exceeds some threshold, Le, for each data point within a given cluster, the radius of a given cluster has to contain at least a minimum number of points. Typical examples of algorithms based on density-based approach are DBSACN, OPTICS, DenClue, etc.

Figure 7.8: Density-based approach

4. Grid-based approach: Grid-based method is based on a multiple-level granularity structure. That is, the objects together form a grid. The object space is quantized into a finite number of cells that form a grid structure. This approach of clustering has a relatively faster processing time and it is dependent only on the number of cells in each dimension in the quantized space. Typical examples of algorithms based on grid-based approach are STING, Wave Cluster, CLIQUE, etc.

5. Model-based: In Model-based clustering method. a model is hypothesized for each of the clusters and tries to find the best fit data points of that model. This method also provides a way to automatically determine the number of clusters based on standard statistics, taking outlier or noise into account. Therefore, it yields robust clustering methods. The model is based on 3 general assumptions: We know the number of clusters before we start

Each observation in the data has a certain probability of belonging to each cluster. The observations within each cluster follow a normal distribution (with the appropriate dimension)Typical examples of algorithms based on model-based approach are Expectation-Maximization (EM), SOM, COBWEB, etc.

6. Constraint-based: Constraint-based Clustering method clusters the given data points by considering user-specified or application-specific constraints. So, it is also known as a user-guided clustering technique. Constraints for constraint-based clustering can be user expectations or the properties of desired clustering results. Typical examples of algorithms based on a hierarchical approach are COD (obstacles), constrained clustering, etc.

7. Link-based clustering: In link-based clustering methods massive links present in the data objects given in the input dataset is used to create the cluster of the input data object. Typical examples of algorithms based on link-based clustering methods are SimRank, LinkClus.

8. Fuzzy clustering: Fuzzy clustering is also known as the soft method. Standard clustering approaches produce partitions (K-means, PAM), in which each data point belongs to only one cluster. This is known as hard clustering. In Fuzzy clustering, items can be a member of more than one cluster. Each item has a set of membership coefficients corresponding to the degree of being in a given cluster. The Fuzzy c-means method is the most popular fuzzy clustering algorithm. In fuzzy c-means clustering, we find out the centroid of the data points and then calculate the distance of each data point from the given centroids until the clusters formed become constant.

Search This Blog

Notes for BSc CSIT

Explain the various techniques used for clustering.

Comments

Post a Comment

Popular posts from this blog

What are different steps used in JDBC? Write down a small program showing all steps.

Discuss classification or taxonomy of virtualization at different levels.

Pure Versus Partial EC