What are the categorization of Clustering Algorithms? And Objectives of Cluster Analysis.
Categorization of Clustering Algorithms
Many clustering algorithms exist in the literature. In general, the major clustering methods can be classified into the following categories.
- Partitioning methods: Given a database of n objects or data tuples, a partitioning method constructs k partitions of the data, where each partition represents a cluster and k <n. Given k, the number of partitions to construct, a partitioning method creates an initial partitioning. It then uses an iterative relocation technique that attempts to improve the partitioning by moving objects from one group to another.
- Hierarchical methods: A hierarchical method creates a hierarchical decomposition of the given set of data objects. A hierarchical method can be classified as being either agglomerative or divisive. The agglomerative approach, also called the bottom-up approach, starts with each object forming a separate group. It successively merges the objects or groups that are close to one another, until all of the groups are merged into one (the topmost level of the hierarchy), or until a termination condition holds. The divisive approach, also called the top-down approach, starts with all of the objects in the same cluster. In each successive iteration, a cluster is split up into smaller clusters, until eventually, each object is in one cluster, or until a termination condition holds.
- Density-based methods: Most partitioning methods cluster objects based on the distance between objects. Such methods can find only spherical-shaped clusters and encounter difficulty in discovering clusters of arbitrary shapes. Other clustering methods have been developed based on the notion of density. Their general idea is to continue growing the given cluster as long as the density (number of objects or data points) in the “neighborhood” exceeds some threshold.
- Model-based methods: Model-based methods hypothesize a model for each of the clusters and find the best fit of the data to the given model. EM is an algorithm that performs expectation-maximization analysis based on statistical modeling.
Types of Clusterings
a) Partitioning Clustering
– A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset
– Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors
– Typical methods: k-means, k-medoids, CLARA (Clustering LARge Applications)
b) Hierarchical clustering
– A set of nested clusters organized as a hierarchical tree
– Create a hierarchical decomposition of the set of data (or objects) using some criterion
– Typical methods: DiAna (Divisive Analysis), AgNes (Agglomerative Nesting), BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), ROCK (RObust Clustering using linKs), CAMELEON
c) Density-based Clustering
– Based on connectivity and density functions
– Typical methods: DBSACN (Density Based Spatial Clustering of Applications with Noise), OPTICS (Ordering Points To Identify the Clustering Structure), DenClue (DENsity-based CLUstEring )
d) Grid-based Clustering
- based on a multiple-level granularity structure
- Typical methods: STING (STatistical INformation Grid ), WaveCluster, CLIQUE (Clustering In QUEst)
e) Model-based Clustering
- A model is hypothesized for each of the clusters and tries to find the best fit of that model to each other
- Typical methods: EM (Expectation-Maximization), SOM (Self-Organizing Map), COBWEB
f) Frequent pattern-based Clustering
- Based on the analysis of frequent patterns
- Typical methods: pCluster
g) User-guided or constraint-based Clustering
- Clustering by considering user-specified or application-specific constraints
- Typical methods: COD, constrained clustering
Comments
Post a Comment