Define Clustering and Cluster Analysis and Applications of Cluster Analysis.

Cluster analysis or simply clustering is the process of partitioning a set of data objects (or observations) into subsets. Each subset is a cluster, such that objects in a cluster are similar to one another, yet dissimilar to objects in other clusters. The set of clusters resulting from a cluster analysis can be referred to as clustering. In this context, different clustering methods may generate different clusterings on the same data set. The partitioning is not performed by humans, but by the clustering algorithm. Hence, clustering is useful in that it can lead to the discovery of previously unknown groups within the data.



Application of Cluster Analysis

Cluster analysis has been widely used in many applications such as

a) business intelligence, 

b) image pattern recognition,  c) Web search, biology, and security.

a) In business intelligence, clustering can be used to organize a large number of customers into groups, where customers within a group share strong similar characteristics. This facilitates the development of business strategies for enhanced customer relationship management. Moreover, consider a consultant company with a large number of projects. To improve project management, clustering can be applied to partition projects into categories based on similarity so that project auditing and diagnosis (to improve project delivery and outcomes) can be conducted effectively.

b) In image recognition, clustering can be used to discover clusters or "subclasses" in handwritten character recognition systems. Suppose we have a data set of handwritten digits, where each digit is labeled as either 1, 2, 3, and so on. Note that there can be a large variance in the way in which people write the same digit. Take the number 2, for example. Some people may write it with a small circle at the left bottom part, while some others may not. We can use clustering to determine subclasses for "2," each of which represents a variation on the way in which 2 can be written. Using multiple models based on the subclasses can improve overall recognition accuracy.

c) Clustering has also found many applications in Web search. For example, a keyword search may often return a very large number of hits (i.e., pages relevant to the search) search results into groups and present the results in a concise and easily accessible way. due to the extremely large number of web pages. Clustering Moreover, clustering techniques have been developed to cluster documents into topics. which are commonly used in information retrieval practice. 

d) In data mining, it can be used to organize this data mining function, cluster analysis can be used as a standalone tool to gain insight into the distribution of data, to observe the characteristics of each cluster, and to focus on a particular set of clusters for further analysis. Alternatively, it may serve as a preprocessing step for other algorithms, such as characterization, attribute subset selection, and classification, which would then operate on the detected clusters and the selected attributes or features.

 Clustering and Cluster Analysis

  • Clustering refers to a technique of grouping objects so that objects with the same functionalities come together and objects with different functionalities go apart. 
  • In other words, we can say that clustering is a process of portioning a data set into a set of meaningful subclasses, known as clusters. Clustering is the same as classification in which data is grouped. Though, unlike classification, the groups are not previously defined. Instead, the grouping is achieved by determining similarities between data according to characteristics found in the real data. The groups are called Clusters.
  • A cluster is a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.
  • Clustering is “the process of organizing objects into groups whose members are similar in some way”.
  • “Cluster Analysis is a set of methods for constructing a (hopefully) sensible and informative classification of an initially unclassified set of data, using the variable values observed on each individual.”

                                              - B. S. Everitt (1998), “The Cambridge Dictionary of Statistics”


Applications of Cluster Analysis

  •  Pattern Recognition
  •  Spatial Data Analysis

- Create thematic maps in GIS by clustering feature spaces

- Detect spatial clusters or for other spatial mining tasks

  • Image Processing
  •  Economic Science (especially market research)
  •  WWW

- Document classification

- Cluster Weblog data to discover groups of similar access patterns

  • Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs
  • Land use: Identification of areas of similar land use in an earth observation database
  •  Insurance: Identifying groups of motor insurance policyholders with a high average claim cost
  • City-planning: Identifying groups of houses according to their house type, value, and geographical location
  • Earth-quake studies: Observed earthquake epicenters should be clustered along continent faults


Comments

Popular posts from this blog

Legislations and IT in Nepal MCQ IT Officer(PSC)

Explain Aneka thread life cycle /Explain local thread and Aneka thread.

Explain advantages of authority delegation