Explain different Functionalities of Data Mining.
Functionalities of Data Mining
Data mining functionalities are used to specify the kind of patterns to be found in data mining.
Given below are the list of data mining functions:
a) Class/Concept Description: Data can be associated with classes or concepts that can be described in summarized, concise, and yet precise, terms. Such descriptions of a concept or class are called class/concept descriptions. These descriptions can be derived via:
- Data Characterization: Characterization is a summarization of the general characteristics or features of a target class of data which creates what is called a characteristic rule.
- Data Discrimination: Data discrimination is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes.
b) Association analysis on frequent patterns: Frequent patterns are patterns that occur frequently in data. Association analysis aims to discover associations between items occurring together frequently.
E.g. buys(X,“computer”) => buys(X,“software”) [support=1%,confidence=50%] where X is a variable representing a customer. Confidence=50% means that if a customer buys a computer, there is a 50% chance that she will buy software as well.
c) Classification and Prediction: Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts. This model is derived based on the analysis of a set of training data and used to predict the class label of objects for which the class label is unknown.
Prediction is used to predict missing or unavailable numeric data values rather than class labels. Regression analysis is a statistical methodology that is most often used for numeric prediction, although other methods exist as well.
d) Cluster Analysis / Clustering: Clustering analyzes data objects without consulting class labels. It can be used to generate class labels for a group of data that did not exist at the beginning. The objects are clustered or grouped based on the principle of maximizing the intra-class similarity and minimizing the interclass similarity. That is, clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters.
e) Outlier Analysis: Outliers are objects that do not comply with the general behavior or model of the data. Most data mining methods discard outliers as noise or exceptions. However, in some events, these kinds of events are more interesting. This analysis of outlier data is referred to as outlier analysis. E.g. Fraud detection
f) Evolution Analysis: Data evolution analysis describes and models regularities or trends for objects whose behavior changes over time. This may include characterization, discrimination, association and correlation analysis, classification, prediction, or clustering of time-related data. Distinct features of such data include time-series data analysis, sequence or periodicity pattern matching, and similarity-based data analysis.
OR IN LONG,
There is number of data mining functionalities that the organized and scientific methods offer, Major data mining functionalities are described as follows:
1. Cass/Concept Descriptions: Characterization and Discrimination
Classes or Concepts can be correlated with results. In simplified, descriptive, and yet accurate ways, it can be helpful to define individual groups and concepts. It is important to link data with groups or related items. For example, computers and printers are types of goods for sale in the Hardware Shop. These class or concept definitions are referred to as class/concept descriptions.
• Data Characterization: The characterization of data is a description of the key characteristics of objects in a target class which creates what is called a characteristic rule. To do this, a user can run a database query to compute the user-specified class through predefined modules to retrieve desired results from data at various abstraction levels. This refers to the summary of general characteristics or features of the class that is under the study. For example. To study the characteristics of a software product whose sales increased by 15% two years ago, anyone can collect these types of data related to such products by running SQL queries.
• Data Discrimination: It compares common features of class which is under study. The output of this process can be represented in many forms. E.g., bar charts, curves, and pie charts.
2. Mining Frequent Patterns
Frequent patterns are patterns that occur frequently in data. Frequent patterns are nothing but things that are found to be most common in the data. There are different kinds of frequencies that can be observed in the dataset.
• Frequent item set This applies to a number of items that can be seen together regularly for e.g.: milk and sugar.
• Frequent Subsequence: This refers to the pattern series that often occurs regularly such as purchasing a phone followed by a back cover.
• Frequent Substructure: It refers to the different kinds of data structures such as trees and graphs that may be combined with the item. set or subsequence.
3. Association Analysis
The process involves uncovering the relationship between data and deciding the rules of the association. It is a way of discovering the relationship between various items. For example, it can be used to determine the sales of items that are frequently purchased together.
4. Correlation Analysis
Correlation is a mathematical technique that can show whether and how strongly the pairs of attributes are related to each other. For example, Heightened people tend to have more weight.
5. Classification and Prediction
Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts. Mainly it is used to predict the class of objects whose class label is unknown. The derived model is based on the analysis of a set of training data. Data object whose class label is known is considered as training data. The derived model may be represented in various forms, such as classification (IF-THEN) rules, decision trees mathematical formulae, neural networks etc.)
For example, a decision tree performs the classification in the form of tree structure. It breaks down the dataset into small subsets and a decision tree can be designed simultaneously. The final result is a tree with a decision node.
The following decision tree can be designed to declare a result, whether an applicant is eligible or not eligible to get the driving license.
6. Cluster Analysis
Unlike classification and prediction, which analyze class-labeled data objects, clustering analyzes data objects without consulting a known class label. Clustering can be used to generate such labels. The objects are clustered or grouped based on the principle of maximizing the intra-class similarity and minimizing the interclass similarity. That is, clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters.
7. Outlier Analysis
A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are outliers. Most data mining methods discard outliers as noise or exceptions. However, in some applications such as fraud detection, the rare events can be more interesting than the more regularly occurring ones. Outliers may be detected using statistical tests that assume a distribution or probability model for the data, or using distance measures where objects that are a substantial distance from any other cluster are considered outliers. For example, an Outlier analysis may uncover fraudulent usage of credit cards by detecting purchases of extremely large amounts for a given account number in comparison to regular charges incurred by the same account.
8. Evolution Analysis
Data evolution analysis describes and models regularities or trends for objects whose behavior changes over time. Distinct features of such an analysis include time-series data analysis, sequence or periodicity pattern matching, and similarity-based data analysis. For example, you have the major stock market (time-series) data of the last several years available from the Nepal Stock Exchange and you would like to invest in shares of high-tech industrial companies. A data mining study of stock exchange data may identify stock evolution regularities for overall stocks and for the stocks of particular companies. Such regularities may help predict future trends in stock market prices, contributing to your decision-making regarding stock investments.
Comments
Post a Comment