Explain Concept/Class Description with example. Also Concept Description vs. OLAP.
Concept/Class Description
- Concept description is the most basic form of descriptive data mining. It describes a given set of task-relevant data in a concise and summative manner, presenting interesting general properties of the data.
- Data can be associated with classes or concepts. For example, in the All electronics store, classes of items for sale include computers and printers, and concepts of customers include big spenders and budget spenders. It can be useful to describe individual classes and concepts in summarized, concise, and yet precise terms. Such descriptions of a class or a concept are called class/concept descriptions.
- These descriptions can be derived via
(1) Data characterization, by summarizing the data of the class under study (often called the target class) in general terms, or
(2) Data discrimination, by comparison of the target class with one or a set of comparative classes (often called the contrasting classes), or
(3) both data characterization and discrimination.
- The output of data characterization can be presented in various forms. Examples include pie charts, bar charts, curves, multidimensional data cubes, and multidimensional tables, including crosstabs. The resulting descriptions can also be presented as generalized relations or in rule form(called characteristic rules).
1) Data characterization,
- Data characterization is a summarization of the general characteristics or features of a target class of data. The data corresponding to the user-specified class is typically collected by a query. For example, to study the characteristics of software products with sales that increased by 10% in the previous year, the data related to such products can be collected by executing an SQL query on the sales database.
- A data mining system should be able to produce a description summarizing the characteristics of customers.
- Example:
Data characterization. A customer relationship manager at AllElectronics may order the following data mining task: Summarize the characteristics of customers who spend more than $5000 a year at AllElectronics. The result is a general profile of these customers, such as that they are 40 to 50 years old, employed, and have excellent credit ratings. The data mining system should allow the customer relationship manager to drill down on any dimension, such as on occupation to view these customers according to their type of employment.
OR,
The characteristics of customers who spend more than $1000 a year at XYZ store. The result can be a general profile such as age, employment status, or credit ratings.
(2) Data discrimination:
- Data discrimination is a comparison of the general features of the target class data objects against the general features of objects from one or multiple contrasting classes. The target and contrasting classes can be specified by a user, and the corresponding data objects can be retrieved through database queries. For example, a user may want to compare the general features of software products with sales that increased by 10% last year against those with sales that decreased by at least 30% during the same period. The methods used for data discrimination are similar to those used for data characterization.
- "How are discrimination descriptions output?" The forms of the output presentation are similar to those for characteristic descriptions, although discrimination descriptions should include comparative measures that help to distinguish between the target and contrasting classes. Discrimination descriptions expressed in the form of rules are referred to as discriminant rules.
- It is a comparison of the general features of targeting class data objects with the general features of objects from one or a set of contrasting classes. Users can specify the target and contrasting classes.
Example:
The user may like to compare the general features of software products whose sales increased by 10% in the last year with those whose sales decreased by about 30% in the same duration.
OR,
Example:- Data discrimination. A customer relationship manager at AllElectronics may want to
compare two groups of customers-those who shop for computer products regularly (e.g., more than twice a month) and those who rarely shop for such products (e.g. less than three times a year). The resulting description provides a general comparative profile of these customers, such as that 80% of the customers who frequently purchase computer products are between 20 and 40 years old and have a university education, whereas 60% of the customers who infrequently buy such products are either seniors of youths and have no university degree. Drilling down on a dimension like occupation, or adding a new dimension like income_level, may help to find even more discriminative features between the two classes.
Concept Description vs. OLAP
Concept Description
– can handle complex data types of the attributes and their aggregations
– a more automated process
OLAP
– restricted to a small number of dimension and measure types
– user-controlled process
Comments
Post a Comment