Explain Concept/Class Description with example. Also Concept Description vs. OLAP.

 Concept/Class Description

  • Concept description is the most basic form of descriptive data mining. It describes a given set of task-relevant data in a concise and summative manner, presenting interesting general properties of the data.
  • Data can be associated with classes or concepts. For example, in the All electronics store, classes of items for sale include computers and printers, and concepts of customers include big spenders and budget spenders. It can be useful to describe individual classes and concepts in summarized, concise, and yet precise terms. Such descriptions of a class or a concept are called class/concept descriptions.
  • These descriptions can be derived via

(1) Data characterization, by summarizing the data of the class under study (often called the target class) in general terms, or

(2) Data discrimination, by comparison of the target class with one or a set of comparative classes (often called the contrasting classes), or

(3) both data characterization and discrimination.

  • The output of data characterization can be presented in various forms. Examples include pie charts, bar charts, curves, multidimensional data cubes, and multidimensional tables, including crosstabs. The resulting descriptions can also be presented as generalized relations or in rule form(called characteristic rules).

1) Data characterization, 
  • Data characterization is a summarization of the general characteristics or features of a target class of data. The data corresponding to the user-specified class is typically collected by a query. For example, to study the characteristics of software products with sales that increased by 10% in the previous year, the data related to such products can be collected by executing an SQL query on the sales database.
  • A data mining system should be able to produce a description summarizing the characteristics of customers.
  • Example:
Data characterization. A customer relationship manager at AllElectronics may order the following data mining task: Summarize the characteristics of customers who spend more than $5000 a year at AllElectronics. The result is a general profile of these customers, such as that they are 40 to 50 years old, employed, and have excellent credit ratings. The data mining system should allow the customer relationship manager to drill down on any dimension, such as on occupation to view these customers according to their type of employment.
OR,
The characteristics of customers who spend more than $1000 a year at XYZ store. The result can be a general profile such as age, employment status, or credit ratings.


(2) Data discrimination:
  • Data discrimination is a comparison of the general features of the target class data objects against the general features of objects from one or multiple contrasting classes. The target and contrasting classes can be specified by a user, and the corresponding data objects can be retrieved through database queries. For example, a user may want to compare the general features of software products with sales that increased by 10% last year against those with sales that decreased by at least 30% during the same period. The methods used for data discrimination are similar to those used for data characterization. 
  • "How are discrimination descriptions output?" The forms of the output presentation are similar to those for characteristic descriptions, although discrimination descriptions should include comparative measures that help to distinguish between the target and contrasting classes. Discrimination descriptions expressed in the form of rules are referred to as discriminant rules.
  • It is a comparison of the general features of targeting class data objects with the general features of objects from one or a set of contrasting classes. Users can specify the target and contrasting classes.

Example:
The user may like to compare the general features of software products whose sales increased by 10% in the last year with those whose sales decreased by about 30% in the same duration.
OR,

Example:- Data discrimination. A customer relationship manager at AllElectronics may want to
compare two groups of customers-those who shop for computer products regularly (e.g., more than twice a month) and those who rarely shop for such products (e.g. less than three times a year). The resulting description provides a general comparative profile of these customers, such as that 80% of the customers who frequently purchase computer products are between 20 and 40 years old and have a university education, whereas 60% of the customers who infrequently buy such products are either seniors of youths and have no university degree. Drilling down on a dimension like occupation, or adding a new dimension like income_level, may help to find even more discriminative features between the two classes.

Concept Description vs. OLAP

Concept Description
– can handle complex data types of the attributes and their aggregations
– a more automated process

OLAP
– restricted to a small number of dimension and measure types
– user-controlled process

Comments

Popular posts from this blog

Discuss classification or taxonomy of virtualization at different levels.

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the three dimensions time, doctor, and patient, and the two measures count and charge, where a charge is the fee that a doctor charges a patient for a visit. a) Draw a schema diagram for the above data warehouse using one of the schemas. [star, snowflake, fact constellation] b) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004? c) To obtain the same list, write an SQL query assuming the data are stored in a relational database with the schema fee (day, month, year, doctor, hospital, patient, count, charge)