Explain Classification by Decision Tree Induction with its example.

CLASSIFICATION BY DECISION TREE INDUCTION

Decision Tree and Decision tree induction: Decision tree induction is the learning of decision trees from class labeled training tuples. The Decision tree is a flow-chart-like tree structure in which the internal node denotes a test on an attribute, the branch represents an outcome of the test, and leaf nodes represent class labels as shown in the figure below.

Figure 6.5: Decision tree to classify whether a person is infected or not infected.

To use decision trees for classification attributes of a new tuple to be classified are tested against the decision tree and a path is traced from the root to a leaf node that holds the class label for that tuple. 

The decision tree classifies this new data tuple by checking attribute value starting from the root to leaf node so, initially checks the root attribute Breathing issue value in the new data tuple, here, value is YES, so the decision tree follows the left branch from the root and reached the next level attribute node Fever and again makes a check and found the Fever attribute value YES. Therefore, again follows the left branch and reached the leaf node labeled infected. Hence, assigns the class label infected =YES to this new data tuple.Classification by Decision Tree Induction


                              OR,

Classification by Decision Tree Induction

Decision tree induction is the learning of decision trees from class labeled training tuples. It is a decision tree is a flowchart-like tree structure where internal nodes (non-leaf node) denotes a test on an attribute branches represent outcomes of tests Leaf nodes (terminal nodes) hold class labels and the Root node is the topmost node.



Why decision trees classifiers are so popular?

  • The construction of a decision tree does not require any domain
  • knowledge or parameter setting
  • They can handle high dimensional data
  • The intuitive representation that is easily understood by humans
  • Learning and classification are simple and fast
  • They have a good accuracy


Algorithm for constructing Decision Tress

Constructing a Decision tree uses a greedy algorithm. The tree is constructed in a top-down recursive divide-and-conquer manner.

1. At the start, all the training tuples are at the root

2. Tuples are partitioned recursively based on selected attributes

3. If all samples for a given node belong to the same class

- Label the class

4. If There are no remaining attributes for further partitioning

- Majority voting is employed for classifying the leaf

5. There are no samples left

- Label the class and terminate

6. Else

- Got to step 2


Types of Decision Tree Induction

a) Basic algorithm (a greedy algorithm)

– Tree is constructed in a top-down recursive divide-and-conquer manner

– At start, all the training examples are at the root

– Examples are partitioned recursively to maximize purity


b) Conditions for stopping partitioning

– All samples belong to the same class

– Leaf node smaller than a specified threshold

– Tradeoff between complexity and generalizability


c) Predictions for new data:

– Classification by majority voting is employed for classifying all members of the leaf

– Probability-based on training data that ended up in that leaf.

– Class Probability estimates can be used also






Comments

Popular posts from this blog

Suppose that a data warehouse consists of the three dimensions time, doctor, and patient, and the two measures count and charge, where a charge is the fee that a doctor charges a patient for a visit. a) Draw a schema diagram for the above data warehouse using one of the schemas. [star, snowflake, fact constellation] b) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004? c) To obtain the same list, write an SQL query assuming the data are stored in a relational database with the schema fee (day, month, year, doctor, hospital, patient, count, charge)

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?