Explain Decision Tree Learning Algorithm - ID3 (Iterative Dichotomiser).

 Decision Tree Learning Algorithm - ID3

ID3 (Iterative Dichotomiser) is a simple decision tree learning algorithm developed by Ross Quinlan (1983). ID3 follows a non-backtracking approach in which decision trees are constructed in a top-down recursive “divide and conquer” manner to test each attribute at every tree node. This approach starts with a training set of tuples and their associated class labels. The training set is recursively partitioned into smaller subsets as the tree is being built.


ID3 Algorithm:

1. create a node N;

2. if tuples in D are all of the same class C then

3. return N as a leaf node labeled with the class C;

4. if attribute_list is empty then

5. return N as a leaf node labeled with the majority class in D;

6.apply Attribute_selection_method(D, attribute_list) to find the

"best"splitting_criterion;

label node N with splitting_criterion;

7. if splitting_attribute is discrete-valued and

multiway splits allowed then //not restricted to binary trees

attribute_list (arrow mark) attribute _ list - splitting_attribute;

8. for each outcome j of splitting_criterion

9. let (symbol)be the set of data tuples in D satisfying outcome j; // partition

10. if (symbol) is empty then

attach a leaf labeled with the majority class in D to node N;

11. else attach the node returned by

Generate_decision_tree(symbol,attribute_list)to node N;

endfor

return N;

Explanation of Algorithm:

The above algorithm has three parameters D, attribute_list, and attribute_selection_method. D is data partition. It is a set of training tuples and their associated class labels. Attribute_list contains a list of attributes describing the tuples.

Now tree starts as a single node N. It represents the training tuples in D. If the tuples in D are all of the same class then node N is considered as a leaf. It is labeled with that class. It is occurring in steps 2 and 3. Steps 4 and 5 are terminating conditions. IF this condition does not follow then the algorithm calls Attribute_selection_method to determine the splitting criterion. This criterion determines the best way to partition the tuples in D into individual classes(step 6). Step 7 serves as a test at the node. In steps 10 and 11, tuples in D are partitioned.


Advantages of using ID3

  • Understandable prediction rules are created from the training data.
  • Builds the fastest tree.
  • Builds a short tree.
  • Only need to test enough attributes until all data is classified.
  • Finding leaf nodes enables test data to be pruned, reducing the number of tests.
  • The whole dataset is searched to create a tree.


Disadvantages of using ID3

  • Data may be over-fitted or over-classified if a small sample is tested.
  • Only one attribute at a time is tested for making a decision.
  • Classifying continuous data may be computationally expensive, as many trees must be generated to see where to break the continuum.

OR,

Iterative Dichotomiser 3 (ID3) Algorithm
ID3 stands for Iterative Dichotomiser 3 and is named such because the algorithm iteratively  (repeatedly) dichotomizes (divides) features into two or more groups at each step. Invented by Ross Quinlan, ID3 uses a top-down greedy approach to build a decision tree. In simple words, the top-down approach means that we start building the tree from the top and the greedy approach means that at each iteration we select the best feature at the present moment to create a node. ID3 algorithm at the start puts all the training tuples at the root. Then, the tuples are partitioned recursively based on selected attributes called the test attributes. Test attributes are selected on the basis of a statistical measure (e.g., information gain). ID3 algorithm stops partitioning if all samples for a given node belong to the same class, there are no remaining attributes for further partitioning (majority voting is employed for classifying the leaf), or there are no samples left.


Generate Decision Tree (Sample D, Attribute_List A)
1. Create a node N 
2. If all samples are of the same class C then label N with C; terminate; 
3. If A is empty then label N with the most common class C in D (majority voting); terminate; 
4. Select ae A, with the highest information gain; Label N with a;
5. For each value v of a:
a. Grow a branch from N with condition av;
b. Let D, be the subset of samples in D with a=v;
C. If D is empty then attach a leaf labeled with the most common class in D;
d. Else attach the node generated by GenenerateDecision Tree(D., A-a)


Comments

Popular posts from this blog

What are different steps used in JDBC? Write down a small program showing all steps.

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Discuss classification or taxonomy of virtualization at different levels.