Explain Decision Tree Learning Algorithm - ID3 (Iterative Dichotomiser).
Decision Tree Learning Algorithm - ID3
ID3 (Iterative Dichotomiser) is a simple decision tree learning algorithm developed by Ross Quinlan (1983). ID3 follows a non-backtracking approach in which decision trees are constructed in a top-down recursive “divide and conquer” manner to test each attribute at every tree node. This approach starts with a training set of tuples and their associated class labels. The training set is recursively partitioned into smaller subsets as the tree is being built.
ID3 Algorithm:
1. create a node N;
2. if tuples in D are all of the same class C then
3. return N as a leaf node labeled with the class C;
4. if attribute_list is empty then
5. return N as a leaf node labeled with the majority class in D;
6.apply Attribute_selection_method(D, attribute_list) to find the
"best"splitting_criterion;
label node N with splitting_criterion;
7. if splitting_attribute is discrete-valued and
multiway splits allowed then //not restricted to binary trees
attribute_list (arrow mark) attribute _ list - splitting_attribute;
8. for each outcome j of splitting_criterion
9. let (symbol)be the set of data tuples in D satisfying outcome j; // partition
10. if (symbol) is empty then
attach a leaf labeled with the majority class in D to node N;
11. else attach the node returned by
Generate_decision_tree(symbol,attribute_list)to node N;
endfor
return N;
Explanation of Algorithm:
The above algorithm has three parameters D, attribute_list, and attribute_selection_method. D is data partition. It is a set of training tuples and their associated class labels. Attribute_list contains a list of attributes describing the tuples.
Now tree starts as a single node N. It represents the training tuples in D. If the tuples in D are all of the same class then node N is considered as a leaf. It is labeled with that class. It is occurring in steps 2 and 3. Steps 4 and 5 are terminating conditions. IF this condition does not follow then the algorithm calls Attribute_selection_method to determine the splitting criterion. This criterion determines the best way to partition the tuples in D into individual classes(step 6). Step 7 serves as a test at the node. In steps 10 and 11, tuples in D are partitioned.
Advantages of using ID3
- Understandable prediction rules are created from the training data.
- Builds the fastest tree.
- Builds a short tree.
- Only need to test enough attributes until all data is classified.
- Finding leaf nodes enables test data to be pruned, reducing the number of tests.
- The whole dataset is searched to create a tree.
Disadvantages of using ID3
- Data may be over-fitted or over-classified if a small sample is tested.
- Only one attribute at a time is tested for making a decision.
- Classifying continuous data may be computationally expensive, as many trees must be generated to see where to break the continuum.
Comments
Post a Comment