Explain Data Discretization and concept hierarchy with example also explain types of Data Discretization
Data Discretization
Discretization techniques can be used to reduce the number of values for a given continuous attribute, by dividing the attribute into a range of intervals. Interval value labels can be used to replace actual data values. These methods are typically recursive, where a large amount of time is spent sorting the data at each step. The smaller the number of distinct values to sort, the fast these methods should be. Here numerous continuous attribute values are replaced by small interval labels. This leads to a concise, easy-to-use, knowledge-level representation of mining results.. Many discretization techniques can be applied recursively in order to provide a hierarchical or multiresolution partitioning of the attribute values known as concept hierarchy.
Example: We have an attribute of age with the following values:
Age: 10, 11, 13, 14, 17, 19, 30, 31, 32, 38, 40, 42, 70, 72, 73, 75
Data Discretization can be categorized into the following two types:
Top-down discretization: If the process starts by first finding one or a few points (called split points or cut points) to split the entire attribute range, and then repeats this recursively on the resulting intervals, then it is called top-down discretization or splitting
Bottom-up discretization: If the process starts by considering all of the continuous values as potential split-points, removes some by merging neighborhood values to form intervals, then it is called bottom-up discretization or merging. Many discretization techniques can be applied recursively in order to provide a hierarchical or multi-resolution partitioning of the attribute values known as concept hierarchy or splitting.
Concept hierarchy
A concept hierarchy for a given numeric attribute defines a discretization of the attribute. Concept hierarchies can be used to reduce the data collecting and replace low-level concepts (such as numeric value for the attribute age) with higher-level concepts (such as young, middle-aged, or senior). Although detail is lost by such generalization, it becomes meaningful and is easier to interpret.
Comments
Post a Comment