What is FP-growth? Discuss FP-growth approach for frequent pattern mining.
FP GROWTH
The main Bottlenecks of the Apriori approach are it uses Breadth-first (i.e., level-wise) search and it often generates and tests a huge number of candidates. One possible better alternative approach for the Apriori approach is the FP-Growth approach uses depth-first search and avoids explicit candidate generation. Its Major philosophy is "grow long patterns from short ones using local frequent items only". For example, if "abc" is a frequent pattern then get all transactions having "abc", and find that "d" is a local frequent item in transactions having "abe" then abcd is also a frequent pattern.
In FP-Growth there are mainly two steps involved first, build a compact data structure called FP-Tree and then, extract frequent itemsets directly from the FP-tree and then, extract frequent itemsets directly from the FP-tree.
1. FP-tree
Construction from a Transactional DB: FP-Tree is constructed using two passes over the data set:
Pass1:
- Scan DB to find frequent 1-itemsets as:
- Scan DB and find support count for each item.
- Discard infrequent items
- Sort items in descending order of their frequency (support count).
- Sort the items in each transaction in descending order of their frequency. Use this order when building the FP-tree, so common prefixes can be shared.
Pass2:
- Scan DB again, construct FP-tree
- FP-growth reads one transaction at a time and maps it to a path.
- Fixed order is used, so path can overlap when transaction share items.
- Pointers are maintained between nodes containing the same item(dotted line)
2. Mining Frequent Patterns Using FP-tree:
- Start from each frequent length-1 pattern (called suffix pattern)
- Construct its conditional pattern base (set of prefix paths in the FP-tree co-occurring with the suffix pattern)
- Then construct its conditional FP-tree.
- The pattern growth is achieved by the concatenation of the suffix pattern with the frequent patterns generated from a conditional FP-tree.
The steps used by the FP-growth approach for FP-tree construction can be expressed in the Flow chart as shown below:
Comments
Post a Comment