Write the limitation of Apriori algorithm.

Limitations: Apriori algorithm is easy to understand and its' join and Prune steps are easy to implement on large itemsets in large databases. Along with these advantages, it has a number of limitations. These are:

1. a Huge number of candidates: The candidate generation is the inherent cost of the Apriori Algorithms, no matter what implementation technique is applied. It is costly to handle a huge number of candidate sets. For example, if there are 10^4 large 1-itemsets, the Apriori algorithm will need to generate more than 10^7 candidate 2-itemsets. Moreover, for 100 itemsets, it must generate more than 2^100 which is approximately 100 candidates in total. 

2.  Multiple scans of transaction database so, to mine large data sets for long patterns this algorithm is not a good choice.

 3. When the Database is scanned to check C_k, for creating F_k, a large number of transactions will be scanned even they do not contain any k-itemset.

                                        OR,

Limitations of Apriori Algorithm

Apriori Algorithm can be slow. The main limitation is the time required to hold a vast number of candidate sets with many frequent itemsets, low minimum support, or large itemsets i.e. it is not an efficient approach for a large number of datasets. For example, if there are 10^4 from frequent 1- itemsets, it needs to generate more than 10^7 candidates into 2-length which in turn they will be tested and accumulate. Furthermore, to detect a frequent pattern in size 100 i.e. v1, v2… v100, it has to generate 2^100 candidate itemsets that yield costly and wasting of time of candidate generation. So, it will check for many sets from candidate itemsets, also it will scan the database many times repeatedly for finding candidate itemsets. Apriori will be very low and inefficient when memory capacity is limited with the large number of transactions. 

                          OR,

 LIMITATIONS OF THE APRIORI ALGORITHM

One of the biggest limitations of the Apriori Algorithm is that it is slow. This is so because of the bare decided by the 

A large number of itemsets in the Apriori algorithm dataset.

Low minimum support in the data set for the Apriori algorithm.

The time needed to hold a large number of candidate sets with many frequent itemsets.

Thus it is inefficient when used with large volumes of datasets.

As an example, if we assume there is a frequent-1 itemset with 10^4 from the set. The Apriori algorithm code needs to generate greater than 10^7 candidates with a 2-length which will then be tested and collected as an accumulation. To detect a size frequent pattern of size 100 (having v1, v2… v100) the algorithm generates 2^100 possible itemsets or candidates which is an example of an application of the Apriori algorithm.

Hence, the yield costs escalate and a lot of time is wasted in candidate generation aka time complexity of the Apriori algorithm. Also, in its attempts to improve the Apriori algorithm to check the many candidate itemsets obtained from the many sets, it scans the database many times using expensive resources. This in turn impacts the algorithm when the system memory is insufficient and there are a large number of frequent transactions. That’s why the algorithm becomes inefficient and slow with large database

Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

What is national data warehouse? What is census data?