<script type="application/ld+json">
{
 "@context": "https://schema.org",
 "@type": "FAQPage",
 "mainEntity": [{
   "@type": "Question",
   "name": "What is the Apriori Algorithm?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "Apriori is an algorithm for frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database."
   }
 },{
   "@type": "Question",
   "name": "What are the advantages of the apriori algorithm?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "1. This is the most simple and easy-to-understand algorithm among association rule learning algorithms.
2. The resulting rules are intuitive and easy to communicate to an end user
3. It doesn't require labeled data as it is fully unsupervised; as a result, you can use it in many different situations because unlabeled data is often more accessible
4. Many extensions were proposed for different use cases based on this implementation—for example, there are association learning algorithms that take into account the ordering of items, their number, and associated times
5. tampsThe algorithm is exhaustive, so it finds all the rules with the specified support and confidence"
   }
 }]
}
</script>

Apriori Algorithm

What is the Apriori Algorithm?

Apriori is an algorithm for frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.

Using breadth-first search and a Hash tree structure, apriori counts candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k-1. Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent k-length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates.

What are the steps in Apriori algorithm?

The following are the main steps of the algorithm:

  • Calculate the support of item sets (of size k = 1) in the transactional database (note that support is the frequency of occurrence of an itemset). 
  • In the first iteration of the algorithm, each item is taken as a 1-itemsets candidate. The algorithm will count the occurrences of each item. This is called generating the candidate set.
  • Let there be some minimum support, min_sup ( eg 2). The set of 1 – itemsets whose occurrence is satisfying the min sup are determined. Only those candidates which count more than or equal to min_sup, are taken ahead for the next iteration and the others are pruned.
  • Prune the candidate set by eliminating items with a support less than the given threshold.
  • Next, 2-itemset frequent items with min_sup are discovered. For this in the join step, the 2-itemset is generated by forming a group of 2 by combining items with itself.
  • Join the frequent itemsets to form sets of size k + 1, and repeat the above sets until no more itemsets can be formed. This will happen when the set(s) formed have a support less than​ the given support.
  • The 2-itemset candidates are pruned using min-sup threshold value. Now the table will have 2 –itemsets with min-sup only.
  • The next iteration will form 3 –itemsets using join and prune step. This iteration will follow antimonotone property where the subsets of 3-itemsets, that is the 2 –itemset subsets of each group fall in min_sup. If all 2-itemset subsets are frequent then the superset will be frequent otherwise it is pruned.
  • Next step will follow making 4-itemset by joining 3-itemset with itself and pruning if its subset does not meet the min_sup criteria. The algorithm is stopped when the most frequent itemset is achieved.

Build an AI chatbot to engage your always-on customers

What are the advantages of the apriori algorithm?

The advantages of apriori are as follows:

  • This is the most simple and easy-to-understand algorithm among association rule learning algorithms
  • The resulting rules are intuitive and easy to communicate to an end user
  • It doesn't require labeled data as it is fully unsupervised; as a result, you can use it in many different situations because unlabeled data is often more accessible
  • Many extensions were proposed for different use cases based on this implementation—for example, there are association learning algorithms that take into account the ordering of items, their number, and associated timestamps
  • The algorithm is exhaustive, so it finds all the rules with the specified support and confidence

What are the disadvantages of the apriori algorithm?

One of the biggest limitations of the Apriori Algorithm is that it is slow. This is so because of the bare decided by the:

  • A large number of itemsets in the Apriori algorithm dataset.
  • Low minimum support in the data set for the Apriori algorithm.
  • The time needed to hold a large number of candidate-sets with many frequent itemsets.
  • Thus it is inefficient when used with large volumes of datasets.

As an example, if we assume there is a frequent-1 itemset with 10^4 from the set. The Apriori algorithm code needs to generate greater than 10^7 candidates with a 2-length which will then be tested and collected as an accumulation. To detect a size frequent pattern of size 100 (having v1, v2… v100) the algorithm generates 2^100 possible itemsets or candidates which is an example of an application of the Apriori algorithm.

Hence, the yield costs escalate and a lot of time wasted in candidate generation aka time complexity of the Apriori algorithm. Also, in its attempts to an improved the Apriori algorithm to check the many candidate itemsets obtained from the many sets, it scans the database many times using expensive resources. This in turn impacts the algorithm when the system memory is insufficient and there are a large number of frequent transactions. That’s why the algorithm becomes inefficient and slow with large databases. 

How can we improve apriori efficiency?

Many methods are available for improving the efficiency of the algorithm.

  • Hash-Based Technique: This method uses a hash-based structure called a hash table for generating the k-itemsets and its corresponding count. It uses a hash function for generating the table.
  • Transaction Reduction: This method reduces the number of transactions scanning in iterations. The transactions which do not contain frequent items are marked or removed.
  • Partitioning: This method requires only two database scans to mine the frequent itemsets. It says that for any itemset to be potentially frequent in the database, it should be frequent in at least one of the partitions of the database.
  • Sampling: This method picks a random sample S from Database D and then searches for frequent itemset in S. It may be possible to lose a global frequent itemset. This can be reduced by lowering the min_sup.
  • Dynamic Itemset Counting: This technique can add new candidate itemsets at any marked start point of the database during the scanning of the database.

Let's build your first AI Chatbot today!


About Engati

Engati powers 45,000+ chatbot & live chat solutions in 50+ languages across the world.

We aim to empower you to create the best customer experiences you could imagine. 

So, are you ready to create unbelievably smooth experiences?

Check us out!

Apriori Algorithm

October 14, 2020

Table of contents

Key takeawaysCollaboration platforms are essential to the new way of workingEmployees prefer engati over emailEmployees play a growing part in software purchasing decisionsThe future of work is collaborativeMethodology

What is the Apriori Algorithm?

Apriori is an algorithm for frequent item set mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.

Using breadth-first search and a Hash tree structure, apriori counts candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k-1. Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent k-length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates.

What are the steps in Apriori algorithm?

The following are the main steps of the algorithm:

  • Calculate the support of item sets (of size k = 1) in the transactional database (note that support is the frequency of occurrence of an itemset). 
  • In the first iteration of the algorithm, each item is taken as a 1-itemsets candidate. The algorithm will count the occurrences of each item. This is called generating the candidate set.
  • Let there be some minimum support, min_sup ( eg 2). The set of 1 – itemsets whose occurrence is satisfying the min sup are determined. Only those candidates which count more than or equal to min_sup, are taken ahead for the next iteration and the others are pruned.
  • Prune the candidate set by eliminating items with a support less than the given threshold.
  • Next, 2-itemset frequent items with min_sup are discovered. For this in the join step, the 2-itemset is generated by forming a group of 2 by combining items with itself.
  • Join the frequent itemsets to form sets of size k + 1, and repeat the above sets until no more itemsets can be formed. This will happen when the set(s) formed have a support less than​ the given support.
  • The 2-itemset candidates are pruned using min-sup threshold value. Now the table will have 2 –itemsets with min-sup only.
  • The next iteration will form 3 –itemsets using join and prune step. This iteration will follow antimonotone property where the subsets of 3-itemsets, that is the 2 –itemset subsets of each group fall in min_sup. If all 2-itemset subsets are frequent then the superset will be frequent otherwise it is pruned.
  • Next step will follow making 4-itemset by joining 3-itemset with itself and pruning if its subset does not meet the min_sup criteria. The algorithm is stopped when the most frequent itemset is achieved.

Build an AI chatbot to engage your always-on customers

What are the advantages of the apriori algorithm?

The advantages of apriori are as follows:

  • This is the most simple and easy-to-understand algorithm among association rule learning algorithms
  • The resulting rules are intuitive and easy to communicate to an end user
  • It doesn't require labeled data as it is fully unsupervised; as a result, you can use it in many different situations because unlabeled data is often more accessible
  • Many extensions were proposed for different use cases based on this implementation—for example, there are association learning algorithms that take into account the ordering of items, their number, and associated timestamps
  • The algorithm is exhaustive, so it finds all the rules with the specified support and confidence

What are the disadvantages of the apriori algorithm?

One of the biggest limitations of the Apriori Algorithm is that it is slow. This is so because of the bare decided by the:

  • A large number of itemsets in the Apriori algorithm dataset.
  • Low minimum support in the data set for the Apriori algorithm.
  • The time needed to hold a large number of candidate-sets with many frequent itemsets.
  • Thus it is inefficient when used with large volumes of datasets.

As an example, if we assume there is a frequent-1 itemset with 10^4 from the set. The Apriori algorithm code needs to generate greater than 10^7 candidates with a 2-length which will then be tested and collected as an accumulation. To detect a size frequent pattern of size 100 (having v1, v2… v100) the algorithm generates 2^100 possible itemsets or candidates which is an example of an application of the Apriori algorithm.

Hence, the yield costs escalate and a lot of time wasted in candidate generation aka time complexity of the Apriori algorithm. Also, in its attempts to an improved the Apriori algorithm to check the many candidate itemsets obtained from the many sets, it scans the database many times using expensive resources. This in turn impacts the algorithm when the system memory is insufficient and there are a large number of frequent transactions. That’s why the algorithm becomes inefficient and slow with large databases. 

How can we improve apriori efficiency?

Many methods are available for improving the efficiency of the algorithm.

  • Hash-Based Technique: This method uses a hash-based structure called a hash table for generating the k-itemsets and its corresponding count. It uses a hash function for generating the table.
  • Transaction Reduction: This method reduces the number of transactions scanning in iterations. The transactions which do not contain frequent items are marked or removed.
  • Partitioning: This method requires only two database scans to mine the frequent itemsets. It says that for any itemset to be potentially frequent in the database, it should be frequent in at least one of the partitions of the database.
  • Sampling: This method picks a random sample S from Database D and then searches for frequent itemset in S. It may be possible to lose a global frequent itemset. This can be reduced by lowering the min_sup.
  • Dynamic Itemset Counting: This technique can add new candidate itemsets at any marked start point of the database during the scanning of the database.

Let's build your first AI Chatbot today!


Share

Continue Reading