Your SlideShare is downloading. ×
0
Apriori Algorithm   Hash Based and Graph Based Modifications
Agenda <ul><li>Data Mining </li></ul><ul><li>Association Rule  </li></ul><ul><li>Ariori Algorithm </li></ul><ul><li>Hash B...
Data Mining <ul><li>Data mining is the process of extracting patterns (knowledge) from  data .  The aim of data mining is ...
Association Rules <ul><li>Association rule learning is a popular and well researched method for discovering interesting re...
Problem Description <ul><li>I = {A, B, C} </li></ul><ul><li>Possible Item sets: </li></ul>
Support and Confidence <ul><li>Support (A -> B) =  </li></ul><ul><li>No. of transactions containing A & B </li></ul><ul><l...
Original Apriori Algorithm
<ul><li>L1 = {large 1-itemsetsg} </li></ul><ul><li>for  ( k = 2; Lk !=0,  k++ ) </li></ul><ul><li>do begin </li></ul><ul><...
 
Hash based method   <ul><li>Repeat   //for each transaction of the database </li></ul><ul><li>{ </li></ul><ul><li>D = { se...
Graph based approach   <ul><li>Procedure FrequentItemGraph (Tree, F) </li></ul><ul><li>{ </li></ul><ul><li>scan  the DB on...
Example: a c g T4 b c e f g T3 a b c e f g T2 b e T1 Items Transaction
Conclusion and Future Work <ul><li>In order to be able to continue with the hashing method, we need a perfect hash functio...
<ul><li>Use hashing techniques to find the efficient frequent 2-itemsets in order to reduce the time and memory requiremen...
Upcoming SlideShare
Loading in...5
×

Apriori algorithm

5,187

Published on

this is my first trial

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,187
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
238
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Apriori algorithm"

  1. 1. Apriori Algorithm Hash Based and Graph Based Modifications
  2. 2. Agenda <ul><li>Data Mining </li></ul><ul><li>Association Rule </li></ul><ul><li>Ariori Algorithm </li></ul><ul><li>Hash Based Method </li></ul><ul><li>Graph Based Approach </li></ul><ul><li>Conclusion and Future Work </li></ul>
  3. 3. Data Mining <ul><li>Data mining is the process of extracting patterns (knowledge) from data . The aim of data mining is to automate the process of finding interesting patterns and trends from a given data. It is seen as an increasingly important tool by modern business to transform data into business intelligence giving an informational advantage. It is currently used in a wide range of profiling practices , scientific discovery, and decision making. </li></ul>
  4. 4. Association Rules <ul><li>Association rule learning is a popular and well researched method for discovering interesting relations between variables in large database. </li></ul><ul><li>For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy burger. Such information can be used as the basis for decisions about marketing activities. </li></ul>
  5. 5. Problem Description <ul><li>I = {A, B, C} </li></ul><ul><li>Possible Item sets: </li></ul>
  6. 6. Support and Confidence <ul><li>Support (A -> B) = </li></ul><ul><li>No. of transactions containing A & B </li></ul><ul><li>________________________________ </li></ul><ul><li>No. of total transactions </li></ul><ul><li>Confidence (A -> B) = </li></ul><ul><li>No. of transactions containing A & B </li></ul><ul><li>_________________________________ </li></ul><ul><li>No. of transactions containing A </li></ul>
  7. 7. Original Apriori Algorithm
  8. 8. <ul><li>L1 = {large 1-itemsetsg} </li></ul><ul><li>for ( k = 2; Lk !=0, k++ ) </li></ul><ul><li>do begin </li></ul><ul><li>Ck = apriori-gen(Lk-1 ); // New candidates </li></ul><ul><li>for all transactions t Є D </li></ul><ul><li>do begin </li></ul><ul><li>Ct = subset (Ck , t); // Candidates contained in t </li></ul><ul><li>for all candidates c Є Ct do </li></ul><ul><li>c.count++; </li></ul><ul><li>end </li></ul><ul><li>Lk = {c Є Ck | c.count >= minsup} </li></ul><ul><li>end </li></ul>
  9. 10. Hash based method <ul><li>Repeat //for each transaction of the database </li></ul><ul><li>{ </li></ul><ul><li>D = { set of all possible k-itemsets in the ith transaction} </li></ul><ul><li>For each element of D </li></ul><ul><li>{ </li></ul><ul><li>Find a unique integer uniq_int using thehash function for k-itemset </li></ul><ul><li>Increment freq[uniq_int] </li></ul><ul><li>} </li></ul><ul><li>Increment trans_pos </li></ul><ul><li>//Moves pointer to next transaction until end_of_file </li></ul><ul><li>For (freq_ind=0; freq_ind<length_of_the_array(two_three_freq[]); freq_ind++) </li></ul><ul><li>{ </li></ul><ul><li>if (freq[freq_ind] >= required support) </li></ul><ul><li>mark the corresponding k-itemset </li></ul><ul><li>} </li></ul><ul><li>} </li></ul>
  10. 11. Graph based approach <ul><li>Procedure FrequentItemGraph (Tree, F) </li></ul><ul><li>{ </li></ul><ul><li>scan the DB once to collect the frequent 2-itemsets </li></ul><ul><li>and their support ascending; </li></ul><ul><li>add all items in the DB as the header nodes </li></ul><ul><li>for each 2-itemset entry (top down order) in freq2list </li></ul><ul><li>do </li></ul><ul><li>if (first item = item in header node) then </li></ul><ul><li>create a link to the corresponding header node i=3 </li></ul><ul><li>for each i-itemsets entry in the tree </li></ul><ul><li>do </li></ul><ul><li>call buildsubtree (F) </li></ul><ul><li>end } </li></ul><ul><li>Procedure buildsubtree (F) </li></ul><ul><li>If (first i-1 itemset = itemsets in their respective header nodes) then </li></ul><ul><li>create a link to the corresponding header node i=i+1 </li></ul><ul><li>repeat buildsubtree (F) </li></ul><ul><li>end } </li></ul>
  11. 12. Example: a c g T4 b c e f g T3 a b c e f g T2 b e T1 Items Transaction
  12. 13. Conclusion and Future Work <ul><li>In order to be able to continue with the hashing method, we need a perfect hash function h(e1, e2,…, ek), this hash function can be obtained by one of the following methods: </li></ul><ul><li>h(e1,e2,…,ek) = prm(1)^e1 + prm(2)^e2 + … + prm(k)^ek </li></ul><ul><li>Where prm is the set of prime numbers, prm = {2, 3, 5, 7…} </li></ul><ul><li>Although this hash function guarantee a unique key for every itemset, but it requires an irrational memory space, for example, consider an original item set X with only 10 items, and the following T hashed item set, T = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, consider a 4-itemset (1, 2, 3, 10), this item set will be hashed to the value “282475385”, this will result in reserving large memory space without being used effectively. </li></ul><ul><li>Other perfect hash functions, used in hashing strings, are not applicable here, because the input variables are limited to 26, which is the number of alphabets, while the number of items in a certain database can be very larger than this. </li></ul>
  13. 14. <ul><li>Use hashing techniques to find the efficient frequent 2-itemsets in order to reduce the time and memory requirements to build a graphical structure </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×