Implementation of Classifier tool in Twister (Iterative MapReduce)


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Implementation of Classifier tool in Twister (Iterative MapReduce)

  1. 1. Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman
  2. 2. Apriori• Generating 1-itemset Frequent Pattern
  3. 3. Apriori• Generating 2-itemset Frequent Pattern
  4. 4. Apriori• Generating 3-itemset Frequent Pattern
  5. 5. Twister• Iterative Mapreduce• Configure once use many times• Map -> Reduce -> Combine• Static data configured with partition file reused through iterations• Provides Fault tolerant solution
  6. 6. Twister
  7. 7. Implementation• Candidate generation• Map• Reduce• Combine• Generate frequent items• Iterate
  8. 8. Data Structures• Vector• String delimited by coma• StringValue• HashMap<String, Integer>
  9. 9. Inputs• Configuration file – Number of items & transactions – Minimum support count %• Partition file – Split data – Number of items & transactions
  10. 10. InputsNumber of transactions Number of Items
  11. 11. Challenges• Twister API – StringValue – Vector<String> – StringVector • toByte, fromByte
  12. 12. Challenges• runMapReduce()• runMapReduce(List<KeyValuePair>)• runMapReduceBCast(StringValue)
  13. 13. Time vs. Transactions Time vs Transactions141210 8 Time vs Transactions 6 4 2 0 10000 20000 30000
  14. 14. Time vs. Itemsets Time vs Item sets 250 200 150 Time vs Item setsSeconds 100 50 0 25 50 75 Itemsets
  15. 15. Time vs. Itemsets Time vs Item sets 250 200 150 5 Mappers Time vs Item setsSeconds 100 50 20 Mappers 0 25 50 75 Itemsets
  16. 16. Implementation of Classifier Tool in Twister Magesh khanna Vadivelu, Shivaraman Janakiraman, shivjana@indiana.eduMotivation: Architecture: Results: Time vs. Itemsets.Mining frequent item-sets from large-scale databases has emerged as animportant problem in the data miningand knowledge discovery researchcommunity. To overcome thisproblem, we have proposed toimplement Apriori algorithm, aclassification algorithm, in Twister, a Twister has several components. Clientdistributed framework, that makes use Time vs. Transactions. side is to drive MapReduce jobs.of MapReduce. We specify a map Daemons and workers which live onfunction that processes a key-value pair compute nodes manage MapReduceto generate a set of intermediate key- tasks. Connection betweenvalue pairs, and a reduce function that components are based on SSH andmerges all intermediate values messaging software. To driveassociated with the same intermediate MapReduce jobs, firstly client needs tokey. Our implementation of Apriori configure the job. It configuresalgorithm runs on a large cluster of MapReduce methods to the More transactions increases themachines and is highly scalable. On an job, prepares KeyValue pairs and execution time but not as much asapplication level, we can use this configures static data to MapReduce Itemsets. This behavior is becauseApriori algorithm to identify the pattern tasks through partition file if required. transactions are static data cachedin which customers buy products in a Messages are transmitted through a in memory for each map-reducesupermarket. network of message brokers with cycle. Whereas Itemsets are publish/subscribe mechanism. broadcasted for each map reduce.
  17. 17. Demo
  18. 18. Output
  19. 19. Thank you