Visuals for the Cascalog workshop on February 19th, 2011.

- 1. Cascalog Workshop
- 2. Example query
- 3. Execution1. Pre-aggregation2. Aggregation3. Post-aggregation
- 4. Variable dependencies
- 5. Pre-aggregation• Start from generator variables• Resolve as many variables as possible using: • Joins • Functions• Use as many ﬁlters as possible• Join all sources into one set of tuples
- 6. Aggregation• Group by resolved output variables• Apply all aggregators to each group
- 7. Post-aggregation• Resolve the rest of the variables• Apply rest of ﬁlters
- 8. Example query
- 9. Query planner Start with generators
- 10. Query planner [?person2 ?age2 ?double-age2]Add functions and ﬁlters until ﬁxed point
- 11. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Do a join
- 12. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2]Add functions and ﬁlters until ﬁxed point
- 13. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2][?person1 ?age1 ?person2 ?age2 ?double-age2] Do a join
- 14. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and ﬁlters until ﬁxed point
- 15. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Group by already satisﬁed output vars
- 16. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
- 17. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and ﬁlters until ﬁxed point
- 18. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project ﬁelds to [?delta ?count]
- 19. Cascading pipes• Each: can occur in Map or Reduce• GroupBy: Causes a Reduce step• Every: One or more follow GroupBy• CoGroup: Join implementation, causes Reduce step
- 20. To Cascading
- 21. To Cascading Each [?person2 ?age2 ?double-age2]
- 22. To Cascading [?person2 ?age2 ?double-age2] CoGroup [?person1 ?person2 ?age2 ?double-age2]
- 23. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] CoGroup[?person1 ?age1 ?person2 ?age2 ?double-age2]
- 24. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Each Each[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
- 25. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta GroupBy[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
- 26. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Every Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
- 27. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] Each[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
- 28. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Each Project ﬁelds to [?delta ?count]
- 29. To MapReduce [?person2 ?age2 ?double-age2] Job 1 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project ﬁelds to [?delta ?count]
- 30. To MapReduce [?person2 ?age2 ?double-age2] Job 2 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project ﬁelds to [?delta ?count]
- 31. To MapReduce [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Job 3 Project ﬁelds to [?delta ?count]
- 32. defmapop[A1, B1, C1] [A1, B1, C1, D1, E1][A2, B2, C2] [A2, B2, C2, D2, E2][A3, B3, C3] [A3, B3, C3, D3, E3] Appends ﬁelds to tuple
- 33. defﬁlterop[A1, B1, C1] true [A1, B1, C1][A2, B2, C2] false [A3, B3, C3][A3, B3, C3] true
- 34. defmapcatop [ [“a red dog”, “a”] [“a red dog”, “a”][“a red dog”] [“a red dog”, “red”] [“a red dog”, “dog”] ] [“a red dog”, “red”] [“ ”] [] [“a red dog”, “dog”] [“hello”, “hello”] [“hello”] [ [“hello”, “hello”] ] Map Concat
- 35. Aggregators[“key1”, 1] [“key1”, 1] [“key1”, 3][“key3”, 3] [“key1”, 2]Map Task 1 Reduce Task 1[“key2”, 3] [“key2”, 3] [“key2”, 3][“key1”, 2] [“key3”, 3] [“key3”, 4][“key3”, 1] [“key3”, 1]Map Task 2 Reduce Task 2Regular aggregators - all data goes to reducers
- 36. defparallelagg [“nathan”] [“nathan”, 1] [“nathan”, 2] [“alice”] [“alice”, 1] [“nathan”, 3] [“alice”, 1] [“nathan”] [“nathan”, 1] Map Task 1 Map Task 1 Map Task 1 Reduce Task 1 Combine Combine Init (Map) (Reduce) [“sally”, 1] [“nathan”] [“nathan”, 1] [“nathan”, 1] [“alice”, 1] [“sally”] [“sally”, 1] [“sally”, 1] Map Task 2 Map Task 2 Map Task 2 Reduce Task 2Parallel aggregators - partial aggregation done in mappers
- 37. combine[1] [3][2] [4][3] [5] [1] [2] [3] [3] [4] [5]
- 38. union[1] [3][2] [4][3] [5] [1] [2] [3] [4] [5]
- 39. ElephantDB Shard 0 Shard 1 Shard 2 DistributedKey/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
- 40. ElephantDBDFS ElephantDB ServerShard 0Shard 1Shard 2 ElephantDB ServerShard 3Shard 4Shard 5 ElephantDB Server Serving domain of data

