Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

2,670 views

Published on

Visuals for the Cascalog workshop on February 19th, 2011.

Published in:
Technology

No Downloads

Total views

2,670

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

47

Comments

0

Likes

6

No embeds

No notes for slide

- 1. Cascalog Workshop
- 2. Example query
- 3. Execution1. Pre-aggregation2. Aggregation3. Post-aggregation
- 4. Variable dependencies
- 5. Pre-aggregation• Start from generator variables• Resolve as many variables as possible using: • Joins • Functions• Use as many ﬁlters as possible• Join all sources into one set of tuples
- 6. Aggregation• Group by resolved output variables• Apply all aggregators to each group
- 7. Post-aggregation• Resolve the rest of the variables• Apply rest of ﬁlters
- 8. Example query
- 9. Query planner Start with generators
- 10. Query planner [?person2 ?age2 ?double-age2]Add functions and ﬁlters until ﬁxed point
- 11. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Do a join
- 12. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2]Add functions and ﬁlters until ﬁxed point
- 13. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2][?person1 ?age1 ?person2 ?age2 ?double-age2] Do a join
- 14. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and ﬁlters until ﬁxed point
- 15. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Group by already satisﬁed output vars
- 16. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
- 17. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and ﬁlters until ﬁxed point
- 18. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project ﬁelds to [?delta ?count]
- 19. Cascading pipes• Each: can occur in Map or Reduce• GroupBy: Causes a Reduce step• Every: One or more follow GroupBy• CoGroup: Join implementation, causes Reduce step
- 20. To Cascading
- 21. To Cascading Each [?person2 ?age2 ?double-age2]
- 22. To Cascading [?person2 ?age2 ?double-age2] CoGroup [?person1 ?person2 ?age2 ?double-age2]
- 23. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] CoGroup[?person1 ?age1 ?person2 ?age2 ?double-age2]
- 24. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Each Each[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
- 25. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta GroupBy[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
- 26. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Every Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
- 27. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] Each[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
- 28. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Each Project ﬁelds to [?delta ?count]
- 29. To MapReduce [?person2 ?age2 ?double-age2] Job 1 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project ﬁelds to [?delta ?count]
- 30. To MapReduce [?person2 ?age2 ?double-age2] Job 2 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project ﬁelds to [?delta ?count]
- 31. To MapReduce [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Job 3 Project ﬁelds to [?delta ?count]
- 32. defmapop[A1, B1, C1] [A1, B1, C1, D1, E1][A2, B2, C2] [A2, B2, C2, D2, E2][A3, B3, C3] [A3, B3, C3, D3, E3] Appends ﬁelds to tuple
- 33. defﬁlterop[A1, B1, C1] true [A1, B1, C1][A2, B2, C2] false [A3, B3, C3][A3, B3, C3] true
- 34. defmapcatop [ [“a red dog”, “a”] [“a red dog”, “a”][“a red dog”] [“a red dog”, “red”] [“a red dog”, “dog”] ] [“a red dog”, “red”] [“ ”] [] [“a red dog”, “dog”] [“hello”, “hello”] [“hello”] [ [“hello”, “hello”] ] Map Concat
- 35. Aggregators[“key1”, 1] [“key1”, 1] [“key1”, 3][“key3”, 3] [“key1”, 2]Map Task 1 Reduce Task 1[“key2”, 3] [“key2”, 3] [“key2”, 3][“key1”, 2] [“key3”, 3] [“key3”, 4][“key3”, 1] [“key3”, 1]Map Task 2 Reduce Task 2Regular aggregators - all data goes to reducers
- 36. defparallelagg [“nathan”] [“nathan”, 1] [“nathan”, 2] [“alice”] [“alice”, 1] [“nathan”, 3] [“alice”, 1] [“nathan”] [“nathan”, 1] Map Task 1 Map Task 1 Map Task 1 Reduce Task 1 Combine Combine Init (Map) (Reduce) [“sally”, 1] [“nathan”] [“nathan”, 1] [“nathan”, 1] [“alice”, 1] [“sally”] [“sally”, 1] [“sally”, 1] Map Task 2 Map Task 2 Map Task 2 Reduce Task 2Parallel aggregators - partial aggregation done in mappers
- 37. combine[1] [3][2] [4][3] [5] [1] [2] [3] [3] [4] [5]
- 38. union[1] [3][2] [4][3] [5] [1] [2] [3] [4] [5]
- 39. ElephantDB Shard 0 Shard 1 Shard 2 DistributedKey/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
- 40. ElephantDBDFS ElephantDB ServerShard 0Shard 1Shard 2 ElephantDB ServerShard 3Shard 4Shard 5 ElephantDB Server Serving domain of data

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment