Cascalog workshop

  • 1,969 views
Uploaded on

Visuals for the Cascalog workshop on February 19th, 2011.

Visuals for the Cascalog workshop on February 19th, 2011.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,969
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
38
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. Cascalog Workshop
  • 2. Example query
  • 3. Execution1. Pre-aggregation2. Aggregation3. Post-aggregation
  • 4. Variable dependencies
  • 5. Pre-aggregation• Start from generator variables• Resolve as many variables as possible using: • Joins • Functions• Use as many filters as possible• Join all sources into one set of tuples
  • 6. Aggregation• Group by resolved output variables• Apply all aggregators to each group
  • 7. Post-aggregation• Resolve the rest of the variables• Apply rest of filters
  • 8. Example query
  • 9. Query planner Start with generators
  • 10. Query planner [?person2 ?age2 ?double-age2]Add functions and filters until fixed point
  • 11. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Do a join
  • 12. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2]Add functions and filters until fixed point
  • 13. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2][?person1 ?age1 ?person2 ?age2 ?double-age2] Do a join
  • 14. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
  • 15. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Group by already satisfied output vars
  • 16. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
  • 17. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
  • 18. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 19. Cascading pipes• Each: can occur in Map or Reduce• GroupBy: Causes a Reduce step• Every: One or more follow GroupBy• CoGroup: Join implementation, causes Reduce step
  • 20. To Cascading
  • 21. To Cascading Each [?person2 ?age2 ?double-age2]
  • 22. To Cascading [?person2 ?age2 ?double-age2] CoGroup [?person1 ?person2 ?age2 ?double-age2]
  • 23. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] CoGroup[?person1 ?age1 ?person2 ?age2 ?double-age2]
  • 24. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Each Each[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 25. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta GroupBy[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 26. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Every Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
  • 27. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] Each[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 28. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Each Project fields to [?delta ?count]
  • 29. To MapReduce [?person2 ?age2 ?double-age2] Job 1 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 30. To MapReduce [?person2 ?age2 ?double-age2] Job 2 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 31. To MapReduce [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Job 3 Project fields to [?delta ?count]
  • 32. defmapop[A1, B1, C1] [A1, B1, C1, D1, E1][A2, B2, C2] [A2, B2, C2, D2, E2][A3, B3, C3] [A3, B3, C3, D3, E3] Appends fields to tuple
  • 33. deffilterop[A1, B1, C1] true [A1, B1, C1][A2, B2, C2] false [A3, B3, C3][A3, B3, C3] true
  • 34. defmapcatop [ [“a red dog”, “a”] [“a red dog”, “a”][“a red dog”] [“a red dog”, “red”] [“a red dog”, “dog”] ] [“a red dog”, “red”] [“ ”] [] [“a red dog”, “dog”] [“hello”, “hello”] [“hello”] [ [“hello”, “hello”] ] Map Concat
  • 35. Aggregators[“key1”, 1] [“key1”, 1] [“key1”, 3][“key3”, 3] [“key1”, 2]Map Task 1 Reduce Task 1[“key2”, 3] [“key2”, 3] [“key2”, 3][“key1”, 2] [“key3”, 3] [“key3”, 4][“key3”, 1] [“key3”, 1]Map Task 2 Reduce Task 2Regular aggregators - all data goes to reducers
  • 36. defparallelagg [“nathan”] [“nathan”, 1] [“nathan”, 2] [“alice”] [“alice”, 1] [“nathan”, 3] [“alice”, 1] [“nathan”] [“nathan”, 1] Map Task 1 Map Task 1 Map Task 1 Reduce Task 1 Combine Combine Init (Map) (Reduce) [“sally”, 1] [“nathan”] [“nathan”, 1] [“nathan”, 1] [“alice”, 1] [“sally”] [“sally”, 1] [“sally”, 1] Map Task 2 Map Task 2 Map Task 2 Reduce Task 2Parallel aggregators - partial aggregation done in mappers
  • 37. combine[1] [3][2] [4][3] [5] [1] [2] [3] [3] [4] [5]
  • 38. union[1] [3][2] [4][3] [5] [1] [2] [3] [4] [5]
  • 39. ElephantDB Shard 0 Shard 1 Shard 2 DistributedKey/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
  • 40. ElephantDBDFS ElephantDB ServerShard 0Shard 1Shard 2 ElephantDB ServerShard 3Shard 4Shard 5 ElephantDB Server Serving domain of data