Your SlideShare is downloading. ×
0
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Cascalog workshop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cascalog workshop

2,074

Published on

Visuals for the Cascalog workshop on February 19th, 2011.

Visuals for the Cascalog workshop on February 19th, 2011.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,074
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
42
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. Cascalog Workshop
    • 2. Example query
    • 3. Execution1. Pre-aggregation2. Aggregation3. Post-aggregation
    • 4. Variable dependencies
    • 5. Pre-aggregation• Start from generator variables• Resolve as many variables as possible using: • Joins • Functions• Use as many filters as possible• Join all sources into one set of tuples
    • 6. Aggregation• Group by resolved output variables• Apply all aggregators to each group
    • 7. Post-aggregation• Resolve the rest of the variables• Apply rest of filters
    • 8. Example query
    • 9. Query planner Start with generators
    • 10. Query planner [?person2 ?age2 ?double-age2]Add functions and filters until fixed point
    • 11. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Do a join
    • 12. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2]Add functions and filters until fixed point
    • 13. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2][?person1 ?age1 ?person2 ?age2 ?double-age2] Do a join
    • 14. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
    • 15. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Group by already satisfied output vars
    • 16. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
    • 17. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
    • 18. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
    • 19. Cascading pipes• Each: can occur in Map or Reduce• GroupBy: Causes a Reduce step• Every: One or more follow GroupBy• CoGroup: Join implementation, causes Reduce step
    • 20. To Cascading
    • 21. To Cascading Each [?person2 ?age2 ?double-age2]
    • 22. To Cascading [?person2 ?age2 ?double-age2] CoGroup [?person1 ?person2 ?age2 ?double-age2]
    • 23. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] CoGroup[?person1 ?age1 ?person2 ?age2 ?double-age2]
    • 24. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Each Each[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
    • 25. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta GroupBy[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
    • 26. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Every Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
    • 27. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] Each[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
    • 28. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Each Project fields to [?delta ?count]
    • 29. To MapReduce [?person2 ?age2 ?double-age2] Job 1 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
    • 30. To MapReduce [?person2 ?age2 ?double-age2] Job 2 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
    • 31. To MapReduce [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Job 3 Project fields to [?delta ?count]
    • 32. defmapop[A1, B1, C1] [A1, B1, C1, D1, E1][A2, B2, C2] [A2, B2, C2, D2, E2][A3, B3, C3] [A3, B3, C3, D3, E3] Appends fields to tuple
    • 33. deffilterop[A1, B1, C1] true [A1, B1, C1][A2, B2, C2] false [A3, B3, C3][A3, B3, C3] true
    • 34. defmapcatop [ [“a red dog”, “a”] [“a red dog”, “a”][“a red dog”] [“a red dog”, “red”] [“a red dog”, “dog”] ] [“a red dog”, “red”] [“ ”] [] [“a red dog”, “dog”] [“hello”, “hello”] [“hello”] [ [“hello”, “hello”] ] Map Concat
    • 35. Aggregators[“key1”, 1] [“key1”, 1] [“key1”, 3][“key3”, 3] [“key1”, 2]Map Task 1 Reduce Task 1[“key2”, 3] [“key2”, 3] [“key2”, 3][“key1”, 2] [“key3”, 3] [“key3”, 4][“key3”, 1] [“key3”, 1]Map Task 2 Reduce Task 2Regular aggregators - all data goes to reducers
    • 36. defparallelagg [“nathan”] [“nathan”, 1] [“nathan”, 2] [“alice”] [“alice”, 1] [“nathan”, 3] [“alice”, 1] [“nathan”] [“nathan”, 1] Map Task 1 Map Task 1 Map Task 1 Reduce Task 1 Combine Combine Init (Map) (Reduce) [“sally”, 1] [“nathan”] [“nathan”, 1] [“nathan”, 1] [“alice”, 1] [“sally”] [“sally”, 1] [“sally”, 1] Map Task 2 Map Task 2 Map Task 2 Reduce Task 2Parallel aggregators - partial aggregation done in mappers
    • 37. combine[1] [3][2] [4][3] [5] [1] [2] [3] [3] [4] [5]
    • 38. union[1] [3][2] [4][3] [5] [1] [2] [3] [4] [5]
    • 39. ElephantDB Shard 0 Shard 1 Shard 2 DistributedKey/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
    • 40. ElephantDBDFS ElephantDB ServerShard 0Shard 1Shard 2 ElephantDB ServerShard 3Shard 4Shard 5 ElephantDB Server Serving domain of data

    ×