Your SlideShare is downloading. ×
0
Cascalog Workshop
Example query
Execution1. Pre-aggregation2. Aggregation3. Post-aggregation
Variable dependencies
Pre-aggregation• Start from generator variables• Resolve as many variables as possible using: • Joins • Functions• Use as ...
Aggregation• Group by resolved output variables• Apply all aggregators to each group
Post-aggregation• Resolve the rest of the variables• Apply rest of filters
Example query
Query planner Start with generators
Query planner          [?person2 ?age2 ?double-age2]Add functions and filters until fixed point
Query planner  [?person2 ?age2 ?double-age2]   [?person1 ?person2 ?age2 ?double-age2]       Do a join
Query planner          [?person2 ?age2 ?double-age2]           [?person1 ?person2 ?age2 ?double-age2]Add functions and filt...
Query planner                              [?person2 ?age2 ?double-age2]                               [?person1 ?person2 ...
Query planner                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?p...
Query planner                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?p...
Query planner                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?p...
Query planner                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?p...
Query planner                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?p...
Cascading pipes• Each: can occur in Map or Reduce• GroupBy: Causes a Reduce step• Every: One or more follow GroupBy• CoGro...
To Cascading
To Cascading              Each [?person2 ?age2 ?double-age2]
To Cascading [?person2 ?age2 ?double-age2]                             CoGroup   [?person1 ?person2 ?age2 ?double-age2]
To Cascading                              [?person2 ?age2 ?double-age2]                               [?person1 ?person2 ?...
To Cascading                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?pe...
To Cascading                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?pe...
To Cascading                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?pe...
To Cascading                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?pe...
To Cascading                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?pe...
To MapReduce                                 [?person2 ?age2 ?double-age2]                                                ...
To MapReduce                                 [?person2 ?age2 ?double-age2]   Job 2                           [?person1 ?pe...
To MapReduce                                 [?person2 ?age2 ?double-age2]                                   [?person1 ?pe...
defmapop[A1, B1, C1]                            [A1, B1, C1, D1, E1][A2, B2, C2]                            [A2, B2, C2, D...
deffilterop[A1, B1, C1]     true                            [A1, B1, C1][A2, B2, C2]     false      [A3, B3, C3][A3, B3, C3...
defmapcatop                      [    [“a red dog”, “a”]                                                               [“a...
Aggregators[“key1”, 1]         [“key1”, 1]                                       [“key1”, 3][“key3”, 3]         [“key1”, 2...
defparallelagg [“nathan”]           [“nathan”, 1]                                                [“nathan”, 2]  [“alice”] ...
combine[1]             [3][2]             [4][3]             [5]        [1]        [2]        [3]        [3]        [4]   ...
union[1]           [3][2]           [4][3]           [5]       [1]       [2]       [3]       [4]       [5]
ElephantDB                                   Shard 0                                   Shard 1                            ...
ElephantDBDFS                       ElephantDB                             ServerShard 0Shard 1Shard 2                   E...
Upcoming SlideShare
Loading in...5
×

Cascalog workshop

2,110

Published on

Visuals for the Cascalog workshop on February 19th, 2011.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,110
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
44
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript of "Cascalog workshop"

    1. 1. Cascalog Workshop
    2. 2. Example query
    3. 3. Execution1. Pre-aggregation2. Aggregation3. Post-aggregation
    4. 4. Variable dependencies
    5. 5. Pre-aggregation• Start from generator variables• Resolve as many variables as possible using: • Joins • Functions• Use as many filters as possible• Join all sources into one set of tuples
    6. 6. Aggregation• Group by resolved output variables• Apply all aggregators to each group
    7. 7. Post-aggregation• Resolve the rest of the variables• Apply rest of filters
    8. 8. Example query
    9. 9. Query planner Start with generators
    10. 10. Query planner [?person2 ?age2 ?double-age2]Add functions and filters until fixed point
    11. 11. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Do a join
    12. 12. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2]Add functions and filters until fixed point
    13. 13. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2][?person1 ?age1 ?person2 ?age2 ?double-age2] Do a join
    14. 14. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
    15. 15. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Group by already satisfied output vars
    16. 16. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
    17. 17. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
    18. 18. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
    19. 19. Cascading pipes• Each: can occur in Map or Reduce• GroupBy: Causes a Reduce step• Every: One or more follow GroupBy• CoGroup: Join implementation, causes Reduce step
    20. 20. To Cascading
    21. 21. To Cascading Each [?person2 ?age2 ?double-age2]
    22. 22. To Cascading [?person2 ?age2 ?double-age2] CoGroup [?person1 ?person2 ?age2 ?double-age2]
    23. 23. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] CoGroup[?person1 ?age1 ?person2 ?age2 ?double-age2]
    24. 24. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Each Each[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
    25. 25. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta GroupBy[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
    26. 26. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Every Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
    27. 27. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] Each[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
    28. 28. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Each Project fields to [?delta ?count]
    29. 29. To MapReduce [?person2 ?age2 ?double-age2] Job 1 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
    30. 30. To MapReduce [?person2 ?age2 ?double-age2] Job 2 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
    31. 31. To MapReduce [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count][?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Job 3 Project fields to [?delta ?count]
    32. 32. defmapop[A1, B1, C1] [A1, B1, C1, D1, E1][A2, B2, C2] [A2, B2, C2, D2, E2][A3, B3, C3] [A3, B3, C3, D3, E3] Appends fields to tuple
    33. 33. deffilterop[A1, B1, C1] true [A1, B1, C1][A2, B2, C2] false [A3, B3, C3][A3, B3, C3] true
    34. 34. defmapcatop [ [“a red dog”, “a”] [“a red dog”, “a”][“a red dog”] [“a red dog”, “red”] [“a red dog”, “dog”] ] [“a red dog”, “red”] [“ ”] [] [“a red dog”, “dog”] [“hello”, “hello”] [“hello”] [ [“hello”, “hello”] ] Map Concat
    35. 35. Aggregators[“key1”, 1] [“key1”, 1] [“key1”, 3][“key3”, 3] [“key1”, 2]Map Task 1 Reduce Task 1[“key2”, 3] [“key2”, 3] [“key2”, 3][“key1”, 2] [“key3”, 3] [“key3”, 4][“key3”, 1] [“key3”, 1]Map Task 2 Reduce Task 2Regular aggregators - all data goes to reducers
    36. 36. defparallelagg [“nathan”] [“nathan”, 1] [“nathan”, 2] [“alice”] [“alice”, 1] [“nathan”, 3] [“alice”, 1] [“nathan”] [“nathan”, 1] Map Task 1 Map Task 1 Map Task 1 Reduce Task 1 Combine Combine Init (Map) (Reduce) [“sally”, 1] [“nathan”] [“nathan”, 1] [“nathan”, 1] [“alice”, 1] [“sally”] [“sally”, 1] [“sally”, 1] Map Task 2 Map Task 2 Map Task 2 Reduce Task 2Parallel aggregators - partial aggregation done in mappers
    37. 37. combine[1] [3][2] [4][3] [5] [1] [2] [3] [3] [4] [5]
    38. 38. union[1] [3][2] [4][3] [5] [1] [2] [3] [4] [5]
    39. 39. ElephantDB Shard 0 Shard 1 Shard 2 DistributedKey/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
    40. 40. ElephantDBDFS ElephantDB ServerShard 0Shard 1Shard 2 ElephantDB ServerShard 3Shard 4Shard 5 ElephantDB Server Serving domain of data
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×