SlideShare a Scribd company logo
Cascalog Workshop
Example query
Execution

1. Pre-aggregation
2. Aggregation
3. Post-aggregation
Variable dependencies
Pre-aggregation
• Start from generator variables
• Resolve as many variables as possible using:
 • Joins
 • Functions
• Use as many filters as possible
• Join all sources into one set of tuples
Aggregation


• Group by resolved output variables
• Apply all aggregators to each group
Post-aggregation


• Resolve the rest of the variables
• Apply rest of filters
Example query
Query planner




 Start with generators
Query planner

          [?person2 ?age2 ?double-age2]




Add functions and filters until fixed point
Query planner

  [?person2 ?age2 ?double-age2]

   [?person1 ?person2 ?age2 ?double-age2]




       Do a join
Query planner

          [?person2 ?age2 ?double-age2]

           [?person1 ?person2 ?age2 ?double-age2]




Add functions and filters until fixed point
Query planner

                              [?person2 ?age2 ?double-age2]

                               [?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]




                                   Do a join
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]




[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

               Add functions and filters until fixed point
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta


[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                 Group by already satisfied output vars
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                    Execute aggregators on each group
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

               Add functions and filters until fixed point
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                                                       Project fields to [?delta ?count]
Cascading pipes

• Each: can occur in Map or Reduce
• GroupBy: Causes a Reduce step
• Every: One or more follow GroupBy
• CoGroup: Join implementation, causes
  Reduce step
To Cascading
To Cascading
              Each


 [?person2 ?age2 ?double-age2]
To Cascading

 [?person2 ?age2 ?double-age2]
                             CoGroup
   [?person1 ?person2 ?age2 ?double-age2]
To Cascading

                              [?person2 ?age2 ?double-age2]

                               [?person1 ?person2 ?age2 ?double-age2]
  CoGroup
[?person1 ?age1 ?person2 ?age2 ?double-age2]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]
                      Each


                       Each


[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta
                                                      GroupBy
[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]

                                                                                       Every
                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                    Execute aggregators on each group
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]
                                                                             Each

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
                                                                                 Each
                                                       Project fields to [?delta ?count]
To MapReduce

                                 [?person2 ?age2 ?double-age2]
                                                                            Job 1
                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                                                       Project fields to [?delta ?count]
To MapReduce

                                 [?person2 ?age2 ?double-age2]

   Job 2                           [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                                                       Project fields to [?delta ?count]
To MapReduce

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
                                                        Job 3
                                                       Project fields to [?delta ?count]
defmapop
[A1, B1, C1]                            [A1, B1, C1, D1, E1]



[A2, B2, C2]                            [A2, B2, C2, D2, E2]



[A3, B3, C3]                            [A3, B3, C3, D3, E3]



               Appends fields to tuple
deffilterop
[A1, B1, C1]     true
                            [A1, B1, C1]
[A2, B2, C2]     false      [A3, B3, C3]


[A3, B3, C3]     true
defmapcatop
                      [    [“a red dog”, “a”]
                                                               [“a red dog”, “a”]
[“a red dog”]             [“a red dog”, “red”]
                          [“a red dog”, “dog”]   ]            [“a red dog”, “red”]

   [“ ”]                          []                          [“a red dog”, “dog”]

                                                               [“hello”, “hello”]
  [“hello”]           [    [“hello”, “hello”]    ]
                Map                                  Concat
Aggregators
[“key1”, 1]         [“key1”, 1]
                                       [“key1”, 3]
[“key3”, 3]         [“key1”, 2]

Map Task 1         Reduce Task 1


[“key2”, 3]         [“key2”, 3]
                                       [“key2”, 3]
[“key1”, 2]         [“key3”, 3]
                                      [“key3”, 4]
[“key3”, 1]         [“key3”, 1]
Map Task 2         Reduce Task 2


Regular aggregators - all data goes to reducers
defparallelagg
 [“nathan”]           [“nathan”, 1]
                                                [“nathan”, 2]
  [“alice”]            [“alice”, 1]                                 [“nathan”, 3]
                                                  [“alice”, 1]
 [“nathan”]           [“nathan”, 1]
  Map Task 1            Map Task 1                Map Task 1        Reduce Task 1
                                      Combine            Combine
               Init
                                       (Map)             (Reduce)
                                                                    [“sally”, 1]
 [“nathan”]           [“nathan”, 1]             [“nathan”, 1]
                                                                    [“alice”, 1]
  [“sally”]            [“sally”, 1]              [“sally”, 1]
 Map Task 2             Map Task 2                 Map Task 2       Reduce Task 2


Parallel aggregators - partial aggregation done in mappers
combine
[1]             [3]

[2]             [4]

[3]             [5]


        [1]

        [2]

        [3]
        [3]
        [4]

        [5]
union
[1]           [3]

[2]           [4]

[3]           [5]


       [1]

       [2]

       [3]

       [4]

       [5]
ElephantDB
                                   Shard 0
                                   Shard 1
                                   Shard 2       Distributed
Key/Value pairs
                                   Shard 3       Filesystem
                    Pre-shard      Shard 4
                   and index in
                                   Shard 5
                   MapReduce


                  Generation of domain of data
ElephantDB
DFS                       ElephantDB
                             Server
Shard 0
Shard 1
Shard 2                   ElephantDB
                             Server
Shard 3
Shard 4
Shard 5                   ElephantDB
                             Server


     Serving domain of data

More Related Content

Viewers also liked

Lab safety 12_10_13
Lab safety 12_10_13Lab safety 12_10_13
Lab safety 12_10_13
skwahl
 
ebay for Beginners
ebay for Beginnersebay for Beginners
ebay for Beginners
Intranet Future
 
Hands-On LinkedIn for Beginners
Hands-On LinkedIn for BeginnersHands-On LinkedIn for Beginners
Hands-On LinkedIn for Beginners
Paula Battalia Brand
 
Aprendiendo sobre las emociones de los pacientes mediante obras artísticas
Aprendiendo sobre las emociones de los pacientes mediante obras artísticasAprendiendo sobre las emociones de los pacientes mediante obras artísticas
Aprendiendo sobre las emociones de los pacientes mediante obras artísticas
Rafa Cofiño
 
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementQuantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Emery Berger
 
Missao Piaui Diario da Serra 2016
Missao Piaui Diario da Serra 2016Missao Piaui Diario da Serra 2016
Missao Piaui Diario da Serra 2016
Alexandre Naime Barbosa
 
Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...Vicki Shaw
 

Viewers also liked (8)

Lab safety 12_10_13
Lab safety 12_10_13Lab safety 12_10_13
Lab safety 12_10_13
 
ebay for Beginners
ebay for Beginnersebay for Beginners
ebay for Beginners
 
Hands-On LinkedIn for Beginners
Hands-On LinkedIn for BeginnersHands-On LinkedIn for Beginners
Hands-On LinkedIn for Beginners
 
Aprendiendo sobre las emociones de los pacientes mediante obras artísticas
Aprendiendo sobre las emociones de los pacientes mediante obras artísticasAprendiendo sobre las emociones de los pacientes mediante obras artísticas
Aprendiendo sobre las emociones de los pacientes mediante obras artísticas
 
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementQuantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
 
Power tecnologia
Power tecnologiaPower tecnologia
Power tecnologia
 
Missao Piaui Diario da Serra 2016
Missao Piaui Diario da Serra 2016Missao Piaui Diario da Serra 2016
Missao Piaui Diario da Serra 2016
 
Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...
 

More from nathanmarz

Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
nathanmarz
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
nathanmarz
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
nathanmarz
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineering
nathanmarz
 
Your Code is Wrong
Your Code is WrongYour Code is Wrong
Your Code is Wrong
nathanmarz
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itnathanmarz
 
Storm
StormStorm
Storm
nathanmarz
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackType
nathanmarz
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
nathanmarz
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackType
nathanmarz
 
Cascalog at Strange Loop
Cascalog at Strange LoopCascalog at Strange Loop
Cascalog at Strange Loop
nathanmarz
 
Cascalog at Hadoop Day
Cascalog at Hadoop DayCascalog at Hadoop Day
Cascalog at Hadoop Day
nathanmarz
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
nathanmarz
 
Cascalog
CascalogCascalog
Cascalog
nathanmarz
 
Cascading
CascadingCascading
Cascading
nathanmarz
 

More from nathanmarz (17)

Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineering
 
Your Code is Wrong
Your Code is WrongYour Code is Wrong
Your Code is Wrong
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop it
 
Storm
StormStorm
Storm
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
ElephantDB
ElephantDBElephantDB
ElephantDB
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackType
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackType
 
Cascalog at Strange Loop
Cascalog at Strange LoopCascalog at Strange Loop
Cascalog at Strange Loop
 
Cascalog at Hadoop Day
Cascalog at Hadoop DayCascalog at Hadoop Day
Cascalog at Hadoop Day
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
 
Cascalog
CascalogCascalog
Cascalog
 
Cascading
CascadingCascading
Cascading
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

Cascalog workshop

  • 5. Pre-aggregation • Start from generator variables • Resolve as many variables as possible using: • Joins • Functions • Use as many filters as possible • Join all sources into one set of tuples
  • 6. Aggregation • Group by resolved output variables • Apply all aggregators to each group
  • 7. Post-aggregation • Resolve the rest of the variables • Apply rest of filters
  • 9. Query planner Start with generators
  • 10. Query planner [?person2 ?age2 ?double-age2] Add functions and filters until fixed point
  • 11. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Do a join
  • 12. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Add functions and filters until fixed point
  • 13. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Do a join
  • 14. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
  • 15. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Group by already satisfied output vars
  • 16. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
  • 17. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
  • 18. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 19. Cascading pipes • Each: can occur in Map or Reduce • GroupBy: Causes a Reduce step • Every: One or more follow GroupBy • CoGroup: Join implementation, causes Reduce step
  • 21. To Cascading Each [?person2 ?age2 ?double-age2]
  • 22. To Cascading [?person2 ?age2 ?double-age2] CoGroup [?person1 ?person2 ?age2 ?double-age2]
  • 23. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] CoGroup [?person1 ?age1 ?person2 ?age2 ?double-age2]
  • 24. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Each Each [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 25. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta GroupBy [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 26. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Every Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
  • 27. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] Each [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 28. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Each Project fields to [?delta ?count]
  • 29. To MapReduce [?person2 ?age2 ?double-age2] Job 1 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 30. To MapReduce [?person2 ?age2 ?double-age2] Job 2 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 31. To MapReduce [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Job 3 Project fields to [?delta ?count]
  • 32. defmapop [A1, B1, C1] [A1, B1, C1, D1, E1] [A2, B2, C2] [A2, B2, C2, D2, E2] [A3, B3, C3] [A3, B3, C3, D3, E3] Appends fields to tuple
  • 33. deffilterop [A1, B1, C1] true [A1, B1, C1] [A2, B2, C2] false [A3, B3, C3] [A3, B3, C3] true
  • 34. defmapcatop [ [“a red dog”, “a”] [“a red dog”, “a”] [“a red dog”] [“a red dog”, “red”] [“a red dog”, “dog”] ] [“a red dog”, “red”] [“ ”] [] [“a red dog”, “dog”] [“hello”, “hello”] [“hello”] [ [“hello”, “hello”] ] Map Concat
  • 35. Aggregators [“key1”, 1] [“key1”, 1] [“key1”, 3] [“key3”, 3] [“key1”, 2] Map Task 1 Reduce Task 1 [“key2”, 3] [“key2”, 3] [“key2”, 3] [“key1”, 2] [“key3”, 3] [“key3”, 4] [“key3”, 1] [“key3”, 1] Map Task 2 Reduce Task 2 Regular aggregators - all data goes to reducers
  • 36. defparallelagg [“nathan”] [“nathan”, 1] [“nathan”, 2] [“alice”] [“alice”, 1] [“nathan”, 3] [“alice”, 1] [“nathan”] [“nathan”, 1] Map Task 1 Map Task 1 Map Task 1 Reduce Task 1 Combine Combine Init (Map) (Reduce) [“sally”, 1] [“nathan”] [“nathan”, 1] [“nathan”, 1] [“alice”, 1] [“sally”] [“sally”, 1] [“sally”, 1] Map Task 2 Map Task 2 Map Task 2 Reduce Task 2 Parallel aggregators - partial aggregation done in mappers
  • 37. combine [1] [3] [2] [4] [3] [5] [1] [2] [3] [3] [4] [5]
  • 38. union [1] [3] [2] [4] [3] [5] [1] [2] [3] [4] [5]
  • 39. ElephantDB Shard 0 Shard 1 Shard 2 Distributed Key/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
  • 40. ElephantDB DFS ElephantDB Server Shard 0 Shard 1 Shard 2 ElephantDB Server Shard 3 Shard 4 Shard 5 ElephantDB Server Serving domain of data

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n