SlideShare a Scribd company logo
1 of 35
Composing and Executing
Parallel Data-flow Graphs with
          Shell Pipes
      Edward Walker (TACC)
         Weijia Xu (TACC)
   Vinoth Chandar (Oracle Corp)
Agenda
• Motivation

• Shell language extensions

• Implementation

• Experimental evaluation

• Conclusions
Motivation
• Distributed memory clusters are becoming pervasive in
  industry and academia

• Shells are the default login environment on these
  systems

• Shell pipes are commonly used for composing extensible
  unix commands.

• There has been no change to the syntax/semantics of
  shell pipes since their invention over 30 years ago.

• Growing need to compose massively parallel jobs
  quickly, using existing software
Extending Shells for Parallel
            Computing
•   Build a simple, powerful coordination layer at the Shell


•   The coordination layer transparently manages the
    parallelism in the workflow


•   User specifies parallel computation as a dataflow graph
    using extensions to the Shell


•   Provides the ability to combine different tools and build
    interesting parallel programs quickly.
Shell pipe extensions
• Pipeline fork
     A | B on n procs
• Pipeline join
     A on n procs | B
• Pipeline cycles
     (++ n A)
• Pipeline key-value aggregation
     A | B on keys
Parallel shell tasks extensions
> function foo()
{
   echo “hello world”
}

> foo on all procs          # foo() on all CPUs

> foo on all nodes          # foo() on all nodes
                 stride
> foo on 10:2 procs # 10 tasks, 2 tasks on each node
                     span
> foo on 10:2:2 procs       # 10 tasks, 2 tasks on alternative node
Composing data-flow graphs
• Example 1:

 function B1() {}
                                             B1
 function B2() {}
                                         A        C
 function B()
 {                                           B2
   if (($_ASPECT_TASKID == 0 )) ; then
            B1
   else
       B2
   endif
 }

 A | B on 2 procs | C
Composing data-flow graphs
• Example 2:
  function map()
  {
                                                                              reduce
    emit_tuple –k key –v value                              map
  }
                                                                  Key-value
  function reduce()                                                 DHT
  {
    consume_tuple –k key –v value                           map               reduce

      num=${#value[@]}
      for ((i=0; i < $num; i++)) ; do
                   # process key=$key, value=${value[$i]}
      done
  }

  map on all procs | reduce on keys
BASH Implementation
Startup Overlay
• Script may have many instances requiring
  startup of parallel tasks
• Motivation for overlay:
  – Fast startup of parallel shell workers
  – Handles node failures gracefully
• Two level hierarchy: sectors and proxies
• Overlay node addressing: 7             0

            Compute node ID

                              Sector id   Proxy id
Fault-Tolerance

• Proxy nodes monitor peers within sector, and
  sector heads monitor peer sectors
• Node 0 maintains a list of available nodes in the
  overlay in a master_node file
                           Overlay sector 0                                         Overlay sector 1
                           Proxy Node 3           Proxy   Node 0                    Proxy   Node 6      Proxy   Node 7
                           exec                   exec                              exec                exec


                  Node 2              Node 1                               Node 4              Node 5
          Proxy                    Proxy                           Proxy                    Proxy
          exec                     exec                            exec                     exec



                                               master_node
Starting shell workers with
      startup overlay
1. Bash spawns agent.
2. Agent queries master_node and spawns
node I/O multiplexor
                                Overlay sector 0                                         Overlay sector 1
                                Proxy Node 3           Proxy   Node 0                    Proxy   Node 6      Proxy   Node 7
                                exec                   exec                              exec                exec


                       Node 2              Node 1                               Node 4              Node 5
               Proxy                    Proxy                           Proxy                    Proxy
               exec                     exec                            exec                     exec


                                                    master_node




                                 (2)


        (1)
 BASH          agent


              (2)


              Node I/O
               MUX
3. Agent Invokes overlay to spawn
CPU I/O multiplexor on node
                                Overlay sector 0                                       Overlay sector 1
                                Proxy Node 3        Proxy   Node 0                     Proxy   Node 6      Proxy   Node 7
                                exec                exec                               exec                exec


                       Node 2              Node 1                             Node 4              Node 5
               Proxy                    Proxy                         Proxy                    Proxy
               exec                     exec                          exec                     exec



              (3)



                                                                           (3)

        (1)
 BASH          agent



              (2)


              Node I/O                                               CPU I/O
               MUX                                                    MUX
4. CPU I/O multiplexor spawns a
shell worker per CPU on node
                                Overlay sector 0                                          Overlay sector 1
                                Proxy Node 3        Proxy   Node 0                        Proxy   Node 6      Proxy   Node 7
                                exec                exec                                  exec                exec


                       Node 2              Node 1                                Node 4              Node 5
               Proxy                    Proxy                           Proxy                     Proxy
               exec                     exec                            exec                      exec



              (3)




                                                                             (3)
        (1)
 BASH          agent



              (2)


              Node I/O                                                 CPU I/O
               MUX                                                      MUX

                                                                           (4)
                                                                     CPU CPUCPU
5. CPU I/O multiplexor calls back to
node I/O multiplexor
                                 Overlay sector 0                                                Overlay sector 1
                                 Proxy Node 3              Proxy   Node 0                        Proxy   Node 6      Proxy   Node 7
                                 exec                      exec                                  exec                exec


                        Node 2              Node 1                                      Node 4              Node 5
                Proxy                    Proxy                                 Proxy                     Proxy
                exec                     exec                                  exec                      exec



              (3)




        (1)
 BASH           agent
                                                                                   (3)


              (2)
                                                     (5)
              Node I/O                                                        CPU I/O
               MUX                                                             MUX

                                                                                  (4)

                                                                            CPU CPUCPU
Implementation of pipeline
          fork
1. Process B pipes stdin into stdin_file
                                                A | B on N procs

                                 stdin                 BASH
    stdout                pipe           (1)


                                           aspect-agent B

                                  stdin
   A                             reader
             stdin_file
2. Constructs command files for each
task
                                                  A | B on N procs

                                 stdin                   BASH
    stdout                pipe           (1)


                                        aspect-agent B
                                                      Cmd
                                  stdin            dispatcher
    A                            reader           (2)
             stdin_file




                                          Cmd
                                          files
                                                    B
                                          cat stdin_file | B
3. 4. and 5. Execute command files in shell
workers and marshal results back to shell

                                                         A | B on N procs

                                        stdin                   BASH
           stdout                pipe           (1)




                                                                                                    control


                                                                                                              stdout
                                               aspect-agent B
                                                             Cmd
                                         stdin            dispatcher
          A                             reader                                                         I/O
                                                         (2)               flusher
                                                                               flusher                MUX
                    stdin_file                                                      flusher
                                                                       (3)




                                                                  qu
                                                                     eue
                                                 Cmd                                          (5)
                                                 files                         Node
                                                           B                     Node
                                                                               MUX Node
                                                                                 MUX
                                                 cat stdin_file | B                 MUX

                                                               Compute node                (4)

                                                                   Shell              Shell
                                                                  worker             worker



                                                                      B                B
6. Replay command files on failure

                                                             A | B on N procs

                                            stdin                   BASH
        stdout                 pipe                 (1)




                                                                                                        control


                                                                                                                  stdout
                                                   aspect-agent B
                                                                 Cmd
                                             stdin            dispatcher
        A                                   reader                                                         I/O
                                                             (2)               flusher
                                                                                   flusher                MUX
                  stdin_file                                                            flusher
                                                replayer                   (3)
                                      (6)




                                                                      qu
                                                                         e
        Local compute node




                                                                          ue
                                                     Cmd                                          (5)
             Shell               Shell               files                         Node
            worker              worker                         B                     Node
                                                                                   MUX Node
                                                                                     MUX
                                                     cat stdin_file | B                 MUX

                                                                   Compute node                (4)
              B                   B
                                                                       Shell              Shell
                                                                      worker             worker



                                                                          B                B
Implementation of key-value
       aggregation
1. Agent inspects and hashes key

                              A | B on keys
                      pipe
                                   BASH
            control            control
                                    (1)
                      aspect-agent B
                                   Key
             A
                                dispatcher
2. Routes key-value to compute node based
on key hash, and stored in hash table

                                     A | B on keys
                             pipe
                                          BASH
                  control             control
                                           (1)
                             aspect-agent B
                                          Key
                    A
                                       dispatcher
                                                    (2)


                                                             Node
                                                             MUX


           Compute node                             Compute node
            Distributed Hash Table

                            Hash                                   Hash
             gdbm           table                   gdbm           table
3. Each node constructs command files to
pipe the key-value entry from its hash table
into process B
                                      A | B on keys
                              pipe
                                           BASH
                   control             control
                                            (1)
                              aspect-agent B
                                           Key
                     A
                                        dispatcher
                                                     (2)


                                                                  Node
                                                                  MUX


            Compute node                             Compute node
             Distributed Hash Table

                             Hash                                    Hash
              gdbm           table                   gdbm            table



              emit_tuple                             emit_tuple
                                                                         (3)
                                       B                                       B
4. Results from the command files
execution are marshaled back to the shell

                                      A | B on keys
                              pipe
                                           BASH
                   control             control
                                            (1)




                                                             stdout


                                                                         control
                              aspect-agent B
                                           Key              I/O MUX
                     A
                                        dispatcher
                                                     (2)


                                                                      Node
                                                                      MUX          (4)


            Compute node                             Compute node
            Distributed Hash Table

                             Hash                                        Hash
              gdbm           table                   gdbm                table



              emit_tuple                             emit_tuple
                                                                             (3)
                                       B                                            B
Experimental Evaluation
Startup overlay performance (when
compared to SSH default mechanism)
Syntactic benchmark I: performance of
             pipeline join
Syntactic benchmark II: performance of
        key-value aggregation
TeraSort benchmark:
           Parallel bucket sort
• Step 1: spawn the data generator in parallel on
  each compute node, partitioning data across N
  nodes for task T if the first 2 bytes fall in the
  range:
               16 T        16     T + 1
              2 ∗ N ,     2     ∗
                                    N 
                                       

• Step 2: perform sort on local data on each node

• Step 3: merge results onto global file system
TeraSort benchmark:
    Sorting rate
Related Work
• Ptolemy – embedded system design

• Yahoo Pipes – web content filtering

• Hadoop – Java implementation of
  MapReduce

• Dryad - distributed DAG data flow
  computation
Conclusion
• A debugger would be extremely helpful.
  Working on bashdb implementation.

• Run-time simulator would be helpful to
  predict performance based on
  characteristics of cluster.

• Still thinking about how to incorporate our
  extensions for named pipes (i.e. mkfifo).
Questions ?

More Related Content

What's hot

Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++
Shuo Chen
 
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CanSecWest
 
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
Seunghwa Song
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
Sim Janghoon
 
ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?
Tetsuyuki Kobayashi
 

What's hot (20)

For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
 
The pocl Kernel Compiler
The pocl Kernel CompilerThe pocl Kernel Compiler
The pocl Kernel Compiler
 
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
 
Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)
 
Inter-process communication of Android
Inter-process communication of AndroidInter-process communication of Android
Inter-process communication of Android
 
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru OtsukaTake a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
 
Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis
 
Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For Datastores
 
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
[COSCUP 2021] LLVM Project: The Good, The Bad, and The Ugly
 
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
CSW2017 Henry li how to find the vulnerability to bypass the control flow gua...
 
Detecting hardware virtualization rootkits
Detecting hardware virtualization rootkitsDetecting hardware virtualization rootkits
Detecting hardware virtualization rootkits
 
Xen Debugging
Xen DebuggingXen Debugging
Xen Debugging
 
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
 
syzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugssyzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugs
 
Ricon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeRicon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak Pipe
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
 
Kernel Recipes 2019 - CVEs are dead, long live the CVE!
Kernel Recipes 2019 - CVEs are dead, long live the CVE!Kernel Recipes 2019 - CVEs are dead, long live the CVE!
Kernel Recipes 2019 - CVEs are dead, long live the CVE!
 
ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?ADB(Android Debug Bridge): How it works?
ADB(Android Debug Bridge): How it works?
 

Viewers also liked (13)

Voldemort : Prototype to Production
Voldemort : Prototype to ProductionVoldemort : Prototype to Production
Voldemort : Prototype to Production
 
Bluetube
BluetubeBluetube
Bluetube
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Vol1
Vol1Vol1
Vol1
 
Voldemort
VoldemortVoldemort
Voldemort
 
Lecture 47
Lecture 47Lecture 47
Lecture 47
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
Introducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4jIntroducción a Voldemort - Innova4j
Introducción a Voldemort - Innova4j
 

Similar to Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

Ryu: network operating system
Ryu: network operating systemRyu: network operating system
Ryu: network operating system
Isaku Yamahata
 
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
Sho Shimizu
 
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
Antonio Garrote Hernández
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js Tutorial
Tom Croucher
 

Similar to Composing and Executing Parallel Data Flow Graphs wth Shell Pipes (20)

Ryu: network operating system
Ryu: network operating systemRyu: network operating system
Ryu: network operating system
 
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
Openflow勉強会 「OpenFlowコントローラを取り巻く状況とその実装」
 
Genode Compositions
Genode CompositionsGenode Compositions
Genode Compositions
 
Openstack Quantum + Devstack Tutorial
Openstack Quantum + Devstack TutorialOpenstack Quantum + Devstack Tutorial
Openstack Quantum + Devstack Tutorial
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
Node.js Explained
Node.js ExplainedNode.js Explained
Node.js Explained
 
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
第2回クラウドネットワーク研究会 「OpenFlowコントローラとその実装」
 
Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1
 
Network Automation Tools
Network Automation ToolsNetwork Automation Tools
Network Automation Tools
 
OVS-NFV Tutorial
OVS-NFV TutorialOVS-NFV Tutorial
OVS-NFV Tutorial
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
4th European Lisp Symposium: Jobim: an Actors Library for the Clojure Program...
 
Open Source Integration with WSO2 Enterprise Service Bus
Open Source Integration  with  WSO2 Enterprise Service BusOpen Source Integration  with  WSO2 Enterprise Service Bus
Open Source Integration with WSO2 Enterprise Service Bus
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
Ryu ods2012-spring
Ryu ods2012-springRyu ods2012-spring
Ryu ods2012-spring
 
Node.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for ProductionNode.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for Production
 
SDNDS.TW Mininet
SDNDS.TW MininetSDNDS.TW Mininet
SDNDS.TW Mininet
 
Nodejs Session01
Nodejs Session01Nodejs Session01
Nodejs Session01
 
Introduction to MapReduce using Disco
Introduction to MapReduce using DiscoIntroduction to MapReduce using Disco
Introduction to MapReduce using Disco
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js Tutorial
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

  • 1. Composing and Executing Parallel Data-flow Graphs with Shell Pipes Edward Walker (TACC) Weijia Xu (TACC) Vinoth Chandar (Oracle Corp)
  • 2. Agenda • Motivation • Shell language extensions • Implementation • Experimental evaluation • Conclusions
  • 3. Motivation • Distributed memory clusters are becoming pervasive in industry and academia • Shells are the default login environment on these systems • Shell pipes are commonly used for composing extensible unix commands. • There has been no change to the syntax/semantics of shell pipes since their invention over 30 years ago. • Growing need to compose massively parallel jobs quickly, using existing software
  • 4. Extending Shells for Parallel Computing • Build a simple, powerful coordination layer at the Shell • The coordination layer transparently manages the parallelism in the workflow • User specifies parallel computation as a dataflow graph using extensions to the Shell • Provides the ability to combine different tools and build interesting parallel programs quickly.
  • 5. Shell pipe extensions • Pipeline fork A | B on n procs • Pipeline join A on n procs | B • Pipeline cycles (++ n A) • Pipeline key-value aggregation A | B on keys
  • 6. Parallel shell tasks extensions > function foo() { echo “hello world” } > foo on all procs # foo() on all CPUs > foo on all nodes # foo() on all nodes stride > foo on 10:2 procs # 10 tasks, 2 tasks on each node span > foo on 10:2:2 procs # 10 tasks, 2 tasks on alternative node
  • 7. Composing data-flow graphs • Example 1: function B1() {} B1 function B2() {} A C function B() { B2 if (($_ASPECT_TASKID == 0 )) ; then B1 else B2 endif } A | B on 2 procs | C
  • 8. Composing data-flow graphs • Example 2: function map() { reduce emit_tuple –k key –v value map } Key-value function reduce() DHT { consume_tuple –k key –v value map reduce num=${#value[@]} for ((i=0; i < $num; i++)) ; do # process key=$key, value=${value[$i]} done } map on all procs | reduce on keys
  • 10. Startup Overlay • Script may have many instances requiring startup of parallel tasks • Motivation for overlay: – Fast startup of parallel shell workers – Handles node failures gracefully • Two level hierarchy: sectors and proxies • Overlay node addressing: 7 0 Compute node ID Sector id Proxy id
  • 11. Fault-Tolerance • Proxy nodes monitor peers within sector, and sector heads monitor peer sectors • Node 0 maintains a list of available nodes in the overlay in a master_node file Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec master_node
  • 12. Starting shell workers with startup overlay
  • 13. 1. Bash spawns agent. 2. Agent queries master_node and spawns node I/O multiplexor Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec master_node (2) (1) BASH agent (2) Node I/O MUX
  • 14. 3. Agent Invokes overlay to spawn CPU I/O multiplexor on node Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (3) (1) BASH agent (2) Node I/O CPU I/O MUX MUX
  • 15. 4. CPU I/O multiplexor spawns a shell worker per CPU on node Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (3) (1) BASH agent (2) Node I/O CPU I/O MUX MUX (4) CPU CPUCPU
  • 16. 5. CPU I/O multiplexor calls back to node I/O multiplexor Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (1) BASH agent (3) (2) (5) Node I/O CPU I/O MUX MUX (4) CPU CPUCPU
  • 18. 1. Process B pipes stdin into stdin_file A | B on N procs stdin BASH stdout pipe (1) aspect-agent B stdin A reader stdin_file
  • 19. 2. Constructs command files for each task A | B on N procs stdin BASH stdout pipe (1) aspect-agent B Cmd stdin dispatcher A reader (2) stdin_file Cmd files B cat stdin_file | B
  • 20. 3. 4. and 5. Execute command files in shell workers and marshal results back to shell A | B on N procs stdin BASH stdout pipe (1) control stdout aspect-agent B Cmd stdin dispatcher A reader I/O (2) flusher flusher MUX stdin_file flusher (3) qu eue Cmd (5) files Node B Node MUX Node MUX cat stdin_file | B MUX Compute node (4) Shell Shell worker worker B B
  • 21. 6. Replay command files on failure A | B on N procs stdin BASH stdout pipe (1) control stdout aspect-agent B Cmd stdin dispatcher A reader I/O (2) flusher flusher MUX stdin_file flusher replayer (3) (6) qu e Local compute node ue Cmd (5) Shell Shell files Node worker worker B Node MUX Node MUX cat stdin_file | B MUX Compute node (4) B B Shell Shell worker worker B B
  • 23. 1. Agent inspects and hashes key A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher
  • 24. 2. Routes key-value to compute node based on key hash, and stored in hash table A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher (2) Node MUX Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table
  • 25. 3. Each node constructs command files to pipe the key-value entry from its hash table into process B A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher (2) Node MUX Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table emit_tuple emit_tuple (3) B B
  • 26. 4. Results from the command files execution are marshaled back to the shell A | B on keys pipe BASH control control (1) stdout control aspect-agent B Key I/O MUX A dispatcher (2) Node MUX (4) Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table emit_tuple emit_tuple (3) B B
  • 28. Startup overlay performance (when compared to SSH default mechanism)
  • 29. Syntactic benchmark I: performance of pipeline join
  • 30. Syntactic benchmark II: performance of key-value aggregation
  • 31. TeraSort benchmark: Parallel bucket sort • Step 1: spawn the data generator in parallel on each compute node, partitioning data across N nodes for task T if the first 2 bytes fall in the range:  16 T 16 T + 1 2 ∗ N , 2 ∗ N    • Step 2: perform sort on local data on each node • Step 3: merge results onto global file system
  • 32. TeraSort benchmark: Sorting rate
  • 33. Related Work • Ptolemy – embedded system design • Yahoo Pipes – web content filtering • Hadoop – Java implementation of MapReduce • Dryad - distributed DAG data flow computation
  • 34. Conclusion • A debugger would be extremely helpful. Working on bashdb implementation. • Run-time simulator would be helpful to predict performance based on characteristics of cluster. • Still thinking about how to incorporate our extensions for named pipes (i.e. mkfifo).