SlideShare a Scribd company logo
Push: a Dataflow Shell
  1. Observation
(Made by Streamline etc...)                       This
                                                                                   Noah Evans, Eric Van Hensbergen
                                               command...
           This...
                                               f1 |< f3 >| f5

                                                                                   2. If everything’s a pipe in Dataflow
                                                                                   programming, why not use a shell?

                                      ... transforms to this syntax tree ...

                                                          >|                               ... which becomes this dataflow
                                                                                                pipelined command set.
  ... is just a large                         cmd         |<          cmd

                                                                                                                            1
   combination of                                                                                              f3                pipe                                          f5
                                                                                                                                                                          0
         these:                         $           cmd   cmd   cmd         f5                         0                                 14
                                                                                                                                                           1     pipe
                                                                                             pipe
                                                                                                                                             10 f4
      f1                                irf          $    f1    f3
                                                                                                                                      pipe
                                                                                                  13                             1
               1                                                                                                                                       6
                                                                                                                    0       f3
                                                    orf                                             9 pipe
              pipe                                                                  0        f2
                                                                                                                                               pipe
                                                                       1    pipe                                                         1
                          0                                                                                5
                                                                f1
                                                                                                                        0
                                                                                                           pipe                  f3
                     f2


 3. How?                        4. Dataflow pipes                                                     5. Record handling in pipes
  •Shell should be orchestrator  cmd1 |< cmd2 >| cmd3                                                 •User Defined: Implicit or Explicit
  •Need a way to do Pipe →                                                                            •ORF (output record filter)
                                 ! |< Fanout: one to many
   Fork → Exec over a large                                                                            •default hashes 1 to many
   Number of machines            ! >| Fanin: many to one                                                •newline separated
  •Need a way of moving to       ! Must be paired                                                     •IRF (Input Record Filter)
   records from byte streams                                                                           •default merges buffers on newlines

     6. Research Challenges + Future Work                                        7. Conclusions
      •Exascale Pipe → Fork → Exec                                                •Systems level not language level
      •Graph optimization at XCPU3                                                •Easy to change record handling
      •Cloud Integration                                                          •Configurable degree of parallelism
      •Work-stealing                                                              •Cross Platform (Win32, Linux, OSX)
                                                                                  •Not Batch, Interactive

                          Job Distribution
                                                                                              See Also:
       laptop → GPUtask → Celltask → BG/Ptask                                      http://www.research.ibm.com/hare
                                                                                     http://code.google.com/p/push/
                                                                                   References
                                                                                   •Willem de Bruijn. Adaptive Operating System Design for High Throughput I/O.
                                                                                    PhD thesis, Vrije Universiteit Amsterdam, 2010.
                              XCPU3                                                •M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel
                                                                                    programs from sequential building blocks. In Proceedings of the 2007 conference on
                                                                                    EuroSys, pages 59–72. ACM Press New York, NY, USA, 2007.




                                                                                        This work has been supported by the Department of Energy Of Office of Science
    See    XCPU3 poster for more                                                        Operating and Runtime Systems For Extreme Scale Scientific Computation project under
    details on job distribution                                                         contract #DE-FG02-08ER25851

More Related Content

More from Eric Van Hensbergen

Scaling Arm from One to One Trillion
Scaling Arm from One to One TrillionScaling Arm from One to One Trillion
Scaling Arm from One to One Trillion
Eric Van Hensbergen
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Eric Van Hensbergen
 
ISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel PresentationISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel Presentation
Eric Van Hensbergen
 
Simulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersSimulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to Supercomputers
Eric Van Hensbergen
 
Brasil Ross 2011
Brasil Ross 2011Brasil Ross 2011
Brasil Ross 2011
Eric Van Hensbergen
 
Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)
Eric Van Hensbergen
 
Multipipes
MultipipesMultipipes
Multi-pipes
Multi-pipesMulti-pipes
Multi-pipes
Eric Van Hensbergen
 
VirtFS
VirtFSVirtFS
HARE 2010 Review
HARE 2010 ReviewHARE 2010 Review
HARE 2010 Review
Eric Van Hensbergen
 
9P Code Walkthrough
9P Code Walkthrough9P Code Walkthrough
9P Code Walkthrough
Eric Van Hensbergen
 
9P Overview
9P Overview9P Overview
9P Overview
Eric Van Hensbergen
 
Push Podc09
Push Podc09Push Podc09
Push Podc09
Eric Van Hensbergen
 
Libra: a Library OS for a JVM
Libra: a Library OS for a JVMLibra: a Library OS for a JVM
Libra: a Library OS for a JVM
Eric Van Hensbergen
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
Eric Van Hensbergen
 
PROSE
PROSEPROSE
Libra Library OS
Libra Library OSLibra Library OS
Libra Library OS
Eric Van Hensbergen
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
Eric Van Hensbergen
 
Holistic Aggregate Resource Environment
Holistic Aggregate Resource EnvironmentHolistic Aggregate Resource Environment
Holistic Aggregate Resource Environment
Eric Van Hensbergen
 
Paravirtualized File Systems
Paravirtualized File SystemsParavirtualized File Systems
Paravirtualized File Systems
Eric Van Hensbergen
 

More from Eric Van Hensbergen (20)

Scaling Arm from One to One Trillion
Scaling Arm from One to One TrillionScaling Arm from One to One Trillion
Scaling Arm from One to One Trillion
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
ISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel PresentationISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel Presentation
 
Simulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersSimulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to Supercomputers
 
Brasil Ross 2011
Brasil Ross 2011Brasil Ross 2011
Brasil Ross 2011
 
Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)Scalable Elastic Systems Architecture (SESA)
Scalable Elastic Systems Architecture (SESA)
 
Multipipes
MultipipesMultipipes
Multipipes
 
Multi-pipes
Multi-pipesMulti-pipes
Multi-pipes
 
VirtFS
VirtFSVirtFS
VirtFS
 
HARE 2010 Review
HARE 2010 ReviewHARE 2010 Review
HARE 2010 Review
 
9P Code Walkthrough
9P Code Walkthrough9P Code Walkthrough
9P Code Walkthrough
 
9P Overview
9P Overview9P Overview
9P Overview
 
Push Podc09
Push Podc09Push Podc09
Push Podc09
 
Libra: a Library OS for a JVM
Libra: a Library OS for a JVMLibra: a Library OS for a JVM
Libra: a Library OS for a JVM
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
 
PROSE
PROSEPROSE
PROSE
 
Libra Library OS
Libra Library OSLibra Library OS
Libra Library OS
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
Holistic Aggregate Resource Environment
Holistic Aggregate Resource EnvironmentHolistic Aggregate Resource Environment
Holistic Aggregate Resource Environment
 
Paravirtualized File Systems
Paravirtualized File SystemsParavirtualized File Systems
Paravirtualized File Systems
 

Recently uploaded

Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 

Recently uploaded (20)

Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 

PUSH-- a Dataflow Shell

  • 1. Push: a Dataflow Shell 1. Observation (Made by Streamline etc...) This Noah Evans, Eric Van Hensbergen command... This... f1 |< f3 >| f5 2. If everything’s a pipe in Dataflow programming, why not use a shell? ... transforms to this syntax tree ... >| ... which becomes this dataflow pipelined command set. ... is just a large cmd |< cmd 1 combination of f3 pipe f5 0 these: $ cmd cmd cmd f5 0 14 1 pipe pipe 10 f4 f1 irf $ f1 f3 pipe 13 1 1 6 0 f3 orf 9 pipe pipe 0 f2 pipe 1 pipe 1 0 5 f1 0 pipe f3 f2 3. How? 4. Dataflow pipes 5. Record handling in pipes •Shell should be orchestrator cmd1 |< cmd2 >| cmd3 •User Defined: Implicit or Explicit •Need a way to do Pipe → •ORF (output record filter) ! |< Fanout: one to many Fork → Exec over a large •default hashes 1 to many Number of machines ! >| Fanin: many to one •newline separated •Need a way of moving to ! Must be paired •IRF (Input Record Filter) records from byte streams •default merges buffers on newlines 6. Research Challenges + Future Work 7. Conclusions •Exascale Pipe → Fork → Exec •Systems level not language level •Graph optimization at XCPU3 •Easy to change record handling •Cloud Integration •Configurable degree of parallelism •Work-stealing •Cross Platform (Win32, Linux, OSX) •Not Batch, Interactive Job Distribution See Also: laptop → GPUtask → Celltask → BG/Ptask http://www.research.ibm.com/hare http://code.google.com/p/push/ References •Willem de Bruijn. Adaptive Operating System Design for High Throughput I/O. PhD thesis, Vrije Universiteit Amsterdam, 2010. XCPU3 •M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2007 conference on EuroSys, pages 59–72. ACM Press New York, NY, USA, 2007. This work has been supported by the Department of Energy Of Office of Science See XCPU3 poster for more Operating and Runtime Systems For Extreme Scale Scientific Computation project under details on job distribution contract #DE-FG02-08ER25851