Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building workflows with Spotify's
Workshop for e-Infrastructures for Massively Parallel Sequencing
Uppsala, Jan 19-20, 201...
Luigi Short Facts
●
Batch (No streaming)
●
Both Command line and Hadoop execution
●
Does not replace Hive, Pig, Cascading ...
A Basic Luigi Task
A Basic Luigi Task
Calling tasks from the commandline
Defining workflows: Default way
Dependencies: Default way
TaskA()
TaskB()
TaskBA() TaskBB()
TaskBAA() TaskBBB()TaskBAB() TaskBBA()
Parameters: Default way
TaskA()
TaskB()
TaskBA() TaskBB()
TaskBAA() TaskBBB()TaskBAB() TaskBBA()
DependencyParameter
Growing list of parameters
Dependencies and parameters:
A better way
AWorkflowTask()
TaskA()
TaskB()
TaskBA() TaskBB()
TaskBAA() TaskBBB()TaskBAB() T...
Supporting separate network definition
Using this functionality in tasks
Using this new functionality in workflows, and wrapping whole workflows in Tasks
Unit testing Luigi tasks is easy
Lessons Learned (April 2014)
●
Only “pull” or “call/return” semantics, no “push”
or “auto triggering by inputs”
●
Not full...
Lessons Learned (Update Jan 2015)
●
Only “pull” or “call/return” semantics, no “push” or “auto triggering
by inputs”
●
Not...
Thank you!
Building Scientific Workflows with Spotify's Luigi
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
A Beginner's Guide to Building Data Pipelines with Luigi
Next
Upcoming SlideShare
A Beginner's Guide to Building Data Pipelines with Luigi
Next
Download to read offline and view in fullscreen.

Share

Building Scientific Workflows with Spotify's Luigi

Download to read offline

Experiences from Building Scientific (Bioinformatics) workflows with Spotify's Luigi tool at the Workshop for e-Infrastructures for Massively Parallel Sequencing
Uppsala, Jan 19-20, 2015 (uppnex.se/twiki/do/view/Courses/EinfraMPS2015)

Accompanying Video Available at: https://youtu.be/f26PqSXZdWM

Related Books

Free with a 30 day trial from Scribd

See all

Building Scientific Workflows with Spotify's Luigi

  1. 1. Building workflows with Spotify's Workshop for e-Infrastructures for Massively Parallel Sequencing Uppsala, Jan 19-20, 2015 Samuel Lampa UU/Dept of Pharm. Biosci. & BILS samuel.lampa@bils.se samuel.lampa.co
  2. 2. Luigi Short Facts ● Batch (No streaming) ● Both Command line and Hadoop execution ● Does not replace Hive, Pig, Cascading etc - instead used to stitch many such jobs together ● Powers 1000:s of jobs every day at Spotify ● Communication / synchronization between tasks via “targets” (Normal file, HDFS, database …) ● Dependencies hard coded in tasks ● Pull based (ask for task, and it figures out dependencies) ● Main features: Scheduling, Dependency Graph, Basic multi-node support (via central planner daemon) ● github.com/spotify/luigi ● Easy to install: pip install luigi [tornado]
  3. 3. A Basic Luigi Task
  4. 4. A Basic Luigi Task
  5. 5. Calling tasks from the commandline
  6. 6. Defining workflows: Default way
  7. 7. Dependencies: Default way TaskA() TaskB() TaskBA() TaskBB() TaskBAA() TaskBBB()TaskBAB() TaskBBA()
  8. 8. Parameters: Default way TaskA() TaskB() TaskBA() TaskBB() TaskBAA() TaskBBB()TaskBAB() TaskBBA() DependencyParameter
  9. 9. Growing list of parameters
  10. 10. Dependencies and parameters: A better way AWorkflowTask() TaskA() TaskB() TaskBA() TaskBB() TaskBAA() TaskBBB()TaskBAB() TaskBBA()
  11. 11. Supporting separate network definition
  12. 12. Using this functionality in tasks
  13. 13. Using this new functionality in workflows, and wrapping whole workflows in Tasks
  14. 14. Unit testing Luigi tasks is easy
  15. 15. Lessons Learned (April 2014) ● Only “pull” or “call/return” semantics, no “push” or “auto triggering by inputs” ● Not fully separate workflow ● Multiple inputs / outputs somewhat error-prone ● Dynamic typed language ● Check out for max processes limits ● No real distribution of compute or data (only common scheduling)
  16. 16. Lessons Learned (Update Jan 2015) ● Only “pull” or “call/return” semantics, no “push” or “auto triggering by inputs” ● Not fully separate workflow def. – We found a way around this. ● Multiple inputs / outputs somewhat error-prone – We're looking into solving this too. ● Dynamic typed language – We've found that unit testing is easy. ● Check out for max processes limits ● No real distribution of compute or data (only common scheduling) – We have found ourselves running everything through SLURM anyway
  17. 17. Thank you!
  • ElenaShylko

    Dec. 1, 2017
  • slw4qd

    Oct. 3, 2016
  • colspan

    Jul. 13, 2015

Experiences from Building Scientific (Bioinformatics) workflows with Spotify's Luigi tool at the Workshop for e-Infrastructures for Massively Parallel Sequencing Uppsala, Jan 19-20, 2015 (uppnex.se/twiki/do/view/Courses/EinfraMPS2015) Accompanying Video Available at: https://youtu.be/f26PqSXZdWM

Views

Total views

3,939

On Slideshare

0

From embeds

0

Number of embeds

20

Actions

Downloads

42

Shares

0

Comments

0

Likes

3

×