Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Luigi future

6,473 views

Published on

Luig is a workflow manager in Python that I've open sourced. These are slides about Luigi's future from a meetup at July 31

Published in: Technology
  • Be the first to comment

Luigi future

  1. 1. July 29, 2014 Luigi The past, the present, the future
  2. 2. Section name Source: The history 2
  3. 3. The long story builder (2009-2010) XML madness Only used for one single project (my Master’s thesis) 3
  4. 4. The long story builder2 (2010-2011) Everything in Python, but insane amounts of boiler plate 4
  5. 5. Why luigi? We wanted to do everything in Python, not XML 5
  6. 6. Source: How do we use it at Spotify? 6
  7. 7. Blah 7
  8. 8. The things we got right 8
  9. 9. Section name Everything is a directed acyclic graph Makefile style Tasks specify what they are dependent on not what other things depend on them 9
  10. 10. Section name Do everything in Python Dependencies often involve algebra hard to express in XML 10
  11. 11. Section name Centralized scheduler Overview of everything that’s currently running/scheduled 11 Luigi worker 1 Luigi worker 2 A B C A C F Luigi central planner
  12. 12. Section name Trigger jobs locally is trivial If the only way is to run things remotely, debugging is super hard Running things locally makes it a lot easier No messing around with paths and configuration ! (this has a flip side – more on this later) 12
  13. 13. Section name It’s a library more than a framework Avoid the “Hollywood principle” and make it easy to customize etc 13
  14. 14. The hairy parts… 14
  15. 15. Section name Execution is tied to scheduling You can’t run this task “in the cloud” and go away 15
  16. 16. Section name Visualization is pretty rudimentary See how nice Driven looks for instance: ! 16
  17. 17. Section name Scheduling isn’t tied to triggering Need to rely on crontab etc Could borrow some of the nice parts of Chronos: 17
  18. 18. Section name Source: What are some ideas for the future? 18
  19. 19. Section name Separate scheduling and execution Schedule something to run later/somewhere else ! Recent baby step towards this is a very simple fix for running modules dynamically: ! $ luigi --module MyModule MyTask --foo xyz --bar 123! ! The next step would be to do something like ! $ luigi --module MyModule MyTask --foo xyz --bar 123 --execute- remotely ! ! A full implementation would include a bunch of command line options to probe status, kill tasks, etc 19
  20. 20. Section name Separate scheduling and execution (2) 20 Luigi central scheduler Worker Worker Worker Worker ...
  21. 21. Section name On-the-fly dependencies class MyTask(luigi.Task):! def run(self):! input = yield OtherTask() # this could replace requires() 21
  22. 22. Section name Built in crontab-replacement @luigi.schedule! class MyTask(luigi.Task):! param = luigi.DateParameter(default=datetime.date.today())! def run(self):! …! ! The @luigi.schedule decorator would then 1. Register that my_module.MyTask should be scheduled (by telling the central planner?) 2. Trigger it continuously from somewhere (central planner?) 22
  23. 23. Section name ETA for tasks Using a persistent task history database, you could train a simple k-NN classifier to predict how long a task will run Then use this with the dependency graph to predict when any task will finish 23
  24. 24. More features in the central planner Kill a task Re-launch a task Launch a new task 24
  25. 25. Section name Support for other languages Luigi is written in Python – but the RPC is language agnostic. 25
  26. 26. Happy plumbing! 26
  27. 27. Questions? 27

×