Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pluggable Pipelines


Published on

A slideshow showing the development of a pluggable pipeline system

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Pluggable Pipelines

  1. 1. Pluggable Pipelines Andy Brown New Pipeline Development Wellcome Trust Sanger Institute
  2. 2. Pluggable Pipelines <ul><li>Problem – The pipeline you are working on starts to become too large, and projects which would normally follow the pipeline, don't need all the components of it.
  3. 3. Possible solution – Make the components of the pipeline pluggable, so that they can be inserted, dropped or the order changed dependent on the project requirements. </li></ul>
  4. 4. The Flag Waver. <ul><li>The flag waver has within it a number of functions/methods that it can run in any order
  5. 5. It just needs to know any variables (e.g. a run folder), the functions it is to run and the order it is to run them in
  6. 6. Each function knows nothing apart from what it is to do and what it should return (typically something which alerts the next function/script to know when to run) </li></ul>Flag Waver Function b Function c Function d Function e Function a
  7. 11. Responsibilities <ul><li>When the user runs the flag waver, they input the order the functions will run in
  8. 12. Functions are only responsible for launching, not checking
  9. 13. It is up to the component which is launched to complete fully and properly – not the function!
  10. 14. It is also up to the user to only select an order where if one component is dependent on the output of another that it is queued after that </li></ul>
  11. 15. Operation <ul><li>Flag waver is told to use a particular order
  12. 16. It also has the parameters required (i.e. a run folder)
  13. 17. Flag waver calls each function in turn, passing in anything which delays the script the function will call (i.e. LSF job ids for dependency)
  14. 18. Function calls the object which will launch the component script or do something, giving it any required parameters. (If something is needed to be looked up, the Object should do this, not the Flag Waver or the function). </li></ul>
  15. 19. Operation continued <ul><li>Object does its thing, most likely submit the script to LSF. The object holds all the additional static parameters that the component process or script needs.
  16. 20. Object returns any information which the next function will require, typically again, LSF job ids.
  17. 21. The function passes out of it the returned information if it needs to.
  18. 22. Flag waver moves onto next function, or ends </li></ul>
  19. 23. Example: Archival File Creation <ul><li>The flag waver has a number of functions which can do something for helping File Creation
  20. 24. We choose </li><ul><li>archive_dir – we need a specific archive directory
  21. 25. create_srf – srf format files have been a standard
  22. 26. create_fastq – another format for data
  23. 27. create_md5 – md5sum the files for data integrity
  24. 28. qc_fastq – run some qc checks on the fastq files </li></ul></ul>
  25. 30. What happened? <ul><li>We have given the order we want the functions to be called in, each being passing it any previous levels LSF job ids as an LSF dependency string
  26. 31. The function loads and calls an object module, passing in the requirements.
  27. 32. The first is archive_dir. In this case, as nothing is before it, it just creates the directory, and returns no LSF job ids. </li></ul>
  28. 33. What happened? <ul><li>The second is create_srf. This is a long job, which needs the compute power of the farm. So it submits 8 jobs, and returns an array of job ids.
  29. 34. These are converted into an LSF job dependency string, and passed to the next function, create_fastq
  30. 35. This then continues until the list has been through, using LSF to now queue the different scripts to run at an appropriate time. </li></ul>
  31. 36. Summary <ul><li>This is one way of producing a pluggable pipeline, there are probably a lot more
  32. 37. Flag waver – code which just blindly launches through an array of functions in user defined order, passing only minimum data to each
  33. 38. Functions know nothing about anything except what they are responsible for calling
  34. 39. Responsibility falls to the user to ensure correct order of use, and components to ensure that they complete properly (or error sensibly). </li></ul>