• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Pluggable Pipelines
 

Pluggable Pipelines

on

  • 714 views

A slideshow showing the development of a pluggable pipeline system

A slideshow showing the development of a pluggable pipeline system

Statistics

Views

Total Views
714
Views on SlideShare
700
Embed Views
14

Actions

Likes
0
Downloads
2
Comments
0

2 Embeds 14

http://vampiresoftware.blogspot.com 13
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Pluggable Pipelines Pluggable Pipelines Presentation Transcript

    • Pluggable Pipelines Andy Brown New Pipeline Development Wellcome Trust Sanger Institute
    • Pluggable Pipelines
      • Problem – The pipeline you are working on starts to become too large, and projects which would normally follow the pipeline, don't need all the components of it.
      • Possible solution – Make the components of the pipeline pluggable, so that they can be inserted, dropped or the order changed dependent on the project requirements.
    • The Flag Waver.
      • The flag waver has within it a number of functions/methods that it can run in any order
      • It just needs to know any variables (e.g. a run folder), the functions it is to run and the order it is to run them in
      • Each function knows nothing apart from what it is to do and what it should return (typically something which alerts the next function/script to know when to run)
      Flag Waver Function b Function c Function d Function e Function a
    •  
    •  
    •  
    •  
    • Responsibilities
      • When the user runs the flag waver, they input the order the functions will run in
      • Functions are only responsible for launching, not checking
      • It is up to the component which is launched to complete fully and properly – not the function!
      • It is also up to the user to only select an order where if one component is dependent on the output of another that it is queued after that
    • Operation
      • Flag waver is told to use a particular order
      • It also has the parameters required (i.e. a run folder)
      • Flag waver calls each function in turn, passing in anything which delays the script the function will call (i.e. LSF job ids for dependency)
      • Function calls the object which will launch the component script or do something, giving it any required parameters. (If something is needed to be looked up, the Object should do this, not the Flag Waver or the function).
    • Operation continued
      • Object does its thing, most likely submit the script to LSF. The object holds all the additional static parameters that the component process or script needs.
      • Object returns any information which the next function will require, typically again, LSF job ids.
      • The function passes out of it the returned information if it needs to.
      • Flag waver moves onto next function, or ends
    • Example: Archival File Creation
      • The flag waver has a number of functions which can do something for helping File Creation
      • We choose
        • archive_dir – we need a specific archive directory
        • create_srf – srf format files have been a standard
        • create_fastq – another format for data
        • create_md5 – md5sum the files for data integrity
        • qc_fastq – run some qc checks on the fastq files
    •  
    • What happened?
      • We have given the order we want the functions to be called in, each being passing it any previous levels LSF job ids as an LSF dependency string
      • The function loads and calls an object module, passing in the requirements.
      • The first is archive_dir. In this case, as nothing is before it, it just creates the directory, and returns no LSF job ids.
    • What happened?
      • The second is create_srf. This is a long job, which needs the compute power of the farm. So it submits 8 jobs, and returns an array of job ids.
      • These are converted into an LSF job dependency string, and passed to the next function, create_fastq
      • This then continues until the list has been through, using LSF to now queue the different scripts to run at an appropriate time.
    • Summary
      • This is one way of producing a pluggable pipeline, there are probably a lot more
      • Flag waver – code which just blindly launches through an array of functions in user defined order, passing only minimum data to each
      • Functions know nothing about anything except what they are responsible for calling
      • Responsibility falls to the user to ensure correct order of use, and components to ensure that they complete properly (or error sensibly).