Successfully reported this slideshow.
Your SlideShare is downloading. ×

BPIPE: a bioinformatics pipeline framework

Ad

BPIPE: BIOINFORMATICS
PIPELINE FRAMEWORKSpeaker: Mohamed Nadhir Djekidel (那弟尔)
2015/11/06

Ad

WHY WE NEED PIPELINES
➤ Bioinformatics analysis is generally a set steps.
➤ In some analysis we need a combination of tool...

Ad

MOTIVATIONS BEHIND PIPE
➤ dedicated programming language for defining and executing
bioinformatics pipelines
➤ No much pro...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 21 Ad
1 of 21 Ad

More Related Content

BPIPE: a bioinformatics pipeline framework

  1. 1. BPIPE: BIOINFORMATICS PIPELINE FRAMEWORKSpeaker: Mohamed Nadhir Djekidel (那弟尔) 2015/11/06
  2. 2. WHY WE NEED PIPELINES ➤ Bioinformatics analysis is generally a set steps. ➤ In some analysis we need a combination of tools (bowtie, samtools,…etc) ➤ Some tasks are repetitive (especially if we have many files). ➤ Need to edit the script if the program crush in the middle ➤ Some time we have hard coded scripts that are not portable ➤ …..
  3. 3. MOTIVATIONS BEHIND PIPE ➤ dedicated programming language for defining and executing bioinformatics pipelines ➤ No much programmable skills are needed ➤ Simple definition of tasks ➤ easy restart of the job from the point of failure ➤ Easy Parallelism and job sequence management ➤ Integration with Cluster Resource Managers ( GSE, PBS, LSF) ➤ Modular development of re-usable pipeline stages. ➤ Automatic logging
  4. 4. BPIP’S ARCHITECTURE ➤ BPIPE Language: Based on Groovy, but shell scripting in generally ok. ➤ The Bpipe Job Management Tool: BASH Shell + Java ➤ Log management : creates .bpipe folder ➤ Communication with Resource Managers: sending jobs to the queue,…etc
  5. 5. BASIC BPIPE STRUCTURES stage_one stage_two
  6. 6. CONVERT A SHELL SCRIPT TO BPIPE Original BASH script BPIPE Script
  7. 7. DYNAMIC INPUT AND OUTPUT Used the variables $input and $output instead
  8. 8. PARALLEL TASKS Use brackets {}, to specify parallel tasks step1 step2 step3 step1 step2 step4 step3 step5
  9. 9. PARALLEL TASKS -CONT step1 step2 step4 step3 step5 step6 (Step6 will wait until both branches are finished)
  10. 10. RUN ON A CLUSTER ➤ create a pipe.config file in you working directory ➤ select the SGE system and specify configuration (optional)
  11. 11. PIPELINE REPORT A file index.html will be generated in the doc folder
  12. 12. INPUT SPLIT ➤ Inputs can be grouped using regular expressions ➤ * used as a general selector and it affects the ordering ➤ % used for splitting Example
  13. 13. INPUT SPLIT - EXAMPLES Input The script Default parameters
  14. 14. INPUT SPLIT - EXAMPLES Pass individual files Order alphabetically Group files
  15. 15. CONTROLLING OUTPUT NAMING Filter : Keeps the same extension and adds the filter file.csv file.nocomments.csv Transform : changes the extension file.csv file.xml file_fast.zip
  16. 16. CONTROLLING OUTPUT NAMING Produce : produces an output file with the specified name
  17. 17. RUNNING R CODE
  18. 18. SOME COMMANDS
  19. 19. ADDING INFORMATION TO THE SCRIPT
  20. 20. USEFUL TUTORIALS ➤ Download bpipe: https://github.com/ssadedin/bpipe ➤ Documentation: http://docs.bpipe.org/ ➤ A complete workshop: https://github.com/tucano/bpipe_workshop ➤ Paper : http://bioinformatics.oxfordjournals.org/content/28/11/1525.full
  21. 21. THANKS

×