pipeline_structure_overview

387 views
325 views

Published on

Published in: Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
387
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

pipeline_structure_overview

  1. 1. NPG Pipeline Overview analyse_RTA PB_cal post_qseq post_qc_review manual qc
  2. 2. analyse_RTA OLB(Bustard)tocreateqseq demultiplex CASAVA(Gerald)Recalibration
  3. 3. PB_cal OLB(bcl2qseq)tocreateqseq demultiplex PB_callanerecalibrationNo recalibration
  4. 4. post_qseq Produce per lane fastq (qseq2fastq.pl) Produce per lane srf (illumina2srf & srf_index_hash) Split out nonconsented data Split fastqs by multiplex tag qX_yield insert_size adapter sequence_error gc_fraction
  5. 5. post_qseq Create run analysis schema information contaminationbam file generation md5 generation bam_markduplicates
  6. 6. post_qseq gc_biasbam indexing Check cluster counts Manual QC Stage
  7. 7. post_qc_review archive_to_sra archive_to_irods Upload fastqcheck Upload auto_qc Upload illumina analysis Tidy up staging area
  8. 8. Additional Notes ● Spider runs at the start and finish at the end of all the pipelines. ● Spider caches web pages which are used throughout the pipeline and sets an environment variable, so that all launched jobs can access them. ● Finish is very important, as it ties off the log files, and writes a json string of the processes launched, which is needed for the schema generation.
  9. 9. Additional Notes ● Status changes have been left out, along with some file checking and creation of tag specific lane files which occur at the start of the primary analysis pipelines. These happen, but are not responsible directly for the files and qc that you see. ● The production version of the primary pipeline launches a version of the secondary pipeline which creates a Latest_Summary link to it's archival files and QC.
  10. 10. Additional Notes ● Status changes have been left out, along with some file checking and creation of tag specific lane files which occur at the start of the primary analysis pipelines. These happen, but are not responsible directly for the files and qc that you see. ● The production version of the primary pipeline launches a version of the secondary pipeline which creates a Latest_Summary link to it's archival files and QC.

×