QBI’s Centre for Brain Genomics<br />The informatics side of things<br />[Sprengben [why not get a friend]]<br />September...
Objective of QBI’s Centre for Brain genomics<br />On-time delivery<br />Reliable data production<br />Convincing data<br /...
Birdseye view of facility’s workflow<br />September 8, 2011<br />
Detailed workflow<br />September 8, 2011<br />Cbot<br />HiSeq<br />30 diff. <br />programs<br />CASAVA<br />Raw sequence<b...
Overview of Production Informatics framework<br />September 8, 2011<br />Automatic<br />Manual<br />Processing            ...
Trigger.sh<br />September 8, 2011<br />Keeping data separate from scripts<br />Automating verification, quality control an...
Flexible generic names: header<br />#Programs<br />BWA="/clusterdata/hiseq_apps/bin/$MODE/bwa"<br />SAMTOOLS="/clusterdata...
Config.txt<br />September 8, 2011<br />#********************<br /># Tasks<br />#********************<br />mappingBWA="1" <...
call<br />trigger.shconfig.txtarmed<br />trigger.shconfig.txthtml<br />September 8, 2011<br />s_1_read1.fastq<br />s_1_rea...
Summary.html<br />Project Cards<br />September 8, 2011<br />Sequence statistics<br />Run check <br />points<br />Data Visu...
Scaffold of pbsScripts.sh: Error catching<br />September 8, 2011<br />Code example for setting up what errors to look out ...
Scaffold of pbsScripts.sh: checkpoints<br />September 8, 2011<br />qsub -by -jy [PBSOPTIONS] pbsScript.sh -k HISEQINF [PAR...
Availability: tailored to skills<br />1<br />2<br />3<br />Website <br />RStudio<br />Command line<br />
The big picture<br />Covering all aspects of: design*, set-up*, maintenance*, usage <br />(*except cluster)<br />Documenta...
Three things to remember<br />Reliable data production<br />Projects have all a similar structure and are processed in the...
Next week<br />NGS Discussion group: <br />Methylation analysis<br />	Kevin Dudley and Danay Baker-Andresen<br />September...
Upcoming SlideShare
Loading in …5
×

Qbi Centre for Brain genomics (Informatics side)

2,250
-1

Published on

An overview of QBI’s production informatics framework with an emphasis on what service will be provided and how the resulting data is made available: from interactive quality control to integration with external data on the genome browser.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,250
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • http://www.haynesboone.com/files/ImageControl/64f36756-3f0f-4254-b7bb-d9b447ae14d5/c8cd574b-4e35-4071-8a35-007febd928ee/Presentation/Image/mainImage_perspective.jpg
  • Qbi Centre for Brain genomics (Informatics side)

    1. 1. QBI’s Centre for Brain Genomics<br />The informatics side of things<br />[Sprengben [why not get a friend]]<br />September 8, 2011<br />
    2. 2. Objective of QBI’s Centre for Brain genomics<br />On-time delivery<br />Reliable data production<br />Convincing data<br />Easy delivery<br />Perkel JM. Coding your way out of a problem. Nat Methods. 2011 Jun PMID: 21716280.<br />
    3. 3. Birdseye view of facility’s workflow<br />September 8, 2011<br />
    4. 4. Detailed workflow<br />September 8, 2011<br />Cbot<br />HiSeq<br />30 diff. <br />programs<br />CASAVA<br />Raw sequence<br />reads<br />projects<br />flowcell<br />HiSeq<br />cluster<br />cluster<br />
    5. 5. Overview of Production Informatics framework<br />September 8, 2011<br />Automatic<br />Manual<br />Processing Evaluation<br />Run/<br />Data/<br />MakeFastq.sh<br />trigger.sh armed<br />trigger.sh html<br />Unaligned/<br />bwa/, reCaAl/, variant/<br />Summary.html<br />//clusterstorage<br />Apache, IGV, R, UCSC<br />//cluster-vm<br />
    6. 6. Trigger.sh<br />September 8, 2011<br />Keeping data separate from scripts<br />Automating verification, quality control and summary HTML generation<br />Rerunning pipeline from every point<br />
    7. 7. Flexible generic names: header<br />#Programs<br />BWA="/clusterdata/hiseq_apps/bin/$MODE/bwa"<br />SAMTOOLS="/clusterdata/hiseq_apps/bin/$MODE/samtools"<br />IGVTOOLS="/clusterdata/hiseq_apps/bin/$MODE/igvtools/IGVTools/igvtools.jar”<br /># Task names<br />TASKFASTQC="fastQC"<br />TASKBWA="bwa"<br />TASKRCA="reCalAln”<br />#Fileabb<br />READONE="read1"<br />READTWO="read2"<br />FASTQ="fastq.gz"<br />ALN="aln" # aligned <br />September 8, 2011<br />
    8. 8. Config.txt<br />September 8, 2011<br />#********************<br /># Tasks<br />#********************<br />mappingBWA="1" <br />recalibrateQualScore="1" <br />#********************<br /># Paths<br />#********************<br />FASTA="/clusterdata/resources/hg19/hg19.fasta" <br />SEQREG=chr1:229994688-230071581"<br />DBSNP="/clusterdata/resources/hg19/snpdb132.vcf" <br />#********************<br /># PARAMETER<br />#********************<br />LIBRARY="QBI”<br />ADDPARAMBWA=“--force single” <br />Specifics what to do,<br />e.g. mapping and recalibration <br />Specifics where to find resources <br />Customizes stanardsripts for this project<br />
    9. 9. call<br />trigger.shconfig.txtarmed<br />trigger.shconfig.txthtml<br />September 8, 2011<br />s_1_read1.fastq<br />s_1_read2.fastq<br />s_2_read1.fastq<br />s_2_read2.fastq<br />s_3_read1.fastq<br />s_3_read2.fastq<br />s_4_read1.fastq<br />s_4_read2.fastq<br />s_1.bam<br />s_2.bam<br />s_1.ashrr.bam<br />s_2.ashrr.bam<br />s_3.bam<br />s_4.bam<br />s_3.ashrr.bam<br />s_4.ashrr.bam<br />Sub1_s_1.out<br />Sub1_s_2.out<br />Sub2_s_3.out<br />Sub2_s_4.out<br />Sub1_s_1.out<br />Sub1_s_2.out<br />Sub2_s_3.out<br />Sub2_s_4.out<br />
    10. 10. Summary.html<br />Project Cards<br />September 8, 2011<br />Sequence statistics<br />Run check <br />points<br />Data Visualization<br />Mapping stats<br />Download<br />Interesting Regions<br />
    11. 11. Scaffold of pbsScripts.sh: Error catching<br />September 8, 2011<br />Code example for setting up what errors to look out for<br /># QCVARIABLES, loosing reads, unmapped read,no such file,file not found,bwa.sh: line<br />Output in Summary.html<br />>>>>>>>>>> Errors<br />QC_PASS .. 0 have We are loosing reads/184<br />QC_PASS .. 0 have for unmapped read/184<br />QC_PASS .. 0 have no such file/184<br />QC_PASS .. 0 have file not found/184<br />QC_PASS .. 0 have bwa.sh: line/184<br />
    12. 12. Scaffold of pbsScripts.sh: checkpoints<br />September 8, 2011<br />qsub -by -jy [PBSOPTIONS] pbsScript.sh -k HISEQINF [PARAMETERS]<br />Code example for setting up checkpoints in the pbsScript.sh<br />echo “********* mapping”<br />$BWA aln -t $THREADS $FASTA $f > $OUT/${n/$FASTQ/sai}<br />$BWA aln -t $THREADS $FASTA ${f/$READONE/$READTWO} > $OUT/${n/$READONE.$FASTQ/$READTWO.sai}<br />Output in Summary.html<br />>>>>>>>>>> CheckPoints<br />QC_PASS .. 184 have mapping/184<br />QC_PASS .. 184 have sorting and bam-conversion/184<br />QC_PASS .. 184 have mark duplicates/184<br />QC_PASS .. 184 have statistics/184<br />QC_PASS .. 184 have coverage track/184<br />
    13. 13. Availability: tailored to skills<br />1<br />2<br />3<br />Website <br />RStudio<br />Command line<br />
    14. 14. The big picture<br />Covering all aspects of: design*, set-up*, maintenance*, usage <br />(*except cluster)<br />Documentation: Project Server<br />//project<br />5 TB raw data<br />750 GB processed data<br />57 GB external data<br />7 project-cards<br />10 Projects, 6 HiSeq-Runs <br />40 wiki pages, 250 Tasks, 551h logged<br />160 Commits<br />35 external programs<br />41 custom scripts (4197 lines of code)<br />Application<br />Backup/Version Control<br />Data Warehousing<br />Statistic <br />Analysis<br />HiSeq Output<br />RSudio<br />Raw Data<br />Quality Control<br />Project Cards<br />Processed Data<br />Processed Data<br />Rsync<br />Hypothesis Generation<br />Software<br />BWA, GATK, samtools, etc.<br />Custom Scripts<br />Custom Scripts<br />Version Control<br />Data<br />Processing and Analysis<br />External Genomic Resources<br />Cluster<br />Genomes, Annotation, etc.<br />Project Server<br />Content<br />Galaxy<br />Visualization<br />IGV<br />Genome Browser<br />//cluster-vm<br />//clusterstorage<br />//groupshare, //ethan<br />
    15. 15. Three things to remember<br />Reliable data production<br />Projects have all a similar structure and are processed in the same way<br />Convincing data<br />All steps are tightly quality controlled and the QC report is accessible<br />Easy delivery<br />We tailored data availability to skill-levels (webpage, Rstudio, console<br />On time delivery<br />Production informatics has priority on the cluster<br />September 8, 2011<br />(<br />)<br />
    16. 16. Next week<br />NGS Discussion group: <br />Methylation analysis<br /> Kevin Dudley and Danay Baker-Andresen<br />September 8, 2011<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×