Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Next generation sequencing in cloud computing era

1,843 views

Published on

Some discussion slides from a recent discussion session in Cambridge

Published in: Technology, Business
  • Be the first to comment

Next generation sequencing in cloud computing era

  1. 1. Files, Tools, and Bioinformatics in the Cloud Thomas Keane Vertebrate Resequencing Informatics WTSI thomas.keane@sanger.ac.ukVertebrate Resequencing Informatics 17th November, 2009
  2. 2. DATA is the problem! NGS means large volumes of raw data   Previously SRF (~8-10bytes per bp), now BAM (~1.6bytes per bp) How much data can a sequencing machine produce?   20Gbp per lane, 16 lanes per run (1 run = 1.5 weeks) => 11Tbp/year   Small sequencing center: 4 machines?   44Tbp per year! Raw data in BAM: 70Tbytes SV Calling: SVMerge Processed calls much smaller   1000G pilot VCF < 1Gbyte Alignment + BAM improvementVertebrate Resequencing Informatics 17th November, 2009
  3. 3. Simplistic Model: Cloud as compute resource Processes 1. Align SRF/Fastq/BAM (2Mbps/sec) Variant calling (n x SNP callers, n indel callers, SV callers)Sequencing Center/Institute BAM + VCF (2Mbps/sec) BAM 3,240 days VCF to upload! Vertebrate Resequencing Informatics 17th November, 2009
  4. 4. Move the raw data generation to the compute Variant calling (n x SNP callers, n indel callers, SV callers)Sequencing Center/Institute VCF BAM VCF Vertebrate Resequencing Informatics 17th November, 2009
  5. 5. Large Collaborative Projects: Cloud centric model VCF Analysis groupsVertebrate Resequencing Informatics 17th November, 2009

×