Your SlideShare is downloading. ×
0
Next generation sequencing in cloud computing era
Next generation sequencing in cloud computing era
Next generation sequencing in cloud computing era
Next generation sequencing in cloud computing era
Next generation sequencing in cloud computing era
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Next generation sequencing in cloud computing era

1,523

Published on

Some discussion slides from a recent discussion session in Cambridge

Some discussion slides from a recent discussion session in Cambridge

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,523
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
39
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Files, Tools, and Bioinformatics in the Cloud Thomas Keane Vertebrate Resequencing Informatics WTSI thomas.keane@sanger.ac.ukVertebrate Resequencing Informatics 17th November, 2009
  • 2. DATA is the problem! NGS means large volumes of raw data   Previously SRF (~8-10bytes per bp), now BAM (~1.6bytes per bp) How much data can a sequencing machine produce?   20Gbp per lane, 16 lanes per run (1 run = 1.5 weeks) => 11Tbp/year   Small sequencing center: 4 machines?   44Tbp per year! Raw data in BAM: 70Tbytes SV Calling: SVMerge Processed calls much smaller   1000G pilot VCF < 1Gbyte Alignment + BAM improvementVertebrate Resequencing Informatics 17th November, 2009
  • 3. Simplistic Model: Cloud as compute resource Processes 1. Align SRF/Fastq/BAM (2Mbps/sec) Variant calling (n x SNP callers, n indel callers, SV callers)Sequencing Center/Institute BAM + VCF (2Mbps/sec) BAM 3,240 days VCF to upload! Vertebrate Resequencing Informatics 17th November, 2009
  • 4. Move the raw data generation to the compute Variant calling (n x SNP callers, n indel callers, SV callers)Sequencing Center/Institute VCF BAM VCF Vertebrate Resequencing Informatics 17th November, 2009
  • 5. Large Collaborative Projects: Cloud centric model VCF Analysis groupsVertebrate Resequencing Informatics 17th November, 2009

×