Big data big challenges
Human Genome Project working draft - 13 years
www.1000genomes.org 1000 Genomes project single genome in weeks
Pacific Biosystems Oxford Nanopore Complete Genomics lots and lots of DNA Roche ABI Illumina
Sanger Institute BGI You 150TB/week ~20TB/week ? Estimated Data Outputs
Sanger Institute BGI You 80.000 CPUs 5.000 CPUs ? Estimated Computing Power … ATTAGAGATAGGATCTCCCGTGTTGCCCAAGCTTGTCTCCAACT...
Too much data, not enough resources And we don’t scale very well
Data storage vs cost of resequencing <ul><li>Cost as of today: $100 000 </li></ul><ul><li>Data output: ~250GB </li></ul><u...
Upcoming SlideShare
Loading in …5
×

Big data - short intro on NGS challenges

2,146 views

Published on

10 minutes introduction to big data issues in Next Generation Sequencing facilities.

Published in: Business, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,146
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
49
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Big data - short intro on NGS challenges

  1. 1. Big data big challenges
  2. 2. Human Genome Project working draft - 13 years
  3. 3. www.1000genomes.org 1000 Genomes project single genome in weeks
  4. 4. Pacific Biosystems Oxford Nanopore Complete Genomics lots and lots of DNA Roche ABI Illumina
  5. 5. Sanger Institute BGI You 150TB/week ~20TB/week ? Estimated Data Outputs
  6. 6. Sanger Institute BGI You 80.000 CPUs 5.000 CPUs ? Estimated Computing Power … ATTAGAGATAGGATCTCCCGTGTTGCCCAAGCTTGTCTCCAACTCCTGGGCTCGACGATCTTCCCTCCTCGGCCTCCCAAAATGCTGGGATTACAGGTACAAGCCATCACACCCCAGTGGGAGA GACCCCACTTGCTGCCACGTGACCATGGGCTGATGGTG…
  7. 7. Too much data, not enough resources And we don’t scale very well
  8. 8. Data storage vs cost of resequencing <ul><li>Cost as of today: $100 000 </li></ul><ul><li>Data output: ~250GB </li></ul><ul><li>Monthly cost of a backup: ~$40 </li></ul><ul><li>After three years: ~$1500 </li></ul><ul><li>Cost as of today: $100 000 </li></ul><ul><li>Data output * : ~20GB </li></ul><ul><li>Monthly cost of a backup: ~$3 </li></ul><ul><li>After three years: ~$100 </li></ul><ul><li>Cost of resequencing in three years: ~$500 </li></ul>Small difference, but what if you scale that up?

×