Your SlideShare is downloading. ×

Big data - short intro on NGS challenges

1,656

Published on

10 minutes introduction to big data issues in Next Generation Sequencing facilities.

10 minutes introduction to big data issues in Next Generation Sequencing facilities.

Published in: Business, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,656
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
48
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big data big challenges
  • 2. Human Genome Project working draft - 13 years
  • 3. www.1000genomes.org 1000 Genomes project single genome in weeks
  • 4. Pacific Biosystems Oxford Nanopore Complete Genomics lots and lots of DNA Roche ABI Illumina
  • 5. Sanger Institute BGI You 150TB/week ~20TB/week ? Estimated Data Outputs
  • 6. Sanger Institute BGI You 80.000 CPUs 5.000 CPUs ? Estimated Computing Power … ATTAGAGATAGGATCTCCCGTGTTGCCCAAGCTTGTCTCCAACTCCTGGGCTCGACGATCTTCCCTCCTCGGCCTCCCAAAATGCTGGGATTACAGGTACAAGCCATCACACCCCAGTGGGAGA GACCCCACTTGCTGCCACGTGACCATGGGCTGATGGTG…
  • 7. Too much data, not enough resources And we don’t scale very well
  • 8. Data storage vs cost of resequencing
    • Cost as of today: $100 000
    • Data output: ~250GB
    • Monthly cost of a backup: ~$40
    • After three years: ~$1500
    • Cost as of today: $100 000
    • Data output * : ~20GB
    • Monthly cost of a backup: ~$3
    • After three years: ~$100
    • Cost of resequencing in three years: ~$500
    Small difference, but what if you scale that up?

×