Slides from my talk given at the AWS Loft event in Stockholm, November 2018.
When genomic data is staged for analysis on Amazon S3, researchers have fast access to large volumes of data without needing to download and store their own copies. In this session, you will learn how a researcher at Sweden's SciLifeLab has made reference genome data available in the cloud as an AWS Public Dataset, and how this makes it easier for researchers to do large scale genomic analysis using tools like EMR and AWS Batch.
29. The first Human
genome took
13 years
to sequence
In 2016 at NGI Stockholm,
we sequenced a human genome equivalent
every 4.08 minutes
Cost per Human Genome
~200TB data delivered to research groups
4k CPU cores, 64TB RAM, ~2PB disk
31. Human genome:
3.2ish billion DNA base pairs
..ACTGACTCTTAGCTATGGCTCTCTAGCTAGCTACGCTACTCGACTACGACTCGCTATCGCTAGCTATATATATTTCGATCGGCGCTATC..
TCTTAGCTATGGCTC CTCGCTATCGCTAGCTA
AGCTACGCTACTCGACTA
TTTCGATCGGCGC
DNA sequencing data:
50 DNA base pairs in a "read"
Between 10 and 350 million of them, per sample
55. ngisweden.scilifelab.se
opensource.scilifelab.se
NGI Stockholm
Maxime Garcia
Johannes Alneberg
Max Käller
Remi-Andre Olsen
Chuan Wang
Denis Moreno
Rickard Hammarén
Senthil Panneerselvam
QBiC Tübingen
Sven Fillinger
Alexander Peltzer
GIS Singapore
Andreas Wilm
Chih Chuan
CRG Barcelona
Paolo Di Tommaso
Phil Ewels
phil.ewels@scilifelab.se
ewels
tallphil
nf-co.re
ngisweden
AWS Cloud Credits for Research
sarek.scilifelab.se
Icons: flaticon.com