Human Genome Project"
Collaborative project to sequence every single letter!
of the human genetic code.!
13 years and $billions to complete.!
Gigabyte scale datasets (transferred between sites on!
Beyond the Human Genome"
45+ species sequenced: mouse, rat, gorilla, rabbit, !
platypus, nematode, zebra ﬁsh...!
Compare genomes between species to identify!
biologically interesting areas of the genome.!
100Gb scale datasets. Increased computational
The Next Generation"
New sequencing instruments lead to a dramatic!
drop in cost and time required to sequence a genome.!
Sequence and compare genetic code of individuals to!
ﬁnd areas of variation. Much more interesting.!
Terabyte scale datasets. Signiﬁcant computational
The 1000 Genomes Projects"
Public/private consortium to build world’s largest!
collection of human genetic variation.!
Hugely important dataset to drive new insight into!
known genetic traits, and the identiﬁcation of new ones.!
Vast, complex data and computational resources required,
beyond reach of most research groups and hospitals.!
1000 Genomes in the Cloud"
The 1000 Genomes data made available to all on AWS.!
Stored for free as part of the Public Datasets program.!
200Tb. 1700 individual genomes. As much compute and
storage as required available to all.!
75% of users select"
movies based on"
More than 27 million users!
~ 30 million plays per day!
More than 40 billion events per day !
~ 4 million ratings per day!
~ 3 million searches per day!
Time of day and week (it now can verify that users watch more TV shows during
the week and more movies during the weekend)!
Metadata from third parties such as Nielsen!
Social media data from Facebook and Twitter!
What ! right now?
trades are executing!
is the exception rate!
is the ad click-through!
topics are trending"
queries are slow!
are the high scores!
Sr. Manager, Solutions Architecture, !
Principal Solutions Architect,
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
archive to S3
100s of terabytes
of events supports
in Hadoop or a
Inexpensive: $0.028 per million puts
High Level Architecture
hosted on S3
Clients load S3 Hosted Webpages using
Clients PUT votes directly to Kinesis stream
records from stream
Persistence and long-term analysis in Redshift
Real Time Average
of Voting Sentiment
Real time Totals of
Votes Across Sentiment
Realtime Display of
Votes Per Second
Tallying and live visualization of data
Service Pricing Total Cost Per Hour
Kinesis Stream 25 shards @ 1.5 cents per shard per hour $0.38
Kinesis messages 24 million PUTS (all of Australia) @ 2.8 cents per million PUTS $0.68
Kinesis Workers 2 x m3.large $0.40
Redshift Workers 2 x m1.medium $0.24
Redshift Cluster 2 x dw1.xlarge (4 TB total) $2.50
ElastiCache Cluster 2 x cache.m3.xlarge $1.02
Tallyroom App Fully redundant deployment with ELB & 2 x m1.small $0.15
Tea Leaves, Speedo
Cents per GB of storage. 0.44 cents per 10,000 requests for 24 million