Beyond the Fridge

The world of Connected Data !
Dr. Werner Vogels!
CTO, Amazon.com!
The amount of information generated during the first day of
a baby’s life today is equivalent to 70 times the information
c...
I. Science"
Observations – Theory – Models – Facts"
Human Genome Project"
Collaborative project to sequence every single letter!
of the human genetic code.!
13 years and $bil...
Beyond the Human Genome"
45+ species sequenced: mouse, rat, gorilla, rabbit, !
platypus, nematode, zebra fish...!
Compare g...
The Next Generation"
New sequencing instruments lead to a dramatic!
drop in cost and time required to sequence a genome.!
...
The 1000 Genomes Projects"
Public/private consortium to build world’s largest!
collection of human genetic variation.!
Hug...
1000 Genomes in the Cloud"
The 1000 Genomes data made available to all on AWS.!
Stored for free as part of the Public Data...
II. Consumer"
Dropcam	
  is	
  the	
  biggest	
  inbound	
  video	
  
service	
  on	
  the	
  Web	
  	
  
•  More	
  data	
  uploaded	
 ...
Lenddo’s	
  Journey	
  
•  Process	
  about	
  3.5TB	
  of	
  social	
  data	
  	
  
•  Social	
  Data	
  growing	
  more	...
III. Retail"
UNCERTAINTY"
UNDERSTAND"
YOUR CUSTOMER"
Who	
  is	
  my	
  customer	
  really?	
  	
  
	
  
What	
  do	
  people	
  really	
  like?	
  	
  
What	
  is	
  happenin...
PERSONALIZE"
75% of users select"
movies based on"
recommendations"
More than 27 million users!
~ 30 million plays per day!
More than 40 billion events per day !
~ 4 million ratings per day!...
BIGGER IS BETTER"
Wego	
  
•  Search	
  using	
  Flexible	
  dates	
  AND/OR	
  Loca=ons	
  and	
  Themes	
  
–  FROM	
  Singapore	
  TO	
  ...
Wego	
  –	
  Search	
  
awsofa.info
The	
  only	
  Asian	
  company	
  which	
  made	
  it	
  to	
  the	
  CODE_n	
  finalist	
  list	
  for	
  CeBIT	
  2014	
...
Platform Architecture
Archival	
  (Glacier)	
  
Storage	
  (S3)	
  
Crawl	
  Cluster	
  (EC2)	
  
File	
  Server	
  
(EC2)...
IV. Industrial"
Access Materials Data and Models from Global Partners!
With Governance, Controllership, and Ownership!
CEED	
  Collabora=v...
V. Sports"
VI. Location"
VII. The Pipeline"
COLLECT	
  |	
  STORE	
  |	
  ORGANIZE	
  |	
  ANALYZE	
  |	
  SHARE	
  
COLLECT	
  |	
  STORE	
  |	
  ORGANIZE	
  |	
  ANALYZE	
  |	
  SHARE	
  
COLLECT	
  |	
  STORE	
  |	
  ORGANIZE	
  |	
  ANALYZE	
  |	
  SHARE	
  
COLLECT	
  |	
  STORE	
  |	
  ORGANIZE	
  |	
  ANALYZE	
  |	
  SHARE	
  
COLLECT	
  |	
  STORE	
  |	
  ORGANIZE	
  |	
  ANALYZE	
  |	
  SHARE	
  
COLLECT	
  |	
  STORE	
  |	
  ORGANIZE	
  |	
  ANALYZE	
  |	
  SHARE	
  
VIII. Real-time"
What was happening 

yesterday?!
What ! right now?!
trades are executing!
is the exception rate!
is the ad click-through!
topics are trending"
inventory re...
Kinesis!
Kinesis architecture
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data cen...
AWS Internal Metering Service
Capture
Submissions
Process in
Realtime
Store in
Redshift
Clients
Submitting
Data
Workload
•...
Workload
•  Daily load of billions records from millions of files
from hundreds of sources
•  3 hour SLA to load and audit...
IX. Beyond the Display"
CONNECTED DATA
REQUIRES

NO LIMITS"
Cloud enables
connected data
collection!
Cloud enables
connected data
processing!
Cloud enables
connected data
collaboration!
werner@amazon.com	
  
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Upcoming SlideShare
Loading in...5
×

Beyond the Fridge, The World of Connected Data - Dr Werner Vogels

2,057

Published on

Beyond the Fridge, The World of Connected Data - Dr Werner Vogels

  1. 1. Beyond the Fridge
 The world of Connected Data ! Dr. Werner Vogels! CTO, Amazon.com!
  2. 2. The amount of information generated during the first day of a baby’s life today is equivalent to 70 times the information contained in the Library of Congress"
  3. 3. I. Science"
  4. 4. Observations – Theory – Models – Facts"
  5. 5. Human Genome Project" Collaborative project to sequence every single letter! of the human genetic code.! 13 years and $billions to complete.! Gigabyte scale datasets (transferred between sites on! iPods!)!
  6. 6. Beyond the Human Genome" 45+ species sequenced: mouse, rat, gorilla, rabbit, ! platypus, nematode, zebra fish...! Compare genomes between species to identify! biologically interesting areas of the genome.! 100Gb scale datasets. Increased computational requirements.!
  7. 7. The Next Generation" New sequencing instruments lead to a dramatic! drop in cost and time required to sequence a genome.! Sequence and compare genetic code of individuals to! find areas of variation. Much more interesting.! Terabyte scale datasets. Significant computational requirements.!
  8. 8. The 1000 Genomes Projects" Public/private consortium to build world’s largest! collection of human genetic variation.! Hugely important dataset to drive new insight into! known genetic traits, and the identification of new ones.! Vast, complex data and computational resources required, beyond reach of most research groups and hospitals.!
  9. 9. 1000 Genomes in the Cloud" The 1000 Genomes data made available to all on AWS.! Stored for free as part of the Public Datasets program.! Updated regularly.! 200Tb. 1700 individual genomes. As much compute and storage as required available to all.!
  10. 10. II. Consumer"
  11. 11. Dropcam  is  the  biggest  inbound  video   service  on  the  Web     •  More  data  uploaded  per   minute  than  YouTube     •  Petabytes  of  data   processed  every  month   •  Billions  of  mo=on  events   detected  
  12. 12. Lenddo’s  Journey   •  Process  about  3.5TB  of  social  data     •  Social  Data  growing  more  users     •  Started  with  MongoDB  cluster  on  CR1  instance   types  on  AWS  ,spending  10K  USD/month     •  Re-­‐architected  to  move  all  their  data  to  S3  and   keep  caches  in  smaller  mongodb  and  dynamodb   cluster.  Use  EMR  to  process  data   •  Now  spending  3K/month    
  13. 13. III. Retail"
  14. 14. UNCERTAINTY"
  15. 15. UNDERSTAND" YOUR CUSTOMER"
  16. 16. Who  is  my  customer  really?       What  do  people  really  like?     What  is  happening  socially  with  my  products?     Where  do  people  consume  my  product?   How  do  people  really  use  your  product?    
  17. 17. PERSONALIZE"
  18. 18. 75% of users select" movies based on" recommendations"
  19. 19. More than 27 million users! ~ 30 million plays per day! More than 40 billion events per day ! ~ 4 million ratings per day! ~ 3 million searches per day! Geo-location data! Device information! Time of day and week (it now can verify that users watch more TV shows during the week and more movies during the weekend)! Metadata from third parties such as Nielsen! Social media data from Facebook and Twitter!
  20. 20. BIGGER IS BETTER"
  21. 21. Wego   •  Search  using  Flexible  dates  AND/OR  Loca=ons  and  Themes   –  FROM  Singapore  TO  Beach  FOR  A  Weekend  Trip  (theme  loca=on  +  flexible  date)   –  FROM  Singapore  TO  Paris  FOR  A  Whole-­‐week  Vaca=on  (specific  des=na=on  +  flexible   date)   –  FROM  Singapore  TO  Sydney  IN  Next  Two  Months  (specific  des=na=on  +  flexible  date)   –  FROM  Singapore  TO  Family-­‐friendly  Des=na=on  ON  30-­‐Apr  to  05-­‐May  (theme  loca=on   +  fixed  dates)   •  Need  for  robust  caching  mechanism  with  millions  of  flight  searches  with   10Million  +  different  flight  routes     •  Use  the  AWS  cloud  to  rapidly  spin  up  machines  to  scale  to  the  requirements   •  AWS  allows  them  to  do  this  in  a  scalable  and  cost  effec=ve  manner    
  22. 22. Wego  –  Search  
  23. 23. awsofa.info
  24. 24. The  only  Asian  company  which  made  it  to  the  CODE_n  finalist  list  for  CeBIT  2014  
  25. 25. Platform Architecture Archival  (Glacier)   Storage  (S3)   Crawl  Cluster  (EC2)   File  Server   (EC2)   Processing  Cluster  (EC2)   Choice  Engine  Cluster     (EC2)   Data   Partners   End  user   interac=on/Front   End   On  AWS   External  to  AWS   Integra=on  Engine   Data  Acquisi=on  
  26. 26. IV. Industrial"
  27. 27. Access Materials Data and Models from Global Partners! With Governance, Controllership, and Ownership! CEED  Collabora=ve  Federated  Environment  
  28. 28. V. Sports"
  29. 29. VI. Location"
  30. 30. VII. The Pipeline"
  31. 31. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  32. 32. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  33. 33. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  34. 34. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  35. 35. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  36. 36. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  37. 37. VIII. Real-time"
  38. 38. What was happening 
 yesterday?!
  39. 39. What ! right now?! trades are executing! is the exception rate! is the ad click-through! topics are trending" inventory remains! queries are slow! are the high scores! ! !
  40. 40. Kinesis!
  41. 41. Kinesis architecture Amazon Web Services AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Aggregate and archive to S3 Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Real-time dashboards and alarms Machine learning algorithms or sliding window analytics Aggregate analysis in Hadoop or a data warehouse Inexpensive: $0.028 per million puts
  42. 42. AWS Internal Metering Service Capture Submissions Process in Realtime Store in Redshift Clients Submitting Data Workload •  Tens of millions records/sec •  Multiple TB per hour •  100,000s of sources New features •  Scale with the business •  Provide real-time alerting •  Inexpensive •  Improved auditing
  43. 43. Workload •  Daily load of billions records from millions of files from hundreds of sources •  3 hour SLA to load and audit data •  Hundreds of customers •  Hundreds of queries per hour New features •  Our data is fresh, we ingest every 6 hours •  Now processing triple the volume in less than 25% of the time •  “Hammerstone” ETL solution –  Built on AWS Data Pipeline –  Build business specific marts –  Build workload specific clusters •  Supports a variety of analytics tools: Tableau, R, Toad, SQL Developer, etc. Internal AWS Data Warehouse Over 200 internal data sources Data staged in Amazon S3 "Hammerstone:" Custom ETL using AWS Data Pipeline Data processing Redshift cluster Batch reporting Redshift cluster Ad hoc query Redshift cluster
  44. 44. IX. Beyond the Display"
  45. 45. CONNECTED DATA REQUIRES
 NO LIMITS"
  46. 46. Cloud enables connected data collection!
  47. 47. Cloud enables connected data processing!
  48. 48. Cloud enables connected data collaboration!
  49. 49. werner@amazon.com  

×