AWS Summit Sydney 2014 | Closing Keynote - Dr Werner Vogels, VP & CTO,


AWS Summit Sydney 2014 | Closing Keynote - Dr Werner Vogels, VP & CTO,

  1. 1. Beyond the Fridge
 The world of Connected Data ! Dr. Werner Vogels! CTO,!
  2. 2. The amount of information generated during the first day of a baby’s life today is equivalent to 70 times the information contained in the Library of Congress"
  3. 3. I. Science"
  4. 4. Observations – Theory – Models - Facts"
  5. 5. Human Genome Project" Collaborative project to sequence every single letter! of the human genetic code.! 13 years and $billions to complete.! Gigabyte scale datasets (transferred between sites on! iPods!)!
  6. 6. Beyond the Human Genome" 45+ species sequenced: mouse, rat, gorilla, rabbit, ! platypus, nematode, zebra fish...! Compare genomes between species to identify! biologically interesting areas of the genome.! 100Gb scale datasets. Increased computational requirements.!
  7. 7. The Next Generation" New sequencing instruments lead to a dramatic! drop in cost and time required to sequence a genome.! Sequence and compare genetic code of individuals to! find areas of variation. Much more interesting.! Terabyte scale datasets. Significant computational requirements.!
  8. 8. The 1000 Genomes Projects" Public/private consortium to build world’s largest! collection of human genetic variation.! Hugely important dataset to drive new insight into! known genetic traits, and the identification of new ones.! Vast, complex data and computational resources required, beyond reach of most research groups and hospitals.!
  9. 9. 1000 Genomes in the Cloud" The 1000 Genomes data made available to all on AWS.! Stored for free as part of the Public Datasets program.! Updated regularly.! 200Tb. 1700 individual genomes. As much compute and storage as required available to all.!
  10. 10. II. Consumer"
  11. 11. Dropcam  is  the  biggest  inbound  video   service  on  the  Web     •  More  data  uploaded  per   minute  than  YouTube     •  Petabytes  of  data   processed  every  month   •  Billions  of  mo=on  events   detected  
  13. 13. III. Retail"
  14. 14. UNCERTAINTY"
  16. 16. Who  is  my  customer  really?       What  do  people  really  like?     What  is  happening  socially  with  my  products?     Where  do  people  consume  my  product?   How  do  people  really  use  your  product?    
  17. 17. PERSONALIZE"
  18. 18. 75% of users select" movies based on" recommendations"
  19. 19. More than 27 million users! ~ 30 million plays per day! More than 40 billion events per day ! ~ 4 million ratings per day! ~ 3 million searches per day! Geo-location data! Device information! Time of day and week (it now can verify that users watch more TV shows during the week and more movies during the weekend)! Metadata from third parties such as Nielsen! Social media data from Facebook and Twitter!
  21. 21. IV. Industrial"
  22. 22. Access Materials Data and Models from Global Partners! With Governance, Controllership, and Ownership! CEED  Collabora=ve  Federated  Environment  
  23. 23. V. Sports"
  24. 24. VI. Location"
  25. 25. VII. Multi-Sensor"
  26. 26. POWERFUL)WEATHER)INTELLIGENCE. Computer)models (objec8ve)numerical
 forecasts) Observa1ons (what’s)happening)now?) The)Forecast Weather)Forecas8ng)101 Skilled)people (Expert)interpreta8on, informa8on)synthesis, experience,)tailored)presenta8on)
  27. 27. POWERFUL)WEATHER)INTELLIGENCE. Tropical cyclone guidance Aviation Volcanic ash Land, marine & mountain warnings
  28. 28. POWERFUL)WEATHER)INTELLIGENCE. Amazon International Weather Sensing Network WebSocket Connectors Lightning: ~250msRadar: ~15s Surface: ~5s MetService NZ Forecasts: 5m
  29. 29. VIII. The Pipeline"
  30. 30. MULTIPLE DOMAINS" Time! Properties! Locations! Sensors!
  31. 31. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  32. 32. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  33. 33. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  34. 34. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  35. 35. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  36. 36. COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
  37. 37. IX. Real-time"
  38. 38. What was happening 
  39. 39. What ! right now? trades are executing! is the exception rate! is the ad click-through! topics are trending" inventory remains! queries are slow! are the high scores! !
  40. 40. Kinesis! Glenn Gore" Sr. Manager, Solutions Architecture, ! AWS! Simon Elisha" Principal Solutions Architect, AWS!
  41. 41. Kinesis  architecture   Amazon Web Services AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Aggregate and archive to S3 Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Real-time dashboards and alarms Machine learning algorithms or sliding window analytics Aggregate analysis in Hadoop or a data warehouse Inexpensive: $0.028 per million puts
  42. 42. Voting Demo High Level Architecture Sentimentizer Webpage hosted on S3 Kinesis Stream Mobile Client Tablet Client Desktop Client Clients load S3 Hosted Webpages using AWS JavaScript SDK Clients PUT votes directly to Kinesis stream Kinesis Redshift Connector ASG Kinesis Client Library ASG Redshift Data Warehouse Analytics JasperSoft AWS Marketplace Consumers process records from stream Persistence and long-term analysis in Redshift ElastiCache Live Tally Pulse Real Time Average of Voting Sentiment Tealeaves Real time Totals of Votes Across Sentiment Speedo Realtime Display of Votes Per Second ElasticBeanstalk Tallyroom App (Sinatra API) Tallying and live visualization of data S3 hosted webpages using JavaScript and live calls to API
  43. 43. Sentimentizer Pricing Service Pricing Total Cost Per Hour Kinesis Stream 25 shards @ 1.5 cents per shard per hour $0.38 Kinesis messages 24 million PUTS (all of Australia) @ 2.8 cents per million PUTS $0.68 Kinesis Workers 2 x m3.large $0.40 Redshift Workers 2 x m1.medium $0.24 Redshift Cluster 2 x dw1.xlarge (4 TB total) $2.50 ElastiCache Cluster 2 x cache.m3.xlarge $1.02 Tallyroom App Fully redundant deployment with ELB & 2 x m1.small $0.15 S3 Websites Sentimentizer, Pulse, Tea Leaves, Speedo Cents per GB of storage. 0.44 cents per 10,000 requests for 24 million requests. $10.56 TOTAL $15.93
  44. 44. X. Beyond the Display"
  46. 46. Cloud enables connected data collection!
  47. 47. Cloud enables connected data processing!
  48. 48. Cloud enables connected data collaboration!
  49. 49.