• Save
AWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner Vogels
Upcoming SlideShare
Loading in...5
×
 

AWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner Vogels

on

  • 410 views

 

Statistics

Views

Total Views
410
Views on SlideShare
407
Embed Views
3

Actions

Likes
2
Downloads
0
Comments
0

1 Embed 3

https://twitter.com 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    AWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner Vogels AWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner Vogels Presentation Transcript

    • Data without Limits! Dr. Werner Vogels! CTO, Amazon.com!
    • I. Science!
    • Observations – Theory – Models - Facts!
    • Human Genome Project! Collaborative project to sequence every single letter! of the human genetic code.! 13 years and $billions to complete.! Gigabyte scale datasets (transferred between sites on! iPods!)!
    • Beyond the Human Genome! 45+ species sequenced: mouse, rat, gorilla, rabbit, ! platypus, nematode, zebra fish...! Compare genomes between species to identify! biologically interesting areas of the genome.! 100Gb scale datasets. Increased computational requirements.!
    • The Next Generation! New sequencing instruments lead to a dramatic! drop in cost and time required to sequence a genome.! Sequence and compare genetic code of individuals to! find areas of variation. Much more interesting.! Terabyte scale datasets. Significant computational requirements.!
    • The 1000 Genomes Projects! Public/private consortium to build world’s largest! collection of human genetic variation.! Hugely important dataset to drive new insight into! known genetic traits, and the identification of new ones.! Vast, complex data and computational resources required, beyond reach of most research groups and hospitals.!
    • 1000 Genomes in the Cloud! The 1000 Genomes data made available to all on AWS.! Stored for free as part of the Public Datasets program.! Updated regularly.! 200Tb. 1700 individual genomes. As much compute and storage as required available to all.!
    • II. Consumer!
    • UNCERTAINTY!
    • UNDERSTAND! YOUR CUSTOMER!
    • Who  is  my  customer  really?       What  do  people  really  like?     What  is  happening  socially  with  my  products?     Where  do  people  consume  my  product?   How  do  people  really  use  your  product?    
    • PERSONALIZE!
    • 75% of users select! movies based on! recommendations!
    • A/B TESTING!
    • BIGGER IS BETTER!
    • Wego   •  Search  using  Flexible  dates  AND/OR  LocaBons  and  Themes   –  FROM  Singapore  TO  Beach  FOR  A  Weekend  Trip  (theme  locaBon  +  flexible  date)   –  FROM  Singapore  TO  Paris  FOR  A  Whole-­‐week  VacaBon  (specific  desBnaBon  +  flexible   date)   –  FROM  Singapore  TO  Sydney  IN  Next  Two  Months  (specific  desBnaBon  +  flexible  date)   –  FROM  Singapore  TO  Family-­‐friendly  DesBnaBon  ON  30-­‐Apr  to  05-­‐May  (theme  locaBon   +  fixed  dates)   •  Need  for  robust  caching  mechanism  with  millions  of  flight  searches  with   10Million  +  different  flight  routes     •  Use  the  AWS  cloud  to  rapidly  spin  up  machines  to  scale  to  the  requirements   •  AWS  allows  them  to  do  this  in  a  scalable  and  cost  effecBve  manner    
    • Wego  –  Search  
    • Dropcam  is  the  biggest  inbound  video   service  on  the  Web     •  More  data  uploaded  per   minute  than  YouTube     •  Petabytes  of  data   processed  every  month   •  Billions  of  moBon  events   detected  
    • III. Industrial!
    • IV. Sports!
    • V. Startups!
    • Experiment
 Measure
 Iterate or Pivot!
    • The  only  Asian  company  which  made  it  to  the  CODE_n  finalist  list  for  CeBIT  2014  
    • Platform Architecture Archival  (Glacier)   Storage  (S3)   Crawl  Cluster  (EC2)   File  Server   (EC2)   Processing  Cluster  (EC2)   Choice  Engine  Cluster     (EC2)   Data   Partners   End  user   interacBon/Front   End   On  AWS   External  to  AWS   IntegraBon  Engine   Data  AcquisiBon  
    • Lenddo’s  Journey   •  Process  about  3.5TB  of  social  data     •  Social  Data  growing  more  users     •  Started  with  MongoDB  cluster  on  CR1  instance   types  on  AWS  ,spending  10K  USD/month     •  Re-­‐architected  to  move  all  their  data  to  S3  and   keep  caches  in  smaller  mongodb  and  dynamodb   cluster.  Use  EMR  to  process  data   •  Now  spending  3K/month    
    • VI. The Pipeline!
    • The amount of information generated during the first day of a baby’s life today is equivalent to 70 times the information contained in the Library of Congress!
    • MULTIPLE DOMAINS! Time! Properties! Locations! Sensors!
    • COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
    • COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
    • COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
    • COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
    • COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
    • COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  
    • VII. Real-time!
    • What was happening 
 yesterday?!
    • What ! right now? trades are executing! is the exception rate! is the ad click-through! topics are trending! inventory remains! queries are slow! are the high scores! !
    • Kinesis  architecture   Amazon Web Services AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Aggregate and archive to S3 Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Real-time dashboards and alarms Machine learning algorithms or sliding window analytics Aggregate analysis in Hadoop or a data warehouse Inexpensive: $0.028 per million puts
    • AWS  Internal  Metering  Service   Capture Submissions Process in Realtime Store in Redshift Clients Submitting Data Workload •  Tens of millions records/sec •  Multiple TB per hour •  100,000s of sources New features •  Scale with the business •  Provide real-time alerting •  Inexpensive •  Improved auditing
    • Workload   •  Daily  load  of  billions  records  from  millions  of  files  from   hundreds  of  sources   •  3  hour  SLA  to  load  and  audit  data   •  Hundreds  of  customers   •  Hundreds  of  queries  per  hour     New  features   •  Our  data  is  fresh,  we  ingest  every  6  hours   •  Now  processing  triple  the  volume  in  less  than  25%  of   the  Bme   •  “Hammerstone”  ETL  soluBon     –  Built  on  AWS  Data  Pipeline   –  Build  business  specific  marts   –  Build  workload  specific  clusters   •  Supports  a  variety  of  analyBcs  tools:  Tableau,  R,  Toad,   SQL  Developer,  etc.   Internal  AWS  Data  Warehouse   Over 200 internal data sources Data staged in Amazon S3 "Hammerstone:" Custom ETL using AWS Data Pipeline Data processing Redshift cluster Batch reporting Redshift cluster Ad hoc query Redshift cluster
    • Big Science & Big Data Verticals! Media/ AdverAsing   Targeted   AdverBsing   Image  and   Video   Processing   Oil  &  Gas   Seismic   Analysis   Retail   RecommendaBons   TransacBon   Analysis   Life   Sciences   Genome   Analysis   Financial   Services   Monte  Carlo   SimulaBons   Risk   Analysis   Security   AnB-­‐virus   Fraud   DetecBon   Image   RecogniBon   Social   Network/ Gaming   User   Demographics   Usage   analysis   In-­‐game   metrics  
    • BIG-DATA REQUIRES
 
 NO LIMITS!
    • Cloud enables big data collection!
    • Cloud enables big data processing!
    • Cloud enables big data collaboration!
    • werner@amazon.com