Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Without Limit - Dr. Werner Vogels - AWS Summit 2012 Australia

1,264 views

Published on

Closing presentation from Dr. Wener Vogels and the AWS Summit in Sydney, May 2012

Published in: Technology, Business

Data Without Limit - Dr. Werner Vogels - AWS Summit 2012 Australia

  1. 1. Data without Limits Dr. Werner Vogels CTO, Amazon.com
  2. 2. http://wv.ly/4thpar
  3. 3. DATA Intensive Centric
  4. 4. BIG-DATADATA Centric IntensiveBIG-DATA
  5. 5. BIG-DATA When your data sets become so large that you have to startinnovating how to collect, store, organize, analyze and share it
  6. 6. 3Vs
  7. 7. Volume3Vs Velocity Variety
  8. 8. BIG-DATA The collection and analysis oflarge amounts of data to create a competitive advantage
  9. 9. BIGGER IS BETTER
  10. 10. UNCERTAINTY
  11. 11. BIG-DATA REQUIRESNO LIMITS
  12. 12. COLLECT | STORE | ORGANIZE | ANALYZE |
  13. 13. COLLECT | STORE | ORGANIZE | ANALYZE |
  14. 14. Step 1: Tracking Step 2: Panel Step 3: DashboardWe’ve created a unique tracking application. It keeps track of all We invite members of a research panel to install it. Usage data now begins to pour into the Wakoopawebsite visited, software used, and/or ads seen. We know not only their digital habits, but also their dashboard in real-time. Log in, and create beautiful offline demographics and behavior. visualizations and useful reports.
  15. 15. TechnologyPanel AWS Activity SQS EMR RDS Data Kamek* Metri cs S3 Wakoopa dashboard
  16. 16. Direct Connect
  17. 17. AWS IMPORT/EXPORT
  18. 18. COLLECT | STORE | ORGANIZE | ANALYZE |
  19. 19. Storage Muck
  20. 20. Database Muck
  21. 21. Amazon DynamoDB
  22. 22. COLLECT | STORE | ORGANIZE | ANALYZE |
  23. 23. DATA QUALITY
  24. 24. DATA QUALITY• Control Data
  25. 25. DATA QUALITY• Control Data• Correct Data
  26. 26. DATA QUALITY• Control Data• Correct Data• Validate Data
  27. 27. DATA QUALITY• Control Data• Correct Data• Validate Data• Enrich Data
  28. 28. –A large provider of business listings (over 20MM in the US) needs to determine where each data element belongs and if it is valid.–1 MM new pieces of data are reviewed a day.Data$Engine$ Excep3ons$are$sent$ New$Data$is$Processing$ to$Mechanical$Turk$ published$ • 2$excep3on$cases:$ • Workers$$validate$ • Data$can$be$ • Conflic3ng$ new$informa3on$ pushed$out$to$the$ informa3on$ through$Web$and$ website$for$ • New$informa3on$ Phone$research$ mone3za3on.$$ that$requires$ • Workers$remove$ valida3on$$ duplicates$$
  29. 29. COLLECT | STORE | ORGANIZE | ANALYZE |
  30. 30. Computational
  31. 31. MAPREDUCHADOO E AMAZON ELASTIC P
  32. 32. Forrester Wave: Enterprise Hadoop Solutions, Q1 ‘12The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of ForresterResearch, Inc. The Forrester Wave™ is a graphical representation of Forresters call on a market and is plotted using adetailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or
  33. 33. HP
  34. 34. COLLECT | STORE | ORGANIZE | ANALYZE |
  35. 35. http://aws.amazon.com/publicdatsets
  36. 36. Big Data Verticals Social. Media/ Life. Financial. Oil.&.Gas. Retail. Security. Network/Adver*sing. Sciences. Services. Gaming. User( An+>virus( Demographics( Targeted( Recommenda+ons( Monte(Carlo( Adver+sing( Simula+ons( Seismic( Genome( Fraud( Usage( Analysis( Analysis( Detec+on( analysis( Image(and( Video( Transac+on( Risk( Analysis( Analysis( Image( In>game( Processing( Recogni+on( metrics(
  37. 37. COLLECT | STORE | ORGANIZE | ANALYZE |
  38. 38. werner@amazon.com

×