Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012


Published on

Big data technologies let you work with any velocity, volume, or variety of data in a highly productive environment. This session seeks to answer questions such as "what is big data," "how can I use unstructured data," and "how can I integrate data collections from different sources" using Hadoop with Amazon Elastic MapReduce. Join general manager of EMR, Peter Sirota, on a journey through real-world use cases of data-driven discovery.

  • Be the first to comment

BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012

  1. 1. BIG-DATA When your data sets become so large that you have to startinnovating how to collect, store, analyze and share it
  2. 2. Volume3Vs Velocity Variety
  3. 3. BIG-DATA The collection andanalysis of large amounts of data creates competitive advantage
  5. 5. Online Population Mobile Phone Machine Data
  6. 6. 1 Trillion Objects!
  9. 9. • Stream data to Amazon using Apache Flume • Amazon S3 • Amazon Elastic MapReduce
  11. 11. Structure High Low Large S3 EMR HDFS HbaseSize Dynamo DB RDS Small Logs on App servers
  13. 13. DynamoDB Table: On-Premise DB Table:Daily-Orders Customer-DemographicsNoSQL Table SQL Table RDS Table: Targeting-Information
  14. 14. DynamoDB Table: On-Premise DB Table:Daily-Orders Customer-DemographicsNoSQL Table SQL TableS3://clickstream-data/ 3rd Party Data: Apache Logs Social Networking Information Accessed via web API RDS Table: Targeting-Information
  15. 15. S3 file:s3://weekly-trend-data/CSV ReportS3 file:s3://monthly-trend-data/CSV Report
  16. 16. AMAZON ELASTIC MAPREDUCEReduces complexity/cost of Hadoop ManagementIntegrates seamlessly with AWS ServicesLeverages unmatched operational experience
  17. 17. Hadoop on Elastic MapReducelowers the cost of developing and operating a distributed system.
  18. 18. Amazon EMR and Amazon S3 S3
  19. 19. Recommendation Ad-hoc Engine Analysis Personalization Prod Cluster S3 (EMR) EMRData consumed in multiple ways
  20. 20. Prod Cluster (EMR)S3 EMR Query Cluster (EMR) EMR EMR EMR EMR
  21. 21. DynamoDB S3
  22. 22. EMR DynamoDBS3
  23. 23. DynamoDB
  25. 25. Big Data Use Cases
  26. 26. Digital Advertising Web Analytics Log Processing Data Warehousing
  27. 27. SocialMedia/Advertising Oil & Gas Retail Life Sciences Financial Services Security Network/Gaming User Anti-virus Demographics Targeted Recommendations Monte Carlo Advertising Simulations Seismic Genome Fraud Detection Usage analysis Analysis Analysis Image and Transactions Video Analysis Risk Analysis Processing Image Recognition In-game metrics
  28. 28. Who is VivaKi? ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  29. 29. Big Data Challenge for VivaKiEnablement Activation Attribution ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  30. 30. The Product Solution – Fluent from RazorfishA digital marketing technology platform that provides marketers and agencies with a single,integrated software application to target, distribute, and manage multi-channel digital campaigns andexperiences. Marketing Central (Marketing Planning and Management, Team Collaboration and Workflow) Experience Publishing (CMS / DMS, Multi-Channel and Multi-Device Distribution, Social Monitoring) Targeting Insights (Multi-Channel Aware Segmentation and Targeting) (Analytics and Reporting, including Attribution) Data Warehouse (Data Sources - 1st and 3rd Party, Data Normalization + Transformation, Data Management) Amazon Cloud Infrastructure ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  31. 31. VivaKi Technology Solution ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  32. 32. Example: Atlas Cookie Level Data Click Stream Historical Click Stream Fe Data eUser Browsing d Ad Server Logs Session Data Mining Apply Customization Segmentation & Categorization Algorithm Customer Loyalty Data Ad Serving System Cross Selling System ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  33. 33. Example: Atlas Cookie Level Data Operational Specifics Traditional Data Center Solution Amazon Cloud Solution 30 Processing Servers (HP Proliant DL-360) 3 SQL Servers (HP Proliant DL-580) EMR Cluster of up to 1000 EC2 Instances Configuration 10TB SAN Storage 200GB additional S3 storage per month Processing 2 to 30 hours reliably 9 hours Data Retention 90 days 18 months System Cost $5000/month $10000/month Personnel Cost $15000/month $5500/month Business Impact  no upfront investment in hardware  no hardware procurement delay  no additional operations staff was hired  We completed development and testing of our first client project in six weeks. Our process is completely automated.  our first client campaign experienced a 500% increase in their return on ad spend ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  34. 34. Better?
  35. 35. Search Ads Restyled
  36. 36. Etsy onOprah Search Ads Restyled Hurricane Strikes Justin Beiber New Cat Meme Sneezes
  37. 37. 5%95%
  38. 38. Thank you!
  39. 39. We are sincerely eager tohear your FEEDBACK on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.