Your SlideShare is downloading. ×
  • Like
  • Save
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012

  • 1,276 views
Published

Big data technologies let you work with any velocity, volume, or variety of data in a highly productive environment. This session seeks to answer questions such as "what is big data," "how can I use …

Big data technologies let you work with any velocity, volume, or variety of data in a highly productive environment. This session seeks to answer questions such as "what is big data," "how can I use unstructured data," and "how can I integrate data collections from different sources" using Hadoop with Amazon Elastic MapReduce. Join general manager of EMR, Peter Sirota, on a journey through real-world use cases of data-driven discovery.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,276
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BIG-DATA When your data sets become so large that you have to startinnovating how to collect, store, analyze and share it
  • 2. Volume3Vs Velocity Variety
  • 3. BIG-DATA The collection andanalysis of large amounts of data creates competitive advantage
  • 4. BIGGER IS BETTER
  • 5. Online Population Mobile Phone Machine Data
  • 6. 1 Trillion Objects!
  • 7. COLLECT | STORE | ANALYZE | SHARE
  • 8. COLLECT | STORE | ANALYZE | SHARE
  • 9. • Stream data to Amazon using Apache Flume • Amazon S3 • Amazon Elastic MapReduce
  • 10. COLLECT | STORE | ANALYZE | SHARE
  • 11. Structure High Low Large S3 EMR HDFS HbaseSize Dynamo DB RDS Small Logs on App servers
  • 12. ANALYZEORGINIZE | CLEAN | ENRICH | CONDENSE
  • 13. DynamoDB Table: On-Premise DB Table:Daily-Orders Customer-DemographicsNoSQL Table SQL Table RDS Table: Targeting-Information
  • 14. DynamoDB Table: On-Premise DB Table:Daily-Orders Customer-DemographicsNoSQL Table SQL TableS3://clickstream-data/ 3rd Party Data: Apache Logs Social Networking Information Accessed via web API RDS Table: Targeting-Information
  • 15. S3 file:s3://weekly-trend-data/CSV ReportS3 file:s3://monthly-trend-data/CSV Report
  • 16. AMAZON ELASTIC MAPREDUCEReduces complexity/cost of Hadoop ManagementIntegrates seamlessly with AWS ServicesLeverages unmatched operational experience
  • 17. Hadoop on Elastic MapReducelowers the cost of developing and operating a distributed system.
  • 18. Amazon EMR and Amazon S3 S3
  • 19. Recommendation Ad-hoc Engine Analysis Personalization Prod Cluster S3 (EMR) EMRData consumed in multiple ways
  • 20. Prod Cluster (EMR)S3 EMR Query Cluster (EMR) EMR EMR EMR EMR
  • 21. DynamoDB S3
  • 22. EMR DynamoDBS3
  • 23. DynamoDB
  • 24. ANALYZE SHAREVISUALIZE | EXPLORE | DECIDE
  • 25. Big Data Use Cases
  • 26. Digital Advertising Web Analytics Log Processing Data Warehousing
  • 27. SocialMedia/Advertising Oil & Gas Retail Life Sciences Financial Services Security Network/Gaming User Anti-virus Demographics Targeted Recommendations Monte Carlo Advertising Simulations Seismic Genome Fraud Detection Usage analysis Analysis Analysis Image and Transactions Video Analysis Risk Analysis Processing Image Recognition In-game metrics
  • 28. Who is VivaKi? ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 29. Big Data Challenge for VivaKiEnablement Activation Attribution ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 30. The Product Solution – Fluent from RazorfishA digital marketing technology platform that provides marketers and agencies with a single,integrated software application to target, distribute, and manage multi-channel digital campaigns andexperiences. Marketing Central (Marketing Planning and Management, Team Collaboration and Workflow) Experience Publishing (CMS / DMS, Multi-Channel and Multi-Device Distribution, Social Monitoring) Targeting Insights (Multi-Channel Aware Segmentation and Targeting) (Analytics and Reporting, including Attribution) Data Warehouse (Data Sources - 1st and 3rd Party, Data Normalization + Transformation, Data Management) Amazon Cloud Infrastructure ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 31. VivaKi Technology Solution ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 32. Example: Atlas Cookie Level Data Click Stream Historical Click Stream Fe Data eUser Browsing d Ad Server Logs Session Data Mining Apply Customization Segmentation & Categorization Algorithm Customer Loyalty Data Ad Serving System Cross Selling System ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 33. Example: Atlas Cookie Level Data Operational Specifics Traditional Data Center Solution Amazon Cloud Solution 30 Processing Servers (HP Proliant DL-360) 3 SQL Servers (HP Proliant DL-580) EMR Cluster of up to 1000 EC2 Instances Configuration 10TB SAN Storage 200GB additional S3 storage per month Processing 2 to 30 hours reliably 9 hours Data Retention 90 days 18 months System Cost $5000/month $10000/month Personnel Cost $15000/month $5500/month Business Impact  no upfront investment in hardware  no hardware procurement delay  no additional operations staff was hired  We completed development and testing of our first client project in six weeks. Our process is completely automated.  our first client campaign experienced a 500% increase in their return on ad spend ©2011. All rights reserved. VivaKi. Proprietary and Confidential.
  • 34. Better?
  • 35. Search Ads Restyled
  • 36. Etsy onOprah Search Ads Restyled Hurricane Strikes Justin Beiber New Cat Meme Sneezes
  • 37. 5%95%
  • 38. Thank you!aws.amazon.com/big-data
  • 39. We are sincerely eager tohear your FEEDBACK on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.