More Related Content Similar to BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012 (20) More from Amazon Web Services (20) BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 20123. BIG-DATA
When your data sets become
so large that you have to start
innovating how to collect, store,
analyze and share it
5. BIG-DATA
The collection and
analysis of large amounts
of data creates
competitive advantage
13. • Stream data to Amazon using Apache Flume
• Amazon S3
• Amazon Elastic MapReduce
16. Structure
High Low
Large
S3
EMR HDFS
Hbase
Size Dynamo DB
RDS
Small Logs on App servers
18. DynamoDB Table: On-Premise DB Table:
Daily-Orders Customer-Demographics
NoSQL Table SQL Table
RDS Table:
Targeting-Information
19. DynamoDB Table: On-Premise DB Table:
Daily-Orders Customer-Demographics
NoSQL Table SQL Table
S3://clickstream-data/ 3rd Party Data:
Apache Logs Social Networking Information
Accessed via web API
RDS Table:
Targeting-Information
24. Hadoop on Elastic MapReduce
lowers the cost of developing and
operating a distributed system.
27. Recommendation Ad-hoc
Engine Analysis Personalization
Prod Cluster
S3 (EMR)
EMR
Data consumed in multiple ways
28. Prod Cluster
(EMR)
S3
EMR
Query Cluster
(EMR)
EMR
EMR
EMR
EMR
38. Social
Media/Advertising Oil & Gas Retail Life Sciences Financial Services Security
Network/Gaming
User
Anti-virus
Demographics
Targeted Recommendations
Monte Carlo
Advertising Simulations
Seismic Genome
Fraud Detection Usage analysis
Analysis Analysis
Image and
Transactions
Video Analysis Risk Analysis
Processing Image
Recognition In-game metrics
41. Who is VivaKi?
©2011. All rights reserved. VivaKi. Proprietary and Confidential.
42. Big Data Challenge for VivaKi
Enablement Activation Attribution
©2011. All rights reserved. VivaKi. Proprietary and Confidential.
43. The Product Solution – Fluent from Razorfish
A digital marketing technology platform that provides marketers and agencies with a single,
integrated software application to target, distribute, and manage multi-channel digital campaigns and
experiences.
Marketing Central
(Marketing Planning and Management, Team Collaboration and Workflow)
Experience Publishing
(CMS / DMS, Multi-Channel and Multi-Device Distribution, Social Monitoring)
Targeting Insights
(Multi-Channel Aware Segmentation and Targeting) (Analytics and Reporting, including Attribution)
Data Warehouse
(Data Sources - 1st and 3rd Party, Data Normalization + Transformation, Data Management)
Amazon Cloud Infrastructure
©2011. All rights reserved. VivaKi. Proprietary and Confidential.
45. Example: Atlas Cookie Level Data
Click Stream
Historical Click Stream
Fe
Data
e
User Browsing
d
Ad Server Logs
Session
Data Mining
Apply
Customization
Segmentation &
Categorization
Algorithm
Customer Loyalty Data
Ad Serving System Cross Selling System
©2011. All rights reserved. VivaKi. Proprietary and Confidential.
46. Example: Atlas Cookie Level Data
Operational Specifics
Traditional Data Center Solution Amazon Cloud Solution
30 Processing Servers (HP Proliant DL-360)
3 SQL Servers (HP Proliant DL-580) EMR Cluster of up to 1000 EC2 Instances
Configuration 10TB SAN Storage 200GB additional S3 storage per month
Processing 2 to 30 hours reliably 9 hours
Data Retention 90 days 18 months
System Cost $5000/month $10000/month
Personnel Cost $15000/month $5500/month
Business Impact
no upfront investment in hardware
no hardware procurement delay
no additional operations staff was hired
We completed development and testing of our first client project in six weeks. Our
process is completely automated.
our first client campaign experienced a 500% increase in their return on ad spend
©2011. All rights reserved. VivaKi. Proprietary and Confidential.
57. Etsy on
Oprah Search Ads Restyled
Hurricane
Strikes
Justin Beiber New Cat Meme
Sneezes
68. We are sincerely eager to
hear your FEEDBACK on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.