SlideShare a Scribd company logo
Big Data Use Cases in the cloud
Peter Sirota, GM Elastic MapReduce
@petersirota
What is Big Data?
Computer generated data
 Application server logs (web sites, games)
 Sensor data (weather, water, smart grids)
 Images/videos (traffic, security cameras)
Human generated data
 Twitter “Firehose” (50 mil tweets/day 1,400% growth
per year)
 Blogs/Reviews/Emails/Pictures
Social graphs
 Facebook, linked-in, contacts
Big Data is full of valuable, unanswered questions!
Why is Big Data Hard (and Getting Harder)?
Data Volume
 Unconstrained growth
 Current systems don’t scale
Why is Big Data Hard (and Getting Harder)?
Why is Big Data Hard (and Getting Harder)?
Data Structure
 Need to consolidate data from multiple data sources
in multiple formats across multiple businesses
Why is Big Data Hard (and Getting Harder)?
Changing Data Requirements
 Faster response time of fresher data
 Sampling is not good enough and history is important
 Increasing complexity of analytics
 Users demand inexpensive experimentation
We need tools built specifically for Big Data!
Innovation #1:
Apache Hadoop
The MapReduce computational paradigm
Open source, scalable, fault tolerant, distributed system‐
Hadoop lowers the cost of developing a distributed
system for data processing
Innovation #2:
Amazon Elastic Compute Cloud (EC2)
“provides resizable compute capacity in the cloud.”
Amazon EC2 lowers the cost of operating a
distributed system for data processing
Amazon Elastic MapReduce =
Amazon EC2 + Hadoop
Elastic MapReduce applications
Targeted advertising / Clickstream analysis
Security: anti-virus, fraud detection, image recognition
Pattern matching / Recommendations
Data warehousing / BI
Bio-informatics (Genome analysis)
Financial simulation (Monte Carlo simulation)
File processing (resize jpegs, video encoding)
Web indexing
Clickstream Analysis –
Big Box Retailer came to Razorfish
 3.5 billion records
 71 million unique cookies
 1.7 million targeted ads required per day
Problem: Improve Return on Ad Spend (ROAS)
Clickstream Analysis –
Targeted Ad
User recently
purchased a sports
movie and is
searching for video
games (1.7 Million per day)
Clickstream Analysis –
Lots of experimentation but final design:
 100 node on-demand Elastic MapReduce cluster running Hadoop
Clickstream Analysis –
Processing time dropped from 2+ days to 8 hours
(with lots more data)
Clickstream Analysis –
Increased Return On Ad Spend by 500%
World’s largest handmade marketplace
 8.9 million items
 1 billion page view per month
 $320MM 2010 GMS
• Easy to ‘backfill’ and run experiments just boot up a cluster
with 100, 500, or 1000 nodes
Production DB
snapshots
Production DB
snapshots
Web event
logs
Web event
logs ETL – Step
1
ETL – Step
1
ETL – Step
2
ETL – Step
2
JobJob
JobJob
JobJob
Recommendations
The Taste Test http://www.etsy.com/tastetest
Recommendations
etsy.com/gifts
Gift Ideas for Facebook Friends
•
• Yelp generates close to 400GB of logs per day
Yelp
• Yelp does not have a physical MapReduce cluster
• Running 250 production clusters per week
• All of those run on Elastic MapReduce
MapReduce at Yelp
Features driven by MapReduce
Features driven by MapReduce
• Analyze ad stats (reporting, billing, algorithm
inputs)
• Analyze A/B test results
• Detect duplicate business listings
• Email bounce processing
• Identify bots based on traffic patterns
More MapReduce uses
9/23/2011 Amazon EMR Strata Justin Moore - @injust
Big Data @ foursquare
9/23/2011 Amazon EMR Strata Justin Moore - @injust
How do we use EMR?
• Map-Reduce
– Run algorithms on our entire dataset
– Streaming jobs, complex analyses
• Hive
– Business intelligence
– Exploratory analyses
– Infographics!
9/23/2011 Amazon EMR Strata Justin Moore - @injust
How big is our data?
• Global reach (North Pole, Space)
• Native app for almost every smartphone, SMS,
web, mobile-web
• 10M+ users, 15M+ venues, ~1B check-ins
• Terabytes of log data
9/23/2011 Amazon EMR Strata Justin Moore - @injust
Our Stack
9/23/2011 Amazon EMR Strata Justin Moore - @injust
Computing venue-to-venue similarity
• Spin up 40 node cluster
• Submit Ruby streaming job
– Invert User x Venue matrix
– Grab Co-occurrences
– Compute similarity
• Spin down cluster
• Load data to app server
9/23/2011 Amazon EMR Strata Justin Moore - @injust
Who is checking in?
9/23/2011 Amazon EMR Strata Justin Moore - @injust
What are people doing?
9/23/2011 Amazon EMR Strata Justin Moore - @injust
Where are our users?
9/23/2011 Amazon EMR Strata Justin Moore - @injust
When do people go to a place?
Thursday Friday Saturday Sunday
9/23/2011 Amazon EMR Strata Justin Moore - @injust
Why are people checking in?
• Explore their city, discover new places
• Find friends, meet up
• Save with local deals
• Get insider tips on venues
• Personal analytics, diary
• Follow brands and celebrities
• Earn points, badges, gamification of life
• The list grows…
9/23/2011 Amazon EMR Strata Justin Moore - @injust
How can we leverage these insights?
9/23/2011 Amazon EMR Strata Justin Moore - @injust
Join us!
foursquare is hiring
www.foursquare.com/jobs
Justin Moore
@injust
justin@foursquare.com
http://aws.amazon.com/elasticmapreduce/

More Related Content

Similar to Big data use cases in the cloud presentation

Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
amiyadash
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
Eli White
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
Amazon Web Services
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Steven Ramage
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
IIIT Allahabad
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
Rukshan Batuwita
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
VMware Tanzu Korea
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
Treasure Data, Inc.
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
Amjid Ali
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECH
Dataiku
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
Amazon Web Services Korea
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Making Sense of Remote Sensing
Making Sense of Remote SensingMaking Sense of Remote Sensing
Making Sense of Remote Sensing
Amazon Web Services
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
Kathirvel Ayyaswamy
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
Skillwise Consulting
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
Sanoj Kumar
 
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação ExtremaA Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
Amazon Web Services LATAM
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
Boulder Java User's Group
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
Clusterpoint
 

Similar to Big data use cases in the cloud presentation (20)

Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECH
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Making Sense of Remote Sensing
Making Sense of Remote SensingMaking Sense of Remote Sensing
Making Sense of Remote Sensing
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação ExtremaA Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 

More from TUSHAR GARG

Big Data
Big DataBig Data
Big Data
TUSHAR GARG
 
4 aa4 3925enw
4 aa4 3925enw4 aa4 3925enw
4 aa4 3925enw
TUSHAR GARG
 
Retail lessons learned from the first data driven business and future direct...
Retail  lessons learned from the first data driven business and future direct...Retail  lessons learned from the first data driven business and future direct...
Retail lessons learned from the first data driven business and future direct...
TUSHAR GARG
 
Questionaire Design
Questionaire DesignQuestionaire Design
Questionaire Design
TUSHAR GARG
 
Job description
Job descriptionJob description
Job description
TUSHAR GARG
 
F&d ppt internship
F&d ppt internshipF&d ppt internship
F&d ppt internship
TUSHAR GARG
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
TUSHAR GARG
 
COCA COLA INDIA STRATEGY
COCA COLA INDIA STRATEGYCOCA COLA INDIA STRATEGY
COCA COLA INDIA STRATEGY
TUSHAR GARG
 

More from TUSHAR GARG (8)

Big Data
Big DataBig Data
Big Data
 
4 aa4 3925enw
4 aa4 3925enw4 aa4 3925enw
4 aa4 3925enw
 
Retail lessons learned from the first data driven business and future direct...
Retail  lessons learned from the first data driven business and future direct...Retail  lessons learned from the first data driven business and future direct...
Retail lessons learned from the first data driven business and future direct...
 
Questionaire Design
Questionaire DesignQuestionaire Design
Questionaire Design
 
Job description
Job descriptionJob description
Job description
 
F&d ppt internship
F&d ppt internshipF&d ppt internship
F&d ppt internship
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
COCA COLA INDIA STRATEGY
COCA COLA INDIA STRATEGYCOCA COLA INDIA STRATEGY
COCA COLA INDIA STRATEGY
 

Recently uploaded

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 

Big data use cases in the cloud presentation

  • 1. Big Data Use Cases in the cloud Peter Sirota, GM Elastic MapReduce @petersirota
  • 2. What is Big Data?
  • 3. Computer generated data  Application server logs (web sites, games)  Sensor data (weather, water, smart grids)  Images/videos (traffic, security cameras)
  • 4. Human generated data  Twitter “Firehose” (50 mil tweets/day 1,400% growth per year)  Blogs/Reviews/Emails/Pictures Social graphs  Facebook, linked-in, contacts
  • 5. Big Data is full of valuable, unanswered questions!
  • 6. Why is Big Data Hard (and Getting Harder)?
  • 7. Data Volume  Unconstrained growth  Current systems don’t scale Why is Big Data Hard (and Getting Harder)?
  • 8. Why is Big Data Hard (and Getting Harder)? Data Structure  Need to consolidate data from multiple data sources in multiple formats across multiple businesses
  • 9. Why is Big Data Hard (and Getting Harder)? Changing Data Requirements  Faster response time of fresher data  Sampling is not good enough and history is important  Increasing complexity of analytics  Users demand inexpensive experimentation
  • 10. We need tools built specifically for Big Data!
  • 11. Innovation #1: Apache Hadoop The MapReduce computational paradigm Open source, scalable, fault tolerant, distributed system‐ Hadoop lowers the cost of developing a distributed system for data processing
  • 12. Innovation #2: Amazon Elastic Compute Cloud (EC2) “provides resizable compute capacity in the cloud.” Amazon EC2 lowers the cost of operating a distributed system for data processing
  • 13. Amazon Elastic MapReduce = Amazon EC2 + Hadoop
  • 14. Elastic MapReduce applications Targeted advertising / Clickstream analysis Security: anti-virus, fraud detection, image recognition Pattern matching / Recommendations Data warehousing / BI Bio-informatics (Genome analysis) Financial simulation (Monte Carlo simulation) File processing (resize jpegs, video encoding) Web indexing
  • 15. Clickstream Analysis – Big Box Retailer came to Razorfish  3.5 billion records  71 million unique cookies  1.7 million targeted ads required per day Problem: Improve Return on Ad Spend (ROAS)
  • 16. Clickstream Analysis – Targeted Ad User recently purchased a sports movie and is searching for video games (1.7 Million per day)
  • 17. Clickstream Analysis – Lots of experimentation but final design:  100 node on-demand Elastic MapReduce cluster running Hadoop
  • 18. Clickstream Analysis – Processing time dropped from 2+ days to 8 hours (with lots more data)
  • 19. Clickstream Analysis – Increased Return On Ad Spend by 500%
  • 20. World’s largest handmade marketplace  8.9 million items  1 billion page view per month  $320MM 2010 GMS
  • 21. • Easy to ‘backfill’ and run experiments just boot up a cluster with 100, 500, or 1000 nodes Production DB snapshots Production DB snapshots Web event logs Web event logs ETL – Step 1 ETL – Step 1 ETL – Step 2 ETL – Step 2 JobJob JobJob JobJob
  • 22. Recommendations The Taste Test http://www.etsy.com/tastetest
  • 24. • • Yelp generates close to 400GB of logs per day Yelp
  • 25. • Yelp does not have a physical MapReduce cluster • Running 250 production clusters per week • All of those run on Elastic MapReduce MapReduce at Yelp
  • 26. Features driven by MapReduce
  • 27. Features driven by MapReduce
  • 28. • Analyze ad stats (reporting, billing, algorithm inputs) • Analyze A/B test results • Detect duplicate business listings • Email bounce processing • Identify bots based on traffic patterns More MapReduce uses
  • 29. 9/23/2011 Amazon EMR Strata Justin Moore - @injust Big Data @ foursquare
  • 30. 9/23/2011 Amazon EMR Strata Justin Moore - @injust How do we use EMR? • Map-Reduce – Run algorithms on our entire dataset – Streaming jobs, complex analyses • Hive – Business intelligence – Exploratory analyses – Infographics!
  • 31. 9/23/2011 Amazon EMR Strata Justin Moore - @injust How big is our data? • Global reach (North Pole, Space) • Native app for almost every smartphone, SMS, web, mobile-web • 10M+ users, 15M+ venues, ~1B check-ins • Terabytes of log data
  • 32. 9/23/2011 Amazon EMR Strata Justin Moore - @injust Our Stack
  • 33. 9/23/2011 Amazon EMR Strata Justin Moore - @injust Computing venue-to-venue similarity • Spin up 40 node cluster • Submit Ruby streaming job – Invert User x Venue matrix – Grab Co-occurrences – Compute similarity • Spin down cluster • Load data to app server
  • 34. 9/23/2011 Amazon EMR Strata Justin Moore - @injust Who is checking in?
  • 35. 9/23/2011 Amazon EMR Strata Justin Moore - @injust What are people doing?
  • 36. 9/23/2011 Amazon EMR Strata Justin Moore - @injust Where are our users?
  • 37. 9/23/2011 Amazon EMR Strata Justin Moore - @injust When do people go to a place? Thursday Friday Saturday Sunday
  • 38. 9/23/2011 Amazon EMR Strata Justin Moore - @injust Why are people checking in? • Explore their city, discover new places • Find friends, meet up • Save with local deals • Get insider tips on venues • Personal analytics, diary • Follow brands and celebrities • Earn points, badges, gamification of life • The list grows…
  • 39. 9/23/2011 Amazon EMR Strata Justin Moore - @injust How can we leverage these insights?
  • 40. 9/23/2011 Amazon EMR Strata Justin Moore - @injust Join us! foursquare is hiring www.foursquare.com/jobs Justin Moore @injust justin@foursquare.com