SlideShare a Scribd company logo
Measure Twice,
Build Once
#RTanalytics
Douglas Moore
Principal Consultant & Architect
May 2013
Dhruv Bansal
CSO
#RTanalytics | 2
About Us
Next generation Big Data stack
to power your applications
Data science and engineering
services that accelerate time to value
Douglas Moore
Principal Consultant & Architect
Dhruv Bansal
CSO & Co-Founder
#RTanalytics | 3
Agenda
 Think Big Use Case
 Infochimps Cloud: Streams, Queries, Batch
 Think Big & Infochimps Together
Measure Twice, Build Once
Understand
the problem
Model the
solution
Test locally
Grow the
infrastructure
#RTanalytics | 4
POLL!
Poll
#RTanalytics | 5
Poll
Very Advanced
18%
Advanced
19%
Not Advanced
45%
Not Started
18%
RESULTS: How advanced is your organization's
approach to Big Data?
#RTanalytics | 6
Accelerating Your Time to Value
Strategy
and Roadmap
IMAGINE
Training
and Education
ILLUMINATE
Hands-On
Data Science and
Data Engineering
IMPLEMENT
Leading Provider of
Data Science and Engineering Services
#RTanalytics | 7
 Use Cases
- Scale batch analysis pipeline
- Generate lively stats
- Recommendations
- Better Predictions
• #page views in next 30
days?
 Environment
- AWS
- Version 1 already in production
 Project Plan
- 8-9 weeks
- Combined Data Engineering
+ Data Science
Engagement
- Staff
• 1 Arch + 1 PM
• 1 Data Engineer
• 2 Data Scientists
• 3 Client Engineers
The Beauty of Predictive Analytics
#RTanalytics | 8
 Predictive Model Design & Build Process
- Listening & Learning
- Discovery (Digging through the data)
- Creating a Research Agenda
- Testing & Learning
 Production Quality Predictive Model Development
- Data Cleansing, Aggregations, Conditioning
- Predictive Model Training Process
- Predictive Model Execution Process
 Challenges:
- What functional forms predict future impression counts given counts up to
time T?
- Robust estimators, like medians rather than means, to cope with outliers
- How do we distinguish between new articles, versus old articles we're
seeing for the first time?
- How well do impression counts correspond to real humans?
Predictive Analytics Process
#RTanalytics | 9
 Better end-user experience
- View an ad, see the counter move.
 Need to catch fast moving events
- Content half life measured at 3 hours (H Mason: http://bit.ly/nu7IDw)
- Path to additional real-time capabilities
- Example: Trend analysis to recommend ‘hot’ articles.
Why Real-time?
#RTanalytics | 10
Overall Architecture
NoSQL
Memcache (Tuple fail tracking)
Queue
Hadoop
Ad Serving
LB
Edge
Edge
Impression
S3
S3
S3
DFS
Archive Logs
Management
Server
LB
Edge
Edge
Relational
Store
Ad Management
Ad Selling
Storm
- Queue Management
- Simple Bot Filtering
- Real-time Bucketization
- Performance Counters
- Event Logging
View Ad
Cleansing
Model Training
Recommendations
Events
Monitoring & Alerting (Metrics, Alarms, Notifications)
Model Parameters
getPrediction
Performance Counters
Impression Buckets
#RTanalytics | 11
Analytics Architecture
Storm
Web
Server
Time Series
BucketBolt
Simple Bot
Annotator
DFS
Adapter
Impression
Spout
Time Series
Buckets
(Batch)
Time Series
Buckets
(Realtime)
Impression
Prediction
Predictive
Model
Parameters
Impressions
Impressions
Impressions
Hadoop
Impression
Bucketization
Predictive
Model
Training
NoSQL
Bolt
Time
#RTanalytics | 12
Analyze
Massive Historical
Data Set
Analyze
Recent
Past
Realtime
Prediction
Solution Approach
Historical Data Set = S3
Analyze = Hadoop + Pig + R
Recent Past = Storm + NoSQL
Analyze = R + Web Service
#RTanalytics | 13
POLL!
Poll
#RTanalytics | 14
Poll
Less than 30 days
8%
Less than 90 days
54%
More than 6 months
38%
RESULTS: Say you are building a Big Data project, which time
frame would you want to build a production solution?
#RTanalytics | 15
Any Data  Any Analytics  Any Cloud
#RTanalytics | 16
Data Flow Architecture
3/18/2022
#RTanalytics | 17
Inside Cloud::Streams
#RTanalytics | 18
Twitter
Gnip
Powertrack
Facebook
Gnip
EDC
Blogs
Moreover
Metabase
TV
Transcription
Radio
Transcription
Print
Transcription
New
Media
Data
Sources
Traditional
Media
Data
Sources
Traditional & Social Media
Listening Platform
3/18/2022
Full Example
#RTanalytics | 19
POLL!
Poll
#RTanalytics | 20
Poll
Hadoop
36%
Queries
35%
Real-time
29%
Which element of the Big Data stack is most important to
you?
#RTanalytics | 21
Don’t Build it Yourself
55% of enterprise
Big Data projects fail*
*According to a December 2012 survey of 300 IT organizations by SSWUG
5%
9%
9%
77%
Project Costs by Function
Compute
Software
Operations Staff
Engineering Staff
#RTanalytics | 22
How Do We Compare to
the Competition?
Competition Think Big &
Infochimps
Speed 6+ months to value 30 days to value
Experience New college grads
Few successful
implementations
Advanced Degrees
& Published Authors
Quality Offshore Onshore, Managed
Service
Proven Learn on your dime Blue Chip
Customers
Methodology Waterfall Agile, test & learn
Questions?
#RTanalytics
Thank you for
participating!
#RTanalytics | 24
Let’s continue the conversation!
infochimps.com/demo
thinkbiganalytics.com/about/contact

More Related Content

What's hot

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
Anna Shymchenko
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
Softweb Solutions
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
Ray Bugg
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
Josef Adersberger
 
Beyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AIBeyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AI
DataWorks Summit
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
DataWorks Summit
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
Peter Tutty
 
The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
DataWorks Summit/Hadoop Summit
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
Jordan Chung
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
Ganesan Narayanasamy
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
Steve Loughran
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Teradata Aster
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
DataWorks Summit/Hadoop Summit
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Con LA
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
InMobi Technology
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
Tony Pearson
 
Using Hadoop for Cognitive Analytics
Using Hadoop for Cognitive AnalyticsUsing Hadoop for Cognitive Analytics
Using Hadoop for Cognitive Analytics
DataWorks Summit/Hadoop Summit
 

What's hot (20)

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
Beyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AIBeyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AI
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation CarrierDisrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
 
Using Hadoop for Cognitive Analytics
Using Hadoop for Cognitive AnalyticsUsing Hadoop for Cognitive Analytics
Using Hadoop for Cognitive Analytics
 

Viewers also liked

Material Flow Management for Treating Waste from Medical Activity
Material Flow Management for Treating Waste from Medical ActivityMaterial Flow Management for Treating Waste from Medical Activity
Material Flow Management for Treating Waste from Medical Activity
AM Publications,India
 
Artilharia brasileirão
Artilharia brasileirãoArtilharia brasileirão
Artilharia brasileirãoRafael Passos
 
How To Sell Your Own Home - FSBO Bootcamp (PT 3)
How To Sell Your Own Home - FSBO Bootcamp (PT 3)How To Sell Your Own Home - FSBO Bootcamp (PT 3)
How To Sell Your Own Home - FSBO Bootcamp (PT 3)
Mass Marketing Resources
 
6 Event Trends You Can't Ignore
6 Event Trends You Can't Ignore6 Event Trends You Can't Ignore
6 Event Trends You Can't Ignore
ITA Group
 
CT-SVD and Arnold Transform for Secure Color Image Watermarking
CT-SVD and Arnold Transform for Secure Color Image WatermarkingCT-SVD and Arnold Transform for Secure Color Image Watermarking
CT-SVD and Arnold Transform for Secure Color Image Watermarking
AM Publications,India
 
Full Cost White Paper
Full Cost White PaperFull Cost White Paper
Full Cost White Paper
Jim Soltis, PMP
 
Episode 22 : PROJECT EXECUTION
Episode 22 : PROJECT EXECUTION Episode 22 : PROJECT EXECUTION
Episode 22 : PROJECT EXECUTION
SAJJAD KHUDHUR ABBAS
 
Chilled Water Plant Optimization
Chilled Water Plant OptimizationChilled Water Plant Optimization
Chilled Water Plant OptimizationJoePessa
 
What's next for banks
What's next for banksWhat's next for banks
What's next for banks
Andy Pemberton
 
Episode 11 : Research Methodology
Episode 11 :  Research MethodologyEpisode 11 :  Research Methodology
Episode 11 : Research Methodology
SAJJAD KHUDHUR ABBAS
 
Episode 51 : Integrated Process Simulation
Episode 51 : Integrated Process Simulation Episode 51 : Integrated Process Simulation
Episode 51 : Integrated Process Simulation
SAJJAD KHUDHUR ABBAS
 
Process design for chemical engineers
Process design for chemical engineersProcess design for chemical engineers
Process design for chemical engineers
Amanda Ribeiro
 
500’s Demo Day 2016 Series A >> Nubity
500’s Demo Day 2016 Series A >> Nubity500’s Demo Day 2016 Series A >> Nubity
500’s Demo Day 2016 Series A >> Nubity
500 Startups
 

Viewers also liked (15)

Communication
CommunicationCommunication
Communication
 
Material Flow Management for Treating Waste from Medical Activity
Material Flow Management for Treating Waste from Medical ActivityMaterial Flow Management for Treating Waste from Medical Activity
Material Flow Management for Treating Waste from Medical Activity
 
Resume
ResumeResume
Resume
 
Artilharia brasileirão
Artilharia brasileirãoArtilharia brasileirão
Artilharia brasileirão
 
How To Sell Your Own Home - FSBO Bootcamp (PT 3)
How To Sell Your Own Home - FSBO Bootcamp (PT 3)How To Sell Your Own Home - FSBO Bootcamp (PT 3)
How To Sell Your Own Home - FSBO Bootcamp (PT 3)
 
6 Event Trends You Can't Ignore
6 Event Trends You Can't Ignore6 Event Trends You Can't Ignore
6 Event Trends You Can't Ignore
 
CT-SVD and Arnold Transform for Secure Color Image Watermarking
CT-SVD and Arnold Transform for Secure Color Image WatermarkingCT-SVD and Arnold Transform for Secure Color Image Watermarking
CT-SVD and Arnold Transform for Secure Color Image Watermarking
 
Full Cost White Paper
Full Cost White PaperFull Cost White Paper
Full Cost White Paper
 
Episode 22 : PROJECT EXECUTION
Episode 22 : PROJECT EXECUTION Episode 22 : PROJECT EXECUTION
Episode 22 : PROJECT EXECUTION
 
Chilled Water Plant Optimization
Chilled Water Plant OptimizationChilled Water Plant Optimization
Chilled Water Plant Optimization
 
What's next for banks
What's next for banksWhat's next for banks
What's next for banks
 
Episode 11 : Research Methodology
Episode 11 :  Research MethodologyEpisode 11 :  Research Methodology
Episode 11 : Research Methodology
 
Episode 51 : Integrated Process Simulation
Episode 51 : Integrated Process Simulation Episode 51 : Integrated Process Simulation
Episode 51 : Integrated Process Simulation
 
Process design for chemical engineers
Process design for chemical engineersProcess design for chemical engineers
Process design for chemical engineers
 
500’s Demo Day 2016 Series A >> Nubity
500’s Demo Day 2016 Series A >> Nubity500’s Demo Day 2016 Series A >> Nubity
500’s Demo Day 2016 Series A >> Nubity
 

Similar to [Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics

The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York CityThe Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
Neo4j
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Data Con LA
 
Philipp Kandal , CTO, Skobbler - Big data on a small budget
Philipp Kandal , CTO, Skobbler - Big data on a small budgetPhilipp Kandal , CTO, Skobbler - Big data on a small budget
Philipp Kandal , CTO, Skobbler - Big data on a small budget
How to Web
 
AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics
Ruben Pertusa Lopez
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
Guido Schmutz
 
Data Culture Series - Keynote & Panel - 19h May - London
Data Culture Series  - Keynote & Panel - 19h May - LondonData Culture Series  - Keynote & Panel - 19h May - London
Data Culture Series - Keynote & Panel - 19h May - London
Jonathan Woodward
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
Nicolas Morales
 
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. VlijmPresentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Alexander Oppel
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
Qubole
 
BigData Analysis
BigData AnalysisBigData Analysis
Driving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningDriving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine Learning
CCG
 
StreamCentral for the IT Professional
StreamCentral for the IT ProfessionalStreamCentral for the IT Professional
StreamCentral for the IT Professional
Raheel Retiwalla
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
DataWorks Summit/Hadoop Summit
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
James Serra
 
Building a data-driven application
Building a data-driven applicationBuilding a data-driven application
Building a data-driven application
wgyn
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Raghu Kashyap
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
Amazon Web Services
 
Future of data center
Future of data centerFuture of data center
Future of data center
aditya panwar
 
Relevance of time series databases & druid.io
Relevance of time series databases & druid.ioRelevance of time series databases & druid.io
Relevance of time series databases & druid.io
Muniraju V
 

Similar to [Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics (20)

The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York CityThe Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Philipp Kandal , CTO, Skobbler - Big data on a small budget
Philipp Kandal , CTO, Skobbler - Big data on a small budgetPhilipp Kandal , CTO, Skobbler - Big data on a small budget
Philipp Kandal , CTO, Skobbler - Big data on a small budget
 
AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Data Culture Series - Keynote & Panel - 19h May - London
Data Culture Series  - Keynote & Panel - 19h May - LondonData Culture Series  - Keynote & Panel - 19h May - London
Data Culture Series - Keynote & Panel - 19h May - London
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. VlijmPresentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Crimson 3 - Final case presentation
Crimson 3 - Final case presentationCrimson 3 - Final case presentation
Crimson 3 - Final case presentation
 
Driving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningDriving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine Learning
 
StreamCentral for the IT Professional
StreamCentral for the IT ProfessionalStreamCentral for the IT Professional
StreamCentral for the IT Professional
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Building a data-driven application
Building a data-driven applicationBuilding a data-driven application
Building a data-driven application
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Future of data center
Future of data centerFuture of data center
Future of data center
 
Relevance of time series databases & druid.io
Relevance of time series databases & druid.ioRelevance of time series databases & druid.io
Relevance of time series databases & druid.io
 

More from Infochimps, a CSC Big Data Business

Vayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex SystemsVayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex Systems
Infochimps, a CSC Big Data Business
 
AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
Infochimps, a CSC Big Data Business
 
Report: CIOs & Big Data
Report: CIOs & Big DataReport: CIOs & Big Data
Report: CIOs & Big Data
Infochimps, a CSC Big Data Business
 
Infographic: CIOs & Big Data
Infographic: CIOs & Big DataInfographic: CIOs & Big Data
Infographic: CIOs & Big Data
Infochimps, a CSC Big Data Business
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
Infochimps, a CSC Big Data Business
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
Infochimps, a CSC Big Data Business
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
Infochimps, a CSC Big Data Business
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
Infochimps, a CSC Big Data Business
 
The Other Way of Doing Big Data
The Other Way of Doing Big DataThe Other Way of Doing Big Data
The Other Way of Doing Big Data
Infochimps, a CSC Big Data Business
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
Infochimps, a CSC Big Data Business
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
Infochimps, a CSC Big Data Business
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Infochimps, a CSC Big Data Business
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
Infochimps, a CSC Big Data Business
 

More from Infochimps, a CSC Big Data Business (13)

Vayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex SystemsVayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex Systems
 
AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
 
Report: CIOs & Big Data
Report: CIOs & Big DataReport: CIOs & Big Data
Report: CIOs & Big Data
 
Infographic: CIOs & Big Data
Infographic: CIOs & Big DataInfographic: CIOs & Big Data
Infographic: CIOs & Big Data
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
The Other Way of Doing Big Data
The Other Way of Doing Big DataThe Other Way of Doing Big Data
The Other Way of Doing Big Data
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics

  • 1. Measure Twice, Build Once #RTanalytics Douglas Moore Principal Consultant & Architect May 2013 Dhruv Bansal CSO
  • 2. #RTanalytics | 2 About Us Next generation Big Data stack to power your applications Data science and engineering services that accelerate time to value Douglas Moore Principal Consultant & Architect Dhruv Bansal CSO & Co-Founder
  • 3. #RTanalytics | 3 Agenda  Think Big Use Case  Infochimps Cloud: Streams, Queries, Batch  Think Big & Infochimps Together Measure Twice, Build Once Understand the problem Model the solution Test locally Grow the infrastructure
  • 5. #RTanalytics | 5 Poll Very Advanced 18% Advanced 19% Not Advanced 45% Not Started 18% RESULTS: How advanced is your organization's approach to Big Data?
  • 6. #RTanalytics | 6 Accelerating Your Time to Value Strategy and Roadmap IMAGINE Training and Education ILLUMINATE Hands-On Data Science and Data Engineering IMPLEMENT Leading Provider of Data Science and Engineering Services
  • 7. #RTanalytics | 7  Use Cases - Scale batch analysis pipeline - Generate lively stats - Recommendations - Better Predictions • #page views in next 30 days?  Environment - AWS - Version 1 already in production  Project Plan - 8-9 weeks - Combined Data Engineering + Data Science Engagement - Staff • 1 Arch + 1 PM • 1 Data Engineer • 2 Data Scientists • 3 Client Engineers The Beauty of Predictive Analytics
  • 8. #RTanalytics | 8  Predictive Model Design & Build Process - Listening & Learning - Discovery (Digging through the data) - Creating a Research Agenda - Testing & Learning  Production Quality Predictive Model Development - Data Cleansing, Aggregations, Conditioning - Predictive Model Training Process - Predictive Model Execution Process  Challenges: - What functional forms predict future impression counts given counts up to time T? - Robust estimators, like medians rather than means, to cope with outliers - How do we distinguish between new articles, versus old articles we're seeing for the first time? - How well do impression counts correspond to real humans? Predictive Analytics Process
  • 9. #RTanalytics | 9  Better end-user experience - View an ad, see the counter move.  Need to catch fast moving events - Content half life measured at 3 hours (H Mason: http://bit.ly/nu7IDw) - Path to additional real-time capabilities - Example: Trend analysis to recommend ‘hot’ articles. Why Real-time?
  • 10. #RTanalytics | 10 Overall Architecture NoSQL Memcache (Tuple fail tracking) Queue Hadoop Ad Serving LB Edge Edge Impression S3 S3 S3 DFS Archive Logs Management Server LB Edge Edge Relational Store Ad Management Ad Selling Storm - Queue Management - Simple Bot Filtering - Real-time Bucketization - Performance Counters - Event Logging View Ad Cleansing Model Training Recommendations Events Monitoring & Alerting (Metrics, Alarms, Notifications) Model Parameters getPrediction Performance Counters Impression Buckets
  • 11. #RTanalytics | 11 Analytics Architecture Storm Web Server Time Series BucketBolt Simple Bot Annotator DFS Adapter Impression Spout Time Series Buckets (Batch) Time Series Buckets (Realtime) Impression Prediction Predictive Model Parameters Impressions Impressions Impressions Hadoop Impression Bucketization Predictive Model Training NoSQL Bolt Time
  • 12. #RTanalytics | 12 Analyze Massive Historical Data Set Analyze Recent Past Realtime Prediction Solution Approach Historical Data Set = S3 Analyze = Hadoop + Pig + R Recent Past = Storm + NoSQL Analyze = R + Web Service
  • 14. #RTanalytics | 14 Poll Less than 30 days 8% Less than 90 days 54% More than 6 months 38% RESULTS: Say you are building a Big Data project, which time frame would you want to build a production solution?
  • 15. #RTanalytics | 15 Any Data  Any Analytics  Any Cloud
  • 16. #RTanalytics | 16 Data Flow Architecture 3/18/2022
  • 17. #RTanalytics | 17 Inside Cloud::Streams
  • 20. #RTanalytics | 20 Poll Hadoop 36% Queries 35% Real-time 29% Which element of the Big Data stack is most important to you?
  • 21. #RTanalytics | 21 Don’t Build it Yourself 55% of enterprise Big Data projects fail* *According to a December 2012 survey of 300 IT organizations by SSWUG 5% 9% 9% 77% Project Costs by Function Compute Software Operations Staff Engineering Staff
  • 22. #RTanalytics | 22 How Do We Compare to the Competition? Competition Think Big & Infochimps Speed 6+ months to value 30 days to value Experience New college grads Few successful implementations Advanced Degrees & Published Authors Quality Offshore Onshore, Managed Service Proven Learn on your dime Blue Chip Customers Methodology Waterfall Agile, test & learn
  • 24. #RTanalytics | 24 Let’s continue the conversation! infochimps.com/demo thinkbiganalytics.com/about/contact

Editor's Notes

  1. We’ll leave this slide up in the time before we start the webinar.
  2. Douglas & Dhruv: Introduce ourselves and our companies.
  3. Dhruv
  4. Which element of the Big Data stack is most important to you? Hadoop Databases (SQL, NoSQL) Real-time (Storm, Kafka, Flume, Esper)
  5. Which element of the Big Data stack is most important to you? Hadoop Databases (SQL, NoSQL) Real-time (Storm, Kafka, Flume, Esper)
  6. DOUGLAS Thanks, Dhruv. Thanks for your time today everyone. Excited to share how Think Big Analytics can help you make big data come alive to allow you to accelerate your time to value. In case you don’t know us, Think Big Analytics provides data science and engineering services that create value from Big Data We help you IMAGINE your possibilities for big data and identify a plan for profitable projects We ILLUMINATE your team through hands-on training and education on the latest in big data technologies We then help you IMPLEMENT your analytics plan with hands-on data scientists and engineers who rapidly build data solutions to deliver value.
  7. Douglas Predictive analytics is helping companies generate competitive advantages and real value. I want to walk you through one project we completed for a 2012 SXSW tech finalist, which is an online ad publishing company. My comments will demonstrate how we used certain big data technologies to enhance the company’s product offering so they could charge their advertisers premium rates. Let me first explain the challenge this company addresses: Let’s say you own a local Mexican restaurant and you want to gear up for Cinco De Mayo, by putting adds into your local online newspaper. The paper is running an article about the planned festivities in the nearby park. The restaurant owner buys banner ads with the newspaper, goes to the page about the festival, and looks for her advertisement? Clicks refresh, hits refresh again and again. She might never see her ad. The company plans on solving this problem, by letting the ad buyer glue their ad to a piece of content. Taking ownership of that content, promoting it, and being satisfied that her family, friends and customers can see her ad and coupon.
  8. Douglas Creating a research agenda What features to use? Content or Behavior based? What features have predictive value? Which functional forms will work best? How successful is the prediction? Will the approach scale?
  9. Douglas Roll your own – Complicated, brittle, scalable, loss of focus. Multi-node scaling (which requires message routing, load balancing etc)
  10. For this customer, the overall architecture looked like this Netty edge server Change DynamoDB to NoSQL
  11. And Storm supported the analytics architecture by…. The result was….
  12. Douglas The theory is that one can analyze and mine a massive historical data set in batch at your leisure and create a parameterized predictive model. In real time one is collecting the parameters, and upon request we can deliver the prediction by executing the training model over the latest parameters Not unlike the Lambda Architecture described by Nathan Marz of Twitter.
  13. How advanced is your organization’s approach to Big Data? Not Advanced Advanced Very Advanced Not Started
  14. How advanced is your organization’s approach to Big Data? Not Advanced Advanced Very Advanced Not Started
  15. Dhruv Any Data  Any Analytics  Any Cloud
  16. Dhruv
  17. Dhruv [NOTE: ADD INFOCHIMPS LOGO ON HEADER]
  18. Dhruv
  19. ]NOTE: ADD INFOCHIMPS LOGO ON HEADER] If you are engaged in a Big Data project, over what timeframe are you interested in deploying a production solution? Less than 30 days Less than 90 days More than 6 months
  20. ]NOTE: ADD INFOCHIMPS LOGO ON HEADER] If you are engaged in a Big Data project, over what timeframe are you interested in deploying a production solution? Less than 30 days Less than 90 days More than 6 months
  21. Dhruv [NOTE: ADD INFOCHIMPS LOGO ON HEADER]
  22. Dhruv [NOTE: ADD INFOCHIMPS LOGO ON HEADER]
  23. Dhruv & Douglas