SlideShare a Scribd company logo
Case studies w/Analytics, Real
Time, DM/ML in a hackathon
L’Oreal 8/27/2013
Not Hadoop
Agenda
• Problem Statement:
– Digital and Retail behavior analysis:
• Long tail problem similarities
– Propensity Marketing:
• Propensity for consumer to respond to promotion?
• Cover DM/ML Demographics presentation
– Profitability Marketing
• Who are the most profitable customers?
• Obvious answer, select * from customers join orders order by amt
desc;
– Promotion Modeling
• What drives order values and who should receive promotions?
What do I do
• Work, Tech lead Google, ~10y, Architect
Absolute SW
• Teach, mentor others on Big Data, Hadoop,
DM/ML
• http://www.meetup.com/HandsOnProgrammi
ngEvents/.
Review
• Theory:
– What is long tail?
– Long tail success case studies
– Demographic targeting/Modeling and prediction
– ML/DM success case studies
• Data Analysis Strategies/Structure
What is the Long Tail?
• Originated from search engines/Google
• Don’t focus on the top 20% queries, focus on
the bottom 50% first
• Why? The bottom 50% was the hardest:
LP&SB. The top 20% was automatic
Long Tail Example, keywords
Keyword Lift/Complementary
Strategies
• 70% of the keywords are not used frequently.
• Page Rank/feature selection/Spam reduction
– Most data (demographics is inaccurate, eBay problem)
• Quality of features enable ML/DM modeling
– Identify these words first using simple SQL queries
then run a model and use A/B testing to iterate to
better results
– Example of ML/DM later
• Case study of data visualisation for search query
length
Complete solution not possible
• A complete solution to the long tail is not
possible via a hackathon
• Examples of Complete Solutions
– Example: Symantec uses modified page rank to see if
virus files are safe/not safe. Viruses are different, all
are unique. You can’t rely on past examples. >90%
accuracy rate. Uses people feedback.
– Example: Yahoo content system matching users to
content ~100 attributes->1k attributes. Most users
only go to Yahoo news for a few stories. MM guides
this
Another long tail on search query
length
Long Tail
• Obvious longer queries imply user wants more precise
result. Precision vs. Recall
• Obvious these users are more valuable b/c the directed
intent is more focused. Showing the user enter in queries
with more precision is very very valuable for shopping and
other applications with focused directed intent
• The above case results in a $50.00 click to Google for
Salesforce/SAP ads (e.g home financing/mortgages)
• Best way to see this is in a demo:
 Move mouse on dots which are close to each other:
http://dataincolour.com:8888/#1144645000
 DEMO!!!!!
Example real time applied to previous
example
 We looked at search keywords and search phrase
length. Visualizations as a substitute for Machine
Learning algorithms. Much faster to implement
 Some students <~20 years old did this in a
weekend hackathon:
http://www.dataincolour.com/2011/06/curiousn
akes-visualization-of-aol-questions/
 http://datainsightsf.com/schedule-2/ Not
repeated
What to do?
• Brainstorm some more, definitely something here, play
w/data; will come in time. The most important part is
the definition of the problem, not the code
– Think more code less
• Should you copy the data visualisation example on
Search Query Length?
– Probably not
• A long long time ago Google displayed the incoming
search queries in the lobby; this had practical use
• Real time constrain the problem, less complicated
processing, less about the algorithm, more about the
user
Why Real Time? Long Tail
 Do I really need real time? Yes, why?
 Pre2010 Google search displayed all the results, a
combination of precision and recall.
 Post 2010 Google went to instant search, limited recall.
Nobody drilled down to the 1Mth page for DVDs.
Better ads results with real time
 Analytics today is similar to pre2010 Google search,
batch processing using click logs
 Real time analytics mostly custom solutions but can be
much more effective. Once user leaves the website too
late to do anything. Many orders of magnitude
difference. Precision >> Recall
UI:mouse over a stream of dots
Mouse on a dot which is part of a
group which looks like a snake
 Can see what user typed in as queries after
another, here is one example;
 How to fix car-> What is a fuel filter-> How to
replace a fuel filter.
 This is valuable in adding additional features
to the user who asked this
 Can't get this from SQL queries easily or at all.
What is the lesson here?
• Viewing data in real time has value
• Minimum it helps clear the thinking for the
next step
• Use as an alerting system/QC process to show
if ML/DM is running correctly (proprietary in
Google/Yahoo). Every business has these.
• Key: visible to everybody w/o running a SQL
query
Wisdom gained matches across 2
hackathons
• One of the most surprising pieces of work was
a unique data visualization from the DM
hackathon
• None of these positive results were defined in
the problem statement. Required creativity.
• Careful
Review ML/DM
• Review a small subset of these slides:
– http://www.slideshare.net/DougChang1/demographic
s-andweblogtargeting-10757778
• Agenda: review a case study of the Motley Fool
and how to create/target promotions to likely
subscribers for problem #2, propensity marketing
• Case study of a past hackathon.
– My role: I seed the ideas, Mike Bowles, Nick Kolegraff
ML/DM Slides
• DO NOT INSERT SLIDES, cover the original so
we don’t limit the scope of audience
questions
ML/DM and Hackathons
• Done 2 as examples,
– Motley Fool, cosponsored by Kaggle (Mike Bowles)
– Best Buy, paid Kaggle (Nick Kolegraff@Accenture/DM SIG,
we sought him out)
– These events require guidance/very successful, both still
are receptive to more DM/ML events
• Careful: an algorithm doesn’t mean you have a
production process or something someone can
manage via a paid analyst headcount
• Why aren’t there more? Time investment to clean data,
tech talk to guide participants, min 3 months work
What do I do for others which may
help you?
• Seed the ideas; should add a structure to this. NDA. Run
SQL queries
• Current Case Study
– Starting to do the prep work for another real time analytics
example, teaching from this
– Nick/Mike did this for the other 2 hackathons.
• Match the strategy w/structure
– Take time off work to build an engineering prototype (Twitter
Storm in old slide deck)
– Not covering this here
– Strategy: first display the data in a real time dashboard then
iterate the visualizations, then add DM/ML algorithms after the
A/B testing framework is complete
One example, web page heat maps
Amazon Web Page
Google Shopping
Example/Reversed/Why?
Upper Left hand corner
Example of Kiehls
Kiehl’s Example
• Put in offers w/($ amount, product desc, click url)
customized per user, A/B test layouts and placement,
store data for customization and measure lift
• Measure facebook ads via page rank
• Predict missing links application
• http://blog.echen.me/2012/07/31/edge-prediction-in-
a-social-graph-my-solution-to-facebooks-user-
recommendation-contest-on-kaggle/
• Careful, don’t copy. Example only. Generalize to
hackathon. Many other ideas
• Your answer is different from Yahoo & Google.
This isn’t a roadmap.
Structure has to match Strategy
• Partner w/Macy’s? Develop a structure to
work with retail partners to increase their
sales
– E.g. customized shopkick
– Don’t just release APIs, release mobile app source
code ppl can modify
• Test promotions and building profiles?
• … lots of ideas

More Related Content

What's hot

How to make data science products
How to make data science productsHow to make data science products
How to make data science products
Prashant Mahajan
 
User vs data - How to make Product Decision
User vs data - How to make Product DecisionUser vs data - How to make Product Decision
User vs data - How to make Product Decision
Prashant Mahajan
 
Introduction to User Experience - Mike Biggs
Introduction to User Experience - Mike BiggsIntroduction to User Experience - Mike Biggs
Introduction to User Experience - Mike Biggs
Thoughtworks
 
Identifying and improving top tasks
Identifying and improving top tasksIdentifying and improving top tasks
Identifying and improving top tasks
Michele Ide-Smith
 
How to gain &amp; retain users
How to gain &amp; retain users How to gain &amp; retain users
How to gain &amp; retain users
Prashant Mahajan
 
Managing Top Tasks
Managing Top TasksManaging Top Tasks
Managing Top Tasks
Michele Ide-Smith
 
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking GrowthHypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
Prabhat Gupta
 
A Practical Guide to Measuring User Experience
A Practical Guide to Measuring User ExperienceA Practical Guide to Measuring User Experience
A Practical Guide to Measuring User Experience
Richard Dalton
 
Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
Jessica DuVerneay
 
How Machine Learning Can Transform The Customer Experience
How Machine Learning Can Transform The Customer ExperienceHow Machine Learning Can Transform The Customer Experience
How Machine Learning Can Transform The Customer Experience
Product School
 
Communicating data: Reporting user research
Communicating data: Reporting user researchCommunicating data: Reporting user research
Communicating data: Reporting user research
Puja Parakh
 
UX Field Research Toolkit - Updated for Big Design 2018
UX Field Research Toolkit - Updated for Big Design 2018UX Field Research Toolkit - Updated for Big Design 2018
UX Field Research Toolkit - Updated for Big Design 2018
Kelly Moran
 

What's hot (12)

How to make data science products
How to make data science productsHow to make data science products
How to make data science products
 
User vs data - How to make Product Decision
User vs data - How to make Product DecisionUser vs data - How to make Product Decision
User vs data - How to make Product Decision
 
Introduction to User Experience - Mike Biggs
Introduction to User Experience - Mike BiggsIntroduction to User Experience - Mike Biggs
Introduction to User Experience - Mike Biggs
 
Identifying and improving top tasks
Identifying and improving top tasksIdentifying and improving top tasks
Identifying and improving top tasks
 
How to gain &amp; retain users
How to gain &amp; retain users How to gain &amp; retain users
How to gain &amp; retain users
 
Managing Top Tasks
Managing Top TasksManaging Top Tasks
Managing Top Tasks
 
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking GrowthHypothesis-Driven Development & How to Fail-Fast Hacking Growth
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
 
A Practical Guide to Measuring User Experience
A Practical Guide to Measuring User ExperienceA Practical Guide to Measuring User Experience
A Practical Guide to Measuring User Experience
 
Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
 
How Machine Learning Can Transform The Customer Experience
How Machine Learning Can Transform The Customer ExperienceHow Machine Learning Can Transform The Customer Experience
How Machine Learning Can Transform The Customer Experience
 
Communicating data: Reporting user research
Communicating data: Reporting user researchCommunicating data: Reporting user research
Communicating data: Reporting user research
 
UX Field Research Toolkit - Updated for Big Design 2018
UX Field Research Toolkit - Updated for Big Design 2018UX Field Research Toolkit - Updated for Big Design 2018
UX Field Research Toolkit - Updated for Big Design 2018
 

Viewers also liked

Team Presentation 1
Team Presentation 1Team Presentation 1
Team Presentation 1
Dennis Rojas
 
Stars & stardom
Stars & stardomStars & stardom
Stars & stardom
Charlotte Frazer
 
Costume & props
Costume & propsCostume & props
Costume & props
Charlotte Frazer
 
Question 2
Question 2Question 2
Question 2
Charlotte Frazer
 
Inglessss
InglessssInglessss
Inglessss
daninata
 
Portafolio daniela
Portafolio danielaPortafolio daniela
Portafolio daniela
daninata
 
Screenshots of digipak
Screenshots of digipakScreenshots of digipak
Screenshots of digipak
Charlotte Frazer
 
Teaser trailers
Teaser trailersTeaser trailers
Teaser trailers
sophiasmediaA2
 
Costume & props
Costume & propsCostume & props
Costume & props
Charlotte Frazer
 
Evaluation question 3
Evaluation question 3Evaluation question 3
Evaluation question 3
sophiasmediaA2
 
Feedback on rough cut
Feedback on rough cutFeedback on rough cut
Feedback on rough cut
Charlotte Frazer
 
Questionnaire results
Questionnaire resultsQuestionnaire results
Questionnaire results
sophiasmediaA2
 
Summer 2013 research
Summer 2013 researchSummer 2013 research
Summer 2013 research
Charlotte Frazer
 
Manajemen operasional
Manajemen operasionalManajemen operasional
Manajemen operasional
Uyund Syechkermaniaa
 
Completed shot list
Completed shot listCompleted shot list
Completed shot list
Charlotte Frazer
 
Stars & stardom
Stars & stardomStars & stardom
Stars & stardom
Charlotte Frazer
 

Viewers also liked (16)

Team Presentation 1
Team Presentation 1Team Presentation 1
Team Presentation 1
 
Stars & stardom
Stars & stardomStars & stardom
Stars & stardom
 
Costume & props
Costume & propsCostume & props
Costume & props
 
Question 2
Question 2Question 2
Question 2
 
Inglessss
InglessssInglessss
Inglessss
 
Portafolio daniela
Portafolio danielaPortafolio daniela
Portafolio daniela
 
Screenshots of digipak
Screenshots of digipakScreenshots of digipak
Screenshots of digipak
 
Teaser trailers
Teaser trailersTeaser trailers
Teaser trailers
 
Costume & props
Costume & propsCostume & props
Costume & props
 
Evaluation question 3
Evaluation question 3Evaluation question 3
Evaluation question 3
 
Feedback on rough cut
Feedback on rough cutFeedback on rough cut
Feedback on rough cut
 
Questionnaire results
Questionnaire resultsQuestionnaire results
Questionnaire results
 
Summer 2013 research
Summer 2013 researchSummer 2013 research
Summer 2013 research
 
Manajemen operasional
Manajemen operasionalManajemen operasional
Manajemen operasional
 
Completed shot list
Completed shot listCompleted shot list
Completed shot list
 
Stars & stardom
Stars & stardomStars & stardom
Stars & stardom
 

Similar to Real timeanalyticsl oreal

Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
MLconf
 
Google cloud certification data engineer
Google cloud certification data engineerGoogle cloud certification data engineer
Google cloud certification data engineer
Joseph Holbrook, Chief Learning Officer (CLO)
 
Analytic next gen usecases - presented for ISB, Hyderabad
Analytic next gen usecases - presented for ISB, HyderabadAnalytic next gen usecases - presented for ISB, Hyderabad
Analytic next gen usecases - presented for ISB, Hyderabad
Sandeep akinapelli
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating Analytics
Tasktop
 
Architecting large systems
Architecting large systemsArchitecting large systems
Architecting large systems
Simon Farrell
 
Surge engr 245 lean launchpad stanford 2020
Surge engr 245 lean launchpad stanford 2020Surge engr 245 lean launchpad stanford 2020
Surge engr 245 lean launchpad stanford 2020
Stanford University
 
How to Use Data to Drive Product Decisions by PayPal PM
How to Use Data to Drive Product Decisions by PayPal PMHow to Use Data to Drive Product Decisions by PayPal PM
How to Use Data to Drive Product Decisions by PayPal PM
Product School
 
Remote, unmoderated usability and user testing.
Remote, unmoderated usability and user testing.Remote, unmoderated usability and user testing.
Remote, unmoderated usability and user testing.
Marc-Oliver Gern
 
5 Tips to Bulletproof Your Analytics Implementation
5 Tips to Bulletproof Your Analytics Implementation5 Tips to Bulletproof Your Analytics Implementation
5 Tips to Bulletproof Your Analytics Implementation
ObservePoint
 
Professional Project Manager Should Be Proficient in Agile
Professional Project Manager Should Be Proficient in AgileProfessional Project Manager Should Be Proficient in Agile
Professional Project Manager Should Be Proficient in Agile
Nitor
 
Rails conference 2016 building applications better the first time
Rails conference 2016 building applications better the first timeRails conference 2016 building applications better the first time
Rails conference 2016 building applications better the first time
Jessica R.
 
What Product Management Frameworks Work by Google PM Lead
What Product Management Frameworks Work by Google PM LeadWhat Product Management Frameworks Work by Google PM Lead
What Product Management Frameworks Work by Google PM Lead
Product School
 
How to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your BusinessHow to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your Business
Kissmetrics on SlideShare
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
Peter Skomoroch
 
Feature Prioritization Techniques for an Agile PMs by Microsoft PM
Feature Prioritization Techniques for an Agile PMs by Microsoft PMFeature Prioritization Techniques for an Agile PMs by Microsoft PM
Feature Prioritization Techniques for an Agile PMs by Microsoft PM
Product School
 
Rejuvenating Agile Operations By Putting Lead And Cycle Time Front And Centre.
Rejuvenating Agile Operations By Putting Lead And Cycle Time Front And Centre.Rejuvenating Agile Operations By Putting Lead And Cycle Time Front And Centre.
Rejuvenating Agile Operations By Putting Lead And Cycle Time Front And Centre.
Zan Kavtaskin
 
Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder Support
Peter Skomoroch
 
Demystifying ML/AI
Demystifying ML/AIDemystifying ML/AI
Demystifying ML/AI
Matthew Reynolds
 
Agile and data driven product development oleh Dhiku VP Product KMK Online
Agile and data driven product development oleh Dhiku VP Product KMK OnlineAgile and data driven product development oleh Dhiku VP Product KMK Online
Agile and data driven product development oleh Dhiku VP Product KMK Online
Rein Mahatma
 
PQF Overview
PQF OverviewPQF Overview
PQF Overview
Martin Hutchings
 

Similar to Real timeanalyticsl oreal (20)

Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
 
Google cloud certification data engineer
Google cloud certification data engineerGoogle cloud certification data engineer
Google cloud certification data engineer
 
Analytic next gen usecases - presented for ISB, Hyderabad
Analytic next gen usecases - presented for ISB, HyderabadAnalytic next gen usecases - presented for ISB, Hyderabad
Analytic next gen usecases - presented for ISB, Hyderabad
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating Analytics
 
Architecting large systems
Architecting large systemsArchitecting large systems
Architecting large systems
 
Surge engr 245 lean launchpad stanford 2020
Surge engr 245 lean launchpad stanford 2020Surge engr 245 lean launchpad stanford 2020
Surge engr 245 lean launchpad stanford 2020
 
How to Use Data to Drive Product Decisions by PayPal PM
How to Use Data to Drive Product Decisions by PayPal PMHow to Use Data to Drive Product Decisions by PayPal PM
How to Use Data to Drive Product Decisions by PayPal PM
 
Remote, unmoderated usability and user testing.
Remote, unmoderated usability and user testing.Remote, unmoderated usability and user testing.
Remote, unmoderated usability and user testing.
 
5 Tips to Bulletproof Your Analytics Implementation
5 Tips to Bulletproof Your Analytics Implementation5 Tips to Bulletproof Your Analytics Implementation
5 Tips to Bulletproof Your Analytics Implementation
 
Professional Project Manager Should Be Proficient in Agile
Professional Project Manager Should Be Proficient in AgileProfessional Project Manager Should Be Proficient in Agile
Professional Project Manager Should Be Proficient in Agile
 
Rails conference 2016 building applications better the first time
Rails conference 2016 building applications better the first timeRails conference 2016 building applications better the first time
Rails conference 2016 building applications better the first time
 
What Product Management Frameworks Work by Google PM Lead
What Product Management Frameworks Work by Google PM LeadWhat Product Management Frameworks Work by Google PM Lead
What Product Management Frameworks Work by Google PM Lead
 
How to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your BusinessHow to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your Business
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
 
Feature Prioritization Techniques for an Agile PMs by Microsoft PM
Feature Prioritization Techniques for an Agile PMs by Microsoft PMFeature Prioritization Techniques for an Agile PMs by Microsoft PM
Feature Prioritization Techniques for an Agile PMs by Microsoft PM
 
Rejuvenating Agile Operations By Putting Lead And Cycle Time Front And Centre.
Rejuvenating Agile Operations By Putting Lead And Cycle Time Front And Centre.Rejuvenating Agile Operations By Putting Lead And Cycle Time Front And Centre.
Rejuvenating Agile Operations By Putting Lead And Cycle Time Front And Centre.
 
Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder Support
 
Demystifying ML/AI
Demystifying ML/AIDemystifying ML/AI
Demystifying ML/AI
 
Agile and data driven product development oleh Dhiku VP Product KMK Online
Agile and data driven product development oleh Dhiku VP Product KMK OnlineAgile and data driven product development oleh Dhiku VP Product KMK Online
Agile and data driven product development oleh Dhiku VP Product KMK Online
 
PQF Overview
PQF OverviewPQF Overview
PQF Overview
 

Recently uploaded

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 

Recently uploaded (20)

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 

Real timeanalyticsl oreal

  • 1. Case studies w/Analytics, Real Time, DM/ML in a hackathon L’Oreal 8/27/2013 Not Hadoop
  • 2. Agenda • Problem Statement: – Digital and Retail behavior analysis: • Long tail problem similarities – Propensity Marketing: • Propensity for consumer to respond to promotion? • Cover DM/ML Demographics presentation – Profitability Marketing • Who are the most profitable customers? • Obvious answer, select * from customers join orders order by amt desc; – Promotion Modeling • What drives order values and who should receive promotions?
  • 3. What do I do • Work, Tech lead Google, ~10y, Architect Absolute SW • Teach, mentor others on Big Data, Hadoop, DM/ML • http://www.meetup.com/HandsOnProgrammi ngEvents/.
  • 4. Review • Theory: – What is long tail? – Long tail success case studies – Demographic targeting/Modeling and prediction – ML/DM success case studies • Data Analysis Strategies/Structure
  • 5. What is the Long Tail? • Originated from search engines/Google • Don’t focus on the top 20% queries, focus on the bottom 50% first • Why? The bottom 50% was the hardest: LP&SB. The top 20% was automatic
  • 7. Keyword Lift/Complementary Strategies • 70% of the keywords are not used frequently. • Page Rank/feature selection/Spam reduction – Most data (demographics is inaccurate, eBay problem) • Quality of features enable ML/DM modeling – Identify these words first using simple SQL queries then run a model and use A/B testing to iterate to better results – Example of ML/DM later • Case study of data visualisation for search query length
  • 8. Complete solution not possible • A complete solution to the long tail is not possible via a hackathon • Examples of Complete Solutions – Example: Symantec uses modified page rank to see if virus files are safe/not safe. Viruses are different, all are unique. You can’t rely on past examples. >90% accuracy rate. Uses people feedback. – Example: Yahoo content system matching users to content ~100 attributes->1k attributes. Most users only go to Yahoo news for a few stories. MM guides this
  • 9. Another long tail on search query length
  • 10. Long Tail • Obvious longer queries imply user wants more precise result. Precision vs. Recall • Obvious these users are more valuable b/c the directed intent is more focused. Showing the user enter in queries with more precision is very very valuable for shopping and other applications with focused directed intent • The above case results in a $50.00 click to Google for Salesforce/SAP ads (e.g home financing/mortgages) • Best way to see this is in a demo:  Move mouse on dots which are close to each other: http://dataincolour.com:8888/#1144645000  DEMO!!!!!
  • 11. Example real time applied to previous example  We looked at search keywords and search phrase length. Visualizations as a substitute for Machine Learning algorithms. Much faster to implement  Some students <~20 years old did this in a weekend hackathon: http://www.dataincolour.com/2011/06/curiousn akes-visualization-of-aol-questions/  http://datainsightsf.com/schedule-2/ Not repeated
  • 12. What to do? • Brainstorm some more, definitely something here, play w/data; will come in time. The most important part is the definition of the problem, not the code – Think more code less • Should you copy the data visualisation example on Search Query Length? – Probably not • A long long time ago Google displayed the incoming search queries in the lobby; this had practical use • Real time constrain the problem, less complicated processing, less about the algorithm, more about the user
  • 13. Why Real Time? Long Tail  Do I really need real time? Yes, why?  Pre2010 Google search displayed all the results, a combination of precision and recall.  Post 2010 Google went to instant search, limited recall. Nobody drilled down to the 1Mth page for DVDs. Better ads results with real time  Analytics today is similar to pre2010 Google search, batch processing using click logs  Real time analytics mostly custom solutions but can be much more effective. Once user leaves the website too late to do anything. Many orders of magnitude difference. Precision >> Recall
  • 14. UI:mouse over a stream of dots
  • 15. Mouse on a dot which is part of a group which looks like a snake  Can see what user typed in as queries after another, here is one example;  How to fix car-> What is a fuel filter-> How to replace a fuel filter.  This is valuable in adding additional features to the user who asked this  Can't get this from SQL queries easily or at all.
  • 16. What is the lesson here? • Viewing data in real time has value • Minimum it helps clear the thinking for the next step • Use as an alerting system/QC process to show if ML/DM is running correctly (proprietary in Google/Yahoo). Every business has these. • Key: visible to everybody w/o running a SQL query
  • 17. Wisdom gained matches across 2 hackathons • One of the most surprising pieces of work was a unique data visualization from the DM hackathon • None of these positive results were defined in the problem statement. Required creativity. • Careful
  • 18. Review ML/DM • Review a small subset of these slides: – http://www.slideshare.net/DougChang1/demographic s-andweblogtargeting-10757778 • Agenda: review a case study of the Motley Fool and how to create/target promotions to likely subscribers for problem #2, propensity marketing • Case study of a past hackathon. – My role: I seed the ideas, Mike Bowles, Nick Kolegraff
  • 19. ML/DM Slides • DO NOT INSERT SLIDES, cover the original so we don’t limit the scope of audience questions
  • 20. ML/DM and Hackathons • Done 2 as examples, – Motley Fool, cosponsored by Kaggle (Mike Bowles) – Best Buy, paid Kaggle (Nick Kolegraff@Accenture/DM SIG, we sought him out) – These events require guidance/very successful, both still are receptive to more DM/ML events • Careful: an algorithm doesn’t mean you have a production process or something someone can manage via a paid analyst headcount • Why aren’t there more? Time investment to clean data, tech talk to guide participants, min 3 months work
  • 21. What do I do for others which may help you? • Seed the ideas; should add a structure to this. NDA. Run SQL queries • Current Case Study – Starting to do the prep work for another real time analytics example, teaching from this – Nick/Mike did this for the other 2 hackathons. • Match the strategy w/structure – Take time off work to build an engineering prototype (Twitter Storm in old slide deck) – Not covering this here – Strategy: first display the data in a real time dashboard then iterate the visualizations, then add DM/ML algorithms after the A/B testing framework is complete
  • 22. One example, web page heat maps
  • 25. Upper Left hand corner
  • 27. Kiehl’s Example • Put in offers w/($ amount, product desc, click url) customized per user, A/B test layouts and placement, store data for customization and measure lift • Measure facebook ads via page rank • Predict missing links application • http://blog.echen.me/2012/07/31/edge-prediction-in- a-social-graph-my-solution-to-facebooks-user- recommendation-contest-on-kaggle/ • Careful, don’t copy. Example only. Generalize to hackathon. Many other ideas • Your answer is different from Yahoo & Google. This isn’t a roadmap.
  • 28. Structure has to match Strategy • Partner w/Macy’s? Develop a structure to work with retail partners to increase their sales – E.g. customized shopkick – Don’t just release APIs, release mobile app source code ppl can modify • Test promotions and building profiles? • … lots of ideas