Case studies w/Analytics, Real
Time, DM/ML in a hackathon
L’Oreal 8/27/2013
Not Hadoop
Agenda
• Problem Statement:
– Digital and Retail behavior analysis:
• Long tail problem similarities
– Propensity Marketin...
What do I do
• Work, Tech lead Google, ~10y, Architect
Absolute SW
• Teach, mentor others on Big Data, Hadoop,
DM/ML
• htt...
Review
• Theory:
– What is long tail?
– Long tail success case studies
– Demographic targeting/Modeling and prediction
– M...
What is the Long Tail?
• Originated from search engines/Google
• Don’t focus on the top 20% queries, focus on
the bottom 5...
Long Tail Example, keywords
Keyword Lift/Complementary
Strategies
• 70% of the keywords are not used frequently.
• Page Rank/feature selection/Spam re...
Complete solution not possible
• A complete solution to the long tail is not
possible via a hackathon
• Examples of Comple...
Another long tail on search query
length
Long Tail
• Obvious longer queries imply user wants more precise
result. Precision vs. Recall
• Obvious these users are mo...
Example real time applied to previous
example
 We looked at search keywords and search phrase
length. Visualizations as a...
What to do?
• Brainstorm some more, definitely something here, play
w/data; will come in time. The most important part is
...
Why Real Time? Long Tail
 Do I really need real time? Yes, why?
 Pre2010 Google search displayed all the results, a
comb...
UI:mouse over a stream of dots
Mouse on a dot which is part of a
group which looks like a snake
 Can see what user typed in as queries after
another, he...
What is the lesson here?
• Viewing data in real time has value
• Minimum it helps clear the thinking for the
next step
• U...
Wisdom gained matches across 2
hackathons
• One of the most surprising pieces of work was
a unique data visualization from...
Review ML/DM
• Review a small subset of these slides:
– http://www.slideshare.net/DougChang1/demographic
s-andweblogtarget...
ML/DM Slides
• DO NOT INSERT SLIDES, cover the original so
we don’t limit the scope of audience
questions
ML/DM and Hackathons
• Done 2 as examples,
– Motley Fool, cosponsored by Kaggle (Mike Bowles)
– Best Buy, paid Kaggle (Nic...
What do I do for others which may
help you?
• Seed the ideas; should add a structure to this. NDA. Run
SQL queries
• Curre...
One example, web page heat maps
Amazon Web Page
Google Shopping
Example/Reversed/Why?
Upper Left hand corner
Example of Kiehls
Kiehl’s Example
• Put in offers w/($ amount, product desc, click url)
customized per user, A/B test layouts and placement,...
Structure has to match Strategy
• Partner w/Macy’s? Develop a structure to
work with retail partners to increase their
sal...
Upcoming SlideShare
Loading in …5
×

Real timeanalyticsl oreal

184 views

Published on

click log heat map, demographic targeting, long tail

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
184
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Real timeanalyticsl oreal

  1. 1. Case studies w/Analytics, Real Time, DM/ML in a hackathon L’Oreal 8/27/2013 Not Hadoop
  2. 2. Agenda • Problem Statement: – Digital and Retail behavior analysis: • Long tail problem similarities – Propensity Marketing: • Propensity for consumer to respond to promotion? • Cover DM/ML Demographics presentation – Profitability Marketing • Who are the most profitable customers? • Obvious answer, select * from customers join orders order by amt desc; – Promotion Modeling • What drives order values and who should receive promotions?
  3. 3. What do I do • Work, Tech lead Google, ~10y, Architect Absolute SW • Teach, mentor others on Big Data, Hadoop, DM/ML • http://www.meetup.com/HandsOnProgrammi ngEvents/.
  4. 4. Review • Theory: – What is long tail? – Long tail success case studies – Demographic targeting/Modeling and prediction – ML/DM success case studies • Data Analysis Strategies/Structure
  5. 5. What is the Long Tail? • Originated from search engines/Google • Don’t focus on the top 20% queries, focus on the bottom 50% first • Why? The bottom 50% was the hardest: LP&SB. The top 20% was automatic
  6. 6. Long Tail Example, keywords
  7. 7. Keyword Lift/Complementary Strategies • 70% of the keywords are not used frequently. • Page Rank/feature selection/Spam reduction – Most data (demographics is inaccurate, eBay problem) • Quality of features enable ML/DM modeling – Identify these words first using simple SQL queries then run a model and use A/B testing to iterate to better results – Example of ML/DM later • Case study of data visualisation for search query length
  8. 8. Complete solution not possible • A complete solution to the long tail is not possible via a hackathon • Examples of Complete Solutions – Example: Symantec uses modified page rank to see if virus files are safe/not safe. Viruses are different, all are unique. You can’t rely on past examples. >90% accuracy rate. Uses people feedback. – Example: Yahoo content system matching users to content ~100 attributes->1k attributes. Most users only go to Yahoo news for a few stories. MM guides this
  9. 9. Another long tail on search query length
  10. 10. Long Tail • Obvious longer queries imply user wants more precise result. Precision vs. Recall • Obvious these users are more valuable b/c the directed intent is more focused. Showing the user enter in queries with more precision is very very valuable for shopping and other applications with focused directed intent • The above case results in a $50.00 click to Google for Salesforce/SAP ads (e.g home financing/mortgages) • Best way to see this is in a demo:  Move mouse on dots which are close to each other: http://dataincolour.com:8888/#1144645000  DEMO!!!!!
  11. 11. Example real time applied to previous example  We looked at search keywords and search phrase length. Visualizations as a substitute for Machine Learning algorithms. Much faster to implement  Some students <~20 years old did this in a weekend hackathon: http://www.dataincolour.com/2011/06/curiousn akes-visualization-of-aol-questions/  http://datainsightsf.com/schedule-2/ Not repeated
  12. 12. What to do? • Brainstorm some more, definitely something here, play w/data; will come in time. The most important part is the definition of the problem, not the code – Think more code less • Should you copy the data visualisation example on Search Query Length? – Probably not • A long long time ago Google displayed the incoming search queries in the lobby; this had practical use • Real time constrain the problem, less complicated processing, less about the algorithm, more about the user
  13. 13. Why Real Time? Long Tail  Do I really need real time? Yes, why?  Pre2010 Google search displayed all the results, a combination of precision and recall.  Post 2010 Google went to instant search, limited recall. Nobody drilled down to the 1Mth page for DVDs. Better ads results with real time  Analytics today is similar to pre2010 Google search, batch processing using click logs  Real time analytics mostly custom solutions but can be much more effective. Once user leaves the website too late to do anything. Many orders of magnitude difference. Precision >> Recall
  14. 14. UI:mouse over a stream of dots
  15. 15. Mouse on a dot which is part of a group which looks like a snake  Can see what user typed in as queries after another, here is one example;  How to fix car-> What is a fuel filter-> How to replace a fuel filter.  This is valuable in adding additional features to the user who asked this  Can't get this from SQL queries easily or at all.
  16. 16. What is the lesson here? • Viewing data in real time has value • Minimum it helps clear the thinking for the next step • Use as an alerting system/QC process to show if ML/DM is running correctly (proprietary in Google/Yahoo). Every business has these. • Key: visible to everybody w/o running a SQL query
  17. 17. Wisdom gained matches across 2 hackathons • One of the most surprising pieces of work was a unique data visualization from the DM hackathon • None of these positive results were defined in the problem statement. Required creativity. • Careful
  18. 18. Review ML/DM • Review a small subset of these slides: – http://www.slideshare.net/DougChang1/demographic s-andweblogtargeting-10757778 • Agenda: review a case study of the Motley Fool and how to create/target promotions to likely subscribers for problem #2, propensity marketing • Case study of a past hackathon. – My role: I seed the ideas, Mike Bowles, Nick Kolegraff
  19. 19. ML/DM Slides • DO NOT INSERT SLIDES, cover the original so we don’t limit the scope of audience questions
  20. 20. ML/DM and Hackathons • Done 2 as examples, – Motley Fool, cosponsored by Kaggle (Mike Bowles) – Best Buy, paid Kaggle (Nick Kolegraff@Accenture/DM SIG, we sought him out) – These events require guidance/very successful, both still are receptive to more DM/ML events • Careful: an algorithm doesn’t mean you have a production process or something someone can manage via a paid analyst headcount • Why aren’t there more? Time investment to clean data, tech talk to guide participants, min 3 months work
  21. 21. What do I do for others which may help you? • Seed the ideas; should add a structure to this. NDA. Run SQL queries • Current Case Study – Starting to do the prep work for another real time analytics example, teaching from this – Nick/Mike did this for the other 2 hackathons. • Match the strategy w/structure – Take time off work to build an engineering prototype (Twitter Storm in old slide deck) – Not covering this here – Strategy: first display the data in a real time dashboard then iterate the visualizations, then add DM/ML algorithms after the A/B testing framework is complete
  22. 22. One example, web page heat maps
  23. 23. Amazon Web Page
  24. 24. Google Shopping Example/Reversed/Why?
  25. 25. Upper Left hand corner
  26. 26. Example of Kiehls
  27. 27. Kiehl’s Example • Put in offers w/($ amount, product desc, click url) customized per user, A/B test layouts and placement, store data for customization and measure lift • Measure facebook ads via page rank • Predict missing links application • http://blog.echen.me/2012/07/31/edge-prediction-in- a-social-graph-my-solution-to-facebooks-user- recommendation-contest-on-kaggle/ • Careful, don’t copy. Example only. Generalize to hackathon. Many other ideas • Your answer is different from Yahoo & Google. This isn’t a roadmap.
  28. 28. Structure has to match Strategy • Partner w/Macy’s? Develop a structure to work with retail partners to increase their sales – E.g. customized shopkick – Don’t just release APIs, release mobile app source code ppl can modify • Test promotions and building profiles? • … lots of ideas

×