Your SlideShare is downloading. ×

L'Oreal Tech Talk

375
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
375
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Case studies w/Analytics, Real Time, DM/ML in a hackathon L’Oreal 8/27/2013 Not Hadoop
  • 2. Agenda • Problem Statement: – Digital and Retail behavior analysis: • Long tail problem similarities – Propensity Marketing: • Propensity for consumer to respond to promotion? • Cover DM/ML Demographics presentation – Profitability Marketing • Who are the most profitable customers? • Obvious answer, select * from customers join orders order by amt desc; – Promotion Modeling • What drives order values and who should receive promotions?
  • 3. What do I do • Work, Tech lead Google, ~10y, Architect Absolute SW • Teach, mentor others on Big Data, Hadoop, DM/ML • http://www.meetup.com/HandsOnProgrammi ngEvents/.
  • 4. Review • Theory: – What is long tail? – Long tail success case studies – Demographic targeting/Modeling and prediction – ML/DM success case studies • Data Analysis Strategies/Structure
  • 5. What is the Long Tail? • Originated from search engines/Google • Don’t focus on the top 20% queries, focus on the bottom 50% first • Why? The bottom 50% was the hardest: LP&SB. The top 20% was automatic
  • 6. Long Tail Example, keywords
  • 7. Keyword Lift/Complementary Strategies • 70% of the keywords are not used frequently. • Page Rank/feature selection/Spam reduction – Most data (demographics is inaccurate, eBay problem) • Quality of features enable ML/DM modeling – Identify these words first using simple SQL queries then run a model and use A/B testing to iterate to better results – Example of ML/DM later • Case study of data visualisation for search query length
  • 8. Complete solution not possible • A complete solution to the long tail is not possible via a hackathon • Examples of Complete Solutions – Example: Symantec uses modified page rank to see if virus files are safe/not safe. Viruses are different, all are unique. You can’t rely on past examples. >90% accuracy rate. Uses people feedback. – Example: Yahoo content system matching users to content ~100 attributes->1k attributes. Most users only go to Yahoo news for a few stories. MM guides this
  • 9. Another long tail on search query length
  • 10. Long Tail • Obvious longer queries imply user wants more precise result. Precision vs. Recall • Obvious these users are more valuable b/c the directed intent is more focused. Showing the user enter in queries with more precision is very very valuable for shopping and other applications with focused directed intent • The above case results in a $50.00 click to Google for Salesforce/SAP ads (e.g home financing/mortgages) • Best way to see this is in a demo:  Move mouse on dots which are close to each other: http://dataincolour.com:8888/#1144645000  DEMO!!!!!
  • 11. Example real time applied to previous example  We looked at search keywords and search phrase length. Visualizations as a substitute for Machine Learning algorithms. Much faster to implement  Some students <~20 years old did this in a weekend hackathon: http://www.dataincolour.com/2011/06/curiousn akes-visualization-of-aol-questions/  http://datainsightsf.com/schedule-2/ Not repeated
  • 12. What to do? • Brainstorm some more, definitely something here, play w/data; will come in time. The most important part is the definition of the problem, not the code – Think more code less • Should you copy the data visualisation example on Search Query Length? – Probably not • A long long time ago Google displayed the incoming search queries in the lobby; this had practical use • Real time constrain the problem, less complicated processing, less about the algorithm, more about the user
  • 13. Why Real Time? Long Tail  Do I really need real time? Yes, why?  Pre2010 Google search displayed all the results, a combination of precision and recall.  Post 2010 Google went to instant search, limited recall. Nobody drilled down to the 1Mth page for DVDs. Better ads results with real time  Analytics today is similar to pre2010 Google search, batch processing using click logs  Real time analytics mostly custom solutions but can be much more effective. Once user leaves the website too late to do anything. Many orders of magnitude difference. Precision >> Recall
  • 14. UI:mouse over a stream of dots
  • 15. Mouse on a dot which is part of a group which looks like a snake  Can see what user typed in as queries after another, here is one example;  How to fix car-> What is a fuel filter-> How to replace a fuel filter.  This is valuable in adding additional features to the user who asked this  Can't get this from SQL queries easily or at all.
  • 16. What is the lesson here? • Viewing data in real time has value • Minimum it helps clear the thinking for the next step • Use as an alerting system/QC process to show if ML/DM is running correctly (proprietary in Google/Yahoo). Every business has these. • Key: visible to everybody w/o running a SQL query
  • 17. Wisdom gained matches across 2 hackathons • One of the most surprising pieces of work was a unique data visualization from the DM hackathon • None of these positive results were defined in the problem statement. Required creativity. • Careful
  • 18. Review ML/DM • Review a small subset of these slides: – http://www.slideshare.net/DougChang1/demographic s-andweblogtargeting-10757778 • Agenda: review a case study of the Motley Fool and how to create/target promotions to likely subscribers for problem #2, propensity marketing • Case study of a past hackathon. – My role: I seed the ideas, Mike Bowles, Nick Kolegraff
  • 19. ML/DM Slides • DO NOT INSERT SLIDES, cover the original so we don’t limit the scope of audience questions
  • 20. ML/DM and Hackathons • Done 2 as examples, – Motley Fool, cosponsored by Kaggle (Mike Bowles) – Best Buy, paid Kaggle (Nick Kolegraff@Accenture/DM SIG, we sought him out) – These events require guidance/very successful, both still are receptive to more DM/ML events • Careful: an algorithm doesn’t mean you have a production process or something someone can manage via a paid analyst headcount • Why aren’t there more? Time investment to clean data, tech talk to guide participants, min 3 months work
  • 21. What do I do for others which may help you? • Seed the ideas; should add a structure to this. NDA. Run SQL queries • Current Case Study – Starting to do the prep work for another real time analytics example, teaching from this – Nick/Mike did this for the other 2 hackathons. • Match the strategy w/structure – Take time off work to build an engineering prototype (Twitter Storm in old slide deck) – Not covering this here – Strategy: first display the data in a real time dashboard then iterate the visualizations, then add DM/ML algorithms after the A/B testing framework is complete
  • 22. One example, real time analytics, web page heat maps
  • 23. Amazon Web Page
  • 24. Google Shopping Example/Reversed/Why?
  • 25. Upper Left hand corner
  • 26. Example of Kiehls
  • 27. Kiehl’s Example • Put in offers w/($ amount, product desc, click url) customized per user, A/B test layouts and placement, store data for customization and measure lift • Measure facebook ads via page rank • Predict missing links application • http://blog.echen.me/2012/07/31/edge-prediction-in- a-social-graph-my-solution-to-facebooks-user- recommendation-contest-on-kaggle/ • Careful, don’t copy. Example only. Generalize to hackathon. Many other ideas • Your answer is different from Yahoo & Google. This isn’t a roadmap.
  • 28. Promotion Modeling • Is this a long tail problem? – How to formulate the graph and influence across nodes? – Which features to select to use for modeling? – Still ok if you don’t have the long tail answer. Follow the Demographics Customer modeling ex. • How to change the model over time? • Metrics for promotion effectiveness – Facebook campaigns are easy to iterate and run. Still need some form of A/B testing
  • 29. Structure has to match Strategy • Partner w/Macy’s? Develop a structure to work with retail partners to increase their sales – E.g. customized shopkick – Don’t just release APIs, release mobile app source code ppl can modify • Test promotions and building profiles? • … lots of ideas

×