Focus the Mining Beacon: Lessons and Challenges from the World of E-Commerce   ECML and PKDD  Oct 3 rd , 2005 Talk (and ML...
Solar Eclipse Today <ul><li>I’d like to thank the organizers for arranging the conference on Oct 3 rd  in Porto! </li></ul...
Overview <ul><li>Background/experience </li></ul><ul><li>E-commerce: great domain for data mining </li></ul><ul><li>Busine...
Background (I) <ul><li>1993-1995: Led development of  MLC++,  the Machine Learning Library in C++ (Stanford University) </...
Background (II) <ul><li>1998-2003: Director of Data Mining, then VP of Business Intelligence at Blue Martini Software </li...
Ingredients for Successful Data Mining <ul><li>Large amount of data (many records) </li></ul><ul><li>Rich data with many a...
Business-level Lessons (I) <ul><li>Auto-creation of the data warehouse worked very well </li></ul><ul><ul><li>At Blue Mart...
Business-level Lessons (II) <ul><li>Collect business-level data from operational side </li></ul><ul><ul><li>Many things no...
We tend to interpret the picture to the left as a serious problem How Priors Fail us
We are not Used to Seeing Pacifiers with Teeth
Collection example – Form Errors Here is a good example of data collection that we introduced without knowing apriori whet...
Business-level Lessons (III) <ul><li>Crawl, Walk, Run </li></ul><ul><ul><li>Do basic reporting first, generate univariate ...
Twyman’s Law <ul><li>Any statistic that appears interesting is almost certainly a mistake </li></ul><ul><li>Validate “amaz...
Twyman’s Law (II) <ul><li>KDD CUP 2000 </li></ul><ul><li>Customers who were willing to receive e-mail correlated with heav...
Simpson’s Paradox <ul><li>Every talk (hopefully) has a few key points to take away </li></ul><ul><ul><li>Simpson’s paradox...
Example 1: Paper reviews  <ul><li>Ann and Bob are papers reviewers for conferences </li></ul><ul><li>They participate in t...
Examples 2: Drug Treatment <ul><li>Real-life example for kidney stone treatments </li></ul><ul><li>Overall success rates: ...
Example 3: Sex Bias? <ul><li>Adopted from real data for UC Berkeley admissions </li></ul><ul><li>Women claim sexual discri...
Example 4: Purchase Channels   <ul><li>Real example from a Blue Martini Customer </li></ul><ul><li>We plotted the average ...
Last Example: Batting Average <ul><li>Baseball example </li></ul><ul><ul><li>(For those not familiar with baseball, battin...
Not Really a Paradox, Yet Non-Intuitive <ul><li>If a/b < A/B  and c/d < C/D, it’s possible that  (a+c)/(b+d) > (A+C)/(B+D)...
The Other Examples <ul><li>Paper reviews: Ann was tougher in general, but she reviewed most of her papers in the “write-on...
Key Takeaway  <ul><li>Why is this so important? </li></ul><ul><li>In knowledge discovery, we state probabilities (correlat...
Controlled Experiments (I) <ul><li>Controlled experiments (A/B test, or control/treatment) are the gold standard </li></ul...
Controlled Experiments (II) <ul><li>Issues with controlled experiments </li></ul><ul><ul><li>Duration: we measure only sho...
Technical Lessons – Cleansing (I) <ul><li>Auditing data </li></ul><ul><ul><li>Make sure time-series data exists for the wh...
Technical Lessons – Cleansing (II) <ul><li>Auditing data (continued) </li></ul><ul><ul><li>Remove test data. QA organizati...
Data Processing <ul><li>Utilize hierarchies </li></ul><ul><ul><li>Generalizations are hard to find when there are many att...
Analysis / Model Building <ul><li>Mining at the right granularity level </li></ul><ul><ul><li>To answer questions about cu...
Data Visualizations <ul><li>Picking the right visualization is key to seeing patterns </li></ul><ul><ul><li>On the left is...
Model Visualizations <ul><li>When we build models for prediction, it is sometimes important to understand them </li></ul><...
UI Tweaks – Feedback in Help <ul><li>Small UI changes can make a big difference </li></ul><ul><li>Example from Microsoft H...
Two Variants of Feedback A B Feedback A puts everything together, whereas feedback B is two-stage: question follows rating...
Another Feedback Variant <ul><li>Call this variant C. </li></ul><ul><li>Which one has a higher response rate, B or C? </li...
A Real Technical Lesson: Computing Confidence Intervals <ul><li>In many situations we need to compute confidence intervals...
Challenges (I) <ul><li>Finding a way to map business questions to data transformations </li></ul><ul><ul><li>Don Chamberli...
Challenges (II) <ul><li>Dealing with “slowly changing dimensions” </li></ul><ul><ul><li>Customer attributes change (people...
Challenges (III) <ul><li>Analyzing and measuring long-term impact of changes </li></ul><ul><ul><li>Control/Treatment exper...
Summary <ul><li>Pick a domain that has the right ingredients The Web and E-commerce are excellent </li></ul><ul><li>Think ...
Fun Lessons <ul><li>For ebay: do not bid on  every word in Google’s adwords </li></ul><ul><li>One accurate measurement is ...
Upcoming SlideShare
Loading in...5
×

Focus the Mining Beacon: Lessons and Challenges from the ...

243

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
243
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • http://ecmlpkdd05.liacc.up.pt/ Abstract: Electronic Commerce is now entering its second decade, with Amazon.com and eBay now in existence for ten years. With massive amounts of data, an actionable domain, and measurable ROI, multiple companies use data mining and knowledge discovery to understand their customers and improve interactions. We present important lessons and challenges using e-commerce examples across two dimensions: (i) business-level to technical, and (ii) the mining lifecycle from data collection, data warehouse construction, to discovery and deployment. Many of the lessons and challenges are applicable to domains outside e-commerce
  • The advantage of doing electronic presentations is that you can modify them a few hours prior to the talk  http://www.eclipse-glasses.com/eclipse-glasses-entree.php?ide=1&amp;idl=2 Pictures from Pestana hotel
  • When you listen to anyone giving advice and lessons, you should ask “what experience does the speaker have?” E-commerce, and more generally web applications, are a “killer” domain For lessons and challenges, I’ll skip the ones you learn in intro to knowledge discovery, such as 80% of the time is spent in data preparation This talk gives a taste for some key lessons and challenges. The ML paper is much more detailed.
  • My goal in giving you these background slides is to tell you about my experiences so that you know I’m not shaving on your face. MLC++: provided algorithms for comparisons that others used like discretization.
  • Clients including including Bluefly, Canadian Tire, Debenhams, Harley Davidson, Gymboree, Kohl’s, Mountain Equipment Co-op, Saks Fifth Avenue, Sainsbury, Sprint, The Men’s Wearhouse At Blue Martini, people wanted us to tell them the time, not how to build clocks. This is the opposite of “built to last” by Collins and Porras. Most companies wanted to sell, not to build core competencies in data mining. They were interested in key insights, especially in the early days of going live. At Amazon, one simple idea implemented in about 2 months by one person was worth, in terms of profits, as much as what the other 64 people did.
  • When you choose to apply data mining to a domain, make sure it has the right ingredients Large amounts of data: Clickstreams generate huge amounts of data New e-commerce sites, even if small, generate sufficient data for effective mining quickly Rich data: this one is hard in many applications since adding more attributes that people do not think are relevant may yield the most surprising insights. For example, screen size was an important attribute for e-commerce. Not easy to collect. A good example on the next slide. Collect product attributes, assortment attributes, promotions, customer attributes (through registration). Clean data on the web: all data is electronically collected. If designed right, it can be clean and reliable. Collect data directly at webstore. No legacy systems, no humans typing survey data from forms. Collect what is needed by design. Not as an afterthought Actionable: easy to change web pages The web is an experimental laboratory Quick deployment Measurable ROI: control/treatment or A/B tests are easy to do.
  • In many cases organizations that own the operations and the analysis don’t create such an automated process.
  • Oh my God, what’s wrong with this child? NOTHING. He’s sound asleep with no teeth
  • People filled in search keywords into the e-mail box that says “sign up for e-mail.” Easy to fix. BTW, search has to be on the home page. Amazon also made this mistake when it went live: there was no search box on the home page.
  • Top-5 taken 9/24/05 at 23:10 PST Sales rank: You get a near-real-time (updated hourly) metric that is not clearly defined to the outside observer!
  • Skip slide
  • “ There are three kinds of lies: lies, damn lies, and statistics ” -- Mark Twain
  • Conference 1 Conference 2 Total   Ann  60 / 100  1/10   61 / 110    Bob  9 / 10   30 / 100    39 / 110 
  • For cities, treatment A was better in each city, but lost overall Small Stones Large Stones Total Treatment A 81/ 87 = 93% 192/263 = 73% 273/350 = 78% Treatment B 234/270 = 87% 55/ 80 = 69% 289/350 = 83%
  • Tank example
  • [Some people bought fairly expensive products for less than 5 cents. Note this is an example of a multi-variate anomaly. It is OK for some products (e.g. gum) to be 5 cents, but not for other products. 2 6 different ways of spelling Mitsubishi!. Use drop down lists instead of free text fields]
  • Explain the heatmap. Note that Friday’s are generally weaker. I was pleased to learn that the next version of office (office 12) has heatmaps.
  • Naïve-Bayes: -- Stay on the last picture and see that - What your mother told you about education is true: higher education implies higher salary - Note politically correct, but nonetheless true in this census bureau data, being male slightly increases your chance of earning more money.
  • This isn’t in the paper, but too many people aren’t aware of this and I keep seeing this mistake made. Even with large amounts of data, model will split it into segments and some will be small enough so that it matters.
  • Focus the Mining Beacon: Lessons and Challenges from the ...

    1. 1. Focus the Mining Beacon: Lessons and Challenges from the World of E-Commerce ECML and PKDD Oct 3 rd , 2005 Talk (and ML paper) available at http://www.kohavi.com
    2. 2. Solar Eclipse Today <ul><li>I’d like to thank the organizers for arranging the conference on Oct 3 rd in Porto! </li></ul><ul><ul><li>The Sun was obscured 89.7% here </li></ul></ul><ul><ul><li>A few pictures I took from my hotel room with the Sky & Space glasses provided </li></ul></ul>
    3. 3. Overview <ul><li>Background/experience </li></ul><ul><li>E-commerce: great domain for data mining </li></ul><ul><li>Business lessons and Simpson’s paradox </li></ul><ul><li>Technical lessons </li></ul><ul><li>Challenges </li></ul><ul><li>Q&A </li></ul>
    4. 4. Background (I) <ul><li>1993-1995: Led development of MLC++, the Machine Learning Library in C++ (Stanford University) </li></ul><ul><ul><li>Implemented or interfaced many ML algorithms. Source code is public domain, used for algorithm comparisons </li></ul></ul><ul><li>1995-1998: Developed and managed MineSet </li></ul><ul><ul><li>MineSet ™ was a “horizontal” data mining and visualization product at Silicon Graphics, Inc (SGI). Utilized MLC++. Now owned by Purple Insight </li></ul></ul><ul><ul><li>Key insight: customers want simple stuff: Naïve Bayes + Viz </li></ul></ul><ul><li>ICML 1998 keynote: claimed that to be successful, data mining needs to be part of a complete solution in a vertical market </li></ul><ul><ul><li>I followed this vision to Blue Martini Software </li></ul></ul><ul><li>A consultant is someone who </li></ul><ul><ul><li>borrows your razor, </li></ul></ul><ul><ul><li>charges you by the hour, </li></ul></ul><ul><ul><li>learns to shave </li></ul></ul><ul><li>on your face </li></ul>
    5. 5. Background (II) <ul><li>1998-2003: Director of Data Mining, then VP of Business Intelligence at Blue Martini Software </li></ul><ul><ul><li>Developed end-to-end e-commerce platform with integrated business intelligence from collection, extract-transform-load (ETL) to data warehouse, reporting, mining, visualizations </li></ul></ul><ul><ul><li>Analyzed data from over 20 clients </li></ul></ul><ul><ul><li>Key insight: collection, ETL worked great. Found many insights. However, customers mostly just ran the reports/analyses we provided </li></ul></ul><ul><li>2003-2005: Director, Data Mining and Personalization, Amazon </li></ul><ul><ul><li>Key insights: (i) simple things work, and (ii) human insight is key </li></ul></ul><ul><li>Recently moved to Microsoft </li></ul><ul><ul><li>Building platform utilizing machine learning and user feedback to improve interactions </li></ul></ul><ul><ul><li>Shameless plug: we are hiring </li></ul></ul>
    6. 6. Ingredients for Successful Data Mining <ul><li>Large amount of data (many records) </li></ul><ul><li>Rich data with many attributes (wide records) </li></ul><ul><li>Clean data / reliable collection (avoid GIGO) </li></ul><ul><li>Actionable domain (have real-world impact, experiment) </li></ul><ul><li>Measurable return-on-investment (did the recipe help) </li></ul><ul><li>E-commerce has all the right ingredients </li></ul><ul><li>If you are choosing to work a domain, make sure it has these ingredients </li></ul>
    7. 7. Business-level Lessons (I) <ul><li>Auto-creation of the data warehouse worked very well </li></ul><ul><ul><li>At Blue Martini we owned the operational side as well as the analysis, we had a ‘DSSGen’ process that auto-generated a star-schema data warehouse </li></ul></ul><ul><ul><li>This worked very well. For example, if a new customer attribute was added at the operational side, it automatically became available in the data warehouse </li></ul></ul><ul><li>Clients are reluctant to list specific questions </li></ul><ul><ul><li>Conduct an interim meeting with basic findings. Clients often came up with a long list of questions faced with basic statistics about their data </li></ul></ul>
    8. 8. Business-level Lessons (II) <ul><li>Collect business-level data from operational side </li></ul><ul><ul><li>Many things not observable in weblogs (search information, shopping cart events, registration forms, time to return results). Log at app-server </li></ul></ul><ul><ul><li>External events: marketing promotions, advertisements, site changes </li></ul></ul><ul><ul><li>Choose to collect as much data as you realistically can because you do not know what might be relevant for a future question. Discoveries that contradict our prior thinking are usually the most interesting </li></ul></ul>
    9. 9. We tend to interpret the picture to the left as a serious problem How Priors Fail us
    10. 10. We are not Used to Seeing Pacifiers with Teeth
    11. 11. Collection example – Form Errors Here is a good example of data collection that we introduced without knowing apriori whether it will help: form errors If a web form was filled and a field did not pass validation, we logged the field and value filled This was the Bluefly home page when they went live Looking at form errors, we saw thousands of errors every day on this page Any guesses?
    12. 12. Business-level Lessons (III) <ul><li>Crawl, Walk, Run </li></ul><ul><ul><li>Do basic reporting first, generate univariate statistics, then use OLAP for hypothesis testing, and only then start asking characterization questions and use data mining algorithms </li></ul></ul><ul><li>Agree on terminology </li></ul><ul><ul><li>What is the difference between a visit and a session? </li></ul></ul><ul><ul><li>How do you define a customer (e.g., did every customer purchase)? </li></ul></ul><ul><ul><li>How is “top seller” defined when showing best sellers? Why are lists from Amazon (left) and Barnes Noble (right) so different? The answer: no agreed-upon definition of sales rank. </li></ul></ul>
    13. 13. Twyman’s Law <ul><li>Any statistic that appears interesting is almost certainly a mistake </li></ul><ul><li>Validate “amazing” discoveries in different ways. They are usually the result of a business process </li></ul><ul><ul><li>5% of customers were born on the same day </li></ul></ul><ul><ul><ul><li>11/11/11 is the easiest way to satisfy the mandatory birth date field </li></ul></ul></ul><ul><ul><li>For US Web sites, there will be a small sales spike later this month on Oct 30, 2005 </li></ul></ul><ul><ul><ul><li>Hint: Between 1-2AM, sales will approximately double relative to the prior week </li></ul></ul></ul><ul><ul><ul><li>Due to daylight saving ending, after 1:59AM DST comes 1:00AM no DST, so there are two actual hours from 1AM to 2AM </li></ul></ul></ul>
    14. 14. Twyman’s Law (II) <ul><li>KDD CUP 2000 </li></ul><ul><li>Customers who were willing to receive e-mail correlated with heavy spenders (target variable) </li></ul><ul><ul><ul><li>Default for registration question was changed from “yes” to “no” on 2/28 </li></ul></ul></ul><ul><ul><ul><li>When it was realized that nobody is opting-in, the default was changed </li></ul></ul></ul><ul><ul><ul><li>This coincided with a $10 discount off every purchase </li></ul></ul></ul><ul><ul><ul><li>Lots of participants found this spurious correlation, but it was terrible for predictions on the test set </li></ul></ul></ul><ul><li>Sites go through phases (launches) and multiple things change together </li></ul>
    15. 15. Simpson’s Paradox <ul><li>Every talk (hopefully) has a few key points to take away </li></ul><ul><ul><li>Simpson’s paradox is a one key takeaway from this talk </li></ul></ul><ul><ul><li>Lack of awareness of the phenomenon can lead to mistaken conclusions </li></ul></ul><ul><ul><li>Unlike esoteric brain teasers, it happens in real life </li></ul></ul><ul><li>Flow for next few slides </li></ul><ul><ul><li>Examples that most of you might think are “impossible” </li></ul></ul><ul><ul><li>Explanation of why they are possible and do happen </li></ul></ul><ul><ul><li>Implications/warning </li></ul></ul>
    16. 16. Example 1: Paper reviews <ul><li>Ann and Bob are papers reviewers for conferences </li></ul><ul><li>They participate in two review cycles: C1 and C2 (e.g., two conferences) </li></ul><ul><li>Both reviewed the same number of papers in total </li></ul><ul><ul><li>Ann accepted 55%, Bob accepted 35% (stricter) </li></ul></ul><ul><li>Who is the stricter reviewer? </li></ul>Adopted from wikipedia/simpson’s paradox <ul><li>It appears to be Bob, but it’s possible to show that there are cases were Ann is stricter in both cycles. Specifically </li></ul><ul><ul><li>For C1, Ann is stricter </li></ul></ul><ul><ul><ul><li>Ann accepted 60% of papers (stricter), Bob accepted 90% of papers </li></ul></ul></ul><ul><ul><li>For C2, Ann is stricter </li></ul></ul><ul><ul><ul><li>Ann accepted 10% of papers (stricter), Bob accepted 30% of papers </li></ul></ul></ul>
    17. 17. Examples 2: Drug Treatment <ul><li>Real-life example for kidney stone treatments </li></ul><ul><li>Overall success rates: </li></ul><ul><ul><li>Treatment A succeeded 78%, Treatment B succeeded 83% (better) </li></ul></ul><ul><li>Further analysis splits the population by stone size </li></ul><ul><ul><li>For small stones </li></ul></ul><ul><ul><ul><li>Treatment A succeeded 93% (better), Treatment B succeeded 83% </li></ul></ul></ul><ul><ul><li>For large stones </li></ul></ul><ul><ul><ul><li>Treatment A succeeded 73% (better), Treatment B succeeded 69% </li></ul></ul></ul><ul><ul><li>Hence treatment A is better in both cases, yet was worse in total </li></ul></ul><ul><li>A similar real-life example happened when the two populations segments were cities </li></ul>Adopted from wikipedia/simpson’s paradox
    18. 18. Example 3: Sex Bias? <ul><li>Adopted from real data for UC Berkeley admissions </li></ul><ul><li>Women claim sexual discrimination </li></ul><ul><ul><li>Only 34% of women were accepted, </li></ul></ul><ul><ul><li>while 44% of men were accepted </li></ul></ul><ul><li>Segmenting by departments to isolate the bias, they find that all departments accept a higher percentage of women applicants than men applicants. (If anything, there is a slight bias in favor of women!) </li></ul><ul><li>There is no conflict in the above bullets. It’s possible and it happened </li></ul>Bickel, P. J., Hammel, E. A., and O'Connell, J. W. (1975). Sex bias in graduate admissions: Data from Berkeley. Science , 187, 1975, 398-404.
    19. 19. Example 4: Purchase Channels <ul><li>Real example from a Blue Martini Customer </li></ul><ul><li>We plotted the average customer spending for customers purchasing on the web or “on the web and offline (POS)” (multi-channel), but segmented by number of purchases per customer </li></ul><ul><li>In all segments, multi-channel customers spent less </li></ul><ul><li>However, like shop.org predicted, ignoring the segments, multi-channel customers spent more on average </li></ul>Multichannel customers spend 72% more per year than single channel customers -- State of Retailing Online, shop.org
    20. 20. Last Example: Batting Average <ul><li>Baseball example </li></ul><ul><ul><li>(For those not familiar with baseball, batting average is percent of hits.) </li></ul></ul><ul><ul><li>One player can hit for a higher batting average than another player during the first half of the year </li></ul></ul><ul><ul><li>Do so again during the second half </li></ul></ul><ul><ul><li>But to have a lower batting average for the entire year </li></ul></ul><ul><li>Example </li></ul><ul><li>Key to the “paradox” is that the segmenting variable (e.g., half year) interacts with “success” and with the counts. E.g., “A” was sick and rarely played in the 1 st half, then “B” was sick in the 2 nd half, but the 1 st half was “easier” overall. </li></ul>
    21. 21. Not Really a Paradox, Yet Non-Intuitive <ul><li>If a/b < A/B and c/d < C/D, it’s possible that (a+c)/(b+d) > (A+C)/(B+D) </li></ul><ul><li>We are essentially dealing with weighted averages when we combine segments </li></ul><ul><li>Here is a simple example with two treatments </li></ul><ul><ul><li>Each cell has Success / Total = Percent Success % </li></ul></ul><ul><ul><li>T1 is superior in both segment C1 and segment C2, yet loses overall </li></ul></ul><ul><ul><li>C1 is “harder” (lower success for both treatments) </li></ul></ul><ul><ul><li>T1 gets tested more in C1 </li></ul></ul>
    22. 22. The Other Examples <ul><li>Paper reviews: Ann was tougher in general, but she reviewed most of her papers in the “write-only” conference where acceptance is always higher </li></ul><ul><li>Kidney Stones: treatments did not work well against large stones, but treatment A was heavily tested on those </li></ul><ul><li>Sex Bias: Departments differed in their acceptance rates and women applied more to departments were such rates were lower </li></ul><ul><li>Web vs. Multi-channel: customers that visited often spent more on average and multi-channel customers visited more </li></ul>
    23. 23. Key Takeaway <ul><li>Why is this so important? </li></ul><ul><li>In knowledge discovery, we state probabilities (correlations) and associate them with causality </li></ul><ul><ul><li>Reviewer Bob is stricter </li></ul></ul><ul><ul><li>Treatment T1 works better </li></ul></ul><ul><ul><li>Berkeley discriminates against women </li></ul></ul><ul><li>We must be careful to check for confounding variables </li></ul><ul><li>Confounding variables may not be ones we are collecting (e.g., latent/hidden) </li></ul>
    24. 24. Controlled Experiments (I) <ul><li>Controlled experiments (A/B test, or control/treatment) are the gold standard </li></ul><ul><li>Make sure to randomize properly </li></ul><ul><ul><li>You cannot run option A on day 1 and option B on day 2, you have to run them in parallel </li></ul></ul><ul><ul><li>When running in parallel, you cannot randomize based on IP (e.g., load-balancer randomization) because all of AOL traffic comes from a few proxy servers </li></ul></ul><ul><ul><li>Every customer must have an equal chance of falling into control or treatment and must stick to that group </li></ul></ul>
    25. 25. Controlled Experiments (II) <ul><li>Issues with controlled experiments </li></ul><ul><ul><li>Duration: we measure only short term impact. Hard to assess long term effects </li></ul></ul><ul><ul><li>Primacy effect: changing navigation in a website may degrade customer experience, even if the new navigation is better </li></ul></ul><ul><ul><li>Multiple experiments: on a large site, you may have multiple experiments running in parallel. Scheduling and QA are complex </li></ul></ul><ul><ul><li>Consistency/contamination: on the web, assignment is usually cookie-based, but people may use multiple computers </li></ul></ul><ul><ul><li>Statistical tests: distributions are far from normal. E.g., 97% of sessions do not purchase, so there’s a large mass on the zero spending </li></ul></ul>
    26. 26. Technical Lessons – Cleansing (I) <ul><li>Auditing data </li></ul><ul><ul><li>Make sure time-series data exists for the whole period. It is very easy to conclude that this week was bad relative to last week because some data is missing (e.g., collection bug) </li></ul></ul><ul><ul><li>Synchronize clocks from all data collection points. In one example, some servers were set to GMT and others to EST, leading to strange anomalies. Even being a few minutes off can cause add-to-carts to appear “prior” to the search </li></ul></ul>
    27. 27. Technical Lessons – Cleansing (II) <ul><li>Auditing data (continued) </li></ul><ul><ul><li>Remove test data. QA organizations constantly test the system. Make sure the data can be identified and removed from analysis </li></ul></ul><ul><ul><li>Remove robots/bots 5-40% of site e-commerce site traffic is generated by crawlers from search engines and students learning Perl. These significantly skew results unless removed </li></ul></ul>
    28. 28. Data Processing <ul><li>Utilize hierarchies </li></ul><ul><ul><li>Generalizations are hard to find when there are many attribute values (e.g., every product has a Stock Keeping Unit number) </li></ul></ul><ul><ul><li>Collapse such attribute values based on hierarchies </li></ul></ul><ul><li>Remember date/time attributes </li></ul><ul><ul><li>Date/time attributes are often ignored, but contain information </li></ul></ul><ul><ul><li>Convert them into cyclical attributes, such as hour of day or morning/afternoon/evening, day of week, etc. </li></ul></ul><ul><ul><li>Compute deltas between such attributes (e.g., ship date minus order date) </li></ul></ul>
    29. 29. Analysis / Model Building <ul><li>Mining at the right granularity level </li></ul><ul><ul><li>To answer questions about customers, we must aggregate clickstreams, purchases, and other information to the customer level </li></ul></ul><ul><ul><li>Defining the right transformation and creating summary attributes is the key to success </li></ul></ul><ul><li>Phrase the problem to avoid leaks </li></ul><ul><ul><li>A leak is an attribute that “gives away” the label. E.g., heavy spenders pay more sales tax (VAT) </li></ul></ul><ul><ul><li>Phrasing the problem to avoid leaks is a key insight. Instead of asking who is a heavy spender, ask which customers migrate from spending a small amount in period 1 to a large amount in period 2 </li></ul></ul>
    30. 30. Data Visualizations <ul><li>Picking the right visualization is key to seeing patterns </li></ul><ul><ul><li>On the left is traffic by day – note the weekends (but hard to see patterns) </li></ul></ul><ul><ul><li>On the right is a heatmap, showing traffic colored from green to yellow to red utilizing the cyclical nature of the week (going up in columns) It’s easy to see the weekend, Labor day on Sept 3, and the effect of Sept 11 </li></ul></ul>weekends
    31. 31. Model Visualizations <ul><li>When we build models for prediction, it is sometimes important to understand them </li></ul><ul><li>For MineSet™, we built visualizations for all models </li></ul><ul><li>Here is one: Naïve-Bayes / Evidence model (movie) </li></ul>
    32. 32. UI Tweaks – Feedback in Help <ul><li>Small UI changes can make a big difference </li></ul><ul><li>Example from Microsoft Help </li></ul><ul><li>When reading help (from product or web), you have an option to give feedback </li></ul>
    33. 33. Two Variants of Feedback A B Feedback A puts everything together, whereas feedback B is two-stage: question follows rating. Feedback A just has 5 stars, whereas B annotates the stars with “Not helpful” to “Very helpful” and makes them lighter Feedback B gets more than double the response rate! Which one has a higher response rate?
    34. 34. Another Feedback Variant <ul><li>Call this variant C. </li></ul><ul><li>Which one has a higher response rate, B or C? </li></ul>C Feedback C outperforms B by a factor of 3.5 !!
    35. 35. A Real Technical Lesson: Computing Confidence Intervals <ul><li>In many situations we need to compute confidence intervals, which are simply estimated as: acc_h +- z*stdDev </li></ul><ul><ul><li>where acc_h is the estimated mean accuracy, </li></ul></ul><ul><ul><li>stdDev is the estimated standard deviation, and </li></ul></ul><ul><ul><li>z is usually 1.96 for a 95% confidence interval) </li></ul></ul><ul><li>This fails miserably for small amounts of data </li></ul><ul><ul><li>For Example: If you see three coin tosses that are head, the confidence interval for the probability of head would be [1,1] </li></ul></ul><ul><li>Use a more accurate formula that does not require using stdDev (but still assumes Normality): </li></ul><ul><ul><li>It’s not used often because it’s more complex, but that’s what computers are for </li></ul></ul><ul><ul><li>See Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection” in IJCAI-95 </li></ul></ul>
    36. 36. Challenges (I) <ul><li>Finding a way to map business questions to data transformations </li></ul><ul><ul><li>Don Chamberlin wrote on the design of SQL “What we thought we were doing was making it possible for non-programmers to interact with databases.&quot; The SQL99 standard is now about 1,000 pages </li></ul></ul><ul><ul><li>Many operations that are needed for mining are not easy to write in SQL </li></ul></ul><ul><li>Explaining models to users </li></ul><ul><ul><li>What are ways to make models more comprehensible </li></ul></ul><ul><ul><li>How can association rules be visualized/summarized? </li></ul></ul>
    37. 37. Challenges (II) <ul><li>Dealing with “slowly changing dimensions” </li></ul><ul><ul><li>Customer attributes change (people get married, their children grow and we need to change recommendations) </li></ul></ul><ul><ul><li>Product attributes change, or are packaged differently. New editions of books come out </li></ul></ul><ul><li>Supporting hierarchical attributes </li></ul><ul><li>Deploying models </li></ul><ul><ul><li>Models are built based on constructed attributes in the data warehouse. Translating them back to attributes available at the operational side is an open problem </li></ul></ul><ul><li>For web sites, detecting robots/spiders </li></ul><ul><ul><li>Detection is based on heuristics (useragent, IP, javascript) </li></ul></ul>
    38. 38. Challenges (III) <ul><li>Analyzing and measuring long-term impact of changes </li></ul><ul><ul><li>Control/Treatment experiments give us short-term value. How do we address long-term impact of changes? </li></ul></ul><ul><ul><li>For non-commerce sites, how do we measure user satisfaction? Example: users hit F1 for help in Microsoft Office and execute a series of queries, browsing through documents. How do we measure satisfaction other than through surveys? </li></ul></ul>
    39. 39. Summary <ul><li>Pick a domain that has the right ingredients The Web and E-commerce are excellent </li></ul><ul><li>Think about the problem end-to-end from collection, transformations, reporting, visualizations, modeling, taking action </li></ul><ul><li>The lessons and challenges are from e-commerce, but likely to be applicable in other domains </li></ul><ul><li>Beware of hidden variables when concluding causality. Think about Simpson’s paradox. Conduct control/treatment experiments with proper randomization </li></ul>
    40. 40. Fun Lessons <ul><li>For ebay: do not bid on every word in Google’s adwords </li></ul><ul><li>One accurate measurement is worth a thousand expert opinions -- Admiral Grace Hopper </li></ul><ul><li>Advertising may be described as the science of arresting the human intelligence long enough to get money from it </li></ul><ul><li>Not everything that can be counted counts And not everything that counts can be counted -- Albert Einstein </li></ul><ul><li>Entropy requires no maintenance </li></ul><ul><li>In God we trust. All others must have data </li></ul><ul><li>Copy of talk and full paper, visit http://kohavi.com </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×