When you listen to anyone giving advice and lessons, you should ask “what experience does the speaker have?” E-commerce, and more generally web applications, are a “killer” domain For lessons and challenges, I’ll skip the ones you learn in intro to knowledge discovery, such as 80% of the time is spent in data preparation This talk gives a taste for some key lessons and challenges. The ML paper is much more detailed.
My goal in giving you these background slides is to tell you about my experiences so that you know I’m not shaving on your face. MLC++: provided algorithms for comparisons that others used like discretization.
Clients including including Bluefly, Canadian Tire, Debenhams, Harley Davidson, Gymboree, Kohl’s, Mountain Equipment Co-op, Saks Fifth Avenue, Sainsbury, Sprint, The Men’s Wearhouse At Blue Martini, people wanted us to tell them the time, not how to build clocks. This is the opposite of “built to last” by Collins and Porras. Most companies wanted to sell, not to build core competencies in data mining. They were interested in key insights, especially in the early days of going live.
In many cases organizations that own the operations and the analysis don’t create such an automated process.
People filled in search keywords into the e-mail box that says “sign up for e-mail.” Easy to fix. BTW, search has to be on the home page. Amazon also made this mistake when it went live: there was no search box on the home page.
Top-5 taken 9/24/05 at 23:10 PST Sales rank: You get a near-real-time (updated hourly) metric that is not clearly defined to the outside observer!
Sunglasses example You tell someone that there’s a big difference in sales of sunglasses per-capita between Seattle and LA. They say, well that’s obvious, you guys up in Seattle never need sunglasses since it’s so cloudy. You then state that Seattle sells more sunglasses per capita than LA (a true fact) They looked surprised, when you can see how their neural net in their brain is updating its weights with this surprising info, but then after 10 seconds they say “well, that’s obvious. It’s so cloudy there, that when the sun comes out, you need sunglasses and don’t know where you left them, so you buy another pair.” http://www.see-seattle.com/seattlefirsts.htm http://www.pubclub.com/pacificnw/seattlepre.htm
Oh my God, what’s wrong with this child? NOTHING. He’s sound asleep with no teeth
Closed space between top &quot;Proceed to Checkout&quot; button line and next line. Removed top &quot;Continue Shopping&quot; button. Removed &quot;Update&quot; button underneath the quantity box. Moved &quot;Total&quot; box down a line. Text and amount appear in different boxes. Above the &quot;Total&quot; box is a &quot;Discount&quot; box, with amount in a box next to it. Above &quot;Shipping Method&quot; line is &quot;Enter Coupon Code&quot; with a box to enter it. New &quot;Recalculate&quot; button left of &quot;Continue Shopping.&quot; Bottom tool bar on two lines. Shopping cart icon one space closer to the words &quot;Shopping Cart.&quot;
Sadly, the 5-stars was launched as an attempt to improve yes/no.
http://webexhibits.org/daylightsaving/b.html For Oct 29, 2006 it’s both Europe and the US. For starting DST, the dates are different. Don’t forget to change your batteries: More than 90 percent of homes in the United States have smoke detectors, but one-third are estimated to have dead or missing batteries.
For cities, treatment A was better in each city, but lost overall Small Stones Large Stones Total Treatment A 81/ 87 = 93% 192/263 = 73% 273/350 = 78% Treatment B 234/270 = 87% 55/ 80 = 69% 289/350 = 83%
Good example of primacy: Office 2007 vs. Office 2003
[Some people bought fairly expensive products for less than 5 cents. Note this is an example of a multi-variate anomaly. It is OK for some products (e.g. gum) to be 5 cents, but not for other products. 2 6 different ways of spelling Mitsubishi!. Use drop down lists instead of free text fields]
Explain the heatmap. Note that Friday’s are generally weaker. The next version of office (office 2007) has heatmaps.
Naïve-Bayes: -- Stay on the last picture and see that - What your mother told you about education is true: higher education implies higher salary - Note politically correct, but nonetheless true in this census bureau data, being male slightly increases your chance of earning more money.
This isn’t in the paper, but too many people aren’t aware of this and I keep seeing this mistake made. Even with large amounts of data, model will split it into segments and some will be small enough so that it matters.
Focus the Mining Beacon: Lessons and Challenges from the World of E-Commerce SF Bay ACM Data Mining SIG, 6/13/2006
Many things not observable in weblogs (search information, shopping cart events, registration forms, time to return results). Log more at app-server
External events: marketing promotions, advertisements, site changes
Choose to collect as much data as you realistically can because you do not know what might be relevant for a future question. (Subject to privacy issues, but aggregated/anonymous data is usually OK.)
Collection example – Form Errors Here is a good example of data collection that we introduced without knowing apriori whether it will help: form errors If a web form was filled and a field did not pass validation, we logged the field and value filled This was the Bluefly home page when they went live Looking at form errors, we saw thousands of errors every day on this page Any guesses?
Many explanations we give to “success” are backwards looking. Hindsight is 20/20
Sales of sunglasses per-capita in Seattle vs. LA example
Our intuition at assessing new ideas is usually very poor
We are especially bad at assessing ideas that are not incremental, i.e., radical changes
We commonly confuse ourselves with the target audience
Discoveries that contradict our prior thinking are usually the most interesting
Next set of slides are a series of examples where you can test your intuition, or your “prior probabilities.”
Do you believe in intuition? No, but I have a feeling I might someday
We tend to interpret the picture to the left as a serious problem How Priors Fail us Warning: graphic image may be disturbing to some people. However, it’s just your priors.
We are not Used to Seeing Pacifiers with Teeth
Checkout Page Example from Bryan Eisenberg’s article on clickz.com The conversion rate is the percentage of visits to the website that include a purchase Which version has a higher conversion rate? Why? A B
When reading help (from product or web), you have an option to give feedback
Office Online Feedback A B Feedback A puts everything together, whereas feedback B is two-stage: question follows rating. Feedback A just has 5 stars, whereas B annotates the stars with “Not helpful” to “Very helpful” and makes them lighter Which one has a higher response rate? By how much?
Only 34% of women were accepted, while 44% of men were accepted
Segmenting by departments to isolate the bias, they found that all departments accept a higher percentage of women applicants than men applicants. (If anything, there is a slight bias in favor of women!)
There is no conflict in the above statements. It’s possible and it happened
Bickel, P. J., Hammel, E. A., and O'Connell, J. W. (1975). Sex bias in graduate admissions: Data from Berkeley. Science , 187, 1975, 398-404.
(For those not familiar with baseball, batting average is percent of hits.)
One player can hit for a higher batting average than another player during the first half of the year
Do so again during the second half
But to have a lower batting average for the entire year
Key to the “paradox” is that the segmenting variable (e.g., half year) interacts with “success” and with the counts. E.g., “A” was sick and rarely played in the 1 st half, then “B” was sick in the 2 nd half, but the 1 st half was “easier” overall.
Make sure time-series data exists for the whole period. It is very easy to conclude that this week was bad relative to last week because some data is missing (e.g., collection bug)
Synchronize clocks from all data collection points. In one example, some servers were set to GMT and others to EST, leading to strange anomalies. Even being a few minutes off can cause add-to-carts to appear “prior” to the search
Picking the right visualization is key to seeing patterns
On the left is traffic by day – note the weekends (but hard to see patterns)
On the right is a heatmap, showing traffic colored from green to yellow to red utilizing the cyclical nature of the week (going up in columns) It’s easy to see the weekend, Labor day on Sept 3, and the effect of Sept 11
Analyzing and measuring long-term impact of changes
Control/Treatment experiments give us short-term value. How do we address long-term impact of changes?
For non-commerce sites, how do we measure user satisfaction? Example: users hit F1 for help in Microsoft Office and execute a series of queries, browsing through documents. How do we measure satisfaction other than through surveys?