Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Embedded Automatic Model Training And Forc In An Enterprise Sw Applic


Published on

How can the process of Knowledge Discovery in Databases be automated, competitive and reliable? One approach is to focus on a narrow vertical market application, with known data sources and data feeds. Then you can automate the Exploratory Data Analysis (EDA) and Preprocessing phases. But how do you automate the selection of training data? Can the enterprise application be installed and configured at a variety of clients without a Senior Knowledge Discovery Engineer? How can you minimize "worst case" results of such a system when used by a business user going through their normal business role? How can you deeply investigate and model "business values" (i.e. things that can get an end user promoted or fired) into the core of the data mining algorithms?

This talk will answer these questions and more. The patent-pending application, ELF, is an enterprise application in the retail supply chain vertical market. Before the development of this system, one enterprise application was used to lay out a weekly newspaper flier three weeks before the sales event, which in turn fed data into a replenishment application. The replenishment application kept products on the store shelves, with a minimal amount of over stock and under stock. The pain point was that the retail buyer would have to manually estimate the the sales lift, or the multiplier increase in sales, for every item for every store. While human expertise can be great, it isn\'t as scalable when applied to a sales event with 1,000 - 4,000 items on sale in 6,000 stores. ELF (Event Lift Forecasting) would import data from a planned event and automatically analyze and forecast the lift for each store-item combination. Data elements used included pricing, placement in the flier, store geography and demographics, seasonality, and product hierarchy.

The resulting ELF system produced a 8-30% reduction in over and under stock costs, which is very significant in terms of the low profit margins in the supply chain industry.

About the Speaker
Greg Makowski is a Principal Consultant of Golden Data Mining, in Los Altos, California. Since 1992, he has deployed over 70 data mining models for clients i n targeted marketing, financial services, supply chain, e-commerce, and Internet advertising in North America, South America and Europe. He has applied a variety of data mining algorithms during these engagements and has experience using SQL, SAS, Java, and areas of Cloud Computing. Greg has eight years of experience in Product Management and over six years of experience working with start ups. See also

Published in: Technology, Business
  • Be the first to comment

Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

  1. 1. Embedded Automatic Model Training and Forecasting in an Enterprise Software Application (… or how to embed a data mining consultant in a box) Presented to the SF Bay ACM Data Mining SIG March 11, 2009 by Greg Makowski Principal Consultant, Golden Data Mining p , g
  2. 2. Outline Challenge: How to automate not only forecasting, but model training? Solution: Focus on a vertical market application Deeply investigate the business & technical issues Result: An enterprise application Up to a 30% reduction in $ lost to over and under stock 1
  3. 3. Challenge: Business Pain Point JDA Software ( (who owns the IP) has dozens of ) enterprise retail supply chain applications The R l i h Th Replenishment software does a very good t ft d gd job keeping store shelves stocked at the right level when sales are steadyy Moves product from warehouse to DC to store Sales are NOT STEADY during sales events! PAIN POINT: The event planner has to estimate the lift in sales for every store-item combination, store item (6k stores) * (1k to 4k item’s) 24 mm store-item lift estmts. 2
  4. 4. Retail (context) Challenge: 16 Page Newspaper Insert Can vary by region or ZIP
  5. 5. Event Lift Forecasting (ELF) Lift is a multiplier for the increase in sales over normal “Prod X in Store Y will sell 6.8 times more than normal” Normal sales are around the event, for the same: time period (i.e. Thr – Sun), a week before and after (non-overlapping) Store – product (SKU is a key for product) Event E t Lift 4
  6. 6. Retail Challenge: Appropriate for Business User A retail event planner Has revenue goals and a “budget” of discount $ Has to get through a lot of detail quickly Does not typically create mathematical forecasts Uses an enterprise application to layout the event flyer about 3 weeks in advance Decides for the event: departments / items / pricing / photos / language Uses the software to specify SKU’s, images and l layout th fl t the flyer 5
  7. 7. Product Mgmt Software Arch Challenge: How to Productize (Agile)? This is not a one-off consulting project, but SW Software engineering needs (get in the ballpark) right starting p g g position, metrics, use cases, data flow , , , Support good Agile development process Goals At least 90% software and 10% configuration, not repeated consulting projects projects, Control the Total Cost of Ownership for the product RELIABLE when used by the business user user, working at the level of detail that the user cares about 6
  8. 8. Product Mgmt Challenge: Details we Have vs. Need to Start
  9. 9. Outline g Challenge: How to automate not only y forecasting, but model training? Solution: Focus on a vertical market application Deeply investigate the business & technical i D li i hb i h i l issues Result: An enterprise application Up to a 30% reduction in $ lost to over and under 30 educt o ost o e a d u de stock 8
  10. 10. Product Mgmt Data Mining Path to Solution Customer lead, product driven – design general Can’t data mine – without data Start data request process with several clients Jumpstart efforts with Monte Carlo Combine Census fields with noise to create a target The models and forecast matter less – the process MORE Ask for business interviews Understand users, metrics, past challenges What is the BATNA? Best Alternative, To A New Alternative (system)? 9
  11. 11. Data Mining Data Sources Event Attributes (for planned in 3 weeks & past) Pricing, placement (page #, on a page) Products, departments, layout Store f S features, d demographics of population in hi f li i area, Past events Flyers may have 1, 8, 12, 16, 20, 64 pages Same week last year may have a different prod mix Calculate Lift for all store-items for all past events Normal sales (not during an event) near in time Event sales; Lift = (event sales) / (non-event sales) 10
  12. 12. Data Mining Iterative KDD Process Knowledge Discovery in Databases (KDD) Select Data for Analysis (from prior event app) 1. Exploratory Data Analysis (EDA) 2. Preprocessing (manipulating fields) p g( p g ) 3. Model Building (Training DM algorithms) 4. Model Evaluation (appl to hold o t data) (apply out 5. 5 Post-process score to business value 6. Feed the next application (Lift / store-item) 7. 11
  13. 13. Data Mining Product Mgmt Easiest to Automate From the Core Go through full process, automating model building / evaluation EDA & Preprocessing Select past marketing campaigns 12
  14. 14. Data Mining Hypothesis to Select Past Campaigns: 1) Most Similar Past Events Attention: your expertise will be quizzed! Hypothesis: a close fit to the new event is better Compare high level event attributes Number of pages of the flyer Discount (average, max) “Primary” departments, sub-dep, catg, sub-category … and so on Use “fuzzy” Euclidian distance to match past events to the planned event in 3 weeks Select the 1-10 most similar events in the last year 13
  15. 15. Data Mining Hypothesis to Select Past Campaigns: 2) Select Broadly Hypothesis: more training records p yp g provides a wide variety of behavior, and better generalization Exclude past marketing events that are quite different (but be broadly inclusive) If the planned event is 10-18 pages, exclude 1-2 and 64 page events Audience Quiz: VOTE for what you expect 1) Close fit, fit 2) Broad fit ? 14
  16. 16. Data Mining Select Past Campaigns: Results & Why g Answer from testing: BROADLY selecting past marketing events to train for the planned event works much better Why: Breadth Robust G Generalization Same sale last year was different in many ways Broad variety of price points / item or department Variety of items on cover Variation V i ti over geography h 15
  17. 17. Data Mining Exploratory Data Analysis (EDA) Front cover items had a lift 5.1 times higher than the average elsewhere! Lift as high as 130 – after Halloween candy sale l The top 5% of the records had 90% of the lift (over all store-item combinations) 16
  18. 18. Data Mining Retail Exploratory Data Analysis (EDA) The Cash Flow is Very Concentrated Range of Lift Values Range of Lift Values (Omitting the Largest) (The Top 5% Provides 88% of the Lift) 7 140 6 120 5 100 t) Lift (Target Lift (Target) 4 80 3 60 2 ? 40 1 20 0 0 012 3456 7 8 9 10 11 12 13 14 15 16 17 18 19 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Bins of an Equal num ber of Records Bins of an Equal num ber of Records Lift Baseline Lift Baseline Test weight and target variations, lift and lift_log 17
  19. 19. Data Mining++ Preprocessing - Categorical Average past Lift per category Percent off bin (i.e. 0%, 5%, 10%, 15% … 80%) Price Savings Bin (i.e. $2, $4, $6 …) Store hi S hierarchy h Product hierarchy (50k to 100k SKUs, 4-6 levels) Department, Sub department, Category Sub-Catgegory Department Sub-department Category, Sub Catgegory Seasonality, time, month, week Reason codes (the event is a circular, clearance) Location on the page in the flyer (top right, top left..) Multivariate combinations – powerful & scalable (price bin) + (page loc bin) + (sub-cat) 18
  20. 20. Data Mining++ Preprocessing – Interactions 19
  21. 21. Data Mining Design Of Experiments (DOE) Model Notebook (pictured in next slide) One row per model trained input columns: data version, model parameters output columns: training time, results in-sample, out of sample, gap (bigger is worse), and gap penalized results Sections per data mining algorithm, i.e. Stepwise Regression Naïve Bayes Regression, Cubist (tree w/ regression in leaves) Neural Net TreeNet (from Salford Systms) 20
  22. 22. Data Mining++ Instead of Occam’s Razor Model Notebook Tracks DOE Generalization Error = abs( in sample res – out of sample res ) Conservative Result = worst( in, out samp ) + Generalization Err (, p MODEL RESULTS ANALYSIS ENGINE SETTINGS Mean Abs Err (-good) 1 2 3 4 N in In Out of Gen: Out + Eng parameter 1 parameter 2 parameter 3 comment ser Samp Samp In-Out Gen 1 regr Try target: LIFT LOG LIFT_LOG 58 vars selected 1.184 1.264 1 184 1 264 0.08 0 08 1.34 1 34 limit to 15 2 regr Try target: LIFT_LOG limit to 15 1.21 1.289 0.08 1.37 vars 3 regr Try target: LIFT 65 vars selected 1.732 2.654 0.92 3.58 limit to 15 4 regr Try target: LIFT limit to 15 1.714 1.837 1 714 1 837 0.12 0 12 1.96 1 96 vars Start with unv4_trn, and set larger wgt's 5 regr for larger lift values wgt_2=1; 60 vars selected 1.20 1.42 0.22 1.63 21 IF(2<lift) wgt_2 = 2; IF(5<lift) wgt_2 = 3;
  23. 23. Data Mining++ Data Mining Algorithm Improvements Cubist Ross Quinlan uses a “greedy algorithm” to select regression fields for each leaf Tested and changed to “stepwise regression” for stepwise regression each leaf Split 1 Split 2 p Split 3 p Leaf 1 Leaf 2 Leaf 3 Leaf 4 22
  24. 24. Data Mining Retail Training Priority – a Complex Surface $180,000 $160,000 $140,000 $120,000 on-Event Cash Flow w e-Items * $100,000 Event Lift * $80,000 C Num Store $60,000 $60 000 $40,000 $20,000 N lift to 4.1 $0 No cash to $7,647 lift to 2.1 cash to $182 cash to $79.38 lift to 1.4 ca to $48.03 cash to $32.36 cash to $22.89 lift to 1.0 Lift cash to $17.08 2.54 ash Cash Flow = lift to .55 55 o h 1 cash to $8.81 $ cash to $12 cash to $6 Non-Event Units/day * Price 23
  25. 25. Data Mining Retail Model Notebook: Example of Describing Models Top 1/6 of most expensive items, $5.30+ |||||||||||||||||||||||||||||||||| Past lift by store, sub dept, dept, front page store sub-dept dept ||||||||||| Average daily sales per item over prior events |||||||||| Average price ||| Item is located on the front page of the flyer | Number of Saturday & Sundays in the event Item comes from the Health and Beauty dept Item in the Stationary department Avg # items sold / day 24
  26. 26. Data Mining Retail Calculate $ of “Business Pain” Business Pain zero error Over Under Stock Sk Stock 25
  27. 27. Data Mining Retail Calculate $ of “Business Pain” Business Pain zero error ? 15% business pain $ 1% bus Over Under pain $ Stock Sk Stock Equal mistakes q Unequal PAIN in $ 26
  28. 28. Data Mining++ Retail Calculate $ of “Business Pain” Business Pain No way – that could get you fired! New progress in getting feedback zero 30% bus error 15% business pain $ pain $ 1% bus Over Under pain $ Stock Sk Stock 4 week supply Equal mistakes q of SKU Unequal PAIN in $ 30% off sale 27
  29. 29. Data Mining Best Models by Lift Correlation <> Best by $ The order of “best” models ranked by best technical metrics (correlation, MAD) vs. business pain metric did ’t match bi i t i didn’t th A HUGE mismatch! Change error function of data mining algs “$ over stock and under stock” 28
  30. 30. Data Mining++ Change Data Mining Algorithm Error Func Error function depends on knowing the threshold per SKU “4 weeks of normal sales volume for the SKU 4 SKU” Neural Net (proprietary, from missile targeting) After epoch, i.e. forward pass of 1000 records, calculate this error to minimize Stepwise Regression & Cubist Leaf Regr. Change optimization problem from an RMSE of the target to RMSE of this error function & target 29
  31. 31. Product Mgmt Retail Worry About Response Time 30
  32. 32. Product Mgmt Data Mining User Interface: 5 Levels of Complexity Needs to make reliable for simplest step Source data fields: use what is available & populated Insure the minimum data enables a reliable system Use metadata to select fields (i e exclude low corr, empty) (i.e. corr Level 1: Train 6 models each for 3 fast engines, or with fast settings g g (i.e. more shallow trees) (~30 seconds) Later Levels: Add more extensive search per engine of model parameters more models in DOE, use slower engines, stay time sensitive (~30 minutes to 2 hours) 31
  33. 33. Product Mgmt Data Mining How is ELF Software and Not Consulting? Software install and configuration process Connect to Event Planning, Connect to Replenishment Use metadata tags on custom fields Not dependent on field names Semantic (i.e. spending) and analytic tags (categorical, source) Preprocessing executes if supporting data is available Installer validates by using ELF to create test models End users create production models Event E t Lift 4 32
  34. 34. Outline g Challenge: How to automate not only y forecasting, but model training? Solution: Focus on a vertical market application Deeply investigate the business & technical i D li i hb i h i l issues Result: An enterprise application Up to a 30% reduction in $ lost to over and under 30 educt o ost o e a d u de stock 33
  35. 35. Retail Data Mining Result: Reduction in Business Pain 8 to 30% Reduction in Business Pain $ ELF, Model 117 ELF ELF ELF over $ over ELF HIGH $ High Over $ under under stocking stock stock Over Stock Stock stock stock 181 87 $ 87 190 31 $ 31 183 46 $ 46 115 77 $ 233 179 105 $ 105 191 109 $ 109 252 101 $ 101 176 40 $ 40 122 37 $ 111 169 6$ 6 183 122 $ 122 119 37 $ 112 287 130 $ 477 34 412 141 $ 281
  36. 36. Product Mgmt Software Dev Result: Start Agile Process After After… Product Requirements Document (PRD) Technical Specifications: data flow diagrams, use cases, business metric Working Prototype, support for testing Go through Agile & Scrum efforts w/ the software soft are engineering group gro p Review, revise, evaluate vs. business metrics 35
  37. 37. Product Mgmt Data Mining Result: Patent Application Process Provisional Patent Re-write with help of patent attorney, very formal Application will not be published for 18 months Ordinary Skill in the Art Written by… Jeffrey D Ullman, Stanford Computer Science h //i f l b f d d / ll / b/f 00 h l The idea must be “novel,” “non obvious” & useful Novel – does not appear in previous literature Non obvious – would not be discovered by one of “ordinary skill in the art when the idea is needed ordinary art” How obvious is “obvious?” To how many of 100? 36
  38. 38. Data Mining To What other Verticals Could This Apply? It can apply where p pp y p , past examples in volume, relate to future examples Marketing / Advertising: (media independent) g g( p ) Finding new customers, clickers, buyers, spending Cross sell, up sell p Customer Attrition (most likely to cancel) Mortgage Bond p gg pricing g (p (help US out of this mess) ) rating mortgages inside, forecasting p p y g prepayment & default rates Many other verticals 37
  39. 39. Summary How to automate? From the center out (i.e. onion) Narrow vertical application, known data source & feeds application How to select training data? Broadly Best improvement? B ti t? Optimize by what gets people promoted or fired Change DM alg. to opt. bus metric alg opt How to make robust? Support, but not require, fields Heavy Research and Prototyping (R&P) before starting Agile How to succeed in business software? Support end users at the level of complexity they want pp p y y Help them succeed consistently and reliably 38
  40. 40. Questions & Answers? (408)781-6808 cell This PPT will be posted on SF Bay ACM and LinkedIn, below (Video company) Future talks for ACM and ACM DM SIG http://www sfbayacm org/dmsig php Other talks http://www meetup com/Bay-Area-Collective-Intelligence/ (business intelligence & other sigs) 39