Data mining with Clementine for smarter retailing


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data mining with Clementine for smarter retailing

  1. 1. White paper – executive briefing Data mining with Clementine for smarter retailing
  2. 2. Data mining with Clementine for smarter retailing Introduction Retailing is a highly competitive business, subject to many changes and cycles in customer buying preferences. Since the introduction of point-of-sales terminals and universal bar coding, retailers have accumulated a wealth of sales data. More recently the use of credit cards, and especially of store cards (e.g., “frequent shopper” or retail credit cards), makes it possible to relate purchases to particular customers. This wealth of data provides the opportunity for competitive advantage. Online retailing has provided new opportunities for retailers as well. In addition to collecting information on what customers buy, retailers can collect information on what customers look at during the shopping process. This provides another level of detail for understanding what customers want and meeting these needs more efficiently: Customers are happier because they find what they need while retailers maximize profitability. The Web also offers new possibilities for merchandising. In brick and mortar operations, placing items next to each other to increase cross-selling usually isn’t physically possible. Changing shelf placement upsets other product affinities and store space is limited and inflexible. With the Web, a virtual store offers endless possibilities for tailoring “shelf space” for each shopper. Data mining is the key for taking advantage of these new opportunities it enables organiza- tions to understand and predict customer behavior. This paper highlights data mining myths and truths, offers common retailing applications and explores market basket analysis in detail. Data mining myths and truths s Myth: data mining is a new way of doing business Data mining is a natural business process: Data mining isn’t about expensive hardware and even more expensive consulting. In actuality, data mining is straightforward and usually provides quick and substantial payback. Put simply, data mining means finding patterns in your data, which you can use to do your business better. At one level, data mining is merely automation of the oldest and most fundamental of business processes: Analyzing what you did in the past and interpreting the results in order to learn how to do better in the future. Yet, at another level this same process can revolutionize retailing by enabling a one-to-one relationship with each and every cus- tomer. So whenever individual customers come into the store, we know exactly what products we most want to sell to them in order to build loyalty and profitability. s Myth: the more data you have, the better your results Business knowledge is more important than massive data: The key to success in data mining is the quality of the business input. Good data mining is directed by business goals. For example, undirected segmentation of customers is likely to produce illusory or transient segments. One study showed that when a supermarket repeated the same clustering exercise one year later, more than 50 percent of customers had changed segments. Similarly simple associations, such as the well-known “diapers and beer” Executive briefing 2
  3. 3. Data mining with Clementine for smarter retailing pattern, even if real, have negligible business value. Before taking any action, the retailer must check out all the other associations with diapers or beer, since any action to leverage this unusual association runs the danger of upsetting many other valuable product associations. Data needs to be analyzed with a business goal in mind. Classifying customers into “high-margin,” “average-margin,” and “low-margin” segments can provide real business value. Data mining can then be used to find the patterns or “signatures” underlying the high-margin customers, who can then be motivated through appropriate, targeted promotions. It can also identify average- or low-margin people with the most potential to be developed into high-margin accounts. s Myth: data mining is throwing algorithms at data Business knowledge will suggest how best to treat data: Usually the raw data is not in itself the most fruitful data to mine. We may want to look at purchases per week, frequency, recency or store-specific or category-specific ratios such as average sales per square foot of shelf space or sales up-lift following a promotion. The crucial impor- tance of these derived attributes is well known to the business-savvy executive but not to a data-oriented analyst. So when you are data mining, the quality of business-knowl- edge input to the project is crucial. To be effective, data mining tools must not distance the business expert from the data, because results require making this business input easy to achieve and understand. Figure 1. After minimal training, a business user treated data by deriving higher-value fields, rep- resented by these “nodes” in a data mining process. These new fields provided the boost required for better store-siting models. In summary, data mining algorithms are very smart about finding patterns in data. But we really want patterns in business behavior. It is the quality of the business input that turns one into the other. The Clementine data mining system is designed to invite the business insight that leads to better data analysis. Executive briefing 3
  4. 4. Data mining with Clementine for smarter retailing Data mining in retail Retailing offers many profitable applications, including customer relationship management (CRM) and e-CRM, store performance analysis, purchasing and stock management and other applications. CRM and e-CRM: s Customer profiling s Customer profitability analysis s Targeted marketing s Basket analysis s Opportunities for up -selling or cross-selling s Churn prediction These applications work by data mining basket data when the customer identity is known, usually as a result of e-store tracking or card membership information. Baskets can be analyzed for profitability and successive baskets bought by the same customer can be aggregated to yield a measure of customer profitability. This data can then be mined to find the highest-profit customer profile. Often high-profit is associated with purchase of certain categories or brands. Other customers can then be scanned to find those with the potential to move into the high-profit category. The relevant products can be offered to these customers via custom online content, a mail- ing or coupon generated at the checkout. Data mining can also be used to examine the behavior of customers in the period leading up to an apparent defection, e.g., they’ve stopped coming into the store or dropped an online shopping cart. If we then find other customers – and especially our profitable customers – exhibiting the same behavior, we can take urgent action to reinforce loyalty. Overall store performance: s Store (or department revenue or profit) forecasting s Store performance assessment s Store site assessment s Store closure program management Store performance analysis is performed by building models that predict sales (or profits) based on store, site and demographic inputs. These models can be used to analyze stores by specifying which ones perform above the level predicted by the model and those that fall below. The model can be used to assess the sales from a possible new site to lessen the risk of expansion. Purchasing and stock management applications: s Category management s Promotions planning and analysis s Demand forecast for individual stock items or categories s Reliability/timeliness/quality of suppliers Executive briefing 4
  5. 5. Data mining with Clementine for smarter retailing Data mining can analyze and model sales patterns and how they vary with store demo- graphics, time of year, economic climate, weather, promotions and many other factors. If data is available, the uptake of new lines can be modeled with a view to optimizing the life-cycle profitability of a line. The resulting models can be used to make more accurate purchasing decisions, to tune promotions strategies and align these with stock purchase decisions. Miscellaneous retail applications: s Human resources: Optimize recruitment, manage staff development and reduce staff turnover s Credit risk assessment of customers and suppliers s Equipment reliability modeling and pre-emptive maintenance planning Performing market basket analysis to increase customer profitability Selling as many different products as possible to your customers maximizes their value to your business. One way to accomplish this goal is to understand what products or services customers tend to purchase at the same time or later on as follow-up purchases. A common data mining approach for this type of problem is market basket analysis, which is the analysis of transactions or the items purchased by a customer. Business managers or analysts can use a market basket analysis to plan: s e-store layout. Develop custom content such as a list of other items that may be of interest to the customer s Couponing and discounting. Changing a coupon to feature a second product that a customer might buy would increase sales at no additional promotion expense s Product placement. Place products that have a strong purchasing relationship close together to take advantage of the natural correlation between the products. Alternatively, place such products far apart to increase traffic past other items s Timing and cross-marketing. For example, assume that analysis produced a rule that says people who have purchased pet food are three times more likely to purchase pet toys in the time period one-to-three months after the pet food was purchased Case study: market baskets in a grocery store The following, is an example of determining product affinities (which products are most likely to be purchased together) for a grocery store to use in determining future discount promotions. The data sample was taken for a set of customers over a one-month period. In reality, stores sell specific items (universal product codes or UPCs). For example, Tropicana orange juice in a 12-ounce bottle, Minute Maid orange juice in a 6-ounce can, etc. While sometimes one might be interested in rules that deal in specific items, this example generates rules that are more general for a product hierarchy and groups similar items together: produce, deli, bakery, seafood, frozen, meat, etc. Identifying this hierarchy was determined through data mining techniques as well. Executive briefing 5
  6. 6. Data mining with Clementine for smarter retailing In the process of discovering affinities in the basket contents, Figure 2 illustrates associa- tions between the products, with the darker lines indicating the strongest association between products. Figure 2. A diagram produced in determining product-purchasing relationships. From this display, three groups of customers stand out: 1) those who buy produce and seafood items, 2) those who buy frozen and miscellaneous items and 3) those who buy deli, meat and bakery items. Left-hand side Right-hand side Confidence Lift Support (percent) Ratio (percent) 1. Deli –> Bakery 50.00 8.33 2.00 2. Bakery –> Deli 33.33 8.33 2.00 3. Deli –> Meat 40.00 20.00 1.60 4. Meat –> Deli 80.00 20.00 1.60 5. Meat –> Bakery 2.20 0.37 0.04 6. Bakery –> Meat 0.73 0.37 0.04 7. Deli & Meat –> Bakery 2.50 0.42 0.04 8. Deli & Bakery –> Meat 2.00 1.00 0.04 9. Meat & Bakery –> Deli 90.91 22.73 0.04 10. Deli –> Meat & Bakery 1.00 22.73 0.04 11. Meat –> Deli & Bakery 2.00 1.00 0.04 12. Bakery –> Deli & Meat 0.67 0.42 0.04 Figure 3. Example of rules for the deli, bakery and meat products with their confidence, lift and support. The Web diagram suggests a strong association between meat, deli and bakery purchases. Further analysis on this group using an association rule detection technique was performed to find out more information on this group. Figure 3 is an example of the rules produced, which give more details on the relationship of this product grouping. Executive briefing 6
  7. 7. Data mining with Clementine for smarter retailing Each rule has a left-hand side (when people buy a product) and a right-hand side (they also buy a different product). A rule has two measures, called confidence and support. Confi- dence measures how much a particular item is dependent on another. When people buy deli product(s), they also buy bakery product(s) 50 percent of the time, which is the confidence for this rule. Lift measures the difference between the confidence of a rule and the expected confidence. Lift is one measure of the strength of an effect. Support measures how often items occur together, as a percentage of the total transactions. The greatest amount of lift is found in the ninth and tenth rules, which both have a lift greater than 22. Looking at the ninth rule, the lift of 22 means that people who purchase meat and bakery items are 22 times more likely to also purchase deli items than people who do not purchase meat and bakery items. Also, note the negative lift (or a lift ratio of less than 1) in the fifth, sixth, seventh and last rules. The latter two rules both have a lift ratio of approximately 0.42. The negative lift on the seventh rule is interpreted to mean that people who buy deli and meat items are less likely to buy bakery items than one would ordinarily expect (that is, in the absence of deli and meat items). It is not coincidental, but might seem somewhat counterintuitive, that the lift of a rule and its inverse rule are the same. This type of analysis can help companies identify groups of products or services that customers have already demonstrated a tendency to acquire together or in subsequent purchases. The example given above is a good illustration of how this type of analysis fits into retail applications, as detecting and assessing relevant patterns can benefit any business that accumulates large volumes of transactions. Adding profitability to market basket analysis The analysis can be taken a step further by examining profitability. From the data, we know if a basket is high value (high margin of profit) and we tie this back to those cus- tomers who are buying the associated items. For this example, we are looking at a different dataset, which has a strong association be- tween customers who buy beer, “beans” (canned goods) and “pizza” (frozen items). The row marked “T” (True = the group in question) shows that only 0.685 percent of the beer/beans/pizza customers have high-value baskets, while the 55.497 percent, the majority, have low value baskets. A rule induction algorithm can then be used to profile who the beer/beans/pizza customers are in terms of demographics. Executive briefing 7
  8. 8. Data mining with Clementine for smarter retailing The rule set highlights the profile of those who are likely to buy beer/beans/pizza. It has found one rule that states males with income <= 16000 units are 83.8 percent likely to buy this set of goods; in the dataset this covers 165 cases. The profiles also tell us who is very unlikely to buy these goods together, notably higher earners and/or females. For completeness, a report of this profile (model) in terms of accuracy and performance is shown below. The model is correct over 96 percent of the time. The coincidence matrix shows that the model’s prediction for true coincides with actual observations 139 times and for false 828 times. Only a handful of cases are misclassified. Executive briefing 8
  9. 9. Data mining with Clementine for smarter retailing Other retail applications Details of work carried out for the majority of these clients are confidential. However, some details from other retail applications are provided. Halfords used Clementine to model the total sales from stores. The key to success was the ability of the business user, Chris Hawkins, Finance Manager of the Property Depart- ment, to himself model the interesting data and business ratios for each store. Halfords had previously bought three months of external consulting from an analytical group in a major consulting company. They failed to generate an accurate model. The model, built by the user himself in three weeks using Clementine, proved highly accurate and is used to assess investments in new stores. Somerfield Stores is a top five UK Supermarket, which serves more than seven million cus- tomers each week. SPSS Inc., together with the Parallel Applications Center, are working to maximize the value in Somerfield’s data. Somerfield’s Jerry Warren stated that, “We have uncovered a great many opportunities for data mining within our organization – one ap- plication will be to optimize consumer choice in our smaller stores.” Using Clementine on its customer database, a major US retailer uncovered buying patterns among its customers. The retailer was then able to create a segmented mail-order promotion that matched one of several different direct-mail advertisements with the buying patterns of a specific customer. The result was a sales increase of 30 percent. Other retail applications successfully implemented by Clementine customers have included basket analysis, human resources planning, promotions analysis and site planning. Conclusion Especially in the world of online retailing, new opportunities for improving customer relationships are happening now. Both online and traditional retailers need to discover and predict customer behavior to survive and excel in the new competitive landscape. Data mining provides the competitive difference in retailing by delivering the intelligence for creating more profitable customer relationships. Executive briefing 9
  10. 10. Data mining with Clementine for smarter retailing Data mining makes the difference SPSS Inc. enables organizations to develop more profitable customer relationships by providing analytical solutions that discover what customers want and predict what they will do. The company delivers analytical solutions at the intersection of customer relationship management and business intelligence. SPSS analytical solutions integrate and analyze market, customer and operational data and deliver results in key vertical markets worldwide including: telecommunications, health care, banking, finance, insurance, manufacturing, retail, consumer packaged goods, market research and the public sector. For more infor- mation, visit Contacting SPSS To place an order or to get more information, call your nearest SPSS office or visit our World Wide Web site at SPSS Inc. +1.312.651.3000 SPSS Israel +972.9.9526700 Toll-free +1.800.543.2185 SPSS Italia +800.437300 SPSS Argentina +5411.4814.5030 SPSS Japan +81.3.5466.5511 SPSS Asia Pacific +65.245.9110 SPSS Korea +82.2.3446.7651 SPSS Australasia +61.2.9954.5660 SPSS Latin America +1.312.651.3539 Toll-free +1.800.024.836 SPSS Malaysia +603.7873.6477 SPSS Belgium +32.16.317070 SPSS Mexico +52.5.682.87.68 SPSS Benelux +31.183.651.777 SPSS Miami +1.305.627.5700 SPSS Brasil +55.11.5505.3644 SPSS Norway + SPSS Czech Republic +420.2.24813839 SPSS Polska +48.12.6369680 SPSS Danmark + SPSS Russia + SPSS East Africa +254.2.577.262 SPSS Schweiz + SPSS Federal Systems (U.S.) +1.703.527.6777 SPSS Singapore +65.324.5150 Toll-free +1.800.860.5762 SPSS South Africa +27.11.807.3189 SPSS Finland +358.9.4355.920 SPSS South Asia +91.80.2088069 SPSS France + SPSS Sweden +46.8.506.105.50 SPSS Germany +49.89.4890740 SPSS Taiwan +886.2.25771100 SPSS Hellas + SPSS Thailand + SPSS Hispanoportuguesa +34.91.447.37.00 SPSS UK +44.1483.719200 SPSS Hong Kong +852.2.811.9662 SPSS Ireland +353.1.415.0234 SPSS is a registered trademark and the other SPSS products named are trademarks of SPSS Inc. All other names are trademarks of their respective owners. Printed in the U.S.A © Copyright 2000 SPSS Inc. CLMRWP-0900 Executive briefing 10