Your SlideShare is downloading. ×
Data Mining & Knowledge Discovery
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Data Mining & Knowledge Discovery


Published on

Published in: Technology, Business

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Data Mining & Knowledge Discovery: Personalization Technologies for One to One Marketing Bhagi Narahari
  • 2. Outline of Lecture
    • What and Why of Data Mining and KDD?
      • Importance and Applications to E-commerce
    • How ?
    • Personalization
      • personalized one-to-one business on the internet
    • Part I: Overview of Personalization
    • Part 2: The Data Mining Process
  • 3. Predictive Modelling
    • A “black box” that makes predictions about the future based on information from the past and present
    Age balance income How much will customer spend on next catalog order ? Model (Crystal ball?)
  • 4. What is Data Mining?
    • It is the exploration and analysis by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules.
  • 5. Why now? (A historical perspective)
    • Because data is now available (wasn’t always)
    • Distributed sources
    • Technology evolution
    • Competition (do what you can to outdo)
  • 6. Why DM?
    • CRM (Customer Relationship Management) - important success factor in E-commerce
      • price differentiation no longer enough
      • customer service more important
    • Links with suppliers already exist (B2B) - JIT, joint forecasting, planning, procurement
    • Current emphasis on links with customers - feedback, input in design, etc.
  • 7. CRM
    • Identifying profitable customers
    • Better service for more valued customers
    • Retaining profitable customers
      • Getting a new customer costs a lot more than retaining an existing one
      • takes 5X to acquire new customers (Peppers&Rogers)
      • An increase from 75% to 80% in retention reduces costs by about 10%
    • Larger share of customer pool
  • 8. CRM
    • Product differentiations based on “price” and “quality” are increasingly difficult
      • need to differentiate based on relationships
    • Increasingly sophisticated mass marketing increases probability of success
      • cost of mass marketing is driven down by internet (reach)
  • 9. CRM
    • Goal: Positively interact with your customers and prospects
      • define customer segments
      • lights out execution of campaigns against segments
      • attribution and evaluation of responses
  • 10. Personalization in Ecommerce
    • Positive:
      • much better chance of personalization
        • customer identification
        • tracking across visits and within visit
      • ability to do ‘what if’ experiments
    • Negative:
      • cost of switching is much less
      • is web based shopping good for ‘touchy feely’ things
      • price differentiation across geographies not easy
  • 11. Personalization Product Discovery Product Evaluation Terms Negotiation Order Placement Order Payment Customer Service & Support Market Research Market Stimulation/ Education Terms Negotiations Order Receipt Order billing and payment management Customer Service & Support Producer Chain Customer Chain
  • 12. B2C Personalization Objectives
    • Know the customer
      • profile - registration, cookies
    • Determine what the customer wants
      • Ask: Questionnaires
        • what is the incentive for truthfulness
      • Deduce: click streams, history, collaborative filtering (Amazon!!)
    • Deliver
      • Customize the look and feel
      • offer special promotions
      • offer customized products (Holy Grail)
  • 13. Use of Personalization
    • In addition to storing and retrieving information on the individual’s profile “on the fly”
      • can also use mining software to analyze the information in the database to make recommendations or comments specific to the individual
  • 14. Impact of Personalization
    • Customer relationship
    • Learn more about customers
      • learn and understand the why and how they prefer to do business with your organization
    • In tandem with tracking provides you with a tool to monitor your website
      • what works, what does’nt, what makes your audience “click”
  • 15. Security and Privacy as Barrier to Personalization
    • Large number of customers concerned about personalization (double click!)
    • will they pay more to preserve privacy?
    • Some falsify info to preserve privacy
    • customers give more info to trusted site
    • need secure site with clear privacy policies stated at site
  • 16. Personalization Know the Customer Identify Give the customer his/her wants Questionnaires Past history Click Streams Profile Login Credit Card# Predicting the wants Mapping to “ peers” Extrapolation from past Extrapolation from peers ( Look &feel Product selection& promotions New Product
  • 17. Know the customer
    • Cookies
      • backlash (users do not trust them)
    • OPS: Open Profiling Standard
      • combined with eTrust certification
    • Registration
      • User certificates: logons
    • Key Question:
      • how do you know that this customer is same as that goes to your storefront
      • need standard warehouse techniques like address resolution, cred.card resolution etc.
  • 18. Know the Customer:OPS
    • Two drivers
      • user should not retype again & again basic info
      • data is used in a trusted fashion (not leaked, other data not see etc.) by users
    • Two parts
      • Common data
        • demographics (country,zip,age,gender)
        • Contact (name, address, CreditCard…)
        • User agent preferences
      • Per-site Sections (can be shared across sites, if user allows)
  • 19. What if no profile???
    • Deduce
      • collect information: history of purchases, time spent on pages
      • ask questions (offer rewards)
      • combine with database marketing data
    • Predict behaviour
      • buy probabilities
      • build customer relationship
    • mining is key!
  • 20. Personalization: Actions to take- Look and feel
    • Personalized pages
      • specific data
      • specific presentation and design
      • sent through various mediums
    • Manage Customers not products: 1-1 marketing
      • deliver personalized pages
        • eg: stock portfolio, personal info including alarm, travel reservations
      • use different mediums
        • WAP enable phones (eg: Sprint PCS Web)
  • 21. Storefront Personalization
    • Customers visit Store Website
      • Howard buys ties
      • Rob buys Baby Products
      • Ray buys toys
      • Amy buys clothes
    • Provide a view of the store to these customers
      • present them with what they are likely to buy?
        • Howard: ties, and men’s formal wear
        • Ray: Toys and gadgets
        • Rob: Infant, Toddler section
        • Amy: Women’s Clothes section
  • 22. More Actions: Product Presentations & Promotions Basic Storefront Product Hierarchy Clothes Men’s Women’s Children’s Shirts Pants Casuals Evening Infants Kids John’s View Mary’s View
  • 23.
    • BroadVision One-to-One application
      • allows businesses to develop and manage personalized web sites
      • interactively profile each visitor and dynamically match info based on their profile and business rules specified by providers of site & services
        • users do not go through hoops finding relevant data
  • 24. DM Terminology OLAP ROLAP Data Warehouse Data Marts Data Stores Neural Networks Genetic Algorithms Data Mining Rule Based Systems SQL
  • 25. How?
    • Determine probability of buying as a function of customer attributes such as age, income, past buying patterns, ..
    • Target customers by ranking from highest to lowest probabilities
    • Other techniques: Decision Trees, Neural Networks, ….
  • 26. KDD
    • Knowledge Discovery in Databases
    • It is the process of identifying valid, novel, potentially useful, and understandable patterns in data (Fayyad, Piatesky-Shapiro, and Smyth)
    • It involves data preparation, pattern extraction, knowledge evaluation, and refinement, in iteration
  • 27. KDD
    • Data mining is a step in the KDD process that involves the application of certain algorithms to extract patterns
    • Steps in the KDD process:
        • Select Data
        • Data Cleansing and Pre-processing
        • Data Mining
        • Results interpretation
        • Implementation
  • 28. Pre-processing in KDD
    • 80-90% of KDD process is spent here
    • Why?
        • Operational data is incomplete, inconsistent, in different formats across systems
        • DM techniques might require data in a specific format
  • 29. Data Mining Problems
    • Classification/Segmentation
      • Binary (Yes/No)
      • Multiple Category (Large/Medium/Small)
    • Forecasting (how much)
    • Association Rule extraction (market basket analysis)
    • Sequence detection
      • balance increase -> missed payment -> default
  • 30. Typical DM tasks
    • Prediction and Classification
      • Directed
      • Decision trees, Neural networks, memory based reasoning, logistic regression
      • Examples:
        • How many units will be sold on a given day?
        • What will be the stock price on a given day?
        • Will a customer buy the product or not?
  • 31. DM tasks
    • Affinity grouping
      • Undirected
      • Which products go together naturally?
      • The beer-diaper syndrome?
      • Market basket analysis
      • Examples:
        • Which products peak in demand simultaneously?
  • 32. DM tasks
    • Clustering task
      • Undirected
      • Segmenting into similar clusters
      • Different from classification
      • Examples
        • Customers with similar buying profiles
        • Products with similar demand patterns
  • 33. DM success factors
    • Integration with data warehouses and DSS
    • Users should develop a good understanding of techniques
    • Recognize that these tools cannot automatically find patterns without being told what to do
    • Most methods now used are extensions of analytical methods that have been around for decades
  • 34. Legal and Ethical Issues
    • Privacy concerns
      • becoming more important
      • will impact the way that data can be used and analyzed
      • ownership issues
      • European data laws have implications on US
    • Often data included in the data warehouse cannot legally be used in decision making process
      • Race, Gender, Age
    • Data contamination will become critical
  • 35. Making Decisions Data Warehouse? Models Decisions Data Data Data Data
  • 36. Data Warehouse
    • Bill Inmon: “A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management decisions.”
    • is managed data that is situated after and outside the operational systems
  • 37. Data Warehousing
    • Increasing need to find, summarize, and interpret large amounts of data effectively
      • Especially when data is distributed across many different databases
    • Transaction processing systems not easily accessible to other systems
      • Plus TP systems have time constraints
  • 38. Enter the Data Warehouse
    • To deliver decision data to decision makers
    • by integrating data from various TPS to a single storage which can then
    • feed a range of decision support applications
    • through an OLAP interface!
  • 39. Data Complications
    • Noise
    • Missing data
    • Transformation
      • numeric data
      • text
    • Need to differentiate between variables you can control and those you cannot
      • Actionable: size of discount, number of offers etc.
      • Non-actionable: age, income ..
  • 40. Data Mining Techniques
    • Market Basket Analysis
    • Memory Based Reasoning
    • Cluster Detection
    • Link Analysis
    • Decision Trees and Rule Induction
    • Neural Networks
    • Genetic Algorithms
    • OLAP
  • 41. OLAP: On Line Analytical Processing
    • While a data warehouse brings data together, OLAP lets you look at data and manipulate interactively
    • OLAP allows users to “slice and dice” data
    • Allows user to drill-down into detail data
  • 42. Relational vs Multidimensional
  • 43. Consolidations
  • 44. Multidimensional Terminology
    • East, West, Central are input members of the Region dimension. Total Region is an output member of the Region dimension . Similarly, Nuts, Screws, Bolts, Washers, and Total are members of the Product dimension.
    • Variables are typically numerical measures like Sales, Costs, Profits, Expenses, and so forth.
    • Dimensions are roughly equivalent to Fields in a relational database. Cells are roughly equivalent to Records.
  • 45. Steps in DW and OLAP Data Loader Data Converter Data Scrubber Data Transformer Data Warehouse OLAP Server OLAP Interface Data Data Data
  • 46.  
  • 47. Cluster Detection
    • Undirected data mining
    • Finds records that are similar to each other (clusters)
    • Clusters are found using geometric methods, statistical methods, and neural networks
    • Good way to start any analysis
  • 48. Market Basket Analysis
    • Form of clustering used for finding items that occur together (in a transaction or market basket)
    • Likelihood of different products being purchased together as rules
    • Planning store layouts, limiting specials to one of the products in a set,...
  • 49. Transaction data
  • 50. Co-occurrence matrix
  • 51. Support and confidence
    • For a rule that says: If A then B
    • Support is defined as the ratio of number of transactions that include both A and B to total number of transactions
    • Confidence is defined by the ratio of the number of transactions that include both A and B to the number of transactions that include A.
    • How do you specify ‘significant’ support and confidence ?
  • 52. Algorithm for Finding Association Rules
    • Input is Min-Support and Min-Confidence
    • Find all sets of items with Min-Support ( frequent itemsets )
      • Frequent Itemsets Property: Every subset of a frequent itemset must also be a frequent itemset
        • iterative algorithm: start with frequent itemsets with one item, and construct larger itemsets using only smaller frequent itemsets.
  • 53. MBA example
    • Using the sample data create a co-occurrence table
    • Let relevant Support = 25% and Confidence= 50%:
      • Beer and Diapers appear in 3/5= 60%
      • If beer then diapers has confidence of 2/3=67%
      • Thus, “If customer buys beer then customer buys diapers” satisfies 25% support & 50% confidence
    • Conclusion drawn by mining system:
      • Customers who buy beer also buy diapers
  • 54. Applying MBA Results
    • Is the relationship useful ?
      • Beer and Diapers may not be of use
      • Victoria’s Secret transaction mining led to specific apparel sent to specific stores -- Microstrategy software
    • Who defines “usefullness”
      • only as good as rules specified by humans/marketing workforce
      • NBA mining: designers of s/w did not include height mismatches at first…coaches made the correction
  • 55. Data Mining Algorithms
    • Four algorithms commonly cited
      • Association Rule (used in over 90% of the cases!)
      • Nearest Neighbor
        • quick and easy but models get large
      • Decision Tree
      • Neural Network
        • difficult to interpret and large time
  • 56. Decision Trees
    • Series of if/then rules
      • easy to understand, complexity in implementation
    No yes Balance<10K Balance > 10K Age > 48 Age< 48 yes
  • 57. CRM and Data Mining
    • Recall:customer segmentation is key in CRM
      • data mining can help improve understanding of customer behaviour
        • helps located meaningful segments from customer data
      • users want to turn that understanding into an automated interactions with their customers
  • 58. Integrating Data Mining & CRM
    • Data mining application owns the modelling process
    • CRM application owns the campaign execution process
    • Goals:
      • minimize pain involved with using models in campaigns
      • score records only when and where necessary
  • 59. Integrating Mining & CRM
    • Step 1:
      • analytic user creates model using mining system
      • model is then exported into campaign management system
    • Step 2:
      • Marketing user creates campaign that includes predictive models
      • when campaign executes, data mining engine scores customers dynamically
  • 60. Benefits of Integration
    • Pre-generated model selection
    • Score defined segments “on the fly”
      • eliminates need to score entire database
      • improve efficiency of campaigns
    • Reduces manual intervention and error
    • Accelerates the market cycle
      • increases likelihood of reaching customers before competitors
      • improves campaign results and lower costs
  • 61. Summary
    • “ Using the new media of the one-to-one future, you will be able to communicate directly with customers individually…..” - Don Peppers & Martha Rogers (One-to-One Future)
    • “ What are you afraid of?…..Even if you’re not afraid of these things, the beauty is,with proper marketing, we can make you afraid”-- Michael Saylor, CEO Microstrategy.