Data Mining & Knowledge Discovery Personalization Technologies For One To One Marketing
Upcoming SlideShare
Loading in...5

Data Mining & Knowledge Discovery Personalization Technologies For One To One Marketing






Total Views
Slideshare-icon Views on SlideShare
Embed Views



1 Embed 6 6



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Data Mining & Knowledge Discovery Personalization Technologies For One To One Marketing Data Mining & Knowledge Discovery Personalization Technologies For One To One Marketing Presentation Transcript

    • Data Mining & Knowledge Discovery: Personalization Technologies for One to One Marketing Bhagi Narahari
    • Outline of Lecture
      • What and Why of Data Mining and KDD?
        • Importance and Applications to E-commerce
      • How ?
      • Personalization
        • personalized one-to-one business on the internet
      • Part I: Overview of Personalization
      • Part 2: The Data Mining Process
    • Predictive Modelling
      • A “black box” that makes predictions about the future based on information from the past and present
      Age balance income How much will customer spend on next catalog order ? Model (Crystal ball?)
    • What is Data Mining?
      • It is the exploration and analysis by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules.
    • Why now? (A historical perspective)
      • Because data is now available (wasn’t always)
      • Distributed sources
      • Technology evolution
      • Competition (do what you can to outdo)
    • Why DM?
      • CRM (Customer Relationship Management) - important success factor in E-commerce
        • price differentiation no longer enough
        • customer service more important
      • Links with suppliers already exist (B2B) - JIT, joint forecasting, planning, procurement
      • Current emphasis on links with customers - feedback, input in design, etc.
    • CRM
      • Identifying profitable customers
      • Better service for more valued customers
      • Retaining profitable customers
        • Getting a new customer costs a lot more than retaining an existing one
        • takes 5X to acquire new customers (Peppers&Rogers)
        • An increase from 75% to 80% in retention reduces costs by about 10%
      • Larger share of customer pool
    • CRM
      • Product differentiations based on “price” and “quality” are increasingly difficult
        • need to differentiate based on relationships
      • Increasingly sophisticated mass marketing increases probability of success
        • cost of mass marketing is driven down by internet (reach)
    • CRM
      • Goal: Positively interact with your customers and prospects
        • define customer segments
        • lights out execution of campaigns against segments
        • attribution and evaluation of responses
    • Personalization in Ecommerce
      • Positive:
        • much better chance of personalization
          • customer identification
          • tracking across visits and within visit
        • ability to do ‘what if’ experiments
      • Negative:
        • cost of switching is much less
        • is web based shopping good for ‘touchy feely’ things
        • price differentiation across geographies not easy
    • Personalization Product Discovery Product Evaluation Terms Negotiation Order Placement Order Payment Customer Service & Support Market Research Market Stimulation/ Education Terms Negotiations Order Receipt Order billing and payment management Customer Service & Support Producer Chain Customer Chain
    • B2C Personalization Objectives
      • Know the customer
        • profile - registration, cookies
      • Determine what the customer wants
        • Ask: Questionnaires
          • what is the incentive for truthfulness
        • Deduce: click streams, history, collaborative filtering (Amazon!!)
      • Deliver
        • Customize the look and feel
        • offer special promotions
        • offer customized products (Holy Grail)
    • Use of Personalization
      • In addition to storing and retrieving information on the individual’s profile “on the fly”
        • can also use mining software to analyze the information in the database to make recommendations or comments specific to the individual
    • Impact of Personalization
      • Customer relationship
      • Learn more about customers
        • learn and understand the why and how they prefer to do business with your organization
      • In tandem with tracking provides you with a tool to monitor your website
        • what works, what does’nt, what makes your audience “click”
    • Security and Privacy as Barrier to Personalization
      • Large number of customers concerned about personalization (double click!)
      • will they pay more to preserve privacy?
      • Some falsify info to preserve privacy
      • customers give more info to trusted site
      • need secure site with clear privacy policies stated at site
    • Personalization Know the Customer Identify Give the customer his/her wants Questionnaires Past history Click Streams Profile Login Credit Card# Predicting the wants Mapping to “ peers” Extrapolation from past Extrapolation from peers ( Look &feel Product selection& promotions New Product
    • Know the customer
      • Cookies
        • backlash (users do not trust them)
      • OPS: Open Profiling Standard
        • combined with eTrust certification
      • Registration
        • User certificates: logons
      • Key Question:
        • how do you know that this customer is same as that goes to your storefront
        • need standard warehouse techniques like address resolution, cred.card resolution etc.
    • Know the Customer:OPS
      • Two drivers
        • user should not retype again & again basic info
        • data is used in a trusted fashion (not leaked, other data not see etc.) by users
      • Two parts
        • Common data
          • demographics (country,zip,age,gender)
          • Contact (name, address, CreditCard…)
          • User agent preferences
        • Per-site Sections (can be shared across sites, if user allows)
    • What if no profile???
      • Deduce
        • collect information: history of purchases, time spent on pages
        • ask questions (offer rewards)
        • combine with database marketing data
      • Predict behaviour
        • buy probabilities
        • build customer relationship
      • mining is key!
    • Personalization: Actions to take- Look and feel
      • Personalized pages
        • specific data
        • specific presentation and design
        • sent through various mediums
      • Manage Customers not products: 1-1 marketing
        • deliver personalized pages
          • eg: stock portfolio, personal info including alarm, travel reservations
        • use different mediums
          • WAP enable phones (eg: Sprint PCS Web)
    • Storefront Personalization
      • Customers visit Store Website
        • Howard buys ties
        • Rob buys Baby Products
        • Ray buys toys
        • Amy buys clothes
      • Provide a view of the store to these customers
        • present them with what they are likely to buy?
          • Howard: ties, and men’s formal wear
          • Ray: Toys and gadgets
          • Rob: Infant, Toddler section
          • Amy: Women’s Clothes section
    • More Actions: Product Presentations & Promotions Basic Storefront Product Hierarchy Clothes Men’s Women’s Children’s Shirts Pants Casuals Evening Infants Kids John’s View Mary’s View
      • BroadVision One-to-One application
        • allows businesses to develop and manage personalized web sites
        • interactively profile each visitor and dynamically match info based on their profile and business rules specified by providers of site & services
          • users do not go through hoops finding relevant data
    • DM Terminology OLAP ROLAP Data Warehouse Data Marts Data Stores Neural Networks Genetic Algorithms Data Mining Rule Based Systems SQL
    • How?
      • Determine probability of buying as a function of customer attributes such as age, income, past buying patterns, ..
      • Target customers by ranking from highest to lowest probabilities
      • Other techniques: Decision Trees, Neural Networks, ….
    • KDD
      • Knowledge Discovery in Databases
      • It is the process of identifying valid, novel, potentially useful, and understandable patterns in data (Fayyad, Piatesky-Shapiro, and Smyth)
      • It involves data preparation, pattern extraction, knowledge evaluation, and refinement, in iteration
    • KDD
      • Data mining is a step in the KDD process that involves the application of certain algorithms to extract patterns
      • Steps in the KDD process:
          • Select Data
          • Data Cleansing and Pre-processing
          • Data Mining
          • Results interpretation
          • Implementation
    • Pre-processing in KDD
      • 80-90% of KDD process is spent here
      • Why?
          • Operational data is incomplete, inconsistent, in different formats across systems
          • DM techniques might require data in a specific format
    • Data Mining Problems
      • Classification/Segmentation
        • Binary (Yes/No)
        • Multiple Category (Large/Medium/Small)
      • Forecasting (how much)
      • Association Rule extraction (market basket analysis)
      • Sequence detection
        • balance increase -> missed payment -> default
    • Typical DM tasks
      • Prediction and Classification
        • Directed
        • Decision trees, Neural networks, memory based reasoning, logistic regression
        • Examples:
          • How many units will be sold on a given day?
          • What will be the stock price on a given day?
          • Will a customer buy the product or not?
    • DM tasks
      • Affinity grouping
        • Undirected
        • Which products go together naturally?
        • The beer-diaper syndrome?
        • Market basket analysis
        • Examples:
          • Which products peak in demand simultaneously?
    • DM tasks
      • Clustering task
        • Undirected
        • Segmenting into similar clusters
        • Different from classification
        • Examples
          • Customers with similar buying profiles
          • Products with similar demand patterns
    • DM success factors
      • Integration with data warehouses and DSS
      • Users should develop a good understanding of techniques
      • Recognize that these tools cannot automatically find patterns without being told what to do
      • Most methods now used are extensions of analytical methods that have been around for decades
    • Legal and Ethical Issues
      • Privacy concerns
        • becoming more important
        • will impact the way that data can be used and analyzed
        • ownership issues
        • European data laws have implications on US
      • Often data included in the data warehouse cannot legally be used in decision making process
        • Race, Gender, Age
      • Data contamination will become critical
    • Making Decisions Data Warehouse? Models Decisions Data Data Data Data
    • Data Warehouse
      • Bill Inmon: “A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management decisions.”
      • is managed data that is situated after and outside the operational systems
    • Data Warehousing
      • Increasing need to find, summarize, and interpret large amounts of data effectively
        • Especially when data is distributed across many different databases
      • Transaction processing systems not easily accessible to other systems
        • Plus TP systems have time constraints
    • Enter the Data Warehouse
      • To deliver decision data to decision makers
      • by integrating data from various TPS to a single storage which can then
      • feed a range of decision support applications
      • through an OLAP interface!
    • Data Complications
      • Noise
      • Missing data
      • Transformation
        • numeric data
        • text
      • Need to differentiate between variables you can control and those you cannot
        • Actionable: size of discount, number of offers etc.
        • Non-actionable: age, income ..
    • Data Mining Techniques
      • Market Basket Analysis
      • Memory Based Reasoning
      • Cluster Detection
      • Link Analysis
      • Decision Trees and Rule Induction
      • Neural Networks
      • Genetic Algorithms
      • OLAP
    • OLAP: On Line Analytical Processing
      • While a data warehouse brings data together, OLAP lets you look at data and manipulate interactively
      • OLAP allows users to “slice and dice” data
      • Allows user to drill-down into detail data
    • Relational vs Multidimensional
    • Consolidations
    • Multidimensional Terminology
      • East, West, Central are input members of the Region dimension. Total Region is an output member of the Region dimension . Similarly, Nuts, Screws, Bolts, Washers, and Total are members of the Product dimension.
      • Variables are typically numerical measures like Sales, Costs, Profits, Expenses, and so forth.
      • Dimensions are roughly equivalent to Fields in a relational database. Cells are roughly equivalent to Records.
    • Steps in DW and OLAP Data Loader Data Converter Data Scrubber Data Transformer Data Warehouse OLAP Server OLAP Interface Data Data Data
    • Cluster Detection
      • Undirected data mining
      • Finds records that are similar to each other (clusters)
      • Clusters are found using geometric methods, statistical methods, and neural networks
      • Good way to start any analysis
    • Market Basket Analysis
      • Form of clustering used for finding items that occur together (in a transaction or market basket)
      • Likelihood of different products being purchased together as rules
      • Planning store layouts, limiting specials to one of the products in a set,...
    • Transaction data
    • Co-occurrence matrix
    • Support and confidence
      • For a rule that says: If A then B
      • Support is defined as the ratio of number of transactions that include both A and B to total number of transactions
      • Confidence is defined by the ratio of the number of transactions that include both A and B to the number of transactions that include A.
      • How do you specify ‘significant’ support and confidence ?
    • Algorithm for Finding Association Rules
      • Input is Min-Support and Min-Confidence
      • Find all sets of items with Min-Support ( frequent itemsets )
        • Frequent Itemsets Property: Every subset of a frequent itemset must also be a frequent itemset
          • iterative algorithm: start with frequent itemsets with one item, and construct larger itemsets using only smaller frequent itemsets.
    • MBA example
      • Using the sample data create a co-occurrence table
      • Let relevant Support = 25% and Confidence= 50%:
        • Beer and Diapers appear in 3/5= 60%
        • If beer then diapers has confidence of 2/3=67%
        • Thus, “If customer buys beer then customer buys diapers” satisfies 25% support & 50% confidence
      • Conclusion drawn by mining system:
        • Customers who buy beer also buy diapers
    • Applying MBA Results
      • Is the relationship useful ?
        • Beer and Diapers may not be of use
        • Victoria’s Secret transaction mining led to specific apparel sent to specific stores -- Microstrategy software
      • Who defines “usefullness”
        • only as good as rules specified by humans/marketing workforce
        • NBA mining: designers of s/w did not include height mismatches at first…coaches made the correction
    • Data Mining Algorithms
      • Four algorithms commonly cited
        • Association Rule (used in over 90% of the cases!)
        • Nearest Neighbor
          • quick and easy but models get large
        • Decision Tree
        • Neural Network
          • difficult to interpret and large time
    • Decision Trees
      • Series of if/then rules
        • easy to understand, complexity in implementation
      No yes Balance<10K Balance > 10K Age > 48 Age< 48 yes
    • CRM and Data Mining
      • Recall:customer segmentation is key in CRM
        • data mining can help improve understanding of customer behaviour
          • helps located meaningful segments from customer data
        • users want to turn that understanding into an automated interactions with their customers
    • Integrating Data Mining & CRM
      • Data mining application owns the modelling process
      • CRM application owns the campaign execution process
      • Goals:
        • minimize pain involved with using models in campaigns
        • score records only when and where necessary
    • Integrating Mining & CRM
      • Step 1:
        • analytic user creates model using mining system
        • model is then exported into campaign management system
      • Step 2:
        • Marketing user creates campaign that includes predictive models
        • when campaign executes, data mining engine scores customers dynamically
    • Benefits of Integration
      • Pre-generated model selection
      • Score defined segments “on the fly”
        • eliminates need to score entire database
        • improve efficiency of campaigns
      • Reduces manual intervention and error
      • Accelerates the market cycle
        • increases likelihood of reaching customers before competitors
        • improves campaign results and lower costs
    • Summary
      • “ Using the new media of the one-to-one future, you will be able to communicate directly with customers individually…..” - Don Peppers & Martha Rogers (One-to-One Future)
      • “ What are you afraid of?…..Even if you’re not afraid of these things, the beauty is,with proper marketing, we can make you afraid”-- Michael Saylor, CEO Microstrategy.