Data Mining & Knowledge Discovery

  • 364 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
364
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
31
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Mining & Knowledge Discovery: Personalization Technologies for One to One Marketing Bhagi Narahari
  • 2. Outline of Lecture
    • What and Why of Data Mining and KDD?
      • Importance and Applications to E-commerce
    • How ?
    • Personalization
      • personalized one-to-one business on the internet
    • Part I: Overview of Personalization
    • Part 2: The Data Mining Process
  • 3. Predictive Modelling
    • A “black box” that makes predictions about the future based on information from the past and present
    Age balance income How much will customer spend on next catalog order ? Model (Crystal ball?)
  • 4. What is Data Mining?
    • It is the exploration and analysis by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules.
  • 5. Why now? (A historical perspective)
    • Because data is now available (wasn’t always)
    • Distributed sources
    • Technology evolution
    • Competition (do what you can to outdo)
  • 6. Why DM?
    • CRM (Customer Relationship Management) - important success factor in E-commerce
      • price differentiation no longer enough
      • customer service more important
    • Links with suppliers already exist (B2B) - JIT, joint forecasting, planning, procurement
    • Current emphasis on links with customers - feedback, input in design, etc.
  • 7. CRM
    • Identifying profitable customers
    • Better service for more valued customers
    • Retaining profitable customers
      • Getting a new customer costs a lot more than retaining an existing one
      • takes 5X to acquire new customers (Peppers&Rogers)
      • An increase from 75% to 80% in retention reduces costs by about 10%
    • Larger share of customer pool
  • 8. CRM
    • Product differentiations based on “price” and “quality” are increasingly difficult
      • need to differentiate based on relationships
    • Increasingly sophisticated mass marketing increases probability of success
      • cost of mass marketing is driven down by internet (reach)
  • 9. CRM
    • Goal: Positively interact with your customers and prospects
      • define customer segments
      • lights out execution of campaigns against segments
      • attribution and evaluation of responses
  • 10. Personalization in Ecommerce
    • Positive:
      • much better chance of personalization
        • customer identification
        • tracking across visits and within visit
      • ability to do ‘what if’ experiments
    • Negative:
      • cost of switching is much less
      • is web based shopping good for ‘touchy feely’ things
      • price differentiation across geographies not easy
  • 11. Personalization Product Discovery Product Evaluation Terms Negotiation Order Placement Order Payment Customer Service & Support Market Research Market Stimulation/ Education Terms Negotiations Order Receipt Order billing and payment management Customer Service & Support Producer Chain Customer Chain
  • 12. B2C Personalization Objectives
    • Know the customer
      • profile - registration, cookies
    • Determine what the customer wants
      • Ask: Questionnaires
        • what is the incentive for truthfulness
      • Deduce: click streams, history, collaborative filtering (Amazon!!)
    • Deliver
      • Customize the look and feel
      • offer special promotions
      • offer customized products (Holy Grail)
  • 13. Use of Personalization
    • In addition to storing and retrieving information on the individual’s profile “on the fly”
      • can also use mining software to analyze the information in the database to make recommendations or comments specific to the individual
  • 14. Impact of Personalization
    • Customer relationship
    • Learn more about customers
      • learn and understand the why and how they prefer to do business with your organization
    • In tandem with tracking provides you with a tool to monitor your website
      • what works, what does’nt, what makes your audience “click”
  • 15. Security and Privacy as Barrier to Personalization
    • Large number of customers concerned about personalization (double click!)
    • will they pay more to preserve privacy?
    • Some falsify info to preserve privacy
    • customers give more info to trusted site
    • need secure site with clear privacy policies stated at site
  • 16. Personalization Know the Customer Identify Give the customer his/her wants Questionnaires Past history Click Streams Profile Login Credit Card# Predicting the wants Mapping to “ peers” Extrapolation from past Extrapolation from peers (firefly.com) Look &feel Product selection& promotions New Product
  • 17. Know the customer
    • Cookies
      • backlash (users do not trust them)
    • OPS: Open Profiling Standard
      • combined with eTrust certification
    • Registration
      • User certificates: logons
    • Key Question:
      • how do you know that this customer is same as that goes to your storefront
      • need standard warehouse techniques like address resolution, cred.card resolution etc.
  • 18. Know the Customer:OPS
    • Two drivers
      • user should not retype again & again basic info
      • data is used in a trusted fashion (not leaked, other data not see etc.) by users
    • Two parts
      • Common data
        • demographics (country,zip,age,gender)
        • Contact (name, address, CreditCard…)
        • User agent preferences
      • Per-site Sections (can be shared across sites, if user allows)
  • 19. What if no profile???
    • Deduce
      • collect information: history of purchases, time spent on pages
      • ask questions (offer rewards)
      • combine with database marketing data
    • Predict behaviour
      • buy probabilities
      • build customer relationship
    • mining is key!
  • 20. Personalization: Actions to take- Look and feel
    • Personalized pages
      • specific data
      • specific presentation and design
      • sent through various mediums
    • Manage Customers not products: 1-1 marketing
    • Strategy.com
      • deliver personalized pages
        • eg: stock portfolio, personal info including alarm, travel reservations
      • use different mediums
        • WAP enable phones (eg: Sprint PCS Web)
  • 21. Storefront Personalization
    • Customers visit Store Website
      • Howard buys ties
      • Rob buys Baby Products
      • Ray buys toys
      • Amy buys clothes
    • Provide a view of the store to these customers
      • present them with what they are likely to buy?
        • Howard: ties, and men’s formal wear
        • Ray: Toys and gadgets
        • Rob: Infant, Toddler section
        • Amy: Women’s Clothes section
  • 22. More Actions: Product Presentations & Promotions Basic Storefront Product Hierarchy Clothes Men’s Women’s Children’s Shirts Pants Casuals Evening Infants Kids John’s View Mary’s View
  • 23. BroadVision.com
    • BroadVision One-to-One application
      • allows businesses to develop and manage personalized web sites
      • interactively profile each visitor and dynamically match info based on their profile and business rules specified by providers of site & services
        • users do not go through hoops finding relevant data
  • 24. DM Terminology OLAP ROLAP Data Warehouse Data Marts Data Stores Neural Networks Genetic Algorithms Data Mining Rule Based Systems SQL
  • 25. How?
    • Determine probability of buying as a function of customer attributes such as age, income, past buying patterns, ..
    • Target customers by ranking from highest to lowest probabilities
    • Other techniques: Decision Trees, Neural Networks, ….
  • 26. KDD
    • Knowledge Discovery in Databases
    • It is the process of identifying valid, novel, potentially useful, and understandable patterns in data (Fayyad, Piatesky-Shapiro, and Smyth)
    • It involves data preparation, pattern extraction, knowledge evaluation, and refinement, in iteration
  • 27. KDD
    • Data mining is a step in the KDD process that involves the application of certain algorithms to extract patterns
    • Steps in the KDD process:
        • Select Data
        • Data Cleansing and Pre-processing
        • Data Mining
        • Results interpretation
        • Implementation
  • 28. Pre-processing in KDD
    • 80-90% of KDD process is spent here
    • Why?
        • Operational data is incomplete, inconsistent, in different formats across systems
        • DM techniques might require data in a specific format
  • 29. Data Mining Problems
    • Classification/Segmentation
      • Binary (Yes/No)
      • Multiple Category (Large/Medium/Small)
    • Forecasting (how much)
    • Association Rule extraction (market basket analysis)
    • Sequence detection
      • balance increase -> missed payment -> default
  • 30. Typical DM tasks
    • Prediction and Classification
      • Directed
      • Decision trees, Neural networks, memory based reasoning, logistic regression
      • Examples:
        • How many units will be sold on a given day?
        • What will be the stock price on a given day?
        • Will a customer buy the product or not?
  • 31. DM tasks
    • Affinity grouping
      • Undirected
      • Which products go together naturally?
      • The beer-diaper syndrome?
      • Market basket analysis
      • Examples:
        • Which products peak in demand simultaneously?
  • 32. DM tasks
    • Clustering task
      • Undirected
      • Segmenting into similar clusters
      • Different from classification
      • Examples
        • Customers with similar buying profiles
        • Products with similar demand patterns
  • 33. DM success factors
    • Integration with data warehouses and DSS
    • Users should develop a good understanding of techniques
    • Recognize that these tools cannot automatically find patterns without being told what to do
    • Most methods now used are extensions of analytical methods that have been around for decades
  • 34. Legal and Ethical Issues
    • Privacy concerns
      • becoming more important
      • will impact the way that data can be used and analyzed
      • ownership issues
      • European data laws have implications on US
    • Often data included in the data warehouse cannot legally be used in decision making process
      • Race, Gender, Age
    • Data contamination will become critical
  • 35. Making Decisions Data Warehouse? Models Decisions Data Data Data Data
  • 36. Data Warehouse
    • Bill Inmon: “A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management decisions.”
    • is managed data that is situated after and outside the operational systems
  • 37. Data Warehousing
    • Increasing need to find, summarize, and interpret large amounts of data effectively
      • Especially when data is distributed across many different databases
    • Transaction processing systems not easily accessible to other systems
      • Plus TP systems have time constraints
  • 38. Enter the Data Warehouse
    • To deliver decision data to decision makers
    • by integrating data from various TPS to a single storage which can then
    • feed a range of decision support applications
    • through an OLAP interface!
  • 39. Data Complications
    • Noise
    • Missing data
    • Transformation
      • numeric data
      • text
    • Need to differentiate between variables you can control and those you cannot
      • Actionable: size of discount, number of offers etc.
      • Non-actionable: age, income ..
  • 40. Data Mining Techniques
    • Market Basket Analysis
    • Memory Based Reasoning
    • Cluster Detection
    • Link Analysis
    • Decision Trees and Rule Induction
    • Neural Networks
    • Genetic Algorithms
    • OLAP
  • 41. OLAP: On Line Analytical Processing
    • While a data warehouse brings data together, OLAP lets you look at data and manipulate interactively
    • OLAP allows users to “slice and dice” data
    • Allows user to drill-down into detail data
  • 42. Relational vs Multidimensional
  • 43. Consolidations
  • 44. Multidimensional Terminology
    • East, West, Central are input members of the Region dimension. Total Region is an output member of the Region dimension . Similarly, Nuts, Screws, Bolts, Washers, and Total are members of the Product dimension.
    • Variables are typically numerical measures like Sales, Costs, Profits, Expenses, and so forth.
    • Dimensions are roughly equivalent to Fields in a relational database. Cells are roughly equivalent to Records.
  • 45. Steps in DW and OLAP Data Loader Data Converter Data Scrubber Data Transformer Data Warehouse OLAP Server OLAP Interface Data Data Data
  • 46.  
  • 47. Cluster Detection
    • Undirected data mining
    • Finds records that are similar to each other (clusters)
    • Clusters are found using geometric methods, statistical methods, and neural networks
    • Good way to start any analysis
  • 48. Market Basket Analysis
    • Form of clustering used for finding items that occur together (in a transaction or market basket)
    • Likelihood of different products being purchased together as rules
    • Planning store layouts, limiting specials to one of the products in a set,...
  • 49. Transaction data
  • 50. Co-occurrence matrix
  • 51. Support and confidence
    • For a rule that says: If A then B
    • Support is defined as the ratio of number of transactions that include both A and B to total number of transactions
    • Confidence is defined by the ratio of the number of transactions that include both A and B to the number of transactions that include A.
    • How do you specify ‘significant’ support and confidence ?
  • 52. Algorithm for Finding Association Rules
    • Input is Min-Support and Min-Confidence
    • Find all sets of items with Min-Support ( frequent itemsets )
      • Frequent Itemsets Property: Every subset of a frequent itemset must also be a frequent itemset
        • iterative algorithm: start with frequent itemsets with one item, and construct larger itemsets using only smaller frequent itemsets.
  • 53. MBA example
    • Using the sample data create a co-occurrence table
    • Let relevant Support = 25% and Confidence= 50%:
      • Beer and Diapers appear in 3/5= 60%
      • If beer then diapers has confidence of 2/3=67%
      • Thus, “If customer buys beer then customer buys diapers” satisfies 25% support & 50% confidence
    • Conclusion drawn by mining system:
      • Customers who buy beer also buy diapers
  • 54. Applying MBA Results
    • Is the relationship useful ?
      • Beer and Diapers may not be of use
      • Victoria’s Secret transaction mining led to specific apparel sent to specific stores -- Microstrategy software
    • Who defines “usefullness”
      • only as good as rules specified by humans/marketing workforce
      • NBA mining: designers of s/w did not include height mismatches at first…coaches made the correction
  • 55. Data Mining Algorithms
    • Four algorithms commonly cited
      • Association Rule (used in over 90% of the cases!)
      • Nearest Neighbor
        • quick and easy but models get large
      • Decision Tree
      • Neural Network
        • difficult to interpret and large time
  • 56. Decision Trees
    • Series of if/then rules
      • easy to understand, complexity in implementation
    No yes Balance<10K Balance > 10K Age > 48 Age< 48 yes
  • 57. CRM and Data Mining
    • Recall:customer segmentation is key in CRM
      • data mining can help improve understanding of customer behaviour
        • helps located meaningful segments from customer data
      • users want to turn that understanding into an automated interactions with their customers
  • 58. Integrating Data Mining & CRM
    • Data mining application owns the modelling process
    • CRM application owns the campaign execution process
    • Goals:
      • minimize pain involved with using models in campaigns
      • score records only when and where necessary
  • 59. Integrating Mining & CRM
    • Step 1:
      • analytic user creates model using mining system
      • model is then exported into campaign management system
    • Step 2:
      • Marketing user creates campaign that includes predictive models
      • when campaign executes, data mining engine scores customers dynamically
  • 60. Benefits of Integration
    • Pre-generated model selection
    • Score defined segments “on the fly”
      • eliminates need to score entire database
      • improve efficiency of campaigns
    • Reduces manual intervention and error
    • Accelerates the market cycle
      • increases likelihood of reaching customers before competitors
      • improves campaign results and lower costs
  • 61. Summary
    • “ Using the new media of the one-to-one future, you will be able to communicate directly with customers individually…..” - Don Peppers & Martha Rogers (One-to-One Future)
    • “ What are you afraid of?…..Even if you’re not afraid of these things, the beauty is,with proper marketing, we can make you afraid”-- Michael Saylor, CEO Microstrategy.