Data Mining & Knowledge Discovery - Presentation Transcript
Data Mining & Knowledge Discovery: Personalization Technologies for One to One Marketing Bhagi Narahari
Outline of Lecture
What and Why of Data Mining and KDD?
Importance and Applications to E-commerce
How ?
Personalization
personalized one-to-one business on the internet
Part I: Overview of Personalization
Part 2: The Data Mining Process
Predictive Modelling
A “black box” that makes predictions about the future based on information from the past and present
Age balance income How much will customer spend on next catalog order ? Model (Crystal ball?)
What is Data Mining?
It is the exploration and analysis by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules.
Why now? (A historical perspective)
Because data is now available (wasn’t always)
Distributed sources
Technology evolution
Competition (do what you can to outdo)
Why DM?
CRM (Customer Relationship Management) - important success factor in E-commerce
Current emphasis on links with customers - feedback, input in design, etc.
CRM
Identifying profitable customers
Better service for more valued customers
Retaining profitable customers
Getting a new customer costs a lot more than retaining an existing one
takes 5X to acquire new customers (Peppers&Rogers)
An increase from 75% to 80% in retention reduces costs by about 10%
Larger share of customer pool
CRM
Product differentiations based on “price” and “quality” are increasingly difficult
need to differentiate based on relationships
Increasingly sophisticated mass marketing increases probability of success
cost of mass marketing is driven down by internet (reach)
CRM
Goal: Positively interact with your customers and prospects
define customer segments
lights out execution of campaigns against segments
attribution and evaluation of responses
Personalization in Ecommerce
Positive:
much better chance of personalization
customer identification
tracking across visits and within visit
ability to do ‘what if’ experiments
Negative:
cost of switching is much less
is web based shopping good for ‘touchy feely’ things
price differentiation across geographies not easy
Personalization Product Discovery Product Evaluation Terms Negotiation Order Placement Order Payment Customer Service & Support Market Research Market Stimulation/ Education Terms Negotiations Order Receipt Order billing and payment management Customer Service & Support Producer Chain Customer Chain
In addition to storing and retrieving information on the individual’s profile “on the fly”
can also use mining software to analyze the information in the database to make recommendations or comments specific to the individual
Impact of Personalization
Customer relationship
Learn more about customers
learn and understand the why and how they prefer to do business with your organization
In tandem with tracking provides you with a tool to monitor your website
what works, what does’nt, what makes your audience “click”
Security and Privacy as Barrier to Personalization
Large number of customers concerned about personalization (double click!)
will they pay more to preserve privacy?
Some falsify info to preserve privacy
customers give more info to trusted site
need secure site with clear privacy policies stated at site
Personalization Know the Customer Identify Give the customer his/her wants Questionnaires Past history Click Streams Profile Login Credit Card# Predicting the wants Mapping to “ peers” Extrapolation from past Extrapolation from peers (firefly.com) Look &feel Product selection& promotions New Product
Know the customer
Cookies
backlash (users do not trust them)
OPS: Open Profiling Standard
combined with eTrust certification
Registration
User certificates: logons
Key Question:
how do you know that this customer is same as that goes to your storefront
need standard warehouse techniques like address resolution, cred.card resolution etc.
Know the Customer:OPS
Two drivers
user should not retype again & again basic info
data is used in a trusted fashion (not leaked, other data not see etc.) by users
Two parts
Common data
demographics (country,zip,age,gender)
Contact (name, address, CreditCard…)
User agent preferences
Per-site Sections (can be shared across sites, if user allows)
What if no profile???
Deduce
collect information: history of purchases, time spent on pages
ask questions (offer rewards)
combine with database marketing data
Predict behaviour
buy probabilities
build customer relationship
mining is key!
Personalization: Actions to take- Look and feel
Personalized pages
specific data
specific presentation and design
sent through various mediums
Manage Customers not products: 1-1 marketing
Strategy.com
deliver personalized pages
eg: stock portfolio, personal info including alarm, travel reservations
allows businesses to develop and manage personalized web sites
interactively profile each visitor and dynamically match info based on their profile and business rules specified by providers of site & services
users do not go through hoops finding relevant data
DM Terminology OLAP ROLAP Data Warehouse Data Marts Data Stores Neural Networks Genetic Algorithms Data Mining Rule Based Systems SQL
How?
Determine probability of buying as a function of customer attributes such as age, income, past buying patterns, ..
Target customers by ranking from highest to lowest probabilities
Other techniques: Decision Trees, Neural Networks, ….
KDD
Knowledge Discovery in Databases
It is the process of identifying valid, novel, potentially useful, and understandable patterns in data (Fayyad, Piatesky-Shapiro, and Smyth)
It involves data preparation, pattern extraction, knowledge evaluation, and refinement, in iteration
KDD
Data mining is a step in the KDD process that involves the application of certain algorithms to extract patterns
Steps in the KDD process:
Select Data
Data Cleansing and Pre-processing
Data Mining
Results interpretation
Implementation
Pre-processing in KDD
80-90% of KDD process is spent here
Why?
Operational data is incomplete, inconsistent, in different formats across systems
DM techniques might require data in a specific format
Data Mining Problems
Classification/Segmentation
Binary (Yes/No)
Multiple Category (Large/Medium/Small)
Forecasting (how much)
Association Rule extraction (market basket analysis)
Sequence detection
balance increase -> missed payment -> default
Typical DM tasks
Prediction and Classification
Directed
Decision trees, Neural networks, memory based reasoning, logistic regression
Examples:
How many units will be sold on a given day?
What will be the stock price on a given day?
Will a customer buy the product or not?
DM tasks
Affinity grouping
Undirected
Which products go together naturally?
The beer-diaper syndrome?
Market basket analysis
Examples:
Which products peak in demand simultaneously?
DM tasks
Clustering task
Undirected
Segmenting into similar clusters
Different from classification
Examples
Customers with similar buying profiles
Products with similar demand patterns
DM success factors
Integration with data warehouses and DSS
Users should develop a good understanding of techniques
Recognize that these tools cannot automatically find patterns without being told what to do
Most methods now used are extensions of analytical methods that have been around for decades
Legal and Ethical Issues
Privacy concerns
becoming more important
will impact the way that data can be used and analyzed
ownership issues
European data laws have implications on US
Often data included in the data warehouse cannot legally be used in decision making process
Race, Gender, Age
Data contamination will become critical
Making Decisions Data Warehouse? Models Decisions Data Data Data Data
Data Warehouse
Bill Inmon: “A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management decisions.”
is managed data that is situated after and outside the operational systems
Data Warehousing
Increasing need to find, summarize, and interpret large amounts of data effectively
Especially when data is distributed across many different databases
Transaction processing systems not easily accessible to other systems
Plus TP systems have time constraints
Enter the Data Warehouse
To deliver decision data to decision makers
by integrating data from various TPS to a single storage which can then
feed a range of decision support applications
through an OLAP interface!
Data Complications
Noise
Missing data
Transformation
numeric data
text
Need to differentiate between variables you can control and those you cannot
Actionable: size of discount, number of offers etc.
Non-actionable: age, income ..
Data Mining Techniques
Market Basket Analysis
Memory Based Reasoning
Cluster Detection
Link Analysis
Decision Trees and Rule Induction
Neural Networks
Genetic Algorithms
OLAP
OLAP: On Line Analytical Processing
While a data warehouse brings data together, OLAP lets you look at data and manipulate interactively
OLAP allows users to “slice and dice” data
Allows user to drill-down into detail data
Relational vs Multidimensional
Consolidations
Multidimensional Terminology
East, West, Central are input members of the Region dimension. Total Region is an output member of the Region dimension . Similarly, Nuts, Screws, Bolts, Washers, and Total are members of the Product dimension.
Variables are typically numerical measures like Sales, Costs, Profits, Expenses, and so forth.
Dimensions are roughly equivalent to Fields in a relational database. Cells are roughly equivalent to Records.
Steps in DW and OLAP Data Loader Data Converter Data Scrubber Data Transformer Data Warehouse OLAP Server OLAP Interface Data Data Data
Cluster Detection
Undirected data mining
Finds records that are similar to each other (clusters)
Clusters are found using geometric methods, statistical methods, and neural networks
Good way to start any analysis
Market Basket Analysis
Form of clustering used for finding items that occur together (in a transaction or market basket)
Likelihood of different products being purchased together as rules
Planning store layouts, limiting specials to one of the products in a set,...
Transaction data
Co-occurrence matrix
Support and confidence
For a rule that says: If A then B
Support is defined as the ratio of number of transactions that include both A and B to total number of transactions
Confidence is defined by the ratio of the number of transactions that include both A and B to the number of transactions that include A.
How do you specify ‘significant’ support and confidence ?
Algorithm for Finding Association Rules
Input is Min-Support and Min-Confidence
Find all sets of items with Min-Support ( frequent itemsets )
Frequent Itemsets Property: Every subset of a frequent itemset must also be a frequent itemset
iterative algorithm: start with frequent itemsets with one item, and construct larger itemsets using only smaller frequent itemsets.
MBA example
Using the sample data create a co-occurrence table
Let relevant Support = 25% and Confidence= 50%:
Beer and Diapers appear in 3/5= 60%
If beer then diapers has confidence of 2/3=67%
Thus, “If customer buys beer then customer buys diapers” satisfies 25% support & 50% confidence
Conclusion drawn by mining system:
Customers who buy beer also buy diapers
Applying MBA Results
Is the relationship useful ?
Beer and Diapers may not be of use
Victoria’s Secret transaction mining led to specific apparel sent to specific stores -- Microstrategy software
Who defines “usefullness”
only as good as rules specified by humans/marketing workforce
NBA mining: designers of s/w did not include height mismatches at first…coaches made the correction
Data Mining Algorithms
Four algorithms commonly cited
Association Rule (used in over 90% of the cases!)
Nearest Neighbor
quick and easy but models get large
Decision Tree
Neural Network
difficult to interpret and large time
Decision Trees
Series of if/then rules
easy to understand, complexity in implementation
No yes Balance<10K Balance > 10K Age > 48 Age< 48 yes
CRM and Data Mining
Recall:customer segmentation is key in CRM
data mining can help improve understanding of customer behaviour
helps located meaningful segments from customer data
users want to turn that understanding into an automated interactions with their customers
Integrating Data Mining & CRM
Data mining application owns the modelling process
CRM application owns the campaign execution process
Goals:
minimize pain involved with using models in campaigns
score records only when and where necessary
Integrating Mining & CRM
Step 1:
analytic user creates model using mining system
model is then exported into campaign management system
Step 2:
Marketing user creates campaign that includes predictive models
when campaign executes, data mining engine scores customers dynamically
Benefits of Integration
Pre-generated model selection
Score defined segments “on the fly”
eliminates need to score entire database
improve efficiency of campaigns
Reduces manual intervention and error
Accelerates the market cycle
increases likelihood of reaching customers before competitors
improves campaign results and lower costs
Summary
“ Using the new media of the one-to-one future, you will be able to communicate directly with customers individually…..” - Don Peppers & Martha Rogers (One-to-One Future)
“ What are you afraid of?…..Even if you’re not afraid of these things, the beauty is,with proper marketing, we can make you afraid”-- Michael Saylor, CEO Microstrategy.
0 comments
Post a comment