2. AIM OF THIS PPT
“Torture The Data, and it will Confess to Anything”
-Ronald Coase
• This presentation mainly focus on the application of
Data Mining in Online Retail Industry.
• This presentation is based on the technical article “Data
mining for the online retail industry: A case study of
RFM model-based customer segmentation using data
mining” by Daqing Chen, Sai Laing Sain & Kun
Guo. 2
3. 3
Nowadays online shopping is a daily practice for urban areas.
Online penetration of retail is expected to reach 10.7% by 2024
compared with 4.7% in 2019. Moreover, online shoppers in India
are expected to reach 220 million by 2025.
There are various factors responsible for this growth:
o Localization of Internet content
o Growth in cities beyond metros
o Growth of mobile commerce
o Growing usage of debit cards for cashless transaction
o COVID-19
With this increase in online smart customers, the traders and
sellers also need to be more aware and need to know the
personality of different buyers.
INTRODUCTION
4. 4
Problems faced by online
retailers
Which items / products’ web pages has a customer visited? How long has a
customer stayed with each web page, and in which sequence has a customer
visited a set of products’ web pages?
Who are the most / least loyal customers, and how are they characterized?
What are customers’ purchase behavior patterns? Which products / items have
customers purchased together often?
In which sequence the products have been purchased?
Which types of customers are more likely to respond to a certain promotion
mailing?
What are the sales patterns in terms of various perspectives such as products
/items, regions and time (weekly, monthly, quarterly, yearly and seasonally), and
so on?
Who are the most / least valuable customers to the business? What are the
distinct characteristics of them?
5. Data Mining
Data mining is the process of analyzing massive volumes of
data to discover business intelligence that helps companies
solve problems, mitigate risks, and seize new
opportunities.
Technically, data mining is the process of finding
correlations or patterns among dozens of fields in large
relational databases.
5
6. 6
Different steps involved in Data Mining
Data Cleaning
This step involves the
removal of noisy or
incomplete data from
the collection.
Data
Integration
When multiple
heterogeneous data
sources such as
databases, data cubes
or files are combined
for analysis, this
process is called data
integration.
Data
Reduction
This technique is
applied to obtain
relevant data for
analysis from the
collection of data. The
size of the
representation is much
smaller in volume
while maintaining
integrity.
Data
Transformation
In this process, data is
transformed into a
form suitable for the
data mining process.
Data is consolidated so
that the mining process
is more efficient and
the patterns are easier
to understand.
Data Mining
The data is represented
in the form of patterns
and models are
structured using
classification and
clustering techniques.
Pattern
Evaluation
This step involves
identifying interesting
patterns representing
the knowledge based
on interestingness
measures. Data
summarization and
visualization methods
are used to make the
data understandable by
the user.
Knowledge
Representation
Knowledge
representation is a step
where data
visualization and
knowledge
representation tools
are used to represent
the mined data. Data is
visualized in the form
of reports, tables, etc.
7. 7
Case Study Review
In the article author used a sample case to demonstrate the Data Mining process for online retail industry.
The author considered a UK-based non-store business with some 80 member of staff.
The company was established in 1981 mainly selling unique all-occasion gifts. For years in the past, the merchant
relied heavily on direct mailing catalogues, and orders were taken over phone calls.
It was only 2 years ago that the company launched its own web site and shifted completely to the Web.
Since then the company has maintained a steady and healthy number of customers from all parts of the United
Kingdom and Europe, and has accumulated a huge amount of data about many customers.
The company also uses Amazon.co.uk to market and sell its products.
Company Overview:
9. MM.DD.20XX
ADD A FOOTER
9
Methodology
The data mining in this case is done by RFM MODEL-BASED CLUSTERING ANALYSIS. Following
steps are involved in the process:
Data pre-processing
K-Means Clustering
Enhancing clustering analysis using decision tree
Conclusion
10. 10
Data Pre-processing
First step is to select the appropriate variable of interest from the dataset. For example - Invoice, StockCode,
Quantity, Price, InvoiceDate and PostCode.
Create an aggregated variable named Amount , by multiplying Quantity with Price, which gives the total
amount of money spent per product / item in each transaction.
Separate the variable InvoiceDate into two variables Date and Time . This allows different transactions
created by the same consumer on the same day but at different times to be treated separately.
Filter out any transactions that do not have a postcode associated with. This resolves any missing value issues
in relation to the variable PostCode.
Sort out the dataset by Postcode and create three essential aggregated variables Recency, requency and
Monetary. Calculate the values of these variables per postcode.
12. 12
Enhancing
clustering analysis
using decision tree
The customers can be divided into such categories
as frequency more than 2.5 with an average
monetary value of 990.66; and frequency more
than 2.5 and less than 3.5 with an average
monetary value of 1056.70 and so on.
Also, it is interesting to note that the relationship
between frequency and monetary seems to be a
monotonic linear relationship.
13. 13
Interpretation & Recommendations
The most valuable consumers of the business have contributed more than 60 per cent of the total sales in year
2011, whereas the least valuable ones only made up 4 per cent of the total sales.
For each of these consumer groups, it is essential to further find out which products the customers in each group
have purchased, which products have been purchased together most frequently and in which sequence the
products have been purchased.
Many of the consumers of the business were organizational consumers with a high quantity of a product per
transaction. Examining at which specific times (seasons), what products and which types of products they have
purchased frequently will be beneficiary to the business.
Another aspect worth further investigation is to link consumer groups to geographical locations. This correlation,
if exists, may help the business look into other factors, such as culture, customs, and economics, that may affect a
consumer’s buying intention and preferences.
14. 14
Conclusion
As shown in the case data mining can help businesses know the purchasing behavior of the customers
which help them to promote their products and services.
The segmentation of customers help businesses to differentiate the between the right target
customers and the one which are unprofitable.
The information businesses get through data mining and analysis is the base for the progressive
strategies and growth.