2. WHAT OUR CUSTOMERS THINK?
“I received extraordinary service yesterday from Kim Sears, a sales associate at
the Lands’ End store in Concord, NH. I'm on vacation in northern NH currently,
and phoned the Concord store for assistance with a purchase. I was greeted by
Kim, who was courteous, knowledgeable, and entirely professional. Kim
helped me to sort through some options, buy a rash shirt and a dress, and she
made the entire purchase experience easy, positive and fun!
At the end of our conversation, I told Kim how much I appreciated her time and
outstanding customer service and what a pleasure it had been to do business
with her. She is a great asset to the Lands’ End store in Concord, NH!”
Sincerely,
Tracy W.
Grand Rapids, MI
3. THE SURVIVAL OF THE FITTEST
1. We need to push ourselves to match
current trend.
2. All big organizations make use of
their data.
3. We don’t pay for data, we have
them - like rain(store and drink).
“Go data driven, data doesn’t lie.“
4. WHAT IS ADVANCED ANALYTICS?
“Better utilization of
data using machine
learning and
statistical
techniques.”
5. WHY WE NEED THESE ANALYTICS?
1. Answer questions that reporting cannot.
2. All organizations use this and benefit.
3. Data is something which is in our gene.
Why for retail industry?
1. Predict future sales.
2. Improve business performance.
8. SUMMER INTERN PROJECTS
1. Category prediction of a customer using supervised
machine learning approach.
2. Category prediction of a customer using unsupervised
learning and distance measure.
3. Product recommendation based on product similarity.
4. Demographic based clusters for customers.
9. OUR DATA
1. Primary source from netezza database
which includes transactional and
the Axiom data.
2. RGB value of color code obtained
from color lab.
10. WHY PYTHON PROGRAMMING?
1. Very easy to learn.
2. Very good core libraries such as numpy,
pandas, sklearn etc.
3. Very good community support.
4. No cost.
5. Widely used tool along with R.
15. ALGORITHM: RANDOM FOREST CLASSIFIER
This algorithm is based on
multiple decision tree.
The same task was tested
with algorithms such as
Bayesian classifier and
Logistic regression.
17. FINDING THE TOP FIVE CATEGORIES
1. This was done for email marketing.
2. Instead of sending one generalized creative, send one of
these five categories if they fit in this.
Why only five?
1. Email team asked only five for testing.
2. Can be extended to all categories as well.
18. RESULTS: TOP FIVE CATEGORIES (WOMEN SEGMENT)
SWIMSUIT KNIT TOPS BOTTOMS FOOTWEAR XR SWIMSUIT
19. HOW IS THIS DIFFERENT FROM CURRENT APPROACH?
1. We just analyze the past purchase of each customer
and predict their future category order.
2. These machine learning techniques combines similar
customers and thus makes better predictions.
21. PROJECT OUTLINE
1. Similar to the previous task.
2. Difference is clustering and distance measure used.
Generate clusters Pick each cluster
Build the
recommender
system for that
cluster
Model building
22. MODEL TESTING
1. Fit the new customer to one of the clusters.
2. Then recommend the category for him based on ‘k’ similar
customers.
30. ALGORITHM: COSINE SIMILARITY
Cosine similarity is a measure of
similarity between two non zero
vectors of an inner product space
that measures the cosine of the
angle between them.
32. RESULTS: TOP FIVE CATEGORIES (All CUSTOMERS)
SWIMSUIT KNIT TOPS BOTTOMS KNIT TOPS BOTTOMS
33. DIFFERENCE BETWEEN PROJECT ONE AND TWO
Project one
1. Predicts the category for a given order of a customer.
2. More purchase oriented.
Project two
1. Predicts the category for a given customer.
2. More customer oriented.
35. WHY PRODUCT RECOMMENDATION?
1. 30% of amazon
sales comes from
recommendation.
2. Make the customers
stick to the website.
3. Better marketing
strategy.
38. ALGORITHM: COSINE SIMILARITY
Cosine similarity is a measure of
similarity between two non zero
vectors of an inner product space
that measures the cosine of the
angle between them.
50. Further studies with clusters
1. Develop more clusters based on geographic, interest, past
purchase etc.
2. Better analysis can be made with several clusters, like
customer with high fashion interest and high income can
be grouped and analysed.
51. CHALLENGES INVOLVED
1. Hardware infrastructure.
2. Lack of data for products.
3. Lack of resource.
4. Data storage - may be
a challenge in near future.
52. POTENTIAL SOLUTIONS
1. Use of distributed systems and cloud.
2. Record more data such as product rating,
recommendations and more.
3. Expansion of data science team.
53. POTENTIAL DATA SCIENCE PROJECTS
1. Product review analysis.
2. Customized home page.
3. Analysis of which webpage or path is performing well and
poor.
4. Which product is performing well across season,
demographic and geographics.
5. Fraud detection in transaction.
6. Attribute preference of a customer, such as color, size,
design etc.
54. SPECIAL MENTIONS
● Shaishav Singh
● Jignesh Patel
● Mike Zhang
● Dave Oesper
● Prashanth Motupalli
● David Garber
● Ankita Chaudhari
● Color lab team
● Alexander Steeno
● Harshini Mohan