This document discusses using data science techniques to build a content recommendation engine and predict customer preferences for the clothing company Betabrand. It analyzes data on Betabrand product descriptions, Facebook likes of customers, and uses vectorization, clustering, and cross-tabulation to develop recommendations. Key findings include interesting correlations between Facebook likes and product types, and that the content recommendation engine and cross-tabulation provided more useful results than k-means clustering with the available data. Recommendations include using the results for upselling and targeted advertising.
5. Problem: Can We Make Recommendations for
“Similar Items” based on Story Descriptions?
6. The Data
• Around 1600 Clothing Products and Story Descriptions for Each
Product in excel from Betabrand
• Facebook Likes (along with “Category” of Likes) of Users Who
Purchased Betabrand items
• Types of Analysis
• Content Recommendation Engine
• Cross tab in Pandas for raw counts
• K Means clustering analysis
10. Top Facebook Like Categories of Users Who
Bought a Particular Product?
• Use of Cross Tab
• Use of Lift When Facebook Likes are too “generalized” across
different products
• Results were interesting! They were actually informative with
different results per product.
• Can see the how the Category of Facebook Like for a User who
bought a product made sense based on the designer’s profile
• *FUTURE EXPLORATION: NLP analysis of designer profile and
whether story description text of product can be correlated with
Facebook Likes? It would be interesting to examine the linkage.
12. Dataframe: Top Facebook Like Categories for
Executive Ponte Top
• Science, Medical, Health
• School
• Shopping & Retail
• Education
• Society/Culture
• Professional Services
• Health/Wellness
13. Compare this to the Toaster!
• Aerospace/Defense
• Performance Venue
• Song, Concert, Record Label, Musical INstrument
• Food
• Computers/Internet Webiste
• Internet/Software
14. What About K-Means Clustering?
• Analyze Category of Facebook Likes to develop User Personas
• Map those Personas to Clothing Preferences
18. Results
• K means had too small of a sample size to identify any meaningful
trends in persona clustering
• Content Recommendation Engine delivered useful results once
duplicates were removed. It might be helpful to do additional NLP
analysis on designer profiles to remove items that are similar
because of the designer from the analysis
• Cross Tab- simplest analysis of raw counts, but perhaps most
informative
19. Impact and Future Directions
• Results of Content Recommendation Engine can be used for upsell
opportunities in Betabrand – identifying products that are similar to
suggest to users at the point of check out on the Betabrand website
• Pandas Crosstab can be used for better Facebook advertisement
targeting and we can better market certain products, via email
campaigns or other channels for certain customer segments
• K Means will need to be refined to identify meaningful user clusters for
collaborative filtering. In combination with other methods, Betabrand
could do some powerful targeting for key demographics and encourage
designers to design for certain audiences.