Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
@aabtzu
Using Graphs for High Quality Recommendations
PyData
Nov 2015
Amit Bhattacharyya
Sr Data Scientist
Teachers Pay Te...
TpT is an online marketplace for
teachers to buy, sell and share original
educational resources.
3M+
ACTIVE
MEMBERS
1M+
RESOURCES50k
ACTIVE
SELLERS
$150,000,000+
PAID TO TEACHERS
TpT landing page
Product page for a best seller
What are we trying to do?
We would like to identify a cluster of
buyers and send a product
recommendations email to the cluster
The Problem
Collaborative filtering (CF) methods are
great for individual recommendations but
hard for clusters.
Collaborative filtering for recommendations
I use GraphLab Create for easy to
implement recommender systems
Core version is open source
https://dato.com/products/crea...
It is harder to use CF methods when
making recommendations for a group
● not optimized for finding clusters
● coefficients...
Why want clusters anyways?
Clusters at TpT
● natural groupings inform buyer behavior
● teachers just getting started may not have
long purchase histo...
The Solution
Cluster Users by Creating a Graph
My
LinkedIn
Network
bhattacharyya’s
finance
career
neighbors
astrophysics
high school
sister in laws
hoosiers
berkeley
pro...
All we are given is a list of users and items
they have purchased
user_id item_id item_name
206067 1020240 Rotation and Re...
How to construct the graph:
● each user is a node
● there is an edge between two users if they
have made a common purchase...
Graphs in Python
● NetworkX
● GraphLab Create (now called Dato)
● igraph
Graphs (not in Python)
● Gephi
example of a simple graph
users = nodes
common purchases = edges
reduce complexity by removing all nodes
with degree = 2
run community detection algorithm and
clusters begin to emerge ...
What do the clusters mean?
Need to look incorporate item info to
determine physical intuition of cluster
item_id name
506116 Solving Systems of Equat...
Another Problem
Community detection is hard to
do at scale
Another Solution
Label propagation
Two step process:
● Run community detection on a fraction
of the users to determine clusters
● Use label propagation to fi...
Label
propagation
in clusters
Recommendation algorithm can be as
simple as choosing the most purchased by
a cluster
Clustering also helps restrict number of
user-item pairs in CF recommender
systems
Let’s do some real Python ...
We’re Hiring!
engineering mostly
TeachersPayTeachers.com/careers
Upcoming SlideShare
Loading in …5
×

Using Graphs for High Quality Recommendations - PyData NYC 2015

784 views

Published on

Teachers Pay Teachers is an online marketplace for teachers to buy, sell and share original educational resources. We find that it is hard to extract good recommendations for a group using traditional recommender systems because of the focus on user-specific results. We present a method to create recommendations for a cluster of users using community detection and label propagation.

Published in: Data & Analytics
  • //DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... //DOWNLOAD PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Using Graphs for High Quality Recommendations - PyData NYC 2015

  1. 1. @aabtzu Using Graphs for High Quality Recommendations PyData Nov 2015 Amit Bhattacharyya Sr Data Scientist Teachers Pay Teachers
  2. 2. TpT is an online marketplace for teachers to buy, sell and share original educational resources.
  3. 3. 3M+ ACTIVE MEMBERS 1M+ RESOURCES50k ACTIVE SELLERS
  4. 4. $150,000,000+ PAID TO TEACHERS
  5. 5. TpT landing page
  6. 6. Product page for a best seller
  7. 7. What are we trying to do?
  8. 8. We would like to identify a cluster of buyers and send a product recommendations email to the cluster
  9. 9. The Problem
  10. 10. Collaborative filtering (CF) methods are great for individual recommendations but hard for clusters.
  11. 11. Collaborative filtering for recommendations
  12. 12. I use GraphLab Create for easy to implement recommender systems Core version is open source https://dato.com/products/create/ m = graphlab.recommender.create(data_frame, user_id='user', item_id='movie') recs = m.recommend()
  13. 13. It is harder to use CF methods when making recommendations for a group ● not optimized for finding clusters ● coefficients are hard to interpret ● lack of distance measurement for k-means clustering ● ratings on TpT are useless
  14. 14. Why want clusters anyways?
  15. 15. Clusters at TpT ● natural groupings inform buyer behavior ● teachers just getting started may not have long purchase history but are likely to be very similar to others ● can reduce complexity of CF recommendations by pre-clustering users
  16. 16. The Solution
  17. 17. Cluster Users by Creating a Graph
  18. 18. My LinkedIn Network bhattacharyya’s finance career neighbors astrophysics high school sister in laws hoosiers berkeley programmers
  19. 19. All we are given is a list of users and items they have purchased user_id item_id item_name 206067 1020240 Rotation and Revolution Model: Sun, Earth and ... 3927028 1498533 Naming Compounds Puzzle - A Fun Chemical Nomen... 3927028 821435 Ionic Bonding Task Cards 3927028 355690 Make Your Own Color By Number Clipart Collecti... 268012 1100472 Introduction to Meso-America Vocabulary 268012 882788 Aztec, Mayan, & Incan Graphic Organizer 268012 1417449 Rise of Empires (Maya, Inca, Aztec)
  20. 20. How to construct the graph: ● each user is a node ● there is an edge between two users if they have made a common purchase ● multiple common purchases will get a higher edge weight
  21. 21. Graphs in Python ● NetworkX ● GraphLab Create (now called Dato) ● igraph Graphs (not in Python) ● Gephi
  22. 22. example of a simple graph users = nodes common purchases = edges
  23. 23. reduce complexity by removing all nodes with degree = 2
  24. 24. run community detection algorithm and clusters begin to emerge ...
  25. 25. What do the clusters mean?
  26. 26. Need to look incorporate item info to determine physical intuition of cluster item_id name 506116 Solving Systems of Equations by Graphing and S... 1080423 Exponent Rules Valentine's Day Coloring Activity 1415508 Trigonometry {SOH CAH TOA} Coloring Activity 590706 Factoring a Greatest Common Factor (GCF) Drag ... 1196123 Algebra: Graphing and Writing Compound Inequal... 967719 Multiplying Polynomials {FOIL} Coloring Activity 1013496 Systems of Equations Relay Races 426830 Algebra 1: Quadratic Equations (Unit 8) - Unit...
  27. 27. Another Problem
  28. 28. Community detection is hard to do at scale
  29. 29. Another Solution
  30. 30. Label propagation
  31. 31. Two step process: ● Run community detection on a fraction of the users to determine clusters ● Use label propagation to fill in the rest
  32. 32. Label propagation in clusters
  33. 33. Recommendation algorithm can be as simple as choosing the most purchased by a cluster
  34. 34. Clustering also helps restrict number of user-item pairs in CF recommender systems
  35. 35. Let’s do some real Python ...
  36. 36. We’re Hiring! engineering mostly TeachersPayTeachers.com/careers

×