SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Jure Leskovec (@jure)
Including joint work with
J. McAuley, R. Pandey, L. Riedel
1Jure Leskove, Stanford University & Pinterest

Connecting People & Objects

Internet
Offsite
Save
Do
On Pinterest
Pinterest: Discovery Engine
Visual Discovery
Engine

Pins: Rich Objects

Boards: Collections

Pinners
Boards
Pins
Web Pages
Object
Graph
Hyperlink
Graph
From Pins to the Object Graph

30+ Billion Pins
categorized by people into more than
750 Million Boards
50% of pins have been created
in the last 6 months
8

How do we uncover
relationships
between pins?
9

Object Graph
10
Can we
understand how
pins fit together
into a giant
network?
Jure Leskove, Stanford University & Pinterest

Object Graph: Products
Pins & product catalogs:
 10s of millions of products
 100s of millions product reviews
 How do we build the product graph
Three components:
 Link Prediction
 Topic models
 Product hierarchies

Product Graph: Relations
12
Substitutes:
Purchase
instead
Complements:
Purchase
in addition

Product Graph: Description
13
: cleaner; quieter
: cheaper; high power
: well made, easy to install
: fits perfectly, great value

Product Graph: Overview
14
Substitute
Complement

Product Graph:What it does?
1. Understand the notions of
substitute and complement goods
is substitutable for
complements

2. Generate explanations
of why certain products are
preferred
“Good quality, soft, light
weight, the colors are
beautiful and exactly like
the picture!”
People prefer this
because:

3. Recommends
baskets of related items
Query: Suggested outfit:
Query: Suggested outfit:

Product Graph: Overview
Building networks of products
Modeling: Can we use product data
to model product relationships?
Understanding: Can we explain
why people prefer certain products
over others?

Problem Setting
Binary prediction task:
Given a pair of products, x and y, predict
whether they are related
(substitute/complementary)
Goal: Build a probabilistic model
that encodes

Problem Setting
How to learn
from data
Train by maximum likelihood:
20
XComplementary
Not
Complementary

Attempt 1: Big bags of features
21
Features of product i:
[0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]
Features of product j:
[0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]
aardvark zoetrope

22
[0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]
[0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]
aardvark zoetrope
Parameterized probability measure
(essentially weighted-nearest-neighbor)

23
[0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]
[0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]
aardvark zoetrope
• High-dimensional
• Prone to overfitting
• Too fine-grained

Attempt 2: Features fromTopics
LDA
Shoes Female
Blei & McAuliffe (2007)
Product topics
Use any kind of
product related features:
brand, price, reviews,
product descriptions, …
Topic models:
24
FashionJure Leskove, Stanford University & Pinterest

[0.1, 0.4, 0.2, 0.1, 0.2]
[0.3, 0.1, 0.3, 0.2, 0.1]
Shoes Female

On the right track, but are the
topics we are discovering
relevant to link prediction?
26
[0.1, 0.4, 0.2, 0.1, 0.2]
[0.3, 0.1, 0.3, 0.2, 0.1]
Shoes Female

Attempt 3: Learn “good” topics
Learn to discover topics that
explain the graph structure

Link Prediction Product “topics”
Idea: Learn both simultaneously
Discover topics that “explain” product relations

Conceptually, we want to learn to project
products into topic space such that
related products are nearby

The SCEPTRE Model
Combining topic models with
link prediction
Topic model with topic distribution 𝜽𝜽
But, the topics should be “good” as
features for the link prediction 30Jure Leskove, Stanford University & Pinterest

The SCEPTRE Model: Details
31
Topic
membership

The SCEPTRE Model
why do people who view
X eventually buy Y?
There is a link between the two products because
people use similar words to describe them
But in what direction does the link flow?
Issue 1: Relationships we want to learn
are not symmetric

The SCEPTRE Model
why do people why view
X eventually buy Y?
Solution: We solve this issue by learning
“relatedness” in addition to “directedness”
Relationships: Explained by product “properties”
“baby, pajamas, pants, colorful”
Directedness: Subjective/qualitative language
“true size, fits well, items are the same color as on the picture”

Learning Multiple Graphs
35
browsed together
bought together
Issue 2: We want to learn multiple
relationships simultaneously
We could fit two independent models, but learning both at once:
1) Gives us more data on which to train the complete model
2) Helps with interpretability, since both relationships are explained in
terms of the same topicsJure Leskove, Stanford University & Pinterest

Learning Multiple Graphs
36
Solution: We fix this by learning multiple
regressors simultaneously (one for each graph),
that operate on a single set of topics
One regressor
per graph

Sceptre is Not tractable
37
Issue 3:The model has a too
many parameters
Thousands of topics multiplied by
millions of products

Including Hierarchy
Idea: use the
category
hierarchy to
sparsify the
model
Solution: Product hierarchy

Including Hierarchy
39
Associate each node in the category
tree with a small number of topics:
Now we can fit models with
thousands of topics but only
10-20 are active per product
“Car audio” topics (for example)
have probability zero of being
selected for this product
Topics at the top of the hierarchy are
common to all electronics products, and
will contain generic (though electronics
specific) languageJure Leskove, Stanford University & Pinterest

Training the model: EM
40
E-step (topic assignments)
M-step (link prediction)
Other topic/regression
parameters (word distribution
𝜙𝜙 and topic assignments z)

Building the Product Graph
Now, we can generate the product graph
by identifying most probable links
For every product, rank all other products
according to p(x is related to y)
But this is slow!
Quadratic number of comparisons!
Solution: Use product hierarchy and a
matching engine

Experiments
 Just for fun, let’s use the Amazon
product catalog:

Edge Prediction Accuracy

Ranking Performance
Manual examination shows great performance
(false positives are actually very relevant)

Results: Micro-Categories

Results: Micro-Categories
48Jure Leskove, Pinterest & Stanford University

Explaining user preferences
 Explain recommendations by identifying
words that “best explain” the link:
 Topic model we assign a topic to each word
 Logistic regressor uses the words to make predictions
 Identify phrases that maximize the likelihood of the
link in order to explain it
49
Use the “directedness” model to generate explanations as
it selects more subjective language (i.e., how do the products
differ, and why was one product “preferable” over another).

Example: Product Graph

Pinterest as a
graph of objects
53

Connecting People & Objects

Tourist
Attractions
Food
Sporting
Venues
San
Francisco
Art
Galleries
Pinterest Graph - Example
User:
●likes classic art
●just viewed a pin
about things to do
in SF Artists

Pinners
Boards
Images
Web Pages
Object
Graph
Hyperlink
Graph
From Pins to the Object Graph

We are hiring!
58
jure@pinterest.com
Inferring Networks of Substitutable and Complementary Products
by J. McAuley, R. Pandey, J. Leskovec. ACM SIGKDD2015.

SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products