Learning to Diversify for E-commerce Search with Multi-Armed Bandit}

SIGIR ECOM 2019
Learning to Diversify for E-commerce Search with
Multi-Armed Bandit
Anjan Goswami (UC Davis), Chengxiang Zhai (UIUC), Prasant
Mohapatra (UC Davis)
July 24, 2019

SIGIR ECOM 2019
Agenda of this Presentation
The Problem
Contribution
Algorithms
Evaluation and Results
Future Work

SIGIR ECOM 2019
Diversity problem
Figure: Query: “Sunglasses for Men”, Site: Amazon, Evaluation: Only
the cheaper sunglasses are shown in top. But a user may be interested in
an expensive one, that Amazon carries.

SIGIR ECOM 2019
Diversity problem
Figure: Query: “Sunglasses for Men”, Site: Walmart, Evaluation: It even
shows two sunglasses from one brand, but a user may be interested in
exploring samples from multiple brands to understand the diversity of the
selection available at Walmart.

SIGIR ECOM 2019
Why yet another learning to diversify problem for
Commerce?
Learn the diverse (price) preferences of the customers from
the data.
Aim to maximize the revenue.
Not hurt the relevance of the search results.

SIGIR ECOM 2019
Our contribution
Deﬁning the learning to diversify problem for e-commerce.
A novel semi-bandit optimization algorithm for learning to
diversify (KPBA).
A simulation based evaluation methodology (similar to
counterfactual learning [3].

SIGIR ECOM 2019
Learning to Diversify Algorithms
Revenue Ranked Explore and Commit (RREC) [2]
Revenue Ranked Bandits Algorithm (RRBA) [2]
Knapsack based bandit algorithm (KPBA)

SIGIR ECOM 2019
Revenue Ranked Explore and Commit (RREC)
Baseline greedy algorithm.
It shows all the products iteratively to estimate the demand.
Eventually maximizes the revenue.
Can have arbitrarily poor performance.
Can not learn any more after it reaches the optimality.

SIGIR ECOM 2019
Revenue Ranked Bandits Algorithm (RRBA)
This is an easy modiﬁcation of the algorithm proposed in [2].
Uses k bandits for k positions.
Each product can be an arm.
A product can be part of several MABs.
Not simultaneously optimizes all the MABs.
Complex realization.

SIGIR ECOM 2019
Knapsack based bandit algorithm (KPBA)
The main algorithm proposed in this paper.
Semi-bandit optimization.
Each product can be an arm.
Selects k out of n arms in every iteration.
Simpliﬁes realization.

SIGIR ECOM 2019
Algorithms for Diversity: KPBA
max
1,2,··· ,T
k
j=1
vUCB
jT
subject to
k
j=1
sj ≥ B (1)
vrj = prj /irj × ρj × Z + α 2 ln t/irj
p: purchase, i: impression, ρ: price, B: relevance threshold, Z:
normalization.

SIGIR ECOM 2019
Algorithms for Diversity: KPBA
max
x1t ,··· ,xnt
n
i=1
xit × vUCB
it
subject to
n
i=1
xij × ˆsi ≤ ˆB
n
i=1
xij = k (2)

SIGIR ECOM 2019
Algorithms for Diversity: KPBA properties
KPBA: A 1
2-approximate solution for E-kKP runs in O(n)
time.
No need for using k MAB for k positions.
Semi-bandit algorithm that is optimal for ranking.
Regret same as RRBA (MAB based): O( (nT lg T))
(proven).

SIGIR ECOM 2019
Algorithms for Diversity: Evaluation Metrics
Average Revenue per Query: ARQ
Median Customer Life Time Value: MCV
Mean Reciprocal Rank of Purchases: PMRR

SIGIR ECOM 2019
Algorithms for Diversity: Evaluation based on Simulation
Synthetically generate product data set.
Assign demand for each product based on a realistic
distribution for each query.
Assign a utility or relevance score to each product for each
query.
Make a user model.
Simulate a user search session with a speciﬁc rank function.

SIGIR ECOM 2019
Algorithms for Diversity: Simulation
Figure: Price histograms and their corresponding relevance score and
purchase rate. Note the plot shows correlation between the relevance
score and purchase rate ﬁtting a line.

SIGIR ECOM 2019
Algorithms for Diversity: Results
Figure: Note that the red curves represent RREC metrics, blue curves
represent RRBA metrics and the green curves denote KPBA metrics. The
revenue metric uses log scale.

SIGIR ECOM 2019
Algorithms for Diversity: Results with position bias

SIGIR ECOM 2019
Algorithms for Diversity: with changing customer
preference

SIGIR ECOM 2019
Possible extensions
Learn more complex function of customer preferences by
incorporating multiple product attributes such as brand etc.
Combine the online learning framework to the traditional
learning to rank functions [1].

SIGIR ECOM 2019
References I
Tie-Yan Liu et al.
Learning to rank for information retrieval.
Foundations and Trends R in Information Retrieval,
3(3):225–331, 2009.
Filip Radlinski, Robert Kleinberg, and Thorsten Joachims.
Learning diverse rankings with multi-armed bandits.
In Proceedings of the 25th International Conference on
Machine Learning, ICML ’08, 2008.
Adith Swaminathan and Thorsten Joachims.
Counterfactual risk minimization: Learning from logged bandit
feedback.
In ICML, pages 814–823, 2015.

Learning to Diversify for E-commerce Search with Multi-Armed Bandit}

Recommended

Recommended

More Related Content

Similar to Learning to Diversify for E-commerce Search with Multi-Armed Bandit}

Similar to Learning to Diversify for E-commerce Search with Multi-Armed Bandit} (20)

More from Anjan Goswami

More from Anjan Goswami (7)

Recently uploaded

Recently uploaded (20)

Learning to Diversify for E-commerce Search with Multi-Armed Bandit}