Algorithmic pricing: Forecasting and Pricing
How to use differential programming together with large scale models to find optimal prices in e-commerce?
5. 5
Forecaster: Oracle of all possible futures
Optimizer: Find the ONE optimal future action
FORECASTING & PRICING
Forecaster Optimizer
6. 6
Forecast sales
per article/object
per shop/market
for a given time horizon per time unit.
FORECASTING & PRICING OVERVIEW
Optimal price for a
Given objective function (Revenue /
profit, customer satisfaction)
Given constraints (min profit, max
accepted loss, max over-stock)
Given optimization time horizon
Given compute budget
Forecasting Pricing
8. 8
PRICING STRATEGIES
Reward (Object function) value
becomes clear immediately ⇒
● Far future is irrelevant
● Prediction is much easier
● You even can use online optimizer
● Can be quickly tested
Long time horizon Short time horizon
Reward (Objective function) value
is revealed very late ⇒
● Many time steps are dependent
● Prediction is difficult
● You need a proper optimizer
● Needs long term A/B test
9. 9
PRICING STRATEGIES
Long time horizon Short time horizon
● Model-based reinforcement learning
○ Learn market model from
historic data, D
○ Learn action policy: 𝞹(d|S)
○ Deterministic: Pure optimization
○ Stochastic: Exploration
● Multi-arm bandit / Online learning
○ Learn online or from historic
data
○ Select an action/price
○ Observe its effect and update
10. 10
● Many articles
○ Not sensible to have a model per article
● Frequent price updates
○ Forecast + Optimization need to be efficient/fast
● Many shops/markets
○ They might have interactions
FORECASTING AT SCALE: CHALLENGES
11. 11
● Various stakeholders
○ Might need forecast at different aggregation level
○ Training samples are not independent anymore
● Long forecasting horizons
○ Put a lot of pressure both computationally and
accuray-wise on the system
○ Hard to evaluate
FORECASTING AT SCALE: CHALLENGES
12. 12
● ARIMA/ Prophet: One demand forecast model per article
○ Model management is an issue
○ Remember: The learned model SHOULD be a function of price
Demand = f(price)
Bigger problem: In short time series, not many price changes exist.
Thus, demand approximation as a function of price will be very
noisy
TRADITIONAL FORECASTING METHODS
13. 13
● One model shared among all articles
● Shop model can be shared or not but, they SHOULD
communicate
● Able to provide the gradient of your function with respect
to price
MODERN APPROACH
15. 15
● Develop models in gradient descent based frameworks
○ pyTorch, Chainer, maxNet, TensorFlow,....
○ Yes it is neural network
○ No not all the models need to be neural network based.
DIFFERENTIAL PROGRAMMING
23. 23
● Replacement of LSTM
○ Extremely parallelizable: O(1) in terms of seq
length
● Bunch of feed-forward networks, much easier to go
against overfitting
TRANSFORMER
Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia
Polosukhin
24. 24
● Multi-head attention mechanism
○ Attention: Every time point finds its similarity with every other time
point in the seq
○ First the seq projected to k different sub-spaces and then attention is
applied
● No hidden-state. Encoded output seq has the same length as the input seq
● It encodes a set
TRANSFORMER
25. 25
Infrastructure
● Low-Latency super duper data processing pipeline
● Multi-GPU training
● Sagemaker, Kubernetes, Databricks for large scale analysis
● One-click training data with new features generation