SlideShare a Scribd company logo
1 of 29
Efficient Counterfactual Learning from
Bandit Feedback
Narita, Yasui and Yata (2018)
Presenter: Masahiro Kato
October 6th 2018 @Ichimura Seminar
Aprill 11th 2020 @Bread Seminar
1
Purpose and Contribution
This paper looks for the most statistically efficient (i.e., lowest asymptotic
variance with no asymptotic bias) way to do off-line policy evaluation and
optimization with log data of bandit feedback.
Contribution:
• Proposed an efficient estimator for off-line evaluation.
• Conducted experiments using the real data.
2
𝑋𝑡
User
Observe
Choice of a treatment
Probability 𝑝 Consider to evaluate a policy 𝜋 using a pool of data.
𝑝 do not have to have relationship with 𝜋.
Experimental Result
3
Problem Setting
We consider a general multi-armed contextual bandit setting.
• There is a set of 𝑚 + 1 actions (equivalently, arms or treatments), 𝐴 = {0, … , 𝑚},
that the decision maker can choose from.
• Let 𝑌 (·) ∶ 𝐴 → 𝑅 denote a potential reward function that maps actions into
rewards or outcomes, where 𝑌 (𝑎) is the reward when action a is chosen (e.g.,
whether an advertisement as an action results in a click).
• Let 𝑋 denote context or covariates (e.g., the user’s demographic profile and
browsing history) that the decision maker observes when picking an action. We
denote the set of contexts by 𝒳.
• We think of (𝑌 (·), 𝑋) as a random vector with unknown distribution 𝐺.
4
Data Generating Process
We consider log data coming from the following data generating process (DGP),
which is similar to those used in the literature on the offline evaluation of
contextual bandit algorithms (Li et al. 2010; Strehl et al. 2010; Li et al. 2011; Li
et al. 2012; Swaminathan and Joachims 2015a; Swaminathan and Joachims
2015b; Swaminathan et al. 2017).
We observe data 𝑌𝑡, 𝑋𝑡, 𝐷𝑡 𝑡=1
𝑇
with 𝑇 observations. 𝐷𝑡 ≡ 𝐷𝑡0 , … , 𝐷𝑡𝑚 ′
where 𝐷𝑡𝑎 is a binary variable indicating whether action a is chosen in round 𝑡.
𝑌𝑡 denotes the reward observed in round 𝑡, i.e., Yt ≡ 𝑎=0
𝑚
𝐷𝑡𝑎 𝑌𝑡(𝑎).
• 𝑋𝑡 denotes the context observed in round t.
5
A key feature of our DGP is that the data 𝑌𝑡, 𝑋𝑡, 𝐷𝑡 𝑡=1
𝑇
are divided into 𝐵
batches, where different batches may use different choice probabilities
(propensity scores).
Let 𝑋𝑡
𝑏
∈ {1, 2, … , 𝐵} denote a random variable indicating the batch to which
round 𝑡 belongs.
We treat this batch number as one of context variables and write 𝑋𝑡 = ( 𝑋𝑡, 𝑋𝑡
𝑏
),
where 𝑋𝑡 is the vector of context variables other than 𝑋𝑡
𝑏
.
6
t 1 2 3 4 5 … T-2 T-1 T
𝑋𝑡
𝑏 1 1 2 2 2 B-1 B B
𝑋𝑡 𝑋1 𝑋2 𝑋3 𝑋4 𝑋5 𝑋 𝑇−2 𝑋 𝑇−1 𝑋 𝑇
𝐷𝑡 1 m ? ? ? ? ? ?
𝑌𝑡 1 0 ? ? ? ? ? ?
ちなみにバッチの設定はメインのアルゴリズムには出てこない.
Let 𝑝𝑡 = (𝑝𝑡0, … , 𝑝𝑡𝑚) ′ ∈ ∆(𝐴) denote the potentially unknown probability
vector indicating the probability that each action is chosen in round 𝑡.
• Here ∆(𝐴) ≡ { 𝑝 𝑎 ∈ ℝ+
𝑚+1
| 𝑎 𝑝 𝑎 = 1} with 𝑝 𝑎 being the probability that
action a is chosen.
A contextual bandit algorithm is a sequence 𝐹𝑏 𝑏=1
𝐵
of distribution functions of
choice probabilities 𝑝𝑡 conditional on 𝑋𝑡, where 𝐹𝑏: 𝒳 → ∆(∆(𝐴)) for 𝑏 ∈
{1, 2, … , 𝐵} and 𝒳 is the support of 𝑋𝑡, where ∆(∆(A)) is the set of distributions
over ∆(A).
• 𝐹𝑏 takes context 𝑋𝑡 as input and returns a distribution of probability vector 𝑝𝑡 in
rounds of batch 𝑏.
• 𝐹𝑏 can vary across batches but does not change across rounds within batch 𝑏.
7
DGP and Contextual Bandit
We assume that the log data are generated by a contextual bandit algorithm
𝐹𝑏 𝑏=1
𝐵
as follows:
• In each round 𝑡 = 1, … , 𝑇, (𝑌𝑡 ⋅ , 𝑋𝑡) is i.i.d drawn from distribution 𝐺. Re-order
round numbers so that they are monotonically increasing in their batch numbers
𝑋𝑡.
• In each round 𝑡 within batch 𝑏 ∈ {1,2, … , 𝐵} and given 𝑋𝑡, probability vector
𝑝𝑡 = 𝑝𝑡0, … , 𝑝𝑡𝑚 ′ is drawn from 𝐹𝑏(⋅ | 𝑋𝑡). Action is randomly chosen based on
probability vector 𝑝𝑡, creating the action choice 𝐷𝑡 and the associated reward 𝑌𝑡.
8
Data Generating Process
To make the notation simpler, we put 𝐹𝑏 𝑏=1
𝐵
together into a single distribution
𝐹: 𝒳 → Δ(Δ 𝒜 ) obtained by 𝐹 · 𝑋, 𝑋 𝑏 = 𝑏 = 𝐹𝑏 · 𝑋) for each 𝑏 ∈
{1, 2, … , 𝐵}.
We use this to rewrite our DGP as follows:
• In each round 𝑡 = 1, … , 𝑇, (𝑌𝑡 ⋅ , 𝑋𝑡) is i.i.d. drawn from distribution 𝐺. Given
𝑋𝑡 , probability vector 𝑝0 𝑥 ≡ 𝑝00 𝑥 , … , 𝑝0𝑚 𝑥 ′ is drawn from 𝐹(· |𝑋𝑡).
Action is randomly chosen based on probability vector 𝑝𝑡, creating the action
choice 𝐷𝑡 and the associated reward 𝑌𝑡.
Define
𝑝0𝑎 𝑥 ≡ Pr
𝐷∼𝑝,𝑝∼𝐹
(𝐷 𝑎= 1|𝑋 = 𝑥)
for each a , and let 𝑝0 𝑥 ≡ 𝑝00 𝑥 , … , 𝑝0𝑚 𝑥 ′.
This is the choice probability vector conditional on each context.
9
0とあるのがややこしいけれども,
これが実際に処置を施す
(バンディット風に言えば腕を引く)
時に使った真の確率.
Data Generating Process
𝐹 is common for all rounds regardless of the batch to which they belong.
? バッチ毎に変わるのでは.
• Thus 𝑝𝑡 and 𝐷𝑡 are i.i.d. across rounds.
Because (𝑌𝑡(·) , 𝑋𝑡) is i.i.d. and 𝑌𝑡 = 𝑖=0
𝑚
𝐷𝑡𝑎 𝑌𝑡(𝑎) , each observation (𝑌𝑡, 𝑋𝑡, 𝐷𝑡)
is i.i.d..
Note also that 𝐷𝑡 is independent of 𝑌𝑡(·) conditional on 𝑋𝑡.
10
Parameters of Interest
 We are interested in using the log data to estimate the expected reward from any
given counterfactual policy 𝜋: 𝒳 → Δ 𝒜 :
𝑉 𝜋 ≡ 𝔼 𝑌 ⋅ ,𝑋 ∼𝐺
𝑎=0
𝑚
𝑌 𝑎 𝜋 𝑎 𝑋
= 𝔼 𝑌 ⋅ ,𝑋 ∼𝐺,𝐷∼𝑝0(𝑋)
𝑎=0
𝑚
𝑌 𝑎 𝐷 𝑎
𝜋 𝑎 𝑋
𝑝0𝑎(𝑋)
where we use the independence of 𝐷 and 𝑌 (·) conditional on 𝑋 and the
definition of 𝑝0(·) for the last equality.
• Here, Δ 𝒜 ≡ 𝑝 𝑎 ∈ ℝ 𝑚+1 | 𝑎 𝑝 𝑎 ≤ 1 (a distribution of actions).
We allow the counterfactual policy 𝜋 to be degenerate, i.e., 𝜋 may choose a
particular action with probability 1.
11
𝑝0𝑎 𝑥 ≡ Pr
𝐷∼𝑝,𝑝∼𝐹
(𝐷 𝑎= 1 𝑋 = 𝑥
𝑝0 𝑥 ≡ 𝑝00 𝑥 , … , 𝑝0𝑚 𝑥 ′
Depending on the choice of 𝜋, 𝑉 𝜋 represents a variety of parameters of interest.
When we set 𝜋 𝑎 𝑥 = 1 for a particular action 𝑎 and 𝜋(𝑎′|𝑥) for all 𝑎′ ∈ 𝒜 ∖
𝑎 for all 𝑥 ∈ 𝒳, 𝑉 𝜋
equals 𝔼 𝑌 ⋅ ,𝑋 ∼𝐷,[𝑌 𝑎 ], the expected reward from action
𝑎.
?When we set 𝜋 𝑎 𝑥 = 1, 𝜋 0 𝑥 = −1 and 𝜋 𝑎′ 𝑥 = 0 for all 𝑎′
∈ 𝒜 ∖
0, 𝑎 for all𝑥 ∈ 𝒳, 𝑉 𝜋 equals 𝔼 𝑌 ⋅ ,𝑋 ∼𝐷,[𝑌 𝑎 − 𝑌(0)], the average treatment
effect of action 𝑎 over action 0.
12
Efficient Value Estimation
We consider the efficient estimation of the expected reward from a
counterfactual policy, 𝑉 𝜋.
This scalar parameter is defined as a function of the distribution of (𝑌 (⋅), 𝑋), on
which we impose no parametric assumption.
This is therefore a semiparametric estimation problem. For this semiparametric
estimation problem, we first derive the semiparametric efficiency bound on how
efficient and precise the estimation of the parameter can be, which is a
semiparametric analog of the Cramer-Rao bound (Bickel et al. 1993).
13
Semiparametric Efficiency
The asymptotic variance of any 𝑇-consistent and asymptotically normal
estimator (i.e., any consistent estimator with a convergence rate 1 ∕ 𝑇 whose
limiting distribution is normal) is no smaller than the semiparametric efficiency
bound.
We then show certain estimators achieve the semiparametric efficiency bound,
i.e., they minimize the asymptotic variance among all 𝑇-consistent and
asymptotically normal estimators.
14
Assumptions
A couple of regularity conditions are used for the analysis.
The logging policy 𝑝0(⋅) ex ante chooses every action with a positive probability
for every context.
15
There exists some 𝑝 such that 0 < 𝑝 ≤ Pr
𝐷∼𝑝,𝑝∼𝐹
𝐷 𝑎 = 1 𝑋 = 𝑥 ≡ 𝑝0𝑎(𝑥) for any
𝑥 ∈ 𝒳 and for 𝑎 = 0, … , 𝑚.
𝔼 𝑌 𝑎 2
< ∞ for 𝑎 = 0, … , 𝑚.
The existence of finite second moments of potential rewards.
Assumption 2
Assumption 1
Theoretical Results
The following lemma provides the semiparametric efficiency bound for 𝑉 𝜋.
16
Under Assumption 1 and 2, the semiparametric efficiency bound for 𝑉 𝜋, the
expected reward from counterfactual policy 𝜋, is
𝔼
𝑎=0
𝑚
𝕍[𝑌(𝑎)|𝑋]
𝜋 𝑎 𝑋
𝑝0𝑎(𝑋)
+ 𝜃 𝑋 − 𝑉 𝜋 2 ,
where 𝜃 𝑋 = 𝑎=0
𝑚
𝔼 𝑌 𝑎 |𝑋 𝜋(𝑎|𝑋) is the expected reward from policy 𝜋
conditional on 𝑋.
Lemma 1 (Semiparametric Efficiency Bound)
Theoretical Results
 Lemma 1 also implies the semiparametric efficiency bounds for the expected
reward from each action and for the average treatment effect, since they are
special cases of 𝑉 𝜋.
17
Suppose that Assumptions 1 and 2 hold. Then, the semiparametric efficiency
bound for the expected reward from each action, 𝔼[𝑌 𝑎 ], is
𝔼
𝕍 𝑌 𝑎 𝑋
𝑝0𝑎(𝑋)
+ 𝔼 𝑌 𝑎 𝑋 − 𝔼 𝑌 𝑎 2 .
The semiparametric efficiency bound for the average treatment effect,𝔼[ 𝑌 𝑎 −
Corollary 1.
Theoretical Results
The estimator consists of two steps:
1. Nonparametrically estimate the propensity score vector 𝑝0(⋅) by a consistent
estimator.
2. We plug the estimated propensity 𝑝(⋅) into the sample analogue of expression
𝔼 𝑌 ⋅ ,𝑋 ∼𝐺,𝐷∼𝑝0(𝑋) 𝑎=0
𝑚
𝑌 𝑎 𝐷 𝑎
𝜋 𝑎 𝑋
𝑝0𝑎(𝑋)
to estimate 𝑉 𝜋:
𝑉 𝜋 =
1
𝑇
𝑡=1
𝑇
𝑎=0
𝑚
𝑌𝑡 𝐷𝑡𝑎
𝜋 𝑎 𝑋𝑡
𝑝 𝑎(𝑋𝑡)
.
Alternately, one can use a “self-normalized” estimator inspired by Swaminathan
and Joachims (2015) when 𝑎=0
𝑚
𝜋(𝑎|𝑥) = 1 for all 𝑥 ∈ 𝒳:
𝑉𝑆𝑁
𝜋
=
1
𝑇 𝑡=1
𝑇
𝑎=0
𝑚
𝑌𝑡 𝐷𝑡𝑎
𝜋 𝑎 𝑋𝑡
𝑝 𝑎 𝑋𝑡
1
𝑇 𝑡=1
𝑇
𝑎=0
𝑚
𝐷𝑡𝑎
𝜋 𝑎 𝑋𝑡
𝑝 𝑎 𝑋𝑡
18
Theoretical Results
Swaminathan and Joachims (2015b) suggest that 𝑉𝑆𝑁
𝜋
tends to be less biased than
𝑉 𝜋 in small sample.
Unlike Swaminathan and Joachims (2015b), however, we use the estimated
propensity score rather than the true one. We find that these estimators are
efficient, building upon Chen, Hong, and Tarozzi (2008) and Ackerberg et al.
(2014) among others.
19
Suppose that Assumptions 1 and 2 hold and that 𝑝(⋅) or observe the realization of
the probability vectors 𝑝𝑡 𝑡=1
𝑇
is a consistent estimator for 𝑝0(⋅). Then, the
variance of 𝑉 𝜋 and 𝑉𝑆𝑁
𝜋
achieves the semiparametric efficiency bound for 𝑉 𝜋.
Theorem 1 (Efficient Estimators).
Inefficient Value Estimation
In some environments, we know the true 𝑝0(⋅) or observe the realization of the
probability vectors 𝑝𝑡 𝑡=1
𝑇
.
In this case, an alternative way to estimate 𝑉 𝜋 is to use the sample analogue of
the expression (1) without estimating the propensity score.
If we know 𝑝0(⋅), a possible estimator is
𝑉 𝜋
=
1
𝑇
𝑡=1
𝑇
𝑎=0
𝑚
𝑌𝑡 𝐷𝑡𝑎
𝜋 𝑎 𝑋𝑡
𝑝0𝑎(𝑋𝑡)
.
If we observe the realization of 𝑝𝑡 𝑡=1
𝑇
, we may use
𝑉 𝜋
=
1
𝑇
𝑡=1
𝑇
𝑎=0
𝑚
𝑌𝑡 𝐷𝑡𝑎
𝜋 𝑎 𝑋𝑡
𝑝𝑡𝑎(𝑋𝑡)
.
20
When 𝑎=0
𝑚
𝜋 𝑎|𝑥 = 1 for all 𝑥 ∈ 𝒳, it is again possible to use their self-
normalized versions:
𝑉𝑆𝑁
𝜋
=
1
𝑇 𝑡=1
𝑇
𝑎=0
𝑚
𝑌𝑡 𝐷𝑡𝑎
𝜋 𝑎 𝑋𝑡
𝑝0𝑎 𝑋𝑡
1
𝑇 𝑡=1
𝑇
𝑎=0
𝑚
𝐷𝑡𝑎
𝜋 𝑎 𝑋𝑡
𝑝0𝑎 𝑋𝑡
.
𝑉𝑆𝑁
𝜋
=
1
𝑇 𝑡=1
𝑇
𝑎=0
𝑚
𝑌𝑡 𝐷𝑡𝑎
𝜋 𝑎 𝑋𝑡
𝑝𝑡𝑎 𝑋𝑡
1
𝑇 𝑡=1
𝑇
𝑎=0
𝑚
𝐷𝑡𝑎
𝜋 𝑎 𝑋𝑡
𝑝𝑡𝑎 𝑋𝑡
21
Theoretical Results
These intuitive estimators turn out to be less efficient than the estimators with
the estimated propensity score, as the following result shows.
22
Suppose that the propensity sore 𝒑 𝟎(⋅) is known and we observe the realization of
𝑝𝑡 𝑡=1
𝑇
. Suppose also that Assumptions 1 and 2 hold and that 𝑝 ⋅ is a consistent
estimator for 𝑝0(⋅). Then, the asymptotic variances of 𝑉 𝜋, 𝑉 𝜋, 𝑉𝑆𝑁
𝜋
and 𝑉𝑆𝑁
𝜋
are no
smaller than that of 𝑉 𝜋 and 𝑉𝑆𝑁
𝜋
. Generically, 𝑉 𝜋, 𝑉 𝜋, 𝑉𝑆𝑁
𝜋
and 𝑉𝑆𝑁
𝜋
are strictly less
efficient than 𝑉 𝜋 and 𝑉𝑆𝑁
𝜋
in the following sense.
Theorem 2 (Inefficient Estimators).
Theoretical Results
23
1. Suppose at least one of the following holds:
a. Pr 𝔼 𝑌 𝑎 𝑋
𝜋 𝑎 𝑋
𝑝0𝑎 𝑋
≠ 𝜃 𝑋 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑎 > 0, or
b. Pr 𝜃 𝑋 ≠ 𝑉 𝜋
> 0.
Then the asymptotic variances of 𝑉 𝜋
, 𝑉 𝜋
, 𝑉𝑆𝑁
𝜋
and 𝑉𝑆𝑁
𝜋
are strictly larger than that of
𝑉 𝜋 and 𝑉𝑆𝑁
𝜋
.
2. If Pr 𝔼 𝑌 𝑎 2
𝑋 𝜋 𝑎 𝑋 2
𝔼
1
𝑝 𝑎
𝑋 −
1
𝑝0𝑎 𝑋
= 0 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑎 > 0, then the
asymptotic variance of 𝑉 𝜋 and 𝑉𝑆𝑁
𝜋
is strictly larger than that of 𝑉 𝜋 and 𝑉𝑆𝑁
𝜋
.
Theorem 2 (Inefficient Estimators).
Theorems 1 and 2 suggest that we should estimate the propensity score and use
the estimated score regardless of whether the propensity score is known.
𝜃 𝑋 =
𝑎=0
𝑚
𝔼 𝑌 𝑎 |𝑋 𝜋(𝑎|𝑋)
Intuition for Theorem 1 and 2
24
How to Estimate Propensity Scores?
1. A sieve Least Squares (LS) estimator:
𝑝 𝑎 ⋅ = arg min
𝑝 𝑎 ⋅ ∈ℋ 𝑎𝑇
1
𝑇
𝑡=1
𝑇
𝐷𝑡𝑎 − 𝑝 𝑎 𝑋𝑡
2
,
where ℋ𝑎𝑇 = 𝑝 𝑎 𝑥 = 𝑗=1
𝑘 𝑎𝑇
𝑞 𝑎𝑗 𝑥 𝜆 𝑎𝑗 = 𝑞 𝑘 𝑎𝑇 𝑥 ′ 𝜆 𝑎 and 𝑘 𝑎𝑇 → ∞ as 𝑇 → ∞.
Here 𝑞 𝑎𝑗 𝑗=1
∞
is some known basis functions defined on 𝒳 and 𝑞 𝑘 𝑎𝑇 ⋅ =
(𝑞 𝑎1 ⋅ , … , 𝑞 𝑎𝑘 𝑎𝑇
⋅ )′.
• What is “sieve”?
25
How to Estimate Propensity Scores?
2. A sieve Logit Maximum Likelihood estimator:
𝑝 ⋅ = arg max
𝑝 ⋅ ∈ℋ 𝑇
log
𝑡=1
𝑇
𝑎=0
𝑚
𝑝 𝑎 𝑋𝑡
𝐷 𝑎𝑡
= arg max
𝑝 ⋅ ∈ℋ 𝑇
1
𝑇
𝑡=1
𝑇
𝑎=0
𝑚
𝐷 𝑎𝑡 log 𝑝 𝑎(𝑋𝑡) ,
where
3. Modern Machine Learning Algorithms.
26
Estimating Asymptotic Variance
We can also calculate its asymptotic variance by using an estimator of 𝑉 𝜋.
However, we have to estimate 𝜇0(𝑎|𝑥) (individual effect) additionally.
In the experiments, the output is click rate, so they use the ridge regression to
estimate 𝜇0 𝑎 𝑥 .
27
Real-World Applications
28
Conclusion
29

More Related Content

What's hot

Logit model testing and interpretation
Logit model testing and interpretationLogit model testing and interpretation
Logit model testing and interpretationFelipe Affonso
 
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...Ajay Kumar
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesGilles Louppe
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
 
A reinforcement learning approach for designing artificial autonomous intelli...
A reinforcement learning approach for designing artificial autonomous intelli...A reinforcement learning approach for designing artificial autonomous intelli...
A reinforcement learning approach for designing artificial autonomous intelli...Université de Liège (ULg)
 
Designing Test Collections for Comparing Many Systems
Designing Test Collections for Comparing Many SystemsDesigning Test Collections for Comparing Many Systems
Designing Test Collections for Comparing Many SystemsTetsuya Sakai
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Statistics Applied to Biomedical Sciences
Statistics Applied to Biomedical SciencesStatistics Applied to Biomedical Sciences
Statistics Applied to Biomedical SciencesLuca Massarelli
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control StudySatish Gupta
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
 
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1Daniel Katz
 
Intro to Quant Trading Strategies (Lecture 2 of 10)
Intro to Quant Trading Strategies (Lecture 2 of 10)Intro to Quant Trading Strategies (Lecture 2 of 10)
Intro to Quant Trading Strategies (Lecture 2 of 10)Adrian Aley
 
Optimization Of Fuzzy Bexa Using Nm
Optimization Of Fuzzy Bexa Using NmOptimization Of Fuzzy Bexa Using Nm
Optimization Of Fuzzy Bexa Using NmAshish Khetan
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis Ibrahim Amer
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmLoc Nguyen
 

What's hot (20)

Logit model testing and interpretation
Logit model testing and interpretationLogit model testing and interpretation
Logit model testing and interpretation
 
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
A reinforcement learning approach for designing artificial autonomous intelli...
A reinforcement learning approach for designing artificial autonomous intelli...A reinforcement learning approach for designing artificial autonomous intelli...
A reinforcement learning approach for designing artificial autonomous intelli...
 
Seattle.Slides.7
Seattle.Slides.7Seattle.Slides.7
Seattle.Slides.7
 
Designing Test Collections for Comparing Many Systems
Designing Test Collections for Comparing Many SystemsDesigning Test Collections for Comparing Many Systems
Designing Test Collections for Comparing Many Systems
 
Differential evolution
Differential evolutionDifferential evolution
Differential evolution
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Statistics Applied to Biomedical Sciences
Statistics Applied to Biomedical SciencesStatistics Applied to Biomedical Sciences
Statistics Applied to Biomedical Sciences
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
 
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1Quantitative Methods for Lawyers - Class #22 -  Regression Analysis - Part 1
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 1
 
Intro to Quant Trading Strategies (Lecture 2 of 10)
Intro to Quant Trading Strategies (Lecture 2 of 10)Intro to Quant Trading Strategies (Lecture 2 of 10)
Intro to Quant Trading Strategies (Lecture 2 of 10)
 
A Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM AlgorithmA Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM Algorithm
 
Optimization Of Fuzzy Bexa Using Nm
Optimization Of Fuzzy Bexa Using NmOptimization Of Fuzzy Bexa Using Nm
Optimization Of Fuzzy Bexa Using Nm
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 

Similar to Efficient Counterfactual Learning from Bandit Feedback

Koh_Liang_ICML2017
Koh_Liang_ICML2017Koh_Liang_ICML2017
Koh_Liang_ICML2017Masa Kato
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descentRevanth Kumar
 
Statistical Inference Part II: Types of Sampling Distribution
Statistical Inference Part II: Types of Sampling DistributionStatistical Inference Part II: Types of Sampling Distribution
Statistical Inference Part II: Types of Sampling DistributionDexlab Analytics
 
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGNA GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGNcscpconf
 
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...csandit
 
Approximate Solution of a Linear Descriptor Dynamic Control System via a non-...
Approximate Solution of a Linear Descriptor Dynamic Control System via a non-...Approximate Solution of a Linear Descriptor Dynamic Control System via a non-...
Approximate Solution of a Linear Descriptor Dynamic Control System via a non-...IOSR Journals
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Mengxi Jiang
 
Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)Adrian Aley
 
Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)Ahmad Gomaa
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Generalised Statistical Convergence For Double Sequences
Generalised Statistical Convergence For Double SequencesGeneralised Statistical Convergence For Double Sequences
Generalised Statistical Convergence For Double SequencesIOSR Journals
 
Intro to Quant Trading Strategies (Lecture 8 of 10)
Intro to Quant Trading Strategies (Lecture 8 of 10)Intro to Quant Trading Strategies (Lecture 8 of 10)
Intro to Quant Trading Strategies (Lecture 8 of 10)Adrian Aley
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...Dexlab Analytics
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements modelAlexander Decker
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements modelAlexander Decker
 

Similar to Efficient Counterfactual Learning from Bandit Feedback (20)

Koh_Liang_ICML2017
Koh_Liang_ICML2017Koh_Liang_ICML2017
Koh_Liang_ICML2017
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
 
Statistical Inference Part II: Types of Sampling Distribution
Statistical Inference Part II: Types of Sampling DistributionStatistical Inference Part II: Types of Sampling Distribution
Statistical Inference Part II: Types of Sampling Distribution
 
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGNA GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
A GENERALIZED SAMPLING THEOREM OVER GALOIS FIELD DOMAINS FOR EXPERIMENTAL DESIGN
 
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
 
Approximate Solution of a Linear Descriptor Dynamic Control System via a non-...
Approximate Solution of a Linear Descriptor Dynamic Control System via a non-...Approximate Solution of a Linear Descriptor Dynamic Control System via a non-...
Approximate Solution of a Linear Descriptor Dynamic Control System via a non-...
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
 
Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)Intro to Quant Trading Strategies (Lecture 7 of 10)
Intro to Quant Trading Strategies (Lecture 7 of 10)
 
JISA_Paper
JISA_PaperJISA_Paper
JISA_Paper
 
Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Generalised Statistical Convergence For Double Sequences
Generalised Statistical Convergence For Double SequencesGeneralised Statistical Convergence For Double Sequences
Generalised Statistical Convergence For Double Sequences
 
Intro to Quant Trading Strategies (Lecture 8 of 10)
Intro to Quant Trading Strategies (Lecture 8 of 10)Intro to Quant Trading Strategies (Lecture 8 of 10)
Intro to Quant Trading Strategies (Lecture 8 of 10)
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements model
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements model
 

More from Masa Kato

敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度Masa Kato
 
最適腕識別と多重検定
最適腕識別と多重検定最適腕識別と多重検定
最適腕識別と多重検定Masa Kato
 
Validating Causal Inference Models via Influence Functions
Validating Causal Inference Modelsvia Influence FunctionsValidating Causal Inference Modelsvia Influence Functions
Validating Causal Inference Models via Influence FunctionsMasa Kato
 
Jamieson_Jain2018
Jamieson_Jain2018Jamieson_Jain2018
Jamieson_Jain2018Masa Kato
 
マルコフ転換モデル:導入編
マルコフ転換モデル:導入編マルコフ転換モデル:導入編
マルコフ転換モデル:導入編Masa Kato
 
経済学のための並列分散処理2
経済学のための並列分散処理2経済学のための並列分散処理2
経済学のための並列分散処理2Masa Kato
 
経済学のための並列分散処理1
経済学のための並列分散処理1経済学のための並列分散処理1
経済学のための並列分散処理1Masa Kato
 
Neural netorksmatching
Neural netorksmatchingNeural netorksmatching
Neural netorksmatchingMasa Kato
 
米国のインサイダー取引規制
米国のインサイダー取引規制米国のインサイダー取引規制
米国のインサイダー取引規制Masa Kato
 
Risk based approaches to asset allocation chap0102
Risk based approaches to asset allocation chap0102Risk based approaches to asset allocation chap0102
Risk based approaches to asset allocation chap0102Masa Kato
 
適時開示制度
適時開示制度適時開示制度
適時開示制度Masa Kato
 
Experimental games
Experimental games Experimental games
Experimental games Masa Kato
 

More from Masa Kato (12)

敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度
 
最適腕識別と多重検定
最適腕識別と多重検定最適腕識別と多重検定
最適腕識別と多重検定
 
Validating Causal Inference Models via Influence Functions
Validating Causal Inference Modelsvia Influence FunctionsValidating Causal Inference Modelsvia Influence Functions
Validating Causal Inference Models via Influence Functions
 
Jamieson_Jain2018
Jamieson_Jain2018Jamieson_Jain2018
Jamieson_Jain2018
 
マルコフ転換モデル:導入編
マルコフ転換モデル:導入編マルコフ転換モデル:導入編
マルコフ転換モデル:導入編
 
経済学のための並列分散処理2
経済学のための並列分散処理2経済学のための並列分散処理2
経済学のための並列分散処理2
 
経済学のための並列分散処理1
経済学のための並列分散処理1経済学のための並列分散処理1
経済学のための並列分散処理1
 
Neural netorksmatching
Neural netorksmatchingNeural netorksmatching
Neural netorksmatching
 
米国のインサイダー取引規制
米国のインサイダー取引規制米国のインサイダー取引規制
米国のインサイダー取引規制
 
Risk based approaches to asset allocation chap0102
Risk based approaches to asset allocation chap0102Risk based approaches to asset allocation chap0102
Risk based approaches to asset allocation chap0102
 
適時開示制度
適時開示制度適時開示制度
適時開示制度
 
Experimental games
Experimental games Experimental games
Experimental games
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Efficient Counterfactual Learning from Bandit Feedback

  • 1. Efficient Counterfactual Learning from Bandit Feedback Narita, Yasui and Yata (2018) Presenter: Masahiro Kato October 6th 2018 @Ichimura Seminar Aprill 11th 2020 @Bread Seminar 1
  • 2. Purpose and Contribution This paper looks for the most statistically efficient (i.e., lowest asymptotic variance with no asymptotic bias) way to do off-line policy evaluation and optimization with log data of bandit feedback. Contribution: • Proposed an efficient estimator for off-line evaluation. • Conducted experiments using the real data. 2 𝑋𝑡 User Observe Choice of a treatment Probability 𝑝 Consider to evaluate a policy 𝜋 using a pool of data. 𝑝 do not have to have relationship with 𝜋.
  • 4. Problem Setting We consider a general multi-armed contextual bandit setting. • There is a set of 𝑚 + 1 actions (equivalently, arms or treatments), 𝐴 = {0, … , 𝑚}, that the decision maker can choose from. • Let 𝑌 (·) ∶ 𝐴 → 𝑅 denote a potential reward function that maps actions into rewards or outcomes, where 𝑌 (𝑎) is the reward when action a is chosen (e.g., whether an advertisement as an action results in a click). • Let 𝑋 denote context or covariates (e.g., the user’s demographic profile and browsing history) that the decision maker observes when picking an action. We denote the set of contexts by 𝒳. • We think of (𝑌 (·), 𝑋) as a random vector with unknown distribution 𝐺. 4
  • 5. Data Generating Process We consider log data coming from the following data generating process (DGP), which is similar to those used in the literature on the offline evaluation of contextual bandit algorithms (Li et al. 2010; Strehl et al. 2010; Li et al. 2011; Li et al. 2012; Swaminathan and Joachims 2015a; Swaminathan and Joachims 2015b; Swaminathan et al. 2017). We observe data 𝑌𝑡, 𝑋𝑡, 𝐷𝑡 𝑡=1 𝑇 with 𝑇 observations. 𝐷𝑡 ≡ 𝐷𝑡0 , … , 𝐷𝑡𝑚 ′ where 𝐷𝑡𝑎 is a binary variable indicating whether action a is chosen in round 𝑡. 𝑌𝑡 denotes the reward observed in round 𝑡, i.e., Yt ≡ 𝑎=0 𝑚 𝐷𝑡𝑎 𝑌𝑡(𝑎). • 𝑋𝑡 denotes the context observed in round t. 5
  • 6. A key feature of our DGP is that the data 𝑌𝑡, 𝑋𝑡, 𝐷𝑡 𝑡=1 𝑇 are divided into 𝐵 batches, where different batches may use different choice probabilities (propensity scores). Let 𝑋𝑡 𝑏 ∈ {1, 2, … , 𝐵} denote a random variable indicating the batch to which round 𝑡 belongs. We treat this batch number as one of context variables and write 𝑋𝑡 = ( 𝑋𝑡, 𝑋𝑡 𝑏 ), where 𝑋𝑡 is the vector of context variables other than 𝑋𝑡 𝑏 . 6 t 1 2 3 4 5 … T-2 T-1 T 𝑋𝑡 𝑏 1 1 2 2 2 B-1 B B 𝑋𝑡 𝑋1 𝑋2 𝑋3 𝑋4 𝑋5 𝑋 𝑇−2 𝑋 𝑇−1 𝑋 𝑇 𝐷𝑡 1 m ? ? ? ? ? ? 𝑌𝑡 1 0 ? ? ? ? ? ? ちなみにバッチの設定はメインのアルゴリズムには出てこない.
  • 7. Let 𝑝𝑡 = (𝑝𝑡0, … , 𝑝𝑡𝑚) ′ ∈ ∆(𝐴) denote the potentially unknown probability vector indicating the probability that each action is chosen in round 𝑡. • Here ∆(𝐴) ≡ { 𝑝 𝑎 ∈ ℝ+ 𝑚+1 | 𝑎 𝑝 𝑎 = 1} with 𝑝 𝑎 being the probability that action a is chosen. A contextual bandit algorithm is a sequence 𝐹𝑏 𝑏=1 𝐵 of distribution functions of choice probabilities 𝑝𝑡 conditional on 𝑋𝑡, where 𝐹𝑏: 𝒳 → ∆(∆(𝐴)) for 𝑏 ∈ {1, 2, … , 𝐵} and 𝒳 is the support of 𝑋𝑡, where ∆(∆(A)) is the set of distributions over ∆(A). • 𝐹𝑏 takes context 𝑋𝑡 as input and returns a distribution of probability vector 𝑝𝑡 in rounds of batch 𝑏. • 𝐹𝑏 can vary across batches but does not change across rounds within batch 𝑏. 7
  • 8. DGP and Contextual Bandit We assume that the log data are generated by a contextual bandit algorithm 𝐹𝑏 𝑏=1 𝐵 as follows: • In each round 𝑡 = 1, … , 𝑇, (𝑌𝑡 ⋅ , 𝑋𝑡) is i.i.d drawn from distribution 𝐺. Re-order round numbers so that they are monotonically increasing in their batch numbers 𝑋𝑡. • In each round 𝑡 within batch 𝑏 ∈ {1,2, … , 𝐵} and given 𝑋𝑡, probability vector 𝑝𝑡 = 𝑝𝑡0, … , 𝑝𝑡𝑚 ′ is drawn from 𝐹𝑏(⋅ | 𝑋𝑡). Action is randomly chosen based on probability vector 𝑝𝑡, creating the action choice 𝐷𝑡 and the associated reward 𝑌𝑡. 8
  • 9. Data Generating Process To make the notation simpler, we put 𝐹𝑏 𝑏=1 𝐵 together into a single distribution 𝐹: 𝒳 → Δ(Δ 𝒜 ) obtained by 𝐹 · 𝑋, 𝑋 𝑏 = 𝑏 = 𝐹𝑏 · 𝑋) for each 𝑏 ∈ {1, 2, … , 𝐵}. We use this to rewrite our DGP as follows: • In each round 𝑡 = 1, … , 𝑇, (𝑌𝑡 ⋅ , 𝑋𝑡) is i.i.d. drawn from distribution 𝐺. Given 𝑋𝑡 , probability vector 𝑝0 𝑥 ≡ 𝑝00 𝑥 , … , 𝑝0𝑚 𝑥 ′ is drawn from 𝐹(· |𝑋𝑡). Action is randomly chosen based on probability vector 𝑝𝑡, creating the action choice 𝐷𝑡 and the associated reward 𝑌𝑡. Define 𝑝0𝑎 𝑥 ≡ Pr 𝐷∼𝑝,𝑝∼𝐹 (𝐷 𝑎= 1|𝑋 = 𝑥) for each a , and let 𝑝0 𝑥 ≡ 𝑝00 𝑥 , … , 𝑝0𝑚 𝑥 ′. This is the choice probability vector conditional on each context. 9 0とあるのがややこしいけれども, これが実際に処置を施す (バンディット風に言えば腕を引く) 時に使った真の確率.
  • 10. Data Generating Process 𝐹 is common for all rounds regardless of the batch to which they belong. ? バッチ毎に変わるのでは. • Thus 𝑝𝑡 and 𝐷𝑡 are i.i.d. across rounds. Because (𝑌𝑡(·) , 𝑋𝑡) is i.i.d. and 𝑌𝑡 = 𝑖=0 𝑚 𝐷𝑡𝑎 𝑌𝑡(𝑎) , each observation (𝑌𝑡, 𝑋𝑡, 𝐷𝑡) is i.i.d.. Note also that 𝐷𝑡 is independent of 𝑌𝑡(·) conditional on 𝑋𝑡. 10
  • 11. Parameters of Interest  We are interested in using the log data to estimate the expected reward from any given counterfactual policy 𝜋: 𝒳 → Δ 𝒜 : 𝑉 𝜋 ≡ 𝔼 𝑌 ⋅ ,𝑋 ∼𝐺 𝑎=0 𝑚 𝑌 𝑎 𝜋 𝑎 𝑋 = 𝔼 𝑌 ⋅ ,𝑋 ∼𝐺,𝐷∼𝑝0(𝑋) 𝑎=0 𝑚 𝑌 𝑎 𝐷 𝑎 𝜋 𝑎 𝑋 𝑝0𝑎(𝑋) where we use the independence of 𝐷 and 𝑌 (·) conditional on 𝑋 and the definition of 𝑝0(·) for the last equality. • Here, Δ 𝒜 ≡ 𝑝 𝑎 ∈ ℝ 𝑚+1 | 𝑎 𝑝 𝑎 ≤ 1 (a distribution of actions). We allow the counterfactual policy 𝜋 to be degenerate, i.e., 𝜋 may choose a particular action with probability 1. 11 𝑝0𝑎 𝑥 ≡ Pr 𝐷∼𝑝,𝑝∼𝐹 (𝐷 𝑎= 1 𝑋 = 𝑥 𝑝0 𝑥 ≡ 𝑝00 𝑥 , … , 𝑝0𝑚 𝑥 ′
  • 12. Depending on the choice of 𝜋, 𝑉 𝜋 represents a variety of parameters of interest. When we set 𝜋 𝑎 𝑥 = 1 for a particular action 𝑎 and 𝜋(𝑎′|𝑥) for all 𝑎′ ∈ 𝒜 ∖ 𝑎 for all 𝑥 ∈ 𝒳, 𝑉 𝜋 equals 𝔼 𝑌 ⋅ ,𝑋 ∼𝐷,[𝑌 𝑎 ], the expected reward from action 𝑎. ?When we set 𝜋 𝑎 𝑥 = 1, 𝜋 0 𝑥 = −1 and 𝜋 𝑎′ 𝑥 = 0 for all 𝑎′ ∈ 𝒜 ∖ 0, 𝑎 for all𝑥 ∈ 𝒳, 𝑉 𝜋 equals 𝔼 𝑌 ⋅ ,𝑋 ∼𝐷,[𝑌 𝑎 − 𝑌(0)], the average treatment effect of action 𝑎 over action 0. 12
  • 13. Efficient Value Estimation We consider the efficient estimation of the expected reward from a counterfactual policy, 𝑉 𝜋. This scalar parameter is defined as a function of the distribution of (𝑌 (⋅), 𝑋), on which we impose no parametric assumption. This is therefore a semiparametric estimation problem. For this semiparametric estimation problem, we first derive the semiparametric efficiency bound on how efficient and precise the estimation of the parameter can be, which is a semiparametric analog of the Cramer-Rao bound (Bickel et al. 1993). 13
  • 14. Semiparametric Efficiency The asymptotic variance of any 𝑇-consistent and asymptotically normal estimator (i.e., any consistent estimator with a convergence rate 1 ∕ 𝑇 whose limiting distribution is normal) is no smaller than the semiparametric efficiency bound. We then show certain estimators achieve the semiparametric efficiency bound, i.e., they minimize the asymptotic variance among all 𝑇-consistent and asymptotically normal estimators. 14
  • 15. Assumptions A couple of regularity conditions are used for the analysis. The logging policy 𝑝0(⋅) ex ante chooses every action with a positive probability for every context. 15 There exists some 𝑝 such that 0 < 𝑝 ≤ Pr 𝐷∼𝑝,𝑝∼𝐹 𝐷 𝑎 = 1 𝑋 = 𝑥 ≡ 𝑝0𝑎(𝑥) for any 𝑥 ∈ 𝒳 and for 𝑎 = 0, … , 𝑚. 𝔼 𝑌 𝑎 2 < ∞ for 𝑎 = 0, … , 𝑚. The existence of finite second moments of potential rewards. Assumption 2 Assumption 1
  • 16. Theoretical Results The following lemma provides the semiparametric efficiency bound for 𝑉 𝜋. 16 Under Assumption 1 and 2, the semiparametric efficiency bound for 𝑉 𝜋, the expected reward from counterfactual policy 𝜋, is 𝔼 𝑎=0 𝑚 𝕍[𝑌(𝑎)|𝑋] 𝜋 𝑎 𝑋 𝑝0𝑎(𝑋) + 𝜃 𝑋 − 𝑉 𝜋 2 , where 𝜃 𝑋 = 𝑎=0 𝑚 𝔼 𝑌 𝑎 |𝑋 𝜋(𝑎|𝑋) is the expected reward from policy 𝜋 conditional on 𝑋. Lemma 1 (Semiparametric Efficiency Bound)
  • 17. Theoretical Results  Lemma 1 also implies the semiparametric efficiency bounds for the expected reward from each action and for the average treatment effect, since they are special cases of 𝑉 𝜋. 17 Suppose that Assumptions 1 and 2 hold. Then, the semiparametric efficiency bound for the expected reward from each action, 𝔼[𝑌 𝑎 ], is 𝔼 𝕍 𝑌 𝑎 𝑋 𝑝0𝑎(𝑋) + 𝔼 𝑌 𝑎 𝑋 − 𝔼 𝑌 𝑎 2 . The semiparametric efficiency bound for the average treatment effect,𝔼[ 𝑌 𝑎 − Corollary 1.
  • 18. Theoretical Results The estimator consists of two steps: 1. Nonparametrically estimate the propensity score vector 𝑝0(⋅) by a consistent estimator. 2. We plug the estimated propensity 𝑝(⋅) into the sample analogue of expression 𝔼 𝑌 ⋅ ,𝑋 ∼𝐺,𝐷∼𝑝0(𝑋) 𝑎=0 𝑚 𝑌 𝑎 𝐷 𝑎 𝜋 𝑎 𝑋 𝑝0𝑎(𝑋) to estimate 𝑉 𝜋: 𝑉 𝜋 = 1 𝑇 𝑡=1 𝑇 𝑎=0 𝑚 𝑌𝑡 𝐷𝑡𝑎 𝜋 𝑎 𝑋𝑡 𝑝 𝑎(𝑋𝑡) . Alternately, one can use a “self-normalized” estimator inspired by Swaminathan and Joachims (2015) when 𝑎=0 𝑚 𝜋(𝑎|𝑥) = 1 for all 𝑥 ∈ 𝒳: 𝑉𝑆𝑁 𝜋 = 1 𝑇 𝑡=1 𝑇 𝑎=0 𝑚 𝑌𝑡 𝐷𝑡𝑎 𝜋 𝑎 𝑋𝑡 𝑝 𝑎 𝑋𝑡 1 𝑇 𝑡=1 𝑇 𝑎=0 𝑚 𝐷𝑡𝑎 𝜋 𝑎 𝑋𝑡 𝑝 𝑎 𝑋𝑡 18
  • 19. Theoretical Results Swaminathan and Joachims (2015b) suggest that 𝑉𝑆𝑁 𝜋 tends to be less biased than 𝑉 𝜋 in small sample. Unlike Swaminathan and Joachims (2015b), however, we use the estimated propensity score rather than the true one. We find that these estimators are efficient, building upon Chen, Hong, and Tarozzi (2008) and Ackerberg et al. (2014) among others. 19 Suppose that Assumptions 1 and 2 hold and that 𝑝(⋅) or observe the realization of the probability vectors 𝑝𝑡 𝑡=1 𝑇 is a consistent estimator for 𝑝0(⋅). Then, the variance of 𝑉 𝜋 and 𝑉𝑆𝑁 𝜋 achieves the semiparametric efficiency bound for 𝑉 𝜋. Theorem 1 (Efficient Estimators).
  • 20. Inefficient Value Estimation In some environments, we know the true 𝑝0(⋅) or observe the realization of the probability vectors 𝑝𝑡 𝑡=1 𝑇 . In this case, an alternative way to estimate 𝑉 𝜋 is to use the sample analogue of the expression (1) without estimating the propensity score. If we know 𝑝0(⋅), a possible estimator is 𝑉 𝜋 = 1 𝑇 𝑡=1 𝑇 𝑎=0 𝑚 𝑌𝑡 𝐷𝑡𝑎 𝜋 𝑎 𝑋𝑡 𝑝0𝑎(𝑋𝑡) . If we observe the realization of 𝑝𝑡 𝑡=1 𝑇 , we may use 𝑉 𝜋 = 1 𝑇 𝑡=1 𝑇 𝑎=0 𝑚 𝑌𝑡 𝐷𝑡𝑎 𝜋 𝑎 𝑋𝑡 𝑝𝑡𝑎(𝑋𝑡) . 20
  • 21. When 𝑎=0 𝑚 𝜋 𝑎|𝑥 = 1 for all 𝑥 ∈ 𝒳, it is again possible to use their self- normalized versions: 𝑉𝑆𝑁 𝜋 = 1 𝑇 𝑡=1 𝑇 𝑎=0 𝑚 𝑌𝑡 𝐷𝑡𝑎 𝜋 𝑎 𝑋𝑡 𝑝0𝑎 𝑋𝑡 1 𝑇 𝑡=1 𝑇 𝑎=0 𝑚 𝐷𝑡𝑎 𝜋 𝑎 𝑋𝑡 𝑝0𝑎 𝑋𝑡 . 𝑉𝑆𝑁 𝜋 = 1 𝑇 𝑡=1 𝑇 𝑎=0 𝑚 𝑌𝑡 𝐷𝑡𝑎 𝜋 𝑎 𝑋𝑡 𝑝𝑡𝑎 𝑋𝑡 1 𝑇 𝑡=1 𝑇 𝑎=0 𝑚 𝐷𝑡𝑎 𝜋 𝑎 𝑋𝑡 𝑝𝑡𝑎 𝑋𝑡 21
  • 22. Theoretical Results These intuitive estimators turn out to be less efficient than the estimators with the estimated propensity score, as the following result shows. 22 Suppose that the propensity sore 𝒑 𝟎(⋅) is known and we observe the realization of 𝑝𝑡 𝑡=1 𝑇 . Suppose also that Assumptions 1 and 2 hold and that 𝑝 ⋅ is a consistent estimator for 𝑝0(⋅). Then, the asymptotic variances of 𝑉 𝜋, 𝑉 𝜋, 𝑉𝑆𝑁 𝜋 and 𝑉𝑆𝑁 𝜋 are no smaller than that of 𝑉 𝜋 and 𝑉𝑆𝑁 𝜋 . Generically, 𝑉 𝜋, 𝑉 𝜋, 𝑉𝑆𝑁 𝜋 and 𝑉𝑆𝑁 𝜋 are strictly less efficient than 𝑉 𝜋 and 𝑉𝑆𝑁 𝜋 in the following sense. Theorem 2 (Inefficient Estimators).
  • 23. Theoretical Results 23 1. Suppose at least one of the following holds: a. Pr 𝔼 𝑌 𝑎 𝑋 𝜋 𝑎 𝑋 𝑝0𝑎 𝑋 ≠ 𝜃 𝑋 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑎 > 0, or b. Pr 𝜃 𝑋 ≠ 𝑉 𝜋 > 0. Then the asymptotic variances of 𝑉 𝜋 , 𝑉 𝜋 , 𝑉𝑆𝑁 𝜋 and 𝑉𝑆𝑁 𝜋 are strictly larger than that of 𝑉 𝜋 and 𝑉𝑆𝑁 𝜋 . 2. If Pr 𝔼 𝑌 𝑎 2 𝑋 𝜋 𝑎 𝑋 2 𝔼 1 𝑝 𝑎 𝑋 − 1 𝑝0𝑎 𝑋 = 0 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑎 > 0, then the asymptotic variance of 𝑉 𝜋 and 𝑉𝑆𝑁 𝜋 is strictly larger than that of 𝑉 𝜋 and 𝑉𝑆𝑁 𝜋 . Theorem 2 (Inefficient Estimators). Theorems 1 and 2 suggest that we should estimate the propensity score and use the estimated score regardless of whether the propensity score is known. 𝜃 𝑋 = 𝑎=0 𝑚 𝔼 𝑌 𝑎 |𝑋 𝜋(𝑎|𝑋)
  • 24. Intuition for Theorem 1 and 2 24
  • 25. How to Estimate Propensity Scores? 1. A sieve Least Squares (LS) estimator: 𝑝 𝑎 ⋅ = arg min 𝑝 𝑎 ⋅ ∈ℋ 𝑎𝑇 1 𝑇 𝑡=1 𝑇 𝐷𝑡𝑎 − 𝑝 𝑎 𝑋𝑡 2 , where ℋ𝑎𝑇 = 𝑝 𝑎 𝑥 = 𝑗=1 𝑘 𝑎𝑇 𝑞 𝑎𝑗 𝑥 𝜆 𝑎𝑗 = 𝑞 𝑘 𝑎𝑇 𝑥 ′ 𝜆 𝑎 and 𝑘 𝑎𝑇 → ∞ as 𝑇 → ∞. Here 𝑞 𝑎𝑗 𝑗=1 ∞ is some known basis functions defined on 𝒳 and 𝑞 𝑘 𝑎𝑇 ⋅ = (𝑞 𝑎1 ⋅ , … , 𝑞 𝑎𝑘 𝑎𝑇 ⋅ )′. • What is “sieve”? 25
  • 26. How to Estimate Propensity Scores? 2. A sieve Logit Maximum Likelihood estimator: 𝑝 ⋅ = arg max 𝑝 ⋅ ∈ℋ 𝑇 log 𝑡=1 𝑇 𝑎=0 𝑚 𝑝 𝑎 𝑋𝑡 𝐷 𝑎𝑡 = arg max 𝑝 ⋅ ∈ℋ 𝑇 1 𝑇 𝑡=1 𝑇 𝑎=0 𝑚 𝐷 𝑎𝑡 log 𝑝 𝑎(𝑋𝑡) , where 3. Modern Machine Learning Algorithms. 26
  • 27. Estimating Asymptotic Variance We can also calculate its asymptotic variance by using an estimator of 𝑉 𝜋. However, we have to estimate 𝜇0(𝑎|𝑥) (individual effect) additionally. In the experiments, the output is click rate, so they use the ridge regression to estimate 𝜇0 𝑎 𝑥 . 27