SlideShare a Scribd company logo
1 of 38
Download to read offline
Melding the Data-Decision
Pipeline: Decision-Focused
Learning for Combinatorial
Optimization
Bryan Wilder, Bistra Dilkina and Milind Tambe
University of Southern California
AAAI 2019
Abstract
• Introduce a general framework for decision-focused learning, where
the machine learning model is directly trained in conjunction with the
optimization algorithm.
• Instantiate the framework for two broad classes of combinatorial
problems: linear programming and submodular maximization.
• Experiments show that proposed method outperforms the traditional
method in terms of solution quality.
Introduction
• Machine learning: use data to predict unknown quantities with the
help of loss function.
• Optimization algorithm: use predictions to arrive at decision which
maximizes some objective.
• Separating two pieces entirely to train the model may result in bad
decision.
• Focus on combinatorial optimization, propose decision-focused
learning framework which integrates prediction and optimization
algorithm.
Background
Matrix Calculus
• scalar by vector • vector by scalar • vector by vector
Implicit differentiation
• Example:
• We want to find the slope of the tangent line to the circle at the point (3, −4).
• One way to derive
• 𝑦 = − 25 − 𝑥2 ((3, −4) locates at the bottom semi-circle)
• ⇒ 𝑦′
= −
1
2
25 − 𝑥2 −
1
2 −2𝑥 =
𝑥
√(25−𝑥2)
• 𝑚 = 𝑦′ =
3
√(25−32)
=
3
4
Source: https://www.math.ucdavis.edu/~kouba/CalcOneDIRECTORY/implicitdiffdirectory/ImplicitDiff.html
Implicit differentiation (cont’d)
• However, not every function can be explicitly written as function of
another variable.
• In implicit differentiation, we differentiate each side of an equation with
two variables by treating one of the variables as a function of the other.
• Using the implicit differentiation, we treat 𝑦 as an implicit function of 𝑥
• 𝑥2 + 𝑦2 = 25
• ⇒ 2𝑥 + 2𝑦
𝑑𝑦
𝑑𝑥
= 0
• ⇒ 𝑦′
=
𝑑𝑦
𝑑𝑥
=
−2𝑥
2𝑦
=
−𝑥
𝑦
• 𝑚 = 𝑦′ =
−𝑥
𝑦
=
−3
−4
=
3
4
Source: https://www.khanacademy.org/math/ap-calculus-ab/ab-differentiation-2-new/ab-3-2/a/implicit-differentiation-review
Lagrange Multiplier
• Consider the optimization problem
max f x, y
subject to g x, y = 0
• Observe the graph, find that
where 𝛻𝑥,𝑦 𝑓 𝑥, 𝑦 =
𝜕𝑓 𝑥,𝑦
𝜕𝑥
,
𝜕𝑓 𝑥,𝑦
𝜕𝑦
𝑇
• Let ℒ 𝑥, 𝑦, 𝜆 = 𝑓 𝑥, 𝑦 + 𝜆 𝑔(𝑥, 𝑦)
• Solve 𝛻𝑥,𝑦,𝜆ℒ 𝑥, 𝑦, 𝜆 = 𝟎 is equivalently to solve equation (1)
(1)
blue: contours of f(x, y) and 𝑑1> 𝑑2 > 𝑑3
red: constraint 𝑔 𝑥, 𝑦 = 𝑐
Source: https://en.m.wikipedia.org/wiki/Lagrange_multiplier
Lagrange Multiplier (cont’d)
• Generalize to 𝒏 variables
• 𝒙 = 𝑥1, 𝑥2, ⋯ , 𝑥 𝑛
𝑻
• Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆ℒ 𝑥1, 𝑥2, … , 𝑥 𝑛, 𝜆 = 𝟎
• Generalize to 𝑴 constraints
• ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝑓 𝑥1, … , 𝑥 𝑛 + σ 𝑘=1
𝑀
𝜆 𝑘 𝑔 𝑘(𝑥1, … , 𝑥 𝑛)
• Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆1,…,𝜆 𝑀
ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝟎
KKT condition
• Consider the optimization problem
max f 𝐱
subject to
gi 𝐱 ≤ 0 for i = 1, … , 𝑚,
ℎ𝑗 𝒙 = 0 for j = 1, … , 𝑙.
• If 𝒙∗ is a local optima, then exist 𝜇𝑖 (𝑖 = 1, … , 𝑚) and 𝜆𝑗 (𝑗 = 1, … , 𝑙) such that
• Stationarity
𝛻𝑓 𝒙∗ = ෍
𝑖=1
𝑚
𝜇𝑖 𝛻𝑔𝑖 𝒙∗ + ෍
𝑗=1
𝑙
𝜆𝑗 𝛻ℎ𝑗(𝒙∗)
• Primal feasibility
𝑔𝑖 𝒙∗ ≤ 0, for i = 1, … , 𝑚
ℎ𝑗 𝒙∗
= 0, for j = 1, … , 𝑙
• Dual feasibility
𝜇𝑖 ≥ 0, for i = 1, … , 𝑚
• Complementary slackness
𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚
Source: https://en.m.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions
Source: https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf
Linear programming relaxation
• Example:
• In 0-1 integer program, all variables are
• 𝑥𝑖 ∈ {0,1}
• After the relaxation,
• 𝑥𝑖 ∈ [0,1]
• The relaxation transforms an NP-hard
optimization problem into a problem
that can solve in polynomial time.
Source: https://en.wikipedia.org/wiki/Linear_programming_relaxation
Source: https://en.wikipedia.org/wiki/Convex_hull
Method
Problem description
• Consider combinatorial optimization problem
max
𝑥∈𝒳
𝑓(𝑥, 𝜃)
where 𝒳 is a discrete set containing all feasible set.
• Without loss of generality, 𝒳 ⊆ 0,1 𝑛, and 𝑥 is a binary vector or
decision vector.
• The objective 𝑓 depends on 𝜃 ∈ Θ. Consider 𝜃 is unknown and must
be inferred from data.
• Observe a feature vector 𝑦 ∈ 𝒴 which is correlated with 𝜃.
• Let 𝑚: 𝒴 ↦ Θ denote a model mapping observed feature to
parameters.
Problem description (cont’d)
• Use the training data 𝑦1, 𝜃1 , … , (𝑦 𝑁, 𝜃 𝑁) drawn from 𝑃 to find the
model 𝑚 (supervised manner).
• Define 𝑥∗ 𝜃 = arg max
𝑥∈𝒳
𝑓(𝑥, 𝜃) to be the optimal 𝑥 for a given 𝜃.
• Objective:
max 𝔼 𝑦,𝜃~𝑃[𝑓(𝑥∗ 𝑚 𝑦 , 𝜃)]
• Example:
• 𝑦: user ratings of the movie
• 𝜃: movie-actor assignments
• Predict which actors are associated with each movie.
• Classical solution (two stage method)
1. Learn a model 𝑚 using loss function.
• min
𝜔
𝔼 𝑦,𝜃~𝑃[ℒ(𝜃, 𝑚(𝑦, 𝜔))]
2. Use the learned model to solve the optimization problem.
• Possible cons:
• Loss function does not consider how 𝜔 will affect the decision making.
• Is it possible to do better?
General framework
• 𝑥∗ 𝜃 = arg max
𝑥∈𝒳
𝑓(𝑥, 𝜃)
• 𝑥∗ is a decision from a binary set, which renders output non-
differentiable with respect to 𝜔.
• Consider continuous relaxation of original problem,
𝑥 𝜃 = arg max
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝑓(𝑥, 𝜃)
where 𝑐𝑜𝑛𝑣 denotes the convex hull.
• Obtain a gradient by sampling a single (𝑦, 𝜃) from training data,
𝑑𝑓(𝑥 ෡𝜃 ,𝜃)
𝑑𝜔
=
𝑑𝑓(𝑥 ෡𝜃 ,𝜃)
𝑑𝑥 ෡𝜃
𝑑𝑥 ෡𝜃
𝑑෡𝜃
𝑑෡𝜃
𝑑𝜔
where መ𝜃 = 𝑚(𝑦, 𝜔)
max
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝑓(𝑥( መ𝜃), 𝜃) = max
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝑓(𝑥(𝑚 𝑦, 𝜔 ), 𝜃)
General framework (cont’d)
•
𝑑𝑥 ෡𝜃
𝑑෡𝜃
measures how the optimal decision changes with respect to መ𝜃.
• For continuous problems, the optimal continuous decision must
satisfy KKT condition.
• Constraints are convex hull, which can be represented as {𝑥: 𝐴𝑥 ≤ 𝑏}.
• Let (𝑥, 𝜆) be pair of primal and dual variables, then differentiating the
conditions yields that
• Recall stationarity: 𝛻𝑓 𝒙∗
= σ𝑖=1
𝑚
𝜇𝑖 𝛻𝑔𝑖 𝒙∗
+ σ 𝑗=1
𝑙
𝜆𝑗 𝛻ℎ𝑗(𝒙∗
)
Derivation-Stationarity
Derivation-Stationarity (cont’d)
• By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
Derivation-Complementary slackness
• Recall complementary slackness: 𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚
Define
Derivation-Complementary slackness (cont’d)
Derivation-Complementary slackness (cont’d)
• By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
By solve this linear system, we can obtain desired
𝑑𝑥
𝑑𝜃
Linear programming
• Consider a linear program with equality and inequality constraints
max 𝜃 𝑇
𝑥 s. t. Ax = b, Gx ≤ ℎ
• Problem: 𝛻𝑥
2
𝑓 𝑥, 𝜃 is always zero, left hand side matrix becomes
singular.
• Resolve the regularized problem instead
max 𝜃 𝑇 𝑥 − 𝛾 𝑥 2
2
s. t. 𝐴𝑥 = 𝑏, 𝐺𝑥 ≤ ℎ
• Transform LP into quadratic program(QP).
• All other terms can be derived from (𝑥, 𝜆) which is output from QP
solvers
Submodular maximization
• Consider problem to maximize a set function 𝑓: 2 𝑉 ↦ 𝑅 where 𝑉 is a
ground set of items.
• A set function is submodular if satisfies one of equivalent condition.
• For every A, 𝐵 ⊆ V with 𝐴 ⊆ 𝐵 and any 𝑣 ∈ 𝑉B, we have
𝑓 𝐴 ∪ 𝑣 − 𝑓 𝐴 ≥ 𝑓 𝐵 ∪ 𝑣 − 𝑓(𝐵).
• For every A, 𝐵 ⊆ V, we have 𝑓 𝐴 + 𝑓 𝐵 ≥ 𝑓 𝐴 ∪ 𝐵 + 𝑓(𝐴 ∩ 𝐵).
• Focus on the cardinality-constrained optimization max
𝑆 ≤𝑘
𝑓(𝑆).
Submodular maximization (cont’d)
• View a set function as defined on the domain 0,1 𝑉
(indicator view)
• Multilinear extension 𝐹 defined on 0,1 𝑉
(probability view).
𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍
𝑆⊆𝑉
𝑓 𝑆 ෑ
𝑖∈𝑆
𝑥𝑖 ෑ
𝑖∉𝑆
(1 − 𝑥𝑖)
where 𝑥𝑖 denotes the probability of item 𝑖 included in 𝑆 independently.
• Instead of solving max
𝑆 ≤𝑘
𝑓(𝑆), we can solve
max
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝐹(𝑥)
where 𝒳 = {𝑥 ∈ 0,1 𝑉
: σ𝑖 𝑥𝑖 ≤ 𝑘}
• Multilinear extension has closed form of coverage functions.
• A set of items 𝑈, and for each item 𝑗 ∈ 𝑈 has a weight 𝑤𝑗.
• Choose from a set of actions 𝑉, and each action 𝑎𝑖 covers each item
independently with probability 𝜃𝑖𝑗.
𝐹 𝑥, 𝜃 = ෍
𝑗∈U
𝑤𝑗(1 − ෑ
𝑖∈𝑉
(1 − 𝑥𝑖𝑗 𝜃𝑖𝑗))
𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍
𝑆⊆𝑉
𝑓 𝑆 ෑ
𝑖∈𝑆
𝑥𝑖 ෑ
𝑖∉𝑆
(1 − 𝑥𝑖)
𝐹 𝑥, 𝜃 = ෍
𝑗∈U
𝑤𝑗(1 − ෑ
𝑖∈𝑉
1 − 𝑥𝑖𝑗 𝜃𝑖𝑗)
Experiments
• For linear programming:
• Bipartite matching
• Feature vector: whether each word appeared in the paper.
• Objective: Reconstruct the citation network
• For submodular maximization:
• Budget allocation
• Model an advertiser’s choice of how to divide a finite budget 𝑘 between a set of channels.
• Feature vector: ground truth 𝜃 passed to DNN
• Objective: expected number of customers reached
• Diverse recommendation
• Feature vector: user rating of movie
• Objective: predict which actors are associated with each movie.
Solution quality
• Quality: the objective value of its decision evaluated using the true 𝜃
NN2: two layer NN
RF: random forest
Accuracy
MSE: mean squared error
CE: cross entropy
Conclusion
• Focus on combinatorial optimization and introduce a general
framework for decision-focused learning.
• Instantiate the framework for linear programming and submodular
maximization.
• Experiments show that proposed method leads to better solution
quality although may loss some accuracy.

More Related Content

What's hot

Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Parinda Rajapaksha
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9M S Prasad
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learningAlexander Novikov
 
Support vector machine
Support vector machineSupport vector machine
Support vector machinePrasenjit Dey
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
Lecture 6 radial basis-function_network
Lecture 6 radial basis-function_networkLecture 6 radial basis-function_network
Lecture 6 radial basis-function_networkParveenMalik18
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowTatsuya Shirakawa
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributionsWooSung Choi
 
Lecture 2 fuzzy inference system
Lecture 2  fuzzy inference systemLecture 2  fuzzy inference system
Lecture 2 fuzzy inference systemParveenMalik18
 
Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Kentaro Minami
 
Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification Sathishkumar Samiappan
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networksParveenMalik18
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Atsushi Nitanda
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksSang Jun Lee
 

What's hot (20)

Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learning
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Svm algorithm
Svm algorithmSvm algorithm
Svm algorithm
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Lecture 6 radial basis-function_network
Lecture 6 radial basis-function_networkLecture 6 radial basis-function_network
Lecture 6 radial basis-function_network
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive Flow
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions
 
Lecture 2 fuzzy inference system
Lecture 2  fuzzy inference systemLecture 2  fuzzy inference system
Lecture 2 fuzzy inference system
 
Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]
 
Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural Networks
 

Similar to Paper Study: Melding the data decision pipeline

A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics JCMwave
 
4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdf4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdfBechanYadav4
 
Interval programming
Interval programming Interval programming
Interval programming Zahra Sadeghi
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdfanandsimple
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxSeungeon Baek
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsJCMwave
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationSantiagoGarridoBulln
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
OptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdfOptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdfSantiagoGarridoBulln
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...AmirParnianifard1
 
AAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptxAAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptxHarshitSingh334328
 

Similar to Paper Study: Melding the data decision pipeline (20)

A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdf4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdf
 
Optmization techniques
Optmization techniquesOptmization techniques
Optmization techniques
 
optmizationtechniques.pdf
optmizationtechniques.pdfoptmizationtechniques.pdf
optmizationtechniques.pdf
 
Interval programming
Interval programming Interval programming
Interval programming
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Linear programing
Linear programing Linear programing
Linear programing
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Optimization Using Evolutionary Computing Techniques
Optimization Using Evolutionary Computing Techniques Optimization Using Evolutionary Computing Techniques
Optimization Using Evolutionary Computing Techniques
 
Graphical method
Graphical methodGraphical method
Graphical method
 
OptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdfOptimumEngineeringDesign-Day-1.pdf
OptimumEngineeringDesign-Day-1.pdf
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
AAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptxAAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptx
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Paper Study: Melding the data decision pipeline

  • 1. Melding the Data-Decision Pipeline: Decision-Focused Learning for Combinatorial Optimization Bryan Wilder, Bistra Dilkina and Milind Tambe University of Southern California AAAI 2019
  • 2. Abstract • Introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm. • Instantiate the framework for two broad classes of combinatorial problems: linear programming and submodular maximization. • Experiments show that proposed method outperforms the traditional method in terms of solution quality.
  • 3. Introduction • Machine learning: use data to predict unknown quantities with the help of loss function. • Optimization algorithm: use predictions to arrive at decision which maximizes some objective. • Separating two pieces entirely to train the model may result in bad decision. • Focus on combinatorial optimization, propose decision-focused learning framework which integrates prediction and optimization algorithm.
  • 5. Matrix Calculus • scalar by vector • vector by scalar • vector by vector
  • 6. Implicit differentiation • Example: • We want to find the slope of the tangent line to the circle at the point (3, −4). • One way to derive • 𝑦 = − 25 − 𝑥2 ((3, −4) locates at the bottom semi-circle) • ⇒ 𝑦′ = − 1 2 25 − 𝑥2 − 1 2 −2𝑥 = 𝑥 √(25−𝑥2) • 𝑚 = 𝑦′ = 3 √(25−32) = 3 4 Source: https://www.math.ucdavis.edu/~kouba/CalcOneDIRECTORY/implicitdiffdirectory/ImplicitDiff.html
  • 7. Implicit differentiation (cont’d) • However, not every function can be explicitly written as function of another variable. • In implicit differentiation, we differentiate each side of an equation with two variables by treating one of the variables as a function of the other. • Using the implicit differentiation, we treat 𝑦 as an implicit function of 𝑥 • 𝑥2 + 𝑦2 = 25 • ⇒ 2𝑥 + 2𝑦 𝑑𝑦 𝑑𝑥 = 0 • ⇒ 𝑦′ = 𝑑𝑦 𝑑𝑥 = −2𝑥 2𝑦 = −𝑥 𝑦 • 𝑚 = 𝑦′ = −𝑥 𝑦 = −3 −4 = 3 4 Source: https://www.khanacademy.org/math/ap-calculus-ab/ab-differentiation-2-new/ab-3-2/a/implicit-differentiation-review
  • 8. Lagrange Multiplier • Consider the optimization problem max f x, y subject to g x, y = 0 • Observe the graph, find that where 𝛻𝑥,𝑦 𝑓 𝑥, 𝑦 = 𝜕𝑓 𝑥,𝑦 𝜕𝑥 , 𝜕𝑓 𝑥,𝑦 𝜕𝑦 𝑇 • Let ℒ 𝑥, 𝑦, 𝜆 = 𝑓 𝑥, 𝑦 + 𝜆 𝑔(𝑥, 𝑦) • Solve 𝛻𝑥,𝑦,𝜆ℒ 𝑥, 𝑦, 𝜆 = 𝟎 is equivalently to solve equation (1) (1) blue: contours of f(x, y) and 𝑑1> 𝑑2 > 𝑑3 red: constraint 𝑔 𝑥, 𝑦 = 𝑐 Source: https://en.m.wikipedia.org/wiki/Lagrange_multiplier
  • 9. Lagrange Multiplier (cont’d) • Generalize to 𝒏 variables • 𝒙 = 𝑥1, 𝑥2, ⋯ , 𝑥 𝑛 𝑻 • Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆ℒ 𝑥1, 𝑥2, … , 𝑥 𝑛, 𝜆 = 𝟎 • Generalize to 𝑴 constraints • ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝑓 𝑥1, … , 𝑥 𝑛 + σ 𝑘=1 𝑀 𝜆 𝑘 𝑔 𝑘(𝑥1, … , 𝑥 𝑛) • Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆1,…,𝜆 𝑀 ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝟎
  • 10. KKT condition • Consider the optimization problem max f 𝐱 subject to gi 𝐱 ≤ 0 for i = 1, … , 𝑚, ℎ𝑗 𝒙 = 0 for j = 1, … , 𝑙. • If 𝒙∗ is a local optima, then exist 𝜇𝑖 (𝑖 = 1, … , 𝑚) and 𝜆𝑗 (𝑗 = 1, … , 𝑙) such that • Stationarity 𝛻𝑓 𝒙∗ = ෍ 𝑖=1 𝑚 𝜇𝑖 𝛻𝑔𝑖 𝒙∗ + ෍ 𝑗=1 𝑙 𝜆𝑗 𝛻ℎ𝑗(𝒙∗) • Primal feasibility 𝑔𝑖 𝒙∗ ≤ 0, for i = 1, … , 𝑚 ℎ𝑗 𝒙∗ = 0, for j = 1, … , 𝑙 • Dual feasibility 𝜇𝑖 ≥ 0, for i = 1, … , 𝑚 • Complementary slackness 𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚 Source: https://en.m.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions Source: https://www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf
  • 11. Linear programming relaxation • Example: • In 0-1 integer program, all variables are • 𝑥𝑖 ∈ {0,1} • After the relaxation, • 𝑥𝑖 ∈ [0,1] • The relaxation transforms an NP-hard optimization problem into a problem that can solve in polynomial time. Source: https://en.wikipedia.org/wiki/Linear_programming_relaxation Source: https://en.wikipedia.org/wiki/Convex_hull
  • 13. Problem description • Consider combinatorial optimization problem max 𝑥∈𝒳 𝑓(𝑥, 𝜃) where 𝒳 is a discrete set containing all feasible set. • Without loss of generality, 𝒳 ⊆ 0,1 𝑛, and 𝑥 is a binary vector or decision vector. • The objective 𝑓 depends on 𝜃 ∈ Θ. Consider 𝜃 is unknown and must be inferred from data. • Observe a feature vector 𝑦 ∈ 𝒴 which is correlated with 𝜃. • Let 𝑚: 𝒴 ↦ Θ denote a model mapping observed feature to parameters.
  • 14. Problem description (cont’d) • Use the training data 𝑦1, 𝜃1 , … , (𝑦 𝑁, 𝜃 𝑁) drawn from 𝑃 to find the model 𝑚 (supervised manner). • Define 𝑥∗ 𝜃 = arg max 𝑥∈𝒳 𝑓(𝑥, 𝜃) to be the optimal 𝑥 for a given 𝜃. • Objective: max 𝔼 𝑦,𝜃~𝑃[𝑓(𝑥∗ 𝑚 𝑦 , 𝜃)] • Example: • 𝑦: user ratings of the movie • 𝜃: movie-actor assignments • Predict which actors are associated with each movie.
  • 15. • Classical solution (two stage method) 1. Learn a model 𝑚 using loss function. • min 𝜔 𝔼 𝑦,𝜃~𝑃[ℒ(𝜃, 𝑚(𝑦, 𝜔))] 2. Use the learned model to solve the optimization problem. • Possible cons: • Loss function does not consider how 𝜔 will affect the decision making. • Is it possible to do better?
  • 16. General framework • 𝑥∗ 𝜃 = arg max 𝑥∈𝒳 𝑓(𝑥, 𝜃) • 𝑥∗ is a decision from a binary set, which renders output non- differentiable with respect to 𝜔. • Consider continuous relaxation of original problem, 𝑥 𝜃 = arg max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝑓(𝑥, 𝜃) where 𝑐𝑜𝑛𝑣 denotes the convex hull. • Obtain a gradient by sampling a single (𝑦, 𝜃) from training data, 𝑑𝑓(𝑥 ෡𝜃 ,𝜃) 𝑑𝜔 = 𝑑𝑓(𝑥 ෡𝜃 ,𝜃) 𝑑𝑥 ෡𝜃 𝑑𝑥 ෡𝜃 𝑑෡𝜃 𝑑෡𝜃 𝑑𝜔 where መ𝜃 = 𝑚(𝑦, 𝜔) max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝑓(𝑥( መ𝜃), 𝜃) = max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝑓(𝑥(𝑚 𝑦, 𝜔 ), 𝜃)
  • 17. General framework (cont’d) • 𝑑𝑥 ෡𝜃 𝑑෡𝜃 measures how the optimal decision changes with respect to መ𝜃. • For continuous problems, the optimal continuous decision must satisfy KKT condition. • Constraints are convex hull, which can be represented as {𝑥: 𝐴𝑥 ≤ 𝑏}. • Let (𝑥, 𝜆) be pair of primal and dual variables, then differentiating the conditions yields that
  • 18. • Recall stationarity: 𝛻𝑓 𝒙∗ = σ𝑖=1 𝑚 𝜇𝑖 𝛻𝑔𝑖 𝒙∗ + σ 𝑗=1 𝑙 𝜆𝑗 𝛻ℎ𝑗(𝒙∗ ) Derivation-Stationarity
  • 19. Derivation-Stationarity (cont’d) • By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
  • 20. Derivation-Complementary slackness • Recall complementary slackness: 𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚 Define
  • 22. Derivation-Complementary slackness (cont’d) • By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
  • 23. By solve this linear system, we can obtain desired 𝑑𝑥 𝑑𝜃
  • 24. Linear programming • Consider a linear program with equality and inequality constraints max 𝜃 𝑇 𝑥 s. t. Ax = b, Gx ≤ ℎ • Problem: 𝛻𝑥 2 𝑓 𝑥, 𝜃 is always zero, left hand side matrix becomes singular. • Resolve the regularized problem instead max 𝜃 𝑇 𝑥 − 𝛾 𝑥 2 2 s. t. 𝐴𝑥 = 𝑏, 𝐺𝑥 ≤ ℎ • Transform LP into quadratic program(QP).
  • 25.
  • 26.
  • 27.
  • 28. • All other terms can be derived from (𝑥, 𝜆) which is output from QP solvers
  • 29. Submodular maximization • Consider problem to maximize a set function 𝑓: 2 𝑉 ↦ 𝑅 where 𝑉 is a ground set of items. • A set function is submodular if satisfies one of equivalent condition. • For every A, 𝐵 ⊆ V with 𝐴 ⊆ 𝐵 and any 𝑣 ∈ 𝑉B, we have 𝑓 𝐴 ∪ 𝑣 − 𝑓 𝐴 ≥ 𝑓 𝐵 ∪ 𝑣 − 𝑓(𝐵). • For every A, 𝐵 ⊆ V, we have 𝑓 𝐴 + 𝑓 𝐵 ≥ 𝑓 𝐴 ∪ 𝐵 + 𝑓(𝐴 ∩ 𝐵). • Focus on the cardinality-constrained optimization max 𝑆 ≤𝑘 𝑓(𝑆).
  • 30. Submodular maximization (cont’d) • View a set function as defined on the domain 0,1 𝑉 (indicator view) • Multilinear extension 𝐹 defined on 0,1 𝑉 (probability view). 𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍ 𝑆⊆𝑉 𝑓 𝑆 ෑ 𝑖∈𝑆 𝑥𝑖 ෑ 𝑖∉𝑆 (1 − 𝑥𝑖) where 𝑥𝑖 denotes the probability of item 𝑖 included in 𝑆 independently. • Instead of solving max 𝑆 ≤𝑘 𝑓(𝑆), we can solve max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝐹(𝑥) where 𝒳 = {𝑥 ∈ 0,1 𝑉 : σ𝑖 𝑥𝑖 ≤ 𝑘}
  • 31. • Multilinear extension has closed form of coverage functions. • A set of items 𝑈, and for each item 𝑗 ∈ 𝑈 has a weight 𝑤𝑗. • Choose from a set of actions 𝑉, and each action 𝑎𝑖 covers each item independently with probability 𝜃𝑖𝑗. 𝐹 𝑥, 𝜃 = ෍ 𝑗∈U 𝑤𝑗(1 − ෑ 𝑖∈𝑉 (1 − 𝑥𝑖𝑗 𝜃𝑖𝑗)) 𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍ 𝑆⊆𝑉 𝑓 𝑆 ෑ 𝑖∈𝑆 𝑥𝑖 ෑ 𝑖∉𝑆 (1 − 𝑥𝑖)
  • 32. 𝐹 𝑥, 𝜃 = ෍ 𝑗∈U 𝑤𝑗(1 − ෑ 𝑖∈𝑉 1 − 𝑥𝑖𝑗 𝜃𝑖𝑗)
  • 33.
  • 34.
  • 35. Experiments • For linear programming: • Bipartite matching • Feature vector: whether each word appeared in the paper. • Objective: Reconstruct the citation network • For submodular maximization: • Budget allocation • Model an advertiser’s choice of how to divide a finite budget 𝑘 between a set of channels. • Feature vector: ground truth 𝜃 passed to DNN • Objective: expected number of customers reached • Diverse recommendation • Feature vector: user rating of movie • Objective: predict which actors are associated with each movie.
  • 36. Solution quality • Quality: the objective value of its decision evaluated using the true 𝜃 NN2: two layer NN RF: random forest
  • 37. Accuracy MSE: mean squared error CE: cross entropy
  • 38. Conclusion • Focus on combinatorial optimization and introduce a general framework for decision-focused learning. • Instantiate the framework for linear programming and submodular maximization. • Experiments show that proposed method leads to better solution quality although may loss some accuracy.