SlideShare a Scribd company logo
1 of 38
Download to read offline
Melding the Data-Decision
Pipeline: Decision-Focused
Learning for Combinatorial
Bryan Wilder, Bistra Dilkina and Milind Tambe
University of Southern California
AAAI 2019
• Introduce a general framework for decision-focused learning, where
the machine learning model is directly trained in conjunction with the
optimization algorithm.
• Instantiate the framework for two broad classes of combinatorial
problems: linear programming and submodular maximization.
• Experiments show that proposed method outperforms the traditional
method in terms of solution quality.
• Machine learning: use data to predict unknown quantities with the
help of loss function.
• Optimization algorithm: use predictions to arrive at decision which
maximizes some objective.
• Separating two pieces entirely to train the model may result in bad
• Focus on combinatorial optimization, propose decision-focused
learning framework which integrates prediction and optimization
Matrix Calculus
• scalar by vector • vector by scalar • vector by vector
Implicit differentiation
• Example:
• We want to find the slope of the tangent line to the circle at the point (3, −4).
• One way to derive
• 𝑦 = − 25 − 𝑥2 ((3, −4) locates at the bottom semi-circle)
• ⇒ 𝑦′
= −
25 − 𝑥2 −
2 −2𝑥 =
• 𝑚 = 𝑦′ =
Implicit differentiation (cont’d)
• However, not every function can be explicitly written as function of
another variable.
• In implicit differentiation, we differentiate each side of an equation with
two variables by treating one of the variables as a function of the other.
• Using the implicit differentiation, we treat 𝑦 as an implicit function of 𝑥
• 𝑥2 + 𝑦2 = 25
• ⇒ 2𝑥 + 2𝑦
= 0
• ⇒ 𝑦′
• 𝑚 = 𝑦′ =
Lagrange Multiplier
• Consider the optimization problem
max f x, y
subject to g x, y = 0
• Observe the graph, find that
where 𝛻𝑥,𝑦 𝑓 𝑥, 𝑦 =
𝜕𝑓 𝑥,𝑦
𝜕𝑓 𝑥,𝑦
• Let ℒ 𝑥, 𝑦, 𝜆 = 𝑓 𝑥, 𝑦 + 𝜆 𝑔(𝑥, 𝑦)
• Solve 𝛻𝑥,𝑦,𝜆ℒ 𝑥, 𝑦, 𝜆 = 𝟎 is equivalently to solve equation (1)
blue: contours of f(x, y) and 𝑑1> 𝑑2 > 𝑑3
red: constraint 𝑔 𝑥, 𝑦 = 𝑐
Lagrange Multiplier (cont’d)
• Generalize to 𝒏 variables
• 𝒙 = 𝑥1, 𝑥2, ⋯ , 𝑥 𝑛
• Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆ℒ 𝑥1, 𝑥2, … , 𝑥 𝑛, 𝜆 = 𝟎
• Generalize to 𝑴 constraints
• ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝑓 𝑥1, … , 𝑥 𝑛 + σ 𝑘=1
𝜆 𝑘 𝑔 𝑘(𝑥1, … , 𝑥 𝑛)
• Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆1,…,𝜆 𝑀
ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝟎
KKT condition
• Consider the optimization problem
max f 𝐱
subject to
gi 𝐱 ≤ 0 for i = 1, … , 𝑚,
ℎ𝑗 𝒙 = 0 for j = 1, … , 𝑙.
• If 𝒙∗ is a local optima, then exist 𝜇𝑖 (𝑖 = 1, … , 𝑚) and 𝜆𝑗 (𝑗 = 1, … , 𝑙) such that
• Stationarity
𝛻𝑓 𝒙∗ = ෍
𝜇𝑖 𝛻𝑔𝑖 𝒙∗ + ෍
𝜆𝑗 𝛻ℎ𝑗(𝒙∗)
• Primal feasibility
𝑔𝑖 𝒙∗ ≤ 0, for i = 1, … , 𝑚
ℎ𝑗 𝒙∗
= 0, for j = 1, … , 𝑙
• Dual feasibility
𝜇𝑖 ≥ 0, for i = 1, … , 𝑚
• Complementary slackness
𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚
Linear programming relaxation
• Example:
• In 0-1 integer program, all variables are
• 𝑥𝑖 ∈ {0,1}
• After the relaxation,
• 𝑥𝑖 ∈ [0,1]
• The relaxation transforms an NP-hard
optimization problem into a problem
that can solve in polynomial time.
Problem description
• Consider combinatorial optimization problem
𝑓(𝑥, 𝜃)
where 𝒳 is a discrete set containing all feasible set.
• Without loss of generality, 𝒳 ⊆ 0,1 𝑛, and 𝑥 is a binary vector or
decision vector.
• The objective 𝑓 depends on 𝜃 ∈ Θ. Consider 𝜃 is unknown and must
be inferred from data.
• Observe a feature vector 𝑦 ∈ 𝒴 which is correlated with 𝜃.
• Let 𝑚: 𝒴 ↦ Θ denote a model mapping observed feature to
Problem description (cont’d)
• Use the training data 𝑦1, 𝜃1 , … , (𝑦 𝑁, 𝜃 𝑁) drawn from 𝑃 to find the
model 𝑚 (supervised manner).
• Define 𝑥∗ 𝜃 = arg max
𝑓(𝑥, 𝜃) to be the optimal 𝑥 for a given 𝜃.
• Objective:
max 𝔼 𝑦,𝜃~𝑃[𝑓(𝑥∗ 𝑚 𝑦 , 𝜃)]
• Example:
• 𝑦: user ratings of the movie
• 𝜃: movie-actor assignments
• Predict which actors are associated with each movie.
• Classical solution (two stage method)
1. Learn a model 𝑚 using loss function.
• min
𝔼 𝑦,𝜃~𝑃[ℒ(𝜃, 𝑚(𝑦, 𝜔))]
2. Use the learned model to solve the optimization problem.
• Possible cons:
• Loss function does not consider how 𝜔 will affect the decision making.
• Is it possible to do better?
General framework
• 𝑥∗ 𝜃 = arg max
𝑓(𝑥, 𝜃)
• 𝑥∗ is a decision from a binary set, which renders output non-
differentiable with respect to 𝜔.
• Consider continuous relaxation of original problem,
𝑥 𝜃 = arg max
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝑓(𝑥, 𝜃)
where 𝑐𝑜𝑛𝑣 denotes the convex hull.
• Obtain a gradient by sampling a single (𝑦, 𝜃) from training data,
𝑑𝑓(𝑥 ෡𝜃 ,𝜃)
𝑑𝑓(𝑥 ෡𝜃 ,𝜃)
𝑑𝑥 ෡𝜃
𝑑𝑥 ෡𝜃
where መ𝜃 = 𝑚(𝑦, 𝜔)
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝑓(𝑥( መ𝜃), 𝜃) = max
𝑥∈𝑐𝑜𝑛𝑣 𝒳
𝑓(𝑥(𝑚 𝑦, 𝜔 ), 𝜃)
General framework (cont’d)
𝑑𝑥 ෡𝜃
measures how the optimal decision changes with respect to መ𝜃.
• For continuous problems, the optimal continuous decision must
satisfy KKT condition.
• Constraints are convex hull, which can be represented as {𝑥: 𝐴𝑥 ≤ 𝑏}.
• Let (𝑥, 𝜆) be pair of primal and dual variables, then differentiating the
conditions yields that
• Recall stationarity: 𝛻𝑓 𝒙∗
= σ𝑖=1
𝜇𝑖 𝛻𝑔𝑖 𝒙∗
+ σ 𝑗=1
𝜆𝑗 𝛻ℎ𝑗(𝒙∗
Derivation-Stationarity (cont’d)
• By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
Derivation-Complementary slackness
• Recall complementary slackness: 𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚
Derivation-Complementary slackness (cont’d)
Derivation-Complementary slackness (cont’d)
• By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
By solve this linear system, we can obtain desired
Linear programming
• Consider a linear program with equality and inequality constraints
max 𝜃 𝑇
𝑥 s. t. Ax = b, Gx ≤ ℎ
• Problem: 𝛻𝑥
𝑓 𝑥, 𝜃 is always zero, left hand side matrix becomes
• Resolve the regularized problem instead
max 𝜃 𝑇 𝑥 − 𝛾 𝑥 2
s. t. 𝐴𝑥 = 𝑏, 𝐺𝑥 ≤ ℎ
• Transform LP into quadratic program(QP).
• All other terms can be derived from (𝑥, 𝜆) which is output from QP
Submodular maximization
• Consider problem to maximize a set function 𝑓: 2 𝑉 ↦ 𝑅 where 𝑉 is a
ground set of items.
• A set function is submodular if satisfies one of equivalent condition.
• For every A, 𝐵 ⊆ V with 𝐴 ⊆ 𝐵 and any 𝑣 ∈ 𝑉B, we have
𝑓 𝐴 ∪ 𝑣 − 𝑓 𝐴 ≥ 𝑓 𝐵 ∪ 𝑣 − 𝑓(𝐵).
• For every A, 𝐵 ⊆ V, we have 𝑓 𝐴 + 𝑓 𝐵 ≥ 𝑓 𝐴 ∪ 𝐵 + 𝑓(𝐴 ∩ 𝐵).
• Focus on the cardinality-constrained optimization max
𝑆 ≤𝑘
Submodular maximization (cont’d)
• View a set function as defined on the domain 0,1 𝑉
(indicator view)
• Multilinear extension 𝐹 defined on 0,1 𝑉
(probability view).
𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍
𝑓 𝑆 ෑ
𝑥𝑖 ෑ
(1 − 𝑥𝑖)
where 𝑥𝑖 denotes the probability of item 𝑖 included in 𝑆 independently.
• Instead of solving max
𝑆 ≤𝑘
𝑓(𝑆), we can solve
𝑥∈𝑐𝑜𝑛𝑣 𝒳
where 𝒳 = {𝑥 ∈ 0,1 𝑉
: σ𝑖 𝑥𝑖 ≤ 𝑘}
• Multilinear extension has closed form of coverage functions.
• A set of items 𝑈, and for each item 𝑗 ∈ 𝑈 has a weight 𝑤𝑗.
• Choose from a set of actions 𝑉, and each action 𝑎𝑖 covers each item
independently with probability 𝜃𝑖𝑗.
𝐹 𝑥, 𝜃 = ෍
𝑤𝑗(1 − ෑ
(1 − 𝑥𝑖𝑗 𝜃𝑖𝑗))
𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍
𝑓 𝑆 ෑ
𝑥𝑖 ෑ
(1 − 𝑥𝑖)
𝐹 𝑥, 𝜃 = ෍
𝑤𝑗(1 − ෑ
1 − 𝑥𝑖𝑗 𝜃𝑖𝑗)
• For linear programming:
• Bipartite matching
• Feature vector: whether each word appeared in the paper.
• Objective: Reconstruct the citation network
• For submodular maximization:
• Budget allocation
• Model an advertiser’s choice of how to divide a finite budget 𝑘 between a set of channels.
• Feature vector: ground truth 𝜃 passed to DNN
• Objective: expected number of customers reached
• Diverse recommendation
• Feature vector: user rating of movie
• Objective: predict which actors are associated with each movie.
Solution quality
• Quality: the objective value of its decision evaluated using the true 𝜃
NN2: two layer NN
RF: random forest
MSE: mean squared error
CE: cross entropy
• Focus on combinatorial optimization and introduce a general
framework for decision-focused learning.
• Instantiate the framework for linear programming and submodular
• Experiments show that proposed method leads to better solution
quality although may loss some accuracy.

More Related Content

What's hot

Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Parinda Rajapaksha
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9M S Prasad
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learningAlexander Novikov
Support vector machine
Support vector machineSupport vector machine
Support vector machinePrasenjit Dey
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
Lecture 6 radial basis-function_network
Lecture 6 radial basis-function_networkLecture 6 radial basis-function_network
Lecture 6 radial basis-function_networkParveenMalik18
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowTatsuya Shirakawa
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributionsWooSung Choi
Lecture 2 fuzzy inference system
Lecture 2  fuzzy inference systemLecture 2  fuzzy inference system
Lecture 2 fuzzy inference systemParveenMalik18
Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Kentaro Minami
Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification Sathishkumar Samiappan
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networksParveenMalik18
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Atsushi Nitanda
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksSang Jun Lee

What's hot (20)

Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learning
Support vector machine
Support vector machineSupport vector machine
Support vector machine
Svm algorithm
Svm algorithmSvm algorithm
Svm algorithm
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Lecture 6 radial basis-function_network
Lecture 6 radial basis-function_networkLecture 6 radial basis-function_network
Lecture 6 radial basis-function_network
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive Flow
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions
Lecture 2 fuzzy inference system
Lecture 2  fuzzy inference systemLecture 2  fuzzy inference system
Lecture 2 fuzzy inference system
Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]
Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural Networks

Similar to Paper Study: Melding the data decision pipeline

A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics JCMwave
Interval programming
Interval programming Interval programming
Interval programming Zahra Sadeghi
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdfanandsimple
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxSeungeon Baek
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsJCMwave
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationSantiagoGarridoBulln
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...AmirParnianifard1
AAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptxAAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptxHarshitSingh334328

Similar to Paper Study: Melding the data decision pipeline (20)

A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
Optmization techniques
Optmization techniquesOptmization techniques
Optmization techniques
Interval programming
Interval programming Interval programming
Interval programming
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
Linear programing
Linear programing Linear programing
Linear programing
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
Optimization Using Evolutionary Computing Techniques
Optimization Using Evolutionary Computing Techniques Optimization Using Evolutionary Computing Techniques
Optimization Using Evolutionary Computing Techniques
Graphical method
Graphical methodGraphical method
Graphical method
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
AAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptxAAC ch 3 Advance strategies (Dynamic Programming).pptx
AAC ch 3 Advance strategies (Dynamic Programming).pptx

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Paper Study: Melding the data decision pipeline

  • 1. Melding the Data-Decision Pipeline: Decision-Focused Learning for Combinatorial Optimization Bryan Wilder, Bistra Dilkina and Milind Tambe University of Southern California AAAI 2019
  • 2. Abstract • Introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm. • Instantiate the framework for two broad classes of combinatorial problems: linear programming and submodular maximization. • Experiments show that proposed method outperforms the traditional method in terms of solution quality.
  • 3. Introduction • Machine learning: use data to predict unknown quantities with the help of loss function. • Optimization algorithm: use predictions to arrive at decision which maximizes some objective. • Separating two pieces entirely to train the model may result in bad decision. • Focus on combinatorial optimization, propose decision-focused learning framework which integrates prediction and optimization algorithm.
  • 5. Matrix Calculus • scalar by vector • vector by scalar • vector by vector
  • 6. Implicit differentiation • Example: • We want to find the slope of the tangent line to the circle at the point (3, −4). • One way to derive • 𝑦 = − 25 − 𝑥2 ((3, −4) locates at the bottom semi-circle) • ⇒ 𝑦′ = − 1 2 25 − 𝑥2 − 1 2 −2𝑥 = 𝑥 √(25−𝑥2) • 𝑚 = 𝑦′ = 3 √(25−32) = 3 4 Source:
  • 7. Implicit differentiation (cont’d) • However, not every function can be explicitly written as function of another variable. • In implicit differentiation, we differentiate each side of an equation with two variables by treating one of the variables as a function of the other. • Using the implicit differentiation, we treat 𝑦 as an implicit function of 𝑥 • 𝑥2 + 𝑦2 = 25 • ⇒ 2𝑥 + 2𝑦 𝑑𝑦 𝑑𝑥 = 0 • ⇒ 𝑦′ = 𝑑𝑦 𝑑𝑥 = −2𝑥 2𝑦 = −𝑥 𝑦 • 𝑚 = 𝑦′ = −𝑥 𝑦 = −3 −4 = 3 4 Source:
  • 8. Lagrange Multiplier • Consider the optimization problem max f x, y subject to g x, y = 0 • Observe the graph, find that where 𝛻𝑥,𝑦 𝑓 𝑥, 𝑦 = 𝜕𝑓 𝑥,𝑦 𝜕𝑥 , 𝜕𝑓 𝑥,𝑦 𝜕𝑦 𝑇 • Let ℒ 𝑥, 𝑦, 𝜆 = 𝑓 𝑥, 𝑦 + 𝜆 𝑔(𝑥, 𝑦) • Solve 𝛻𝑥,𝑦,𝜆ℒ 𝑥, 𝑦, 𝜆 = 𝟎 is equivalently to solve equation (1) (1) blue: contours of f(x, y) and 𝑑1> 𝑑2 > 𝑑3 red: constraint 𝑔 𝑥, 𝑦 = 𝑐 Source:
  • 9. Lagrange Multiplier (cont’d) • Generalize to 𝒏 variables • 𝒙 = 𝑥1, 𝑥2, ⋯ , 𝑥 𝑛 𝑻 • Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆ℒ 𝑥1, 𝑥2, … , 𝑥 𝑛, 𝜆 = 𝟎 • Generalize to 𝑴 constraints • ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝑓 𝑥1, … , 𝑥 𝑛 + σ 𝑘=1 𝑀 𝜆 𝑘 𝑔 𝑘(𝑥1, … , 𝑥 𝑛) • Solve 𝛻𝑥1,𝑥2,…,𝑥 𝑛,𝜆1,…,𝜆 𝑀 ℒ 𝑥1, … , 𝑥 𝑛, 𝜆1, … , 𝜆 𝑀 = 𝟎
  • 10. KKT condition • Consider the optimization problem max f 𝐱 subject to gi 𝐱 ≤ 0 for i = 1, … , 𝑚, ℎ𝑗 𝒙 = 0 for j = 1, … , 𝑙. • If 𝒙∗ is a local optima, then exist 𝜇𝑖 (𝑖 = 1, … , 𝑚) and 𝜆𝑗 (𝑗 = 1, … , 𝑙) such that • Stationarity 𝛻𝑓 𝒙∗ = ෍ 𝑖=1 𝑚 𝜇𝑖 𝛻𝑔𝑖 𝒙∗ + ෍ 𝑗=1 𝑙 𝜆𝑗 𝛻ℎ𝑗(𝒙∗) • Primal feasibility 𝑔𝑖 𝒙∗ ≤ 0, for i = 1, … , 𝑚 ℎ𝑗 𝒙∗ = 0, for j = 1, … , 𝑙 • Dual feasibility 𝜇𝑖 ≥ 0, for i = 1, … , 𝑚 • Complementary slackness 𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚 Source: Source:
  • 11. Linear programming relaxation • Example: • In 0-1 integer program, all variables are • 𝑥𝑖 ∈ {0,1} • After the relaxation, • 𝑥𝑖 ∈ [0,1] • The relaxation transforms an NP-hard optimization problem into a problem that can solve in polynomial time. Source: Source:
  • 13. Problem description • Consider combinatorial optimization problem max 𝑥∈𝒳 𝑓(𝑥, 𝜃) where 𝒳 is a discrete set containing all feasible set. • Without loss of generality, 𝒳 ⊆ 0,1 𝑛, and 𝑥 is a binary vector or decision vector. • The objective 𝑓 depends on 𝜃 ∈ Θ. Consider 𝜃 is unknown and must be inferred from data. • Observe a feature vector 𝑦 ∈ 𝒴 which is correlated with 𝜃. • Let 𝑚: 𝒴 ↦ Θ denote a model mapping observed feature to parameters.
  • 14. Problem description (cont’d) • Use the training data 𝑦1, 𝜃1 , … , (𝑦 𝑁, 𝜃 𝑁) drawn from 𝑃 to find the model 𝑚 (supervised manner). • Define 𝑥∗ 𝜃 = arg max 𝑥∈𝒳 𝑓(𝑥, 𝜃) to be the optimal 𝑥 for a given 𝜃. • Objective: max 𝔼 𝑦,𝜃~𝑃[𝑓(𝑥∗ 𝑚 𝑦 , 𝜃)] • Example: • 𝑦: user ratings of the movie • 𝜃: movie-actor assignments • Predict which actors are associated with each movie.
  • 15. • Classical solution (two stage method) 1. Learn a model 𝑚 using loss function. • min 𝜔 𝔼 𝑦,𝜃~𝑃[ℒ(𝜃, 𝑚(𝑦, 𝜔))] 2. Use the learned model to solve the optimization problem. • Possible cons: • Loss function does not consider how 𝜔 will affect the decision making. • Is it possible to do better?
  • 16. General framework • 𝑥∗ 𝜃 = arg max 𝑥∈𝒳 𝑓(𝑥, 𝜃) • 𝑥∗ is a decision from a binary set, which renders output non- differentiable with respect to 𝜔. • Consider continuous relaxation of original problem, 𝑥 𝜃 = arg max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝑓(𝑥, 𝜃) where 𝑐𝑜𝑛𝑣 denotes the convex hull. • Obtain a gradient by sampling a single (𝑦, 𝜃) from training data, 𝑑𝑓(𝑥 ෡𝜃 ,𝜃) 𝑑𝜔 = 𝑑𝑓(𝑥 ෡𝜃 ,𝜃) 𝑑𝑥 ෡𝜃 𝑑𝑥 ෡𝜃 𝑑෡𝜃 𝑑෡𝜃 𝑑𝜔 where መ𝜃 = 𝑚(𝑦, 𝜔) max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝑓(𝑥( መ𝜃), 𝜃) = max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝑓(𝑥(𝑚 𝑦, 𝜔 ), 𝜃)
  • 17. General framework (cont’d) • 𝑑𝑥 ෡𝜃 𝑑෡𝜃 measures how the optimal decision changes with respect to መ𝜃. • For continuous problems, the optimal continuous decision must satisfy KKT condition. • Constraints are convex hull, which can be represented as {𝑥: 𝐴𝑥 ≤ 𝑏}. • Let (𝑥, 𝜆) be pair of primal and dual variables, then differentiating the conditions yields that
  • 18. • Recall stationarity: 𝛻𝑓 𝒙∗ = σ𝑖=1 𝑚 𝜇𝑖 𝛻𝑔𝑖 𝒙∗ + σ 𝑗=1 𝑙 𝜆𝑗 𝛻ℎ𝑗(𝒙∗ ) Derivation-Stationarity
  • 19. Derivation-Stationarity (cont’d) • By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
  • 20. Derivation-Complementary slackness • Recall complementary slackness: 𝜇𝑖 𝑔𝑖 𝐱∗ = 0, for i = 1, … , 𝑚 Define
  • 22. Derivation-Complementary slackness (cont’d) • By implicit differentiation (seen 𝜆 as an implicit function of 𝑥),
  • 23. By solve this linear system, we can obtain desired 𝑑𝑥 𝑑𝜃
  • 24. Linear programming • Consider a linear program with equality and inequality constraints max 𝜃 𝑇 𝑥 s. t. Ax = b, Gx ≤ ℎ • Problem: 𝛻𝑥 2 𝑓 𝑥, 𝜃 is always zero, left hand side matrix becomes singular. • Resolve the regularized problem instead max 𝜃 𝑇 𝑥 − 𝛾 𝑥 2 2 s. t. 𝐴𝑥 = 𝑏, 𝐺𝑥 ≤ ℎ • Transform LP into quadratic program(QP).
  • 25.
  • 26.
  • 27.
  • 28. • All other terms can be derived from (𝑥, 𝜆) which is output from QP solvers
  • 29. Submodular maximization • Consider problem to maximize a set function 𝑓: 2 𝑉 ↦ 𝑅 where 𝑉 is a ground set of items. • A set function is submodular if satisfies one of equivalent condition. • For every A, 𝐵 ⊆ V with 𝐴 ⊆ 𝐵 and any 𝑣 ∈ 𝑉B, we have 𝑓 𝐴 ∪ 𝑣 − 𝑓 𝐴 ≥ 𝑓 𝐵 ∪ 𝑣 − 𝑓(𝐵). • For every A, 𝐵 ⊆ V, we have 𝑓 𝐴 + 𝑓 𝐵 ≥ 𝑓 𝐴 ∪ 𝐵 + 𝑓(𝐴 ∩ 𝐵). • Focus on the cardinality-constrained optimization max 𝑆 ≤𝑘 𝑓(𝑆).
  • 30. Submodular maximization (cont’d) • View a set function as defined on the domain 0,1 𝑉 (indicator view) • Multilinear extension 𝐹 defined on 0,1 𝑉 (probability view). 𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍ 𝑆⊆𝑉 𝑓 𝑆 ෑ 𝑖∈𝑆 𝑥𝑖 ෑ 𝑖∉𝑆 (1 − 𝑥𝑖) where 𝑥𝑖 denotes the probability of item 𝑖 included in 𝑆 independently. • Instead of solving max 𝑆 ≤𝑘 𝑓(𝑆), we can solve max 𝑥∈𝑐𝑜𝑛𝑣 𝒳 𝐹(𝑥) where 𝒳 = {𝑥 ∈ 0,1 𝑉 : σ𝑖 𝑥𝑖 ≤ 𝑘}
  • 31. • Multilinear extension has closed form of coverage functions. • A set of items 𝑈, and for each item 𝑗 ∈ 𝑈 has a weight 𝑤𝑗. • Choose from a set of actions 𝑉, and each action 𝑎𝑖 covers each item independently with probability 𝜃𝑖𝑗. 𝐹 𝑥, 𝜃 = ෍ 𝑗∈U 𝑤𝑗(1 − ෑ 𝑖∈𝑉 (1 − 𝑥𝑖𝑗 𝜃𝑖𝑗)) 𝐹 𝑥 = 𝔼 𝑓 𝑆 = ෍ 𝑆⊆𝑉 𝑓 𝑆 ෑ 𝑖∈𝑆 𝑥𝑖 ෑ 𝑖∉𝑆 (1 − 𝑥𝑖)
  • 32. 𝐹 𝑥, 𝜃 = ෍ 𝑗∈U 𝑤𝑗(1 − ෑ 𝑖∈𝑉 1 − 𝑥𝑖𝑗 𝜃𝑖𝑗)
  • 33.
  • 34.
  • 35. Experiments • For linear programming: • Bipartite matching • Feature vector: whether each word appeared in the paper. • Objective: Reconstruct the citation network • For submodular maximization: • Budget allocation • Model an advertiser’s choice of how to divide a finite budget 𝑘 between a set of channels. • Feature vector: ground truth 𝜃 passed to DNN • Objective: expected number of customers reached • Diverse recommendation • Feature vector: user rating of movie • Objective: predict which actors are associated with each movie.
  • 36. Solution quality • Quality: the objective value of its decision evaluated using the true 𝜃 NN2: two layer NN RF: random forest
  • 37. Accuracy MSE: mean squared error CE: cross entropy
  • 38. Conclusion • Focus on combinatorial optimization and introduce a general framework for decision-focused learning. • Instantiate the framework for linear programming and submodular maximization. • Experiments show that proposed method leads to better solution quality although may loss some accuracy.