SlideShare a Scribd company logo
1 of 34
© Mitsubishi Electric Corporation
High-Dimensional Bayesian Optimization with Constraints:
Application to Powder Weighing
Shoki Miyagawa, Atsuyoshi Yano, Naoko Sawada, Isamu Ogawa
(Mitsubishi Electric Corporation)
© Mitsubishi Electric Corporation
Background
Bayesian optimization (BO) can explore optimal parameters in black-box problems in limited trials.
2/32
Black-box
model
𝑥new
Bayesian
optimization
Input
parameters
Output
(to be maximized)
𝑦new
𝑦 𝑦
𝒙𝐧𝐞𝐰 𝑥
𝑥
𝑦
𝒙𝐧𝐞𝐰 𝑥
𝒚𝐧𝐞𝐰
© Mitsubishi Electric Corporation
Background
Bayesian optimization (BO) can explore optimal parameters in black-box problems in limited trials.
However, it cannot work for high-dimensional parameters (typically > 10) because the exploration
area is too wide.
3/32
Black-box
model
𝑥new
Bayesian
optimization
Input
parameters
Output
(to be maximized)
𝑦new
𝑦 𝑦
𝒙𝐧𝐞𝐰 𝑥
𝑥
𝑦
𝒙𝐧𝐞𝐰 𝑥
𝒚𝐧𝐞𝐰
© Mitsubishi Electric Corporation
Background
Related works explore parameters in a low-dimensional space acquired by the following methods.
4/32
REMBO [IJCAI’13]
LINEBO [ICML’19]
Efficient exploration
Constraints cannot be explicitly
expressed in the latent space.
Constraints can be easily introduced
Not efficient exploration
(particularly for image and NLP)
Dropout [IJCAI’17]
Original
Dropped
Subspace extraction / Linear embedding
𝒙𝐧𝐞𝐰
𝒙𝐢𝐧𝐢𝐭
BO in non-dropped
dimensions.
BO in
a single dimension.
BO in random
embedded dimensions.
Nonlinear embedding
2. BO in the latent space
𝑦
𝒛𝐧𝐞𝐰 𝒛
Encoder
(DNN)
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒛𝟏
𝒛𝟐
𝒛𝟑
Decoder
(DNN)
𝒛𝐧𝐞𝐰
𝒙𝐧𝐞𝐰
1. Encode datasets 3. Decode latent parameters
We tackle this problem !!
𝒙 𝒛
Random
matrix
𝑦
𝒛𝐧𝐞𝐰
𝒛
𝒙
© Mitsubishi Electric Corporation
Proposed method: Key idea
This study focus on two types of constraint.
• Known equality constraint
→ Decomposition into variable and fixed
parameters is useful.
• Unknown inequality constraint
→ Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful.
5/32
Nonlinear embedding
2. Bayesian optimization
on the latent space
𝑦
𝒛𝐧𝐞𝐰
𝒛
Encoder
(DNN)
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒛𝟏
𝒛𝟐
𝒛𝟑
Decoder
(DNN)
𝒛𝐧𝐞𝐰
𝒙𝐧𝐞𝐰
1. Encode datasets 3. Decode datasets
Disentangled representation learning
• Latent parameter is interpretable and
independent
𝒛
Decoder
(DNN)
“rotation”
“smile”
The Figure is from 𝛽-VAE [ICLR’17]
• DRL is generally used to control
generative models (VAE, GAN, …)
Variable parameters w/o
equality constraints
Fixed parameters w/
equality constraints
Parameters
Bayesian
optimization
Explored
parameters
condition
𝒙𝐯
𝒙𝐟 𝒙𝐟
𝒙𝐟
𝒙
𝒙𝐯
𝐧𝐞𝐰
𝒙𝐯
𝐧𝐞𝐰
© Mitsubishi Electric Corporation
Proposed method: Key idea
This study focus on two types of constraint.
• Known equality constraint
→ Decomposition into variable and fixed
parameters is useful.
• Unknown inequality constraint
→ Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful.
6/32
Nonlinear embedding
2. Bayesian optimization
on the latent space
𝑦
𝒛𝐧𝐞𝐰
𝒛
Encoder
(DNN)
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒛𝟏
𝒛𝟐
𝒛𝟑
Decoder
(DNN)
𝒛𝐧𝐞𝐰
𝒙𝐧𝐞𝐰
1. Encode datasets 3. Decode datasets
Disentangled representation learning
• Latent parameter is interpretable and
independent
𝒛
Decoder
(DNN)
“rotation”
“smile”
The Figure is from 𝛽-VAE [ICLR’17]
• DRL is generally used to control
generative models (VAE, GAN, …)
Variable parameters w/o
equality constraints
Fixed parameters w/
equality constraints
Parameters
Bayesian
optimization
Explored
parameters
condition
𝒙𝐯
𝒙𝐟 𝒙𝐟
𝒙𝐟
𝒙
𝒙𝐯
𝐧𝐞𝐰
𝒙𝐯
𝐧𝐞𝐰
© Mitsubishi Electric Corporation
Proposed method: Key idea
This study focus on two types of constraint.
• Known equality constraint
→ Decomposition into variable and fixed
parameters is useful.
• Unknown inequality constraint
→ Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful.
7/32
Nonlinear embedding
2. Bayesian optimization
on the latent space
𝑦
𝒛𝐧𝐞𝐰
𝒛
Encoder
(DNN)
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒛𝟏
𝒛𝟐
𝒛𝟑
Decoder
(DNN)
𝒛𝐧𝐞𝐰
𝒙𝐧𝐞𝐰
1. Encode datasets 3. Decode datasets
Disentangled representation learning
• Latent parameter is interpretable and
independent
𝒛
Decoder
(DNN)
“rotation”
“smile”
The Figure is from 𝛽-VAE [ICLR’17]
• DRL is generally used to control
generative models (VAE, GAN, …)
Variable parameters w/o
equality constraints
Fixed parameters w/
equality constraints
Parameters
Bayesian
optimization
Explored
parameters
condition
𝒙𝐯
𝒙𝐟 𝒙𝐟
𝒙𝐟
𝒙
𝒙𝐯
𝐧𝐞𝐰
𝒙𝐯
𝐧𝐞𝐰
© Mitsubishi Electric Corporation
• Unknown inequality constraint
Proposed method: Key idea
8/32
Problem in the previous methods
axis #2
axis #1
Nonlinear
embedding
Latent parameter space
Original parameter space
axis #2
axis #1
We can just apply BO in a region
that satisfy constraints if the
inequality constraints are known.
Region that
satisfy constraints
Region that does not
satisfy constraints
BO possibly generate parameters that
does not satisfy constraints.
© Mitsubishi Electric Corporation
Proposed method: Key idea
• Unknown inequality constraint
→ Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful
because users need only check whether the constraints are satisfied for data in each axis.
9/32
Example 1: Generating face with a constraint of man face.
Decoder This region locally
satisfy the constraint !
This axis is not
related to the
constraint
(to be a man face)
axis #1
axis #1
axis #2
axis #2
rotation
© Mitsubishi Electric Corporation
Proposed method: Key idea
• Unknown inequality constraint
→ Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful
because users need only check whether the constraints are satisfied for data in each axis.
10/32
axis #1
axis #1
Decoder
This axis is not
related to the
constraint
(to be a man face)
axis #2
axis #2
Example 1: Generating face with a constraint of man face.
smiling
© Mitsubishi Electric Corporation
Proposed method: Key idea
• Unknown inequality constraint
→ Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful
because users need only check whether the constraints are satisfied for data in each axis.
11/32
Mixed features also
satisfy constraints
axis #1
axis #1
axis #2
axis #2
Example 1: Generating face with a constraint of man face.
© Mitsubishi Electric Corporation
Proposed method: Key idea
• Unknown inequality constraint
→ Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful
because users need only check whether the constraints are satisfied for data in each axis.
12/32
Decoder
Exploration area is
restricted to axis #1.
This axis is related
to the constraint
(to be smiling face)
Example 1: Generating face with a constraint of similing face.
This region possibly does not
satisfy the constraint.
axis #1
axis #1
axis #2
axis #2
smiling
!
© Mitsubishi Electric Corporation
Proposed method: Key idea
• Unknown inequality constraint
→ Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful
because users need only check whether the constraints are satisfied for data in each axis.
13/32
axis #1
axis #2
axis #1
axis #2
Exploration area in
the example 1
Exploration area in
the example 2
We can control the exploration are in
the latent space even if the inequality
constraints are unknown.
© Mitsubishi Electric Corporation
Proposed method: Overview
14/32
© Mitsubishi Electric Corporation
Step1: Dimensionality reduction
For variable parameters, we used 𝛽-VAE to introduce DRL into VAE and
acquired the latent space 𝑧v ∈ ℝ𝒅𝐯. (For fixed parameters, we used PCA for simplicity.)
15/32
Hyperparameters (the dimensionality of the latent space 𝒅𝐯 and the coefficient 𝜷)
control a tradeoff between two losses.
Reconstruction loss KL-divergence loss
large 𝛽
×
BO generates rough-grained features.
(-> hard to optimize parameters)
〇
Features are more disentangled.
small 𝛽
〇
BO generates fine-grained features
×
Features are less disentangled.
(-> hard to consider constraints)
Reconstruction loss + 𝛽 ∗ KL-divergence loss
𝛽-VAE loss =
𝑧
𝑧 loss
𝑧
𝑥 𝑥′
loss
𝒩(0, 1)
encoder decoder
© Mitsubishi Electric Corporation
Step2: Bayesian optimization
We used Gaussian process regression (GPR) and maximized the
UCB (upper confidence bound) acquisition function 𝑎(𝑧v, 𝑧f = 𝑧f
target
).
16/32
𝑦
𝑦
𝑧v, 𝑧f
𝑎UCB(𝑧v, 𝑧f)
𝑎UCB 𝑧v, 𝑧f = 𝜇 𝑧v, 𝑧f + 𝛼 ∗ 𝜎 𝑧v, 𝑧f
𝜇 𝑧v, 𝑧f
𝜎 𝑧v, 𝑧f
Variance
Acquisition function
Mean
We generated three parameters 𝒛𝐯
𝐧𝐞𝐰
and let user to select one of them.
- exploitation-oriented (𝛼 = 0.001)
- intermediate (𝛼 = 0.5)
- exploration-oriented (𝛼 = 1.0)
𝒛𝐯
𝐧𝐞𝐰
, 𝒛𝐟
𝐭𝐚𝐫𝐠𝐞𝐭
𝑧v, 𝑧f 𝑧v, 𝑧f
Gaussian process regression
© Mitsubishi Electric Corporation
Usage Scenario: Powder weighing system
System overview
The system needs to precisely weigh a powder by changing a valve opening degree 𝑣𝑖 → 𝑣𝑖+1
if the scale value reached a corresponding switching weight 𝑠𝑖+1 (0 ≤ 𝑖 ≤ 8).
17/32
Valve opening degree
9 steps
Switching
weight
𝑣0
𝑣1
𝑠1 𝑠9
𝑣9
(start)
(end)
© Mitsubishi Electric Corporation
Usage Scenario: Powder weighing system
Two types of inequality constraints
• Non-negative constraints : 𝑣𝑖 > 0, 𝑠𝑖 > 0
• Monotonically decreasing constraints : 𝑣𝑖 > 𝑣𝑖+1, 𝑠𝑖 < 𝑠𝑖+1
18/32
Valve opening degree
9 steps
Switching
weight
𝑣0
𝑣1
𝑠1 𝑠9
𝑣9
(start)
(end)
© Mitsubishi Electric Corporation
Usage Scenario: Powder weighing system
Preprocessing
19/32
• Normalization
• Outlier removal
• Duplication removal to prevent
imbalanced learning
• Train/Test split
• Normalization
• Outlier removal
• Data filtering to restrict the
exploration area locally
• Train/Test split
© Mitsubishi Electric Corporation
Usage Scenario: Powder weighing system
Datasets
20/32
contained 60 types of powder and consisted of 1,792 trials (the average is 31.33±19.48).
Parameters 𝒙𝐟 w/ equality constraints
(used for learning PCA and GPR)
Parameters 𝒙𝐯 w/o equality constraints
(used for learning 𝛽-VAE and GPR)
ൠ
Objective value 𝑦 representing an error
between the measured and required weight.
(used for learning GPR)
ൠ
© Mitsubishi Electric Corporation
Experiments overview
Experiment 1-1, 1-2
Experiment 2
21/32
We verify the effect of hyperparameters in 𝛽-VAE learning on considering inequality constraints.
We verify whether the proposed method could determine optimum parameters within
a reasonable number of trials.
the weighing error 𝑦 is less than 1% of the required weight.
(manual tuning needs typically about 20 trials in practice.)
Hyperparameters
?
𝑑v and 𝛽
Considering
inequality constraints
Considering
inequality constraints ? The number of
required trials
© Mitsubishi Electric Corporation
Experiments overview
Experiment 1-1, 1-2
Experiment 2
22/32
We verify the effect of hyperparameters in 𝛽-VAE learning on considering inequality constraints.
We verify whether the proposed method could determine optimum parameters within
a reasonable number of trials.
the weighing error 𝑦 is less than 1% of the required weight.
(manual tuning needs typically about 20 trials in practice.)
Hyperparameters
?
𝑑v and 𝛽
Considering
inequality constraints
Considering
inequality constraints ? The number of
required trials
© Mitsubishi Electric Corporation
Experiment 1-1: Evaluation on hyperparameter effects quantitatively.
23/32
ቊ
𝑑v ∈ 2, 4, 6, 8, 10
𝛽 ∈ {0.1, 0.2, … , 1.5}
Hyperparameters
value
selection
𝛽-VAE learning
evaluation
The number of the unsuitable
data which does not satisfy
constraints
Procedure (× 75 times for all hyperparameter combinations)
Sampled randomly in the
latent space (𝑛 = 1000)
𝑧
𝑥
Decoder
(DNN)
Satisfy constraints?
Suitable
Unsuitable
Sampled randomly in the
original space (𝑛 = 1000)
ℝ𝒅𝐯
© Mitsubishi Electric Corporation
Experiment 1-1: Evaluation on hyperparameter effects quantitatively.
Result
24/32
Findings
• Larger 𝛽 decreases the unsuitable data.
-> We guess that DRL enable us to consider the inequality constraints.
• Larger 𝑑v in the latent space increases the unsuitable data.
-> We guess that samples far from the origin of the latent space tend to be the
unsuitable data because fine-grained features are emphasized in the area
far from the origin.
The number
of unsuitable
data
Regions where undesirable
features are emphasized
𝑧
Exploration area
© Mitsubishi Electric Corporation
Experiment 1-2: Evaluation on hyperparameter effects qualitatively.
25/32
ቊ
𝑑v = 2
𝛽 ∈ {0.1, 0.5, 1.0}
Hyperparameters
value
selection
𝛽-VAE learning
Visualization
The meaning of disentangled
features
Procedure (× 3 times for all hyperparameter combinations)
Sampled at equal intervals
along the axes in the latent
space (𝑛 = 15)
𝑧 𝑥
Decoder
(DNN)
Check whether the
disentangled features
satisfy constraints
Sampled at equal intervals
along the axes in the original
space (𝑛 = 15)
© Mitsubishi Electric Corporation
Experiment 1-2: Evaluation on hyperparameter effects qualitatively.
Result
26/32
𝑧
Sufficient consideration of constraints, poor diversity
Lack of consideration of constraints, rich diversity
Initial point changes
Initial point and the
gradient change
Valve opening degree Valve opening degree Valve opening degree
Switching
weight
Switching
weight
© Mitsubishi Electric Corporation
Experiment 1-2: Evaluation on hyperparameter effects qualitatively.
Result
27/32
𝑧
Sufficient consideration of constraints, poor diversity
Lack of consideration of constraints, rich diversity
Initial point changes
We used this setting
in the next experiment 2.
Initial point and the
gradient change
© Mitsubishi Electric Corporation
Experiment 1
Discussion
• Can DRL consider inequality constraints ?
➢ YES.
• How should we set the hyperparameter values ?
➢ To determine the hyperparameter values,
the visualization of the effect quantitatively
and qualitatively is helpful.
➢ We recommend to determine the value of 𝑑v first
because the suitable value of 𝛽 depends on 𝑑v value.
28
Acceptable
parameters
area
𝑑v
smaller
𝑑v
larger
smaller 𝛽
larger 𝛽
Reconstruction
loss is too high
(-> parameters
have poor diversity)
Lack of
consideration
of constraints
© Mitsubishi Electric Corporation
Experiments overview
Experiment 1-1, 1-2
Experiment 2
29/32
We verify the effect of hyperparameters in 𝛽-VAE learning on considering inequality constraints.
We verify whether the proposed method could determine optimum parameters within
a reasonable number of trials.
the weighing error 𝑦 is less than 1% of the required weight.
(manual tuning needs typically about 20 trials in practice.)
Hyperparameters
?
𝑑v and 𝛽
Considering
inequality constraints
Considering
inequality constraints ? The number of
required trials
© Mitsubishi Electric Corporation
Experiment 2
Procedure
Result
30/32
• We used three types of powder A, B, and C not included in the dataset.
• we can see that powders A, B, and C are not outliers.
• From the result of the experiment 1, we set 𝑑v = 2 , 𝛽 = 0.1
which leads rich diversity and low reconstruction error.
The proposed method contributes to reducing the number
of trials (from 20 to around 5) compared to the baseline.
Baseline (manual tuning)
Features of the generated parameters
• For powders B and C, we successfully satisfy the
constraints in all trials.
• For powder A, we generated the unsuitable data in one
trial because the proposed method seems to have
explored areas far from the origin of the latent space
(from the observation in the experiment 1).
PCA visualization of fixed parameters
© Mitsubishi Electric Corporation
Limitations
• The relationship between hyperparameters and the number of required trial is still unclear.
• The exploration area needs to be set by manual.
31/32
Considering
inequality constraints
The number of
required trials
Hyperparameters
𝑑v and 𝛽
Experiment 1 Experiment 2
?
Size of the exploration area
small
𝑧 𝑧 𝑧
Optimal
parameters
large
Generating parameters that
does not satisfy constraints
Optimal parameters
cannot be explored.
Exploration area
(bounding box)
Future work
Best area !!
© Mitsubishi Electric Corporation
Conclusion
• We proposed methods to handle two types of constraints in Bayesian optimization
even after the nonlinear embedding.
➢ Known equality constraints : Parameter decomposition is useful.
➢ Unknown inequality constraints : Disentangled representation learning is useful.
• We conducted two experiments.
➢ Experiment 1 showed the effect of hyperparameters on considering inequality constraints
and the visualization to determine the values.
➢ Experiment 2 demonstrated that the proposed method contributes to reducing the number
of trials by approximately 66% compared to the manual tuning.
32/32
© Mitsubishi Electric Corporation
Do you have any questions ?
33/32
High-Dimensional Bayesian Optimization with Constraints: Application to Powder weighing (PDPAT2022/MPS139)

More Related Content

What's hot

Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Yoshitaka Ushiku
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and EditingDeep Learning JP
 
【論文調査】XAI技術の効能を ユーザ実験で評価する研究
【論文調査】XAI技術の効能を ユーザ実験で評価する研究【論文調査】XAI技術の効能を ユーザ実験で評価する研究
【論文調査】XAI技術の効能を ユーザ実験で評価する研究Satoshi Hara
 
【DL輪読会】A Path Towards Autonomous Machine Intelligence
【DL輪読会】A Path Towards Autonomous Machine Intelligence【DL輪読会】A Path Towards Autonomous Machine Intelligence
【DL輪読会】A Path Towards Autonomous Machine IntelligenceDeep Learning JP
 
モンテカルロ法と情報量
モンテカルロ法と情報量モンテカルロ法と情報量
モンテカルロ法と情報量Shohei Miyashita
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative ModelsDeep Learning JP
 
Counterfaual Machine Learning(CFML)のサーベイ
Counterfaual Machine Learning(CFML)のサーベイCounterfaual Machine Learning(CFML)のサーベイ
Counterfaual Machine Learning(CFML)のサーベイARISE analytics
 
(文献紹介)Deep Unrolling: Learned ISTA (LISTA)
(文献紹介)Deep Unrolling: Learned ISTA (LISTA)(文献紹介)Deep Unrolling: Learned ISTA (LISTA)
(文献紹介)Deep Unrolling: Learned ISTA (LISTA)Morpho, Inc.
 
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)MLSE
 
オントロジー工学に基づくセマンティック技術(1)オントロジー工学入門
オントロジー工学に基づくセマンティック技術(1)オントロジー工学入門オントロジー工学に基づくセマンティック技術(1)オントロジー工学入門
オントロジー工学に基づくセマンティック技術(1)オントロジー工学入門Kouji Kozaki
 
ナレッジグラフ入門
ナレッジグラフ入門ナレッジグラフ入門
ナレッジグラフ入門KnowledgeGraph
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習cvpaper. challenge
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門Shuyo Nakatani
 
“機械学習の説明”の信頼性
“機械学習の説明”の信頼性“機械学習の説明”の信頼性
“機械学習の説明”の信頼性Satoshi Hara
 
[DL輪読会]Weakly-Supervised Disentanglement Without Compromises
[DL輪読会]Weakly-Supervised Disentanglement Without Compromises[DL輪読会]Weakly-Supervised Disentanglement Without Compromises
[DL輪読会]Weakly-Supervised Disentanglement Without CompromisesDeep Learning JP
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII
 
【DL輪読会】HyperTree Proof Search for Neural Theorem Proving
【DL輪読会】HyperTree Proof Search for Neural Theorem Proving【DL輪読会】HyperTree Proof Search for Neural Theorem Proving
【DL輪読会】HyperTree Proof Search for Neural Theorem ProvingDeep Learning JP
 
Direct feedback alignment provides learning in Deep Neural Networks
Direct feedback alignment provides learning in Deep Neural NetworksDirect feedback alignment provides learning in Deep Neural Networks
Direct feedback alignment provides learning in Deep Neural NetworksDeep Learning JP
 
【DL輪読会】Reward Design with Language Models
【DL輪読会】Reward Design with Language Models【DL輪読会】Reward Design with Language Models
【DL輪読会】Reward Design with Language ModelsDeep Learning JP
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP
 

What's hot (20)

Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
 
【論文調査】XAI技術の効能を ユーザ実験で評価する研究
【論文調査】XAI技術の効能を ユーザ実験で評価する研究【論文調査】XAI技術の効能を ユーザ実験で評価する研究
【論文調査】XAI技術の効能を ユーザ実験で評価する研究
 
【DL輪読会】A Path Towards Autonomous Machine Intelligence
【DL輪読会】A Path Towards Autonomous Machine Intelligence【DL輪読会】A Path Towards Autonomous Machine Intelligence
【DL輪読会】A Path Towards Autonomous Machine Intelligence
 
モンテカルロ法と情報量
モンテカルロ法と情報量モンテカルロ法と情報量
モンテカルロ法と情報量
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models
 
Counterfaual Machine Learning(CFML)のサーベイ
Counterfaual Machine Learning(CFML)のサーベイCounterfaual Machine Learning(CFML)のサーベイ
Counterfaual Machine Learning(CFML)のサーベイ
 
(文献紹介)Deep Unrolling: Learned ISTA (LISTA)
(文献紹介)Deep Unrolling: Learned ISTA (LISTA)(文献紹介)Deep Unrolling: Learned ISTA (LISTA)
(文献紹介)Deep Unrolling: Learned ISTA (LISTA)
 
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
【基調講演】『深層学習の原理の理解に向けた理論の試み』 今泉 允聡(東大)
 
オントロジー工学に基づくセマンティック技術(1)オントロジー工学入門
オントロジー工学に基づくセマンティック技術(1)オントロジー工学入門オントロジー工学に基づくセマンティック技術(1)オントロジー工学入門
オントロジー工学に基づくセマンティック技術(1)オントロジー工学入門
 
ナレッジグラフ入門
ナレッジグラフ入門ナレッジグラフ入門
ナレッジグラフ入門
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
“機械学習の説明”の信頼性
“機械学習の説明”の信頼性“機械学習の説明”の信頼性
“機械学習の説明”の信頼性
 
[DL輪読会]Weakly-Supervised Disentanglement Without Compromises
[DL輪読会]Weakly-Supervised Disentanglement Without Compromises[DL輪読会]Weakly-Supervised Disentanglement Without Compromises
[DL輪読会]Weakly-Supervised Disentanglement Without Compromises
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
 
【DL輪読会】HyperTree Proof Search for Neural Theorem Proving
【DL輪読会】HyperTree Proof Search for Neural Theorem Proving【DL輪読会】HyperTree Proof Search for Neural Theorem Proving
【DL輪読会】HyperTree Proof Search for Neural Theorem Proving
 
Direct feedback alignment provides learning in Deep Neural Networks
Direct feedback alignment provides learning in Deep Neural NetworksDirect feedback alignment provides learning in Deep Neural Networks
Direct feedback alignment provides learning in Deep Neural Networks
 
【DL輪読会】Reward Design with Language Models
【DL輪読会】Reward Design with Language Models【DL輪読会】Reward Design with Language Models
【DL輪読会】Reward Design with Language Models
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 

Similar to High-Dimensional Bayesian Optimization with Constraints: Application to Powder weighing (PDPAT2022/MPS139)

ABAQUS LEC.ppt
ABAQUS LEC.pptABAQUS LEC.ppt
ABAQUS LEC.pptAdalImtiaz
 
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...Jihun Yun
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Steady state CFD analysis of C-D nozzle
Steady state CFD analysis of C-D nozzle Steady state CFD analysis of C-D nozzle
Steady state CFD analysis of C-D nozzle Vishnu R
 
Radioss analysis quality_ht
Radioss analysis quality_htRadioss analysis quality_ht
Radioss analysis quality_htAltairKorea
 
Self-dependent 3D face rotational alignment using the nose region
Self-dependent 3D face rotational alignment using the nose regionSelf-dependent 3D face rotational alignment using the nose region
Self-dependent 3D face rotational alignment using the nose regionMehryar (Mike) E., Ph.D.
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...Borhan Kazimipour
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSangmin Woo
 
Acm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale SystemsAcm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale SystemsVinayak Hegde
 
Metric Recovery from Unweighted k-NN Graphs
Metric Recovery from Unweighted k-NN GraphsMetric Recovery from Unweighted k-NN Graphs
Metric Recovery from Unweighted k-NN Graphsjoisino
 
Linear programming models - U2.pptx
Linear programming models - U2.pptxLinear programming models - U2.pptx
Linear programming models - U2.pptxMariaBurgos55
 
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...Borhan Kazimipour
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 

Similar to High-Dimensional Bayesian Optimization with Constraints: Application to Powder weighing (PDPAT2022/MPS139) (20)

ABAQUS LEC.ppt
ABAQUS LEC.pptABAQUS LEC.ppt
ABAQUS LEC.ppt
 
PF_MAO2010 Souma
PF_MAO2010 SoumaPF_MAO2010 Souma
PF_MAO2010 Souma
 
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Steady state CFD analysis of C-D nozzle
Steady state CFD analysis of C-D nozzle Steady state CFD analysis of C-D nozzle
Steady state CFD analysis of C-D nozzle
 
Radioss analysis quality_ht
Radioss analysis quality_htRadioss analysis quality_ht
Radioss analysis quality_ht
 
Self-dependent 3D face rotational alignment using the nose region
Self-dependent 3D face rotational alignment using the nose regionSelf-dependent 3D face rotational alignment using the nose region
Self-dependent 3D face rotational alignment using the nose region
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Tutorial_01_Quick_Start.pdf
Tutorial_01_Quick_Start.pdfTutorial_01_Quick_Start.pdf
Tutorial_01_Quick_Start.pdf
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Acm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale SystemsAcm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale Systems
 
Metric Recovery from Unweighted k-NN Graphs
Metric Recovery from Unweighted k-NN GraphsMetric Recovery from Unweighted k-NN Graphs
Metric Recovery from Unweighted k-NN Graphs
 
FEA Report
FEA ReportFEA Report
FEA Report
 
Linear programming models - U2.pptx
Linear programming models - U2.pptxLinear programming models - U2.pptx
Linear programming models - U2.pptx
 
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
A sensitivity analysis of contribution-based cooperative co-evolutionary algo...
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
EDM_SEMINAR.pptx
EDM_SEMINAR.pptxEDM_SEMINAR.pptx
EDM_SEMINAR.pptx
 
Benders Decomposition
Benders Decomposition Benders Decomposition
Benders Decomposition
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

High-Dimensional Bayesian Optimization with Constraints: Application to Powder weighing (PDPAT2022/MPS139)

  • 1. © Mitsubishi Electric Corporation High-Dimensional Bayesian Optimization with Constraints: Application to Powder Weighing Shoki Miyagawa, Atsuyoshi Yano, Naoko Sawada, Isamu Ogawa (Mitsubishi Electric Corporation)
  • 2. © Mitsubishi Electric Corporation Background Bayesian optimization (BO) can explore optimal parameters in black-box problems in limited trials. 2/32 Black-box model 𝑥new Bayesian optimization Input parameters Output (to be maximized) 𝑦new 𝑦 𝑦 𝒙𝐧𝐞𝐰 𝑥 𝑥 𝑦 𝒙𝐧𝐞𝐰 𝑥 𝒚𝐧𝐞𝐰
  • 3. © Mitsubishi Electric Corporation Background Bayesian optimization (BO) can explore optimal parameters in black-box problems in limited trials. However, it cannot work for high-dimensional parameters (typically > 10) because the exploration area is too wide. 3/32 Black-box model 𝑥new Bayesian optimization Input parameters Output (to be maximized) 𝑦new 𝑦 𝑦 𝒙𝐧𝐞𝐰 𝑥 𝑥 𝑦 𝒙𝐧𝐞𝐰 𝑥 𝒚𝐧𝐞𝐰
  • 4. © Mitsubishi Electric Corporation Background Related works explore parameters in a low-dimensional space acquired by the following methods. 4/32 REMBO [IJCAI’13] LINEBO [ICML’19] Efficient exploration Constraints cannot be explicitly expressed in the latent space. Constraints can be easily introduced Not efficient exploration (particularly for image and NLP) Dropout [IJCAI’17] Original Dropped Subspace extraction / Linear embedding 𝒙𝐧𝐞𝐰 𝒙𝐢𝐧𝐢𝐭 BO in non-dropped dimensions. BO in a single dimension. BO in random embedded dimensions. Nonlinear embedding 2. BO in the latent space 𝑦 𝒛𝐧𝐞𝐰 𝒛 Encoder (DNN) 𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒛𝟏 𝒛𝟐 𝒛𝟑 Decoder (DNN) 𝒛𝐧𝐞𝐰 𝒙𝐧𝐞𝐰 1. Encode datasets 3. Decode latent parameters We tackle this problem !! 𝒙 𝒛 Random matrix 𝑦 𝒛𝐧𝐞𝐰 𝒛 𝒙
  • 5. © Mitsubishi Electric Corporation Proposed method: Key idea This study focus on two types of constraint. • Known equality constraint → Decomposition into variable and fixed parameters is useful. • Unknown inequality constraint → Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful. 5/32 Nonlinear embedding 2. Bayesian optimization on the latent space 𝑦 𝒛𝐧𝐞𝐰 𝒛 Encoder (DNN) 𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒛𝟏 𝒛𝟐 𝒛𝟑 Decoder (DNN) 𝒛𝐧𝐞𝐰 𝒙𝐧𝐞𝐰 1. Encode datasets 3. Decode datasets Disentangled representation learning • Latent parameter is interpretable and independent 𝒛 Decoder (DNN) “rotation” “smile” The Figure is from 𝛽-VAE [ICLR’17] • DRL is generally used to control generative models (VAE, GAN, …) Variable parameters w/o equality constraints Fixed parameters w/ equality constraints Parameters Bayesian optimization Explored parameters condition 𝒙𝐯 𝒙𝐟 𝒙𝐟 𝒙𝐟 𝒙 𝒙𝐯 𝐧𝐞𝐰 𝒙𝐯 𝐧𝐞𝐰
  • 6. © Mitsubishi Electric Corporation Proposed method: Key idea This study focus on two types of constraint. • Known equality constraint → Decomposition into variable and fixed parameters is useful. • Unknown inequality constraint → Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful. 6/32 Nonlinear embedding 2. Bayesian optimization on the latent space 𝑦 𝒛𝐧𝐞𝐰 𝒛 Encoder (DNN) 𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒛𝟏 𝒛𝟐 𝒛𝟑 Decoder (DNN) 𝒛𝐧𝐞𝐰 𝒙𝐧𝐞𝐰 1. Encode datasets 3. Decode datasets Disentangled representation learning • Latent parameter is interpretable and independent 𝒛 Decoder (DNN) “rotation” “smile” The Figure is from 𝛽-VAE [ICLR’17] • DRL is generally used to control generative models (VAE, GAN, …) Variable parameters w/o equality constraints Fixed parameters w/ equality constraints Parameters Bayesian optimization Explored parameters condition 𝒙𝐯 𝒙𝐟 𝒙𝐟 𝒙𝐟 𝒙 𝒙𝐯 𝐧𝐞𝐰 𝒙𝐯 𝐧𝐞𝐰
  • 7. © Mitsubishi Electric Corporation Proposed method: Key idea This study focus on two types of constraint. • Known equality constraint → Decomposition into variable and fixed parameters is useful. • Unknown inequality constraint → Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful. 7/32 Nonlinear embedding 2. Bayesian optimization on the latent space 𝑦 𝒛𝐧𝐞𝐰 𝒛 Encoder (DNN) 𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒛𝟏 𝒛𝟐 𝒛𝟑 Decoder (DNN) 𝒛𝐧𝐞𝐰 𝒙𝐧𝐞𝐰 1. Encode datasets 3. Decode datasets Disentangled representation learning • Latent parameter is interpretable and independent 𝒛 Decoder (DNN) “rotation” “smile” The Figure is from 𝛽-VAE [ICLR’17] • DRL is generally used to control generative models (VAE, GAN, …) Variable parameters w/o equality constraints Fixed parameters w/ equality constraints Parameters Bayesian optimization Explored parameters condition 𝒙𝐯 𝒙𝐟 𝒙𝐟 𝒙𝐟 𝒙 𝒙𝐯 𝐧𝐞𝐰 𝒙𝐯 𝐧𝐞𝐰
  • 8. © Mitsubishi Electric Corporation • Unknown inequality constraint Proposed method: Key idea 8/32 Problem in the previous methods axis #2 axis #1 Nonlinear embedding Latent parameter space Original parameter space axis #2 axis #1 We can just apply BO in a region that satisfy constraints if the inequality constraints are known. Region that satisfy constraints Region that does not satisfy constraints BO possibly generate parameters that does not satisfy constraints.
  • 9. © Mitsubishi Electric Corporation Proposed method: Key idea • Unknown inequality constraint → Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful because users need only check whether the constraints are satisfied for data in each axis. 9/32 Example 1: Generating face with a constraint of man face. Decoder This region locally satisfy the constraint ! This axis is not related to the constraint (to be a man face) axis #1 axis #1 axis #2 axis #2 rotation
  • 10. © Mitsubishi Electric Corporation Proposed method: Key idea • Unknown inequality constraint → Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful because users need only check whether the constraints are satisfied for data in each axis. 10/32 axis #1 axis #1 Decoder This axis is not related to the constraint (to be a man face) axis #2 axis #2 Example 1: Generating face with a constraint of man face. smiling
  • 11. © Mitsubishi Electric Corporation Proposed method: Key idea • Unknown inequality constraint → Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful because users need only check whether the constraints are satisfied for data in each axis. 11/32 Mixed features also satisfy constraints axis #1 axis #1 axis #2 axis #2 Example 1: Generating face with a constraint of man face.
  • 12. © Mitsubishi Electric Corporation Proposed method: Key idea • Unknown inequality constraint → Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful because users need only check whether the constraints are satisfied for data in each axis. 12/32 Decoder Exploration area is restricted to axis #1. This axis is related to the constraint (to be smiling face) Example 1: Generating face with a constraint of similing face. This region possibly does not satisfy the constraint. axis #1 axis #1 axis #2 axis #2 smiling !
  • 13. © Mitsubishi Electric Corporation Proposed method: Key idea • Unknown inequality constraint → Introducing disentangled representation learning (DRL) into the nonlinear embedding is useful because users need only check whether the constraints are satisfied for data in each axis. 13/32 axis #1 axis #2 axis #1 axis #2 Exploration area in the example 1 Exploration area in the example 2 We can control the exploration are in the latent space even if the inequality constraints are unknown.
  • 14. © Mitsubishi Electric Corporation Proposed method: Overview 14/32
  • 15. © Mitsubishi Electric Corporation Step1: Dimensionality reduction For variable parameters, we used 𝛽-VAE to introduce DRL into VAE and acquired the latent space 𝑧v ∈ ℝ𝒅𝐯. (For fixed parameters, we used PCA for simplicity.) 15/32 Hyperparameters (the dimensionality of the latent space 𝒅𝐯 and the coefficient 𝜷) control a tradeoff between two losses. Reconstruction loss KL-divergence loss large 𝛽 × BO generates rough-grained features. (-> hard to optimize parameters) 〇 Features are more disentangled. small 𝛽 〇 BO generates fine-grained features × Features are less disentangled. (-> hard to consider constraints) Reconstruction loss + 𝛽 ∗ KL-divergence loss 𝛽-VAE loss = 𝑧 𝑧 loss 𝑧 𝑥 𝑥′ loss 𝒩(0, 1) encoder decoder
  • 16. © Mitsubishi Electric Corporation Step2: Bayesian optimization We used Gaussian process regression (GPR) and maximized the UCB (upper confidence bound) acquisition function 𝑎(𝑧v, 𝑧f = 𝑧f target ). 16/32 𝑦 𝑦 𝑧v, 𝑧f 𝑎UCB(𝑧v, 𝑧f) 𝑎UCB 𝑧v, 𝑧f = 𝜇 𝑧v, 𝑧f + 𝛼 ∗ 𝜎 𝑧v, 𝑧f 𝜇 𝑧v, 𝑧f 𝜎 𝑧v, 𝑧f Variance Acquisition function Mean We generated three parameters 𝒛𝐯 𝐧𝐞𝐰 and let user to select one of them. - exploitation-oriented (𝛼 = 0.001) - intermediate (𝛼 = 0.5) - exploration-oriented (𝛼 = 1.0) 𝒛𝐯 𝐧𝐞𝐰 , 𝒛𝐟 𝐭𝐚𝐫𝐠𝐞𝐭 𝑧v, 𝑧f 𝑧v, 𝑧f Gaussian process regression
  • 17. © Mitsubishi Electric Corporation Usage Scenario: Powder weighing system System overview The system needs to precisely weigh a powder by changing a valve opening degree 𝑣𝑖 → 𝑣𝑖+1 if the scale value reached a corresponding switching weight 𝑠𝑖+1 (0 ≤ 𝑖 ≤ 8). 17/32 Valve opening degree 9 steps Switching weight 𝑣0 𝑣1 𝑠1 𝑠9 𝑣9 (start) (end)
  • 18. © Mitsubishi Electric Corporation Usage Scenario: Powder weighing system Two types of inequality constraints • Non-negative constraints : 𝑣𝑖 > 0, 𝑠𝑖 > 0 • Monotonically decreasing constraints : 𝑣𝑖 > 𝑣𝑖+1, 𝑠𝑖 < 𝑠𝑖+1 18/32 Valve opening degree 9 steps Switching weight 𝑣0 𝑣1 𝑠1 𝑠9 𝑣9 (start) (end)
  • 19. © Mitsubishi Electric Corporation Usage Scenario: Powder weighing system Preprocessing 19/32 • Normalization • Outlier removal • Duplication removal to prevent imbalanced learning • Train/Test split • Normalization • Outlier removal • Data filtering to restrict the exploration area locally • Train/Test split
  • 20. © Mitsubishi Electric Corporation Usage Scenario: Powder weighing system Datasets 20/32 contained 60 types of powder and consisted of 1,792 trials (the average is 31.33±19.48). Parameters 𝒙𝐟 w/ equality constraints (used for learning PCA and GPR) Parameters 𝒙𝐯 w/o equality constraints (used for learning 𝛽-VAE and GPR) ൠ Objective value 𝑦 representing an error between the measured and required weight. (used for learning GPR) ൠ
  • 21. © Mitsubishi Electric Corporation Experiments overview Experiment 1-1, 1-2 Experiment 2 21/32 We verify the effect of hyperparameters in 𝛽-VAE learning on considering inequality constraints. We verify whether the proposed method could determine optimum parameters within a reasonable number of trials. the weighing error 𝑦 is less than 1% of the required weight. (manual tuning needs typically about 20 trials in practice.) Hyperparameters ? 𝑑v and 𝛽 Considering inequality constraints Considering inequality constraints ? The number of required trials
  • 22. © Mitsubishi Electric Corporation Experiments overview Experiment 1-1, 1-2 Experiment 2 22/32 We verify the effect of hyperparameters in 𝛽-VAE learning on considering inequality constraints. We verify whether the proposed method could determine optimum parameters within a reasonable number of trials. the weighing error 𝑦 is less than 1% of the required weight. (manual tuning needs typically about 20 trials in practice.) Hyperparameters ? 𝑑v and 𝛽 Considering inequality constraints Considering inequality constraints ? The number of required trials
  • 23. © Mitsubishi Electric Corporation Experiment 1-1: Evaluation on hyperparameter effects quantitatively. 23/32 ቊ 𝑑v ∈ 2, 4, 6, 8, 10 𝛽 ∈ {0.1, 0.2, … , 1.5} Hyperparameters value selection 𝛽-VAE learning evaluation The number of the unsuitable data which does not satisfy constraints Procedure (× 75 times for all hyperparameter combinations) Sampled randomly in the latent space (𝑛 = 1000) 𝑧 𝑥 Decoder (DNN) Satisfy constraints? Suitable Unsuitable Sampled randomly in the original space (𝑛 = 1000) ℝ𝒅𝐯
  • 24. © Mitsubishi Electric Corporation Experiment 1-1: Evaluation on hyperparameter effects quantitatively. Result 24/32 Findings • Larger 𝛽 decreases the unsuitable data. -> We guess that DRL enable us to consider the inequality constraints. • Larger 𝑑v in the latent space increases the unsuitable data. -> We guess that samples far from the origin of the latent space tend to be the unsuitable data because fine-grained features are emphasized in the area far from the origin. The number of unsuitable data Regions where undesirable features are emphasized 𝑧 Exploration area
  • 25. © Mitsubishi Electric Corporation Experiment 1-2: Evaluation on hyperparameter effects qualitatively. 25/32 ቊ 𝑑v = 2 𝛽 ∈ {0.1, 0.5, 1.0} Hyperparameters value selection 𝛽-VAE learning Visualization The meaning of disentangled features Procedure (× 3 times for all hyperparameter combinations) Sampled at equal intervals along the axes in the latent space (𝑛 = 15) 𝑧 𝑥 Decoder (DNN) Check whether the disentangled features satisfy constraints Sampled at equal intervals along the axes in the original space (𝑛 = 15)
  • 26. © Mitsubishi Electric Corporation Experiment 1-2: Evaluation on hyperparameter effects qualitatively. Result 26/32 𝑧 Sufficient consideration of constraints, poor diversity Lack of consideration of constraints, rich diversity Initial point changes Initial point and the gradient change Valve opening degree Valve opening degree Valve opening degree Switching weight Switching weight
  • 27. © Mitsubishi Electric Corporation Experiment 1-2: Evaluation on hyperparameter effects qualitatively. Result 27/32 𝑧 Sufficient consideration of constraints, poor diversity Lack of consideration of constraints, rich diversity Initial point changes We used this setting in the next experiment 2. Initial point and the gradient change
  • 28. © Mitsubishi Electric Corporation Experiment 1 Discussion • Can DRL consider inequality constraints ? ➢ YES. • How should we set the hyperparameter values ? ➢ To determine the hyperparameter values, the visualization of the effect quantitatively and qualitatively is helpful. ➢ We recommend to determine the value of 𝑑v first because the suitable value of 𝛽 depends on 𝑑v value. 28 Acceptable parameters area 𝑑v smaller 𝑑v larger smaller 𝛽 larger 𝛽 Reconstruction loss is too high (-> parameters have poor diversity) Lack of consideration of constraints
  • 29. © Mitsubishi Electric Corporation Experiments overview Experiment 1-1, 1-2 Experiment 2 29/32 We verify the effect of hyperparameters in 𝛽-VAE learning on considering inequality constraints. We verify whether the proposed method could determine optimum parameters within a reasonable number of trials. the weighing error 𝑦 is less than 1% of the required weight. (manual tuning needs typically about 20 trials in practice.) Hyperparameters ? 𝑑v and 𝛽 Considering inequality constraints Considering inequality constraints ? The number of required trials
  • 30. © Mitsubishi Electric Corporation Experiment 2 Procedure Result 30/32 • We used three types of powder A, B, and C not included in the dataset. • we can see that powders A, B, and C are not outliers. • From the result of the experiment 1, we set 𝑑v = 2 , 𝛽 = 0.1 which leads rich diversity and low reconstruction error. The proposed method contributes to reducing the number of trials (from 20 to around 5) compared to the baseline. Baseline (manual tuning) Features of the generated parameters • For powders B and C, we successfully satisfy the constraints in all trials. • For powder A, we generated the unsuitable data in one trial because the proposed method seems to have explored areas far from the origin of the latent space (from the observation in the experiment 1). PCA visualization of fixed parameters
  • 31. © Mitsubishi Electric Corporation Limitations • The relationship between hyperparameters and the number of required trial is still unclear. • The exploration area needs to be set by manual. 31/32 Considering inequality constraints The number of required trials Hyperparameters 𝑑v and 𝛽 Experiment 1 Experiment 2 ? Size of the exploration area small 𝑧 𝑧 𝑧 Optimal parameters large Generating parameters that does not satisfy constraints Optimal parameters cannot be explored. Exploration area (bounding box) Future work Best area !!
  • 32. © Mitsubishi Electric Corporation Conclusion • We proposed methods to handle two types of constraints in Bayesian optimization even after the nonlinear embedding. ➢ Known equality constraints : Parameter decomposition is useful. ➢ Unknown inequality constraints : Disentangled representation learning is useful. • We conducted two experiments. ➢ Experiment 1 showed the effect of hyperparameters on considering inequality constraints and the visualization to determine the values. ➢ Experiment 2 demonstrated that the proposed method contributes to reducing the number of trials by approximately 66% compared to the manual tuning. 32/32
  • 33. © Mitsubishi Electric Corporation Do you have any questions ? 33/32