The slideshow contains a brief explanation of Gaussian process regression and Bayesian optimization. For two optimization problems, benchmarks against other local gradient-based and global heuristic optimization methods are included. They show, that Bayesian optimization can identify better designs in exceptionally short computation times.
Schema on read is obsolete. Welcome metaprogramming..pdf
ย
A machine learning method for efficient design optimization in nano-optics
1. A machine learning method
for efficient design
optimization in nano-optics
2. 2
Optical behavior of small structures (e.g. scattering in certain
direction) dominated by diffraction, interference and resonance
phenomena
๏ Full solution of Maxwellโs equation required
๏ Behavior only known implicitly (black-box
function)
๏ Computation of solution is time
consuming (expensive
black-box function)
Computational challenges in nano-optics
?
3. 3
Analysis of expensive black-box functions
Typical questions:
โข Regression: What is the response ๐(๐ฅ)
for unknown parameter values ๐ฅ ?
โข Optimization: What are the best
parameter values that lead to a
measured/desired response?
โข Integration: What is the average
response?
System response
(requires solution
of Maxwellโs
equations)
k
ฯ
p1
p2
โฆ
Black-box function
?
Isolated Scatterers Metamaterials Geometry Reconstruction
k, ฯ
4. 4
Regression models
๏ง Regression models are important
tools to interpolate between
known data points.
๏ง Further, they can be used for
model-based optimization and
numerical integration
(quadrature).
5. 5
+ Accurate and data
efficient
+ Reliable (provides
uncertainties)
+ Interpretable results
โ Computationally
demanding but not as
much as training neural
networks
Regression models (small selection)
K-nearest neighbors
Linear regression
Support vector machine
Random forest trees
Gaussian process
regression (Kriging)
(Deep) neural networks
[CE Rasmussen, โGaussian processes in machine learningโ. Advanced lectures on machine
learning , Springer (2004)]
[B. Shahriari et al., "Taking the Human Out of the Loop: A Review of Bayesian Optimizationโ.
Proc. IEEE 104(1), 148 (2016)]
Increasingpredictivepower
andcomputationaldemands
9. 9
In the following we donโt need correlated random vectors of
function values, but just the probability distribution of a
single function value ๐ฆ at some ๐ฅโ
โ ๐ณ
This is simply a normal distribution ๐ฆ โผ ๐ฉ( ๐ฆ, ๐2
) with mean
and standard deviation
๐ฆ = ๐ ๐ฅโ +
๐๐
๐ ๐ฅโ, ๐ฅ๐ ๐ ๐๐
โ1
[๐ ๐ฅ๐ โ ๐ ๐ฅ๐ ]
๐2 = ๐ ๐ฅโ, ๐ฅโ โ ๐๐ ๐(๐ฅโ, ๐ฅ๐) ๐ ๐๐
โ1
๐(๐ฅ๐, ๐ฅโ)
Gaussian process regression
13. 13
Problem: Find parameters ๐ฅ โ ๐ณ that minimize ๐ ๐ฅ . For the currently known smallest
function value ๐ฆ ๐๐๐ we define the improvement
๐ผ ๐ฆ =
0 โถ ๐ฆ โฅ ๐ฆ ๐๐๐
๐ฆ ๐๐๐ โ ๐ฆ โถ y < ๐ฆ ๐๐๐
We sample at points of largest expected improvement ๐ผEI(๐ฅ) = ๐ผ ๐ผ(๐ฆ) (analytic function
derived from normal distribution of ๐ฆ)
Bayesian optimization
14. 14
Problem: Find parameters ๐ฅ โ ๐ณ that minimize ๐ ๐ฅ . For the currently known smallest
function value ๐ฆ ๐๐๐ we define the improvement
๐ผ ๐ฆ =
0 โถ ๐ฆ โฅ ๐ฆ ๐๐๐
๐ฆ ๐๐๐ โ ๐ฆ โถ y < ๐ฆ ๐๐๐
We sample at points of largest expected improvement ๐ผEI(๐ฅ) = ๐ผ ๐ผ(๐ฆ) (analytic function
derived from normal distribution of ๐ฆ)
Bayesian optimization
15. 15
Problem: Find parameters ๐ฅ โ ๐ณ that minimize ๐ ๐ฅ . For the currently known smallest
function value ๐ฆ ๐๐๐ we define the improvement
๐ผ ๐ฆ =
0 โถ ๐ฆ โฅ ๐ฆ ๐๐๐
๐ฆ ๐๐๐ โ ๐ฆ โถ y < ๐ฆ ๐๐๐
We sample at points of largest expected improvement ๐ผEI(๐ฅ) = ๐ผ ๐ผ(๐ฆ) (analytic function
derived from normal distribution of ๐ฆ)
Bayesian optimization
16. 16
Problem: Find parameters ๐ฅ โ ๐ณ that minimize ๐ ๐ฅ . For the currently known smallest
function value ๐ฆ ๐๐๐ we define the improvement
๐ผ ๐ฆ =
0 โถ ๐ฆ โฅ ๐ฆ ๐๐๐
๐ฆ ๐๐๐ โ ๐ฆ โถ y < ๐ฆ ๐๐๐
We sample at points of largest expected improvement ๐ผEI(๐ฅ) = ๐ผ ๐ผ(๐ฆ) (analytic function
derived from normal distribution of ๐ฆ)
Bayesian optimization
17. 17
Problem: Find parameters ๐ฅ โ ๐ณ that minimize ๐ ๐ฅ . For the currently known smallest
function value ๐ฆ ๐๐๐ we define the improvement
๐ผ ๐ฆ =
0 โถ ๐ฆ โฅ ๐ฆ ๐๐๐
๐ฆ ๐๐๐ โ ๐ฆ โถ y < ๐ฆ ๐๐๐
We sample at points of largest expected improvement ๐ผEI(๐ฅ) = ๐ผ ๐ผ(๐ฆ) (analytic function
derived from normal distribution of ๐ฆ)
Bayesian optimization
18. 18
Problem: Find parameters ๐ฅ โ ๐ณ that minimize ๐ ๐ฅ . For the currently known smallest
function value ๐ฆ ๐๐๐ we define the improvement
๐ผ ๐ฆ =
0 โถ ๐ฆ โฅ ๐ฆ ๐๐๐
๐ฆ ๐๐๐ โ ๐ฆ โถ y < ๐ฆ ๐๐๐
We sample at points of largest expected improvement ๐ผEI(๐ฅ) = ๐ผ ๐ผ(๐ฆ) (analytic function
derived from normal distribution of ๐ฆ)
Bayesian optimization
19. 19
Problem: Find parameters ๐ฅ โ ๐ณ that minimize ๐ ๐ฅ . For the currently known smallest
function value ๐ฆ ๐๐๐ we define the improvement
๐ผ ๐ฆ =
0 โถ ๐ฆ โฅ ๐ฆ ๐๐๐
๐ฆ ๐๐๐ โ ๐ฆ โถ y < ๐ฆ ๐๐๐
We sample at points of largest expected improvement ๐ผEI(๐ฅ) = ๐ผ ๐ผ(๐ฆ) (analytic function
derived from normal distribution of ๐ฆ)
Bayesian optimization
20. 20
Problem: Find parameters ๐ฅ โ ๐ณ that minimize ๐ ๐ฅ . For the currently known smallest
function value ๐ฆ ๐๐๐ we define the improvement
๐ผ ๐ฆ =
0 โถ ๐ฆ โฅ ๐ฆ ๐๐๐
๐ฆ ๐๐๐ โ ๐ฆ โถ y < ๐ฆ ๐๐๐
We sample at points of largest expected improvement ๐ผEI(๐ฅ) = ๐ผ ๐ผ(๐ฆ) (analytic function
derived from normal distribution of ๐ฆ)
Bayesian optimization
For more and more data points in the local
minimum: ๐ผEI ๐ฅ โ 0.
Hence, we do not get trapped in local
minima, but eventually jump out of them.
21. 21
Utilizing derivatives
The JCMsuite FEM solver can compute also derivatives w.r.t. geometric
parameter, material parameters and others. We can use derivatives to train the
GP because differentiation is a linear operator:
โข What is the mean function of the GP for derivative observations?
๐ ๐ท ๐ฅ โก ๐ผ ๐ป๐ ๐ฅ = ๐ป๐ผ ๐ ๐ฅ = ๐ป๐ ๐ฅ = 0
โข What is the kernel function between an observation at ๐ฅ and a derivative
observation at ๐ฅโฒ
?
๐ ๐ท ๐ฅ, ๐ฅโฒ โก cov ๐ ๐ฅ , ๐ป๐ ๐ฅโฒ = ๐ผ ๐ ๐ฅ โ ๐ ๐ฅ ๐ป๐ ๐ฅโฒ โ ๐ ๐ท ๐ฅโฒ = ๐ป๐ฅโฒ ๐(๐ฅ, ๐ฅโฒ)
โข Analogously, the kernel function between a derivative observation at ๐ฅ and a
derivative observation at ๐ฅโฒ
is given as
๐ ๐ท๐ท ๐ฅ, ๐ฅโฒ
โก cov ๐ป๐ ๐ฅ , ๐ป๐ ๐ฅโฒ
= ๐ป๐ฅ ๐ป๐ฅโฒ ๐(๐ฅ, ๐ฅโฒ)
๏จ We can build a large GP (i.e. a large mean vector and covariance matrix)
containing observations of objective function and its derivatives
38. 38
Solving arg max
๐ฅ
๐ผEI(๐ฅ) can be very time consuming.
Bayesian optimization runs inefficiently if the sample computation
takes longer then the objective function calculation (simulation)
๏จWe use differential evolution to maximize ๐ผEI(๐ฅ) and adapt
the effort (i.e. the population size and number of generations)
to the simulation time.
๏จWe calculate one sample in advance while the objective
function is evaluated.
See Schneider et al. arXiv:1809.06674 (2019) for details
Making Bayesian optimization time efficient
40. 40
Rastrigin function
๏ง Defined on an ๐-dimensional domain as ๐ ๐ = ๐ด๐ + ๐=1
๐
[๐ฅ๐
2
โ ๐ด cos(2๐๐ฅ๐)] with
๐ด = 10. We use ๐ = 3 and ๐ฅ๐ โ [โ2.5,2.5].
๏ง Sleeping for 10s during evaluation to make function call โexpensiveโ.
๏ง Parallel minimization with 5 parallel evaluations of ๐ ๐ .
Global minimum ๐ ๐๐๐ = 0 at ๐ = 0
41. 41
Choice of optimization algorithms
We compare the performance of
Bayesian optimization (BO) with
โข Local optimization methods
Gradient-based low-memory
Broyden-Fletcher-Goldfarb-Shanno
(L-BFGS-B) started in parallel from
10 different locations
โข Global heuristic optimization
Differential evolution (DE),
Particle swarm optimization (PSO), Covariance matrix
adaptation evolution strategy (CMA-ES)
All optimization methods are run with standard parameters
42. 42
Benchmark on Rastrigin function
[Laptop with 2-core Intel Core I7 @ 2.7 GHz]
๏กBO converges significantely faster than other methods
๏กAlthough more elaborate, BO has no significant computation time
overhead (total overhead approx. 3 min.)
43. 43
Benchmark on Rastrigin function with derivatives
[Laptop with 2-core Intel Core I7 @ 2.7 GHz]
๏กDerivative information speed up minimization
๏กBO with and without derivatives finds lower function values than
multi-start L-BFGS-B with derivatives
44. 44
Benchmark against open-source BO (scikit)
[Laptop with 2-core Intel Core I7 @ 2.7 GHz]
Comparison against Bayesian optimization of scikit-optimize
(https://scikit-optimize.github.io/stable/) shows that the
implemented sample computation methods lead to better samples in
a drastically reduced computation time.
45. 45
More benchmarksโฆ
More benchmarks for realistic
photonic optimization problems can
be found in the publication
ACS Photonics 6 2726 (2019)
https://arxiv.org/abs/1809.06674
โข Single-Photon Source
โข Metasurface
โข Parameter reconstruction
46. 46
Conclusion
โข Bayesian optimization is a highly
efficient method for shape
optimization
โข It can incorporate derivative
information if available
โข It can be used for very expensive
simulations but also for
fast/parallelized simulations (e.g.
one simulation result every two
seconds)
47. 47
Acknowledgements
We are grateful to the following institutions for funding
this research:
โข European Unions Horizon 2020 research and
innovation programme under the Marie Sklodowska-
Curie grant agreement No 675745 (MSCA-ITN-EID
NOLOSS)
โข EMPIR programme co-nanced by the Participating
States and from the European Unions Horizon 2020
research and innovation programme under grant
agreement number 17FUN01 (Be-COMe).
โข Virtual Materials Design (VIRTMAT) project by the
Helmholtz Association via the Helmholtz program
Science and Technology of Nanosystems (STN).
โข Central Innovation Programme for SMEs of the
German Federal Ministry for Economic Afairs and
Energy on the basis of a decision by the German
Bundestag (ZF4450901)
48. 48
Resources
๏ง Description of FEM software
JCMsuite
๏ง Getting started with JCMsuite
๏ง Tutorial on optimization with
JCMsuite using Matlabยฎ/Python
๏ง Free trial download of JCMsuite