SlideShare a Scribd company logo
1 of 39
Download to read offline
MTH702 Optimization
Nonlinear Optimization
Optimization
General optimization problem
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
1/18
Optimization
General optimization problem
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
candidate solutions, variables, parameters x ∈ Rd
objective function f : Rd
→ R
typically: technical assumption: f is continuous and differentiable
1/18
Optimization
General optimization problem
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
candidate solutions, variables, parameters x ∈ Rd
objective function f : Rd
→ R
typically: technical assumption: f is continuous and differentiable
Q: Is this problem easy? OR When is this easy?
1/18
Optimization
General optimization problem
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
candidate solutions, variables, parameters x ∈ Rd
objective function f : Rd
→ R
typically: technical assumption: f is continuous and differentiable
Q: Is this problem easy? OR When is this easy?
Q: How to find the best solution (optimal solution)?
1/18
Questions
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
Q: why do we study optimization?
Q: what are you hoping to learn?
2/18
Why? And How?
Optimization is everywhere
machine learning, big data, statistics, data analysis of all kinds, finance, logistics, planning, control
theory, mathematics, search engines, simulations, and many other applications ...
Mathematical Modeling:
defining & modeling the optimization problem
Computational Optimization:
running an (appropriate) optimization algorithm
3/18
Optimization for Machine Learning
Mathematical Modeling:
defining & and measuring the machine learning model
Computational Optimization:
learning the model parameters
Theory vs. practice:
libraries are available, algorithms treated as “black box” by most practitioners
4/18
Optimization for Machine Learning
Mathematical Modeling:
defining & and measuring the machine learning model
Computational Optimization:
learning the model parameters
Theory vs. practice:
libraries are available, algorithms treated as “black box” by most practitioners
”.... just use Adam....”
4/18
Optimization for Machine Learning
Mathematical Modeling:
defining & and measuring the machine learning model
Computational Optimization:
learning the model parameters
Theory vs. practice:
libraries are available, algorithms treated as “black box” by most practitioners
”.... just use Adam....”
Not here: we look inside the algorithms and try to understand why and how fast they work!
4/18
Optimization Algorithms
Optimization at large scale: simplicity rules!
In special cases (f is smooth, X = Rd) we have ”basic”/”simple” algorithms:
Gradient Descent
Stochastic Gradient Descent (SGD)
Coordinate Descent
5/18
Optimization Algorithms
Optimization at large scale: simplicity rules!
In special cases (f is smooth, X = Rd) we have ”basic”/”simple” algorithms:
Gradient Descent
Stochastic Gradient Descent (SGD)
Coordinate Descent
History:
1847: Cauchy proposes gradient descent
1950s: Linear Programs, soon followed by non-linear, SGD
1980s: General optimization, convergence theory
2005-today: Large scale optimization, convergence of SGD
5/18
Example: Coordinate Descent
Goal: Find x? ∈ Rd minimizing f(x). (Example: d = 2)
x?
x1
x2
Idea: Update one coordinate at a time, while keeping others fixed.
6/18
Example: Coordinate Descent
Goal: Find x? ∈ Rd minimizing f(x).
x?
x1
x2
Idea: Update one coordinate at a time, while keeping others fixed.
7/18
Example: Coordinate Descent
Goal: Find x? ∈ Rd minimizing f(x).
x?
x1
x2
Idea: Update one coordinate at a time, while keeping others fixed.
Q: How to pick coordinate direction? How to find out how far to go? Does it always
7/18
Oracle
Definitions
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
P is an optimization problem (from class of problems P ∈ P)
Oracle O answers questions for some optimization method M
Q: What kind of questions we would need to be answered?
9/18
Definitions
minimizex∈Rd f(x)
with x ∈ X ⊆ Rd
P is an optimization problem (from class of problems P ∈ P)
Oracle O answers questions for some optimization method M
Q: What kind of questions we would need to be answered?
A: what is f(x) for given x? Is the x ∈ X? can we compute ∇f(x) and what is it?
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
9/18
Questions
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
... to solve the problem.... Q: what does it mean?
10/18
Questions
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
... to solve the problem.... Q: what does it mean?
Example: minx
1
2x2 and M be such that given x, it returns x − x
2 . Q: Will we even
solve the problem?
10/18
Questions
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
... to solve the problem.... Q: what does it mean?
Example: minx
1
2x2 and M be such that given x, it returns x − x
2 . Q: Will we even
solve the problem?
Approximate solution to P
in many areas of numerical analysis, it is impossible to find exact solution
10/18
Questions
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
... to solve the problem.... Q: what does it mean?
Example: minx
1
2x2 and M be such that given x, it returns x − x
2 . Q: Will we even
solve the problem?
Approximate solution to P
in many areas of numerical analysis, it is impossible to find exact solution
relaxed goal: find an approximate solution to P with some accuracy   0!
10/18
Questions
The performance
The performance of M on P is the total amount of computational effort required by
method M to solve the problem P
... to solve the problem.... Q: what does it mean?
Example: minx
1
2x2 and M be such that given x, it returns x − x
2 . Q: Will we even
solve the problem?
Approximate solution to P
in many areas of numerical analysis, it is impossible to find exact solution
relaxed goal: find an approximate solution to P with some accuracy   0!
let T be some termination criteria
10/18
Complexity of General Iterative Scheme [N+
18]
Analytical complexity
number of calls of the
oracle necessary to solve
problem P to accuracy 
Arithmetical complexity
total number of arithmetic
operations (including the
work of oracle and work of
method) which is
necessary for solving
problem P up to accuracy

11/18
Standard Oracles
Zero-order oracle
returns the function value f(x)
First-order oracle
returns the function value f(x), ∇f(x)
Second-order oracle
returns the function value f(x), ∇f(x), ∇2f(x)
12/18
Complexity Bounds for Global Optimization
Assume a simple problem
min
x∈Bd
f(x)
where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1}
Q: Can we find the   0 solutions? How many times do we need to call zero-order
oracle O?
13/18
Complexity Bounds for Global Optimization
Assume a simple problem
min
x∈Bd
f(x)
where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1}
Q: Can we find the   0 solutions? How many times do we need to call zero-order
oracle O?
We need some assumptions on f to derive some complexity bounds + we need an algorithm!
13/18
Complexity Bounds for Global Optimization
Assume a simple problem
min
x∈Bd
f(x)
where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1}
Q: Can we find the   0 solutions? How many times do we need to call zero-order
oracle O?
We need some assumptions on f to derive some complexity bounds + we need an algorithm!
Lipschitz Continuity of f
The f : Rd → R is Lipschitz continuous on Bd: |f(x) − f(y)| ≤ Lkx − yk∞ ∀x, y ∈ Bd
Q: How can it help us?
13/18
Complexity Bounds for Global Optimization
Assume a simple problem
min
x∈Bd
f(x)
where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1}
Q: Can we find the   0 solutions? How many times do we need to call zero-order
oracle O?
We need some assumptions on f to derive some complexity bounds + we need an algorithm!
Lipschitz Continuity of f
The f : Rd → R is Lipschitz continuous on Bd: |f(x) − f(y)| ≤ Lkx − yk∞ ∀x, y ∈ Bd
Q: How can it help us?
A: Assume we split Bd into small grid points. Let ∆ is the size of the grid. If we
return the ”best” grid point, what has to be ∆ to guarantee  solutions?
13/18
Uniform Grid Method [N+
18]
note that Nesterov uses n for dimension of the problem
any two neighboring points
x, y in the grid have
kx − yk∞ ≤
1
p
for x∗, there is a grid point
x̄ such that kx∗ − x̄k∞ ≤ 1
p
14/18
Uniform Grid Method [N+
18]
note that Nesterov uses n for dimension of the problem
any two neighboring points
x, y in the grid have
kx − yk∞ ≤
1
p
for x∗, there is a grid point
x̄ such that kx∗ − x̄k∞ ≤ 1
p
|f(x̄) − f(x∗)| ≤
Lkx̄ − x∗k∞ ≤ L
2p
Q: How many Oracle
class does the method
need?
Q: How to pick p to
guarantee  solution?
14/18
Final Complexity
to find  solution, we need
L
2p
≤  ⇒ p =

L
2

+ 1
Analytical Complexity
Q: How many calls of zero-order oracle do we need?
15/18
Final Complexity
to find  solution, we need
L
2p
≤  ⇒ p =

L
2

+ 1
Analytical Complexity
Q: How many calls of zero-order oracle do we need?
A: We need 
L
2

+ 1
d
zero-order oracle calls
15/18
Final Complexity
to find  solution, we need
L
2p
≤  ⇒ p =

L
2

+ 1
Analytical Complexity
Q: How many calls of zero-order oracle do we need?
A: We need 
L
2

+ 1
d
zero-order oracle calls
Q: Is this also the worst-case behaviour (lower-bound) OR we are just using ”very
naı̈ve” algorithm?
15/18
Lower-Bound and Computational Need for Tiny Problem
Lower-Bound
We can build a L-Lipchitz function that requires any method to explore ( L
2 )d points before it
can identify  solution.
Example
Assume L = 2, d = 10 and  = 0.01
If we change d to d + 1, then the
estimate is multiplied by one
hundred
if we multiply  by two, we
reduce the complexity by a
factor of a thousand
if  = 8%, then we need only
two weeks
16/18
Conclusion
a simple example above shows that optimization in hard!
Q: What can save us?
17/18
Conclusion
a simple example above shows that optimization in hard!
Q: What can save us?
we can assume some special properties of the problems
use different oracle (e.g., use gradients)
17/18
Bibliography
Yurii Nesterov et al.
Lectures on convex optimization, volume 137.
Springer, 2018.
Thanks also to Prof. Martin Jaggi and Prof. Mark Schmidt for their slides and lectures and
[N+18].
18/18
mbzuai.ac.ae
Mohamed bin Zayed
University of Artificial Intelligence
Masdar City
Abu Dhabi
United Arab Emirates

More Related Content

Similar to Introduction to optimizxation

Linear programming in computational geometry
Linear programming in computational geometryLinear programming in computational geometry
Linear programming in computational geometrySubhashis Hazarika
 
NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING karishma gupta
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdfAnaNeacsu5
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningKhang Pham
 
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...PadmaGadiyar
 
Solving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docxSolving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docxwhitneyleman54422
 
Theory of Computation Introduction Session
Theory of Computation Introduction SessionTheory of Computation Introduction Session
Theory of Computation Introduction SessionRushabh2428
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptxKokilaK25
 
Linear programming in computational geometry
Linear programming in computational geometryLinear programming in computational geometry
Linear programming in computational geometryhsubhashis
 
Ch3(1).pptxbbbbbbbbbbbbbbbbbbbhhhhhhhhhh
Ch3(1).pptxbbbbbbbbbbbbbbbbbbbhhhhhhhhhhCh3(1).pptxbbbbbbbbbbbbbbbbbbbhhhhhhhhhh
Ch3(1).pptxbbbbbbbbbbbbbbbbbbbhhhhhhhhhhdanielgetachew0922
 
Regret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationRegret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationTasuku Soma
 
Convex optmization in communications
Convex optmization in communicationsConvex optmization in communications
Convex optmization in communicationsDeepshika Reddy
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9M S Prasad
 
Kk20503 1 introduction
Kk20503 1 introductionKk20503 1 introduction
Kk20503 1 introductionLow Ying Hao
 

Similar to Introduction to optimizxation (20)

Linear programming in computational geometry
Linear programming in computational geometryLinear programming in computational geometry
Linear programming in computational geometry
 
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
 
Algorithms DM
Algorithms DMAlgorithms DM
Algorithms DM
 
NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING NON LINEAR PROGRAMMING
NON LINEAR PROGRAMMING
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdf
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
 
Approximation
ApproximationApproximation
Approximation
 
Solving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docxSolving Optimization Problems using the Matlab Optimization.docx
Solving Optimization Problems using the Matlab Optimization.docx
 
Theory of Computation Introduction Session
Theory of Computation Introduction SessionTheory of Computation Introduction Session
Theory of Computation Introduction Session
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
Linear programming in computational geometry
Linear programming in computational geometryLinear programming in computational geometry
Linear programming in computational geometry
 
L20 Simplex Method
L20 Simplex MethodL20 Simplex Method
L20 Simplex Method
 
Ch3(1).pptxbbbbbbbbbbbbbbbbbbbhhhhhhhhhh
Ch3(1).pptxbbbbbbbbbbbbbbbbbbbhhhhhhhhhhCh3(1).pptxbbbbbbbbbbbbbbbbbbbhhhhhhhhhh
Ch3(1).pptxbbbbbbbbbbbbbbbbbbbhhhhhhhhhh
 
AppsDiff3c.pdf
AppsDiff3c.pdfAppsDiff3c.pdf
AppsDiff3c.pdf
 
Regret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationRegret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function Maximization
 
Convex optmization in communications
Convex optmization in communicationsConvex optmization in communications
Convex optmization in communications
 
Lp and ip programming cp 9
Lp and ip programming cp 9Lp and ip programming cp 9
Lp and ip programming cp 9
 
Kk20503 1 introduction
Kk20503 1 introductionKk20503 1 introduction
Kk20503 1 introduction
 
OI.ppt
OI.pptOI.ppt
OI.ppt
 

Recently uploaded

AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 

Recently uploaded (20)

ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 

Introduction to optimizxation

  • 3. Optimization General optimization problem minimizex∈Rd f(x) with x ∈ X ⊆ Rd candidate solutions, variables, parameters x ∈ Rd objective function f : Rd → R typically: technical assumption: f is continuous and differentiable 1/18
  • 4. Optimization General optimization problem minimizex∈Rd f(x) with x ∈ X ⊆ Rd candidate solutions, variables, parameters x ∈ Rd objective function f : Rd → R typically: technical assumption: f is continuous and differentiable Q: Is this problem easy? OR When is this easy? 1/18
  • 5. Optimization General optimization problem minimizex∈Rd f(x) with x ∈ X ⊆ Rd candidate solutions, variables, parameters x ∈ Rd objective function f : Rd → R typically: technical assumption: f is continuous and differentiable Q: Is this problem easy? OR When is this easy? Q: How to find the best solution (optimal solution)? 1/18
  • 6. Questions minimizex∈Rd f(x) with x ∈ X ⊆ Rd Q: why do we study optimization? Q: what are you hoping to learn? 2/18
  • 7. Why? And How? Optimization is everywhere machine learning, big data, statistics, data analysis of all kinds, finance, logistics, planning, control theory, mathematics, search engines, simulations, and many other applications ... Mathematical Modeling: defining & modeling the optimization problem Computational Optimization: running an (appropriate) optimization algorithm 3/18
  • 8. Optimization for Machine Learning Mathematical Modeling: defining & and measuring the machine learning model Computational Optimization: learning the model parameters Theory vs. practice: libraries are available, algorithms treated as “black box” by most practitioners 4/18
  • 9. Optimization for Machine Learning Mathematical Modeling: defining & and measuring the machine learning model Computational Optimization: learning the model parameters Theory vs. practice: libraries are available, algorithms treated as “black box” by most practitioners ”.... just use Adam....” 4/18
  • 10. Optimization for Machine Learning Mathematical Modeling: defining & and measuring the machine learning model Computational Optimization: learning the model parameters Theory vs. practice: libraries are available, algorithms treated as “black box” by most practitioners ”.... just use Adam....” Not here: we look inside the algorithms and try to understand why and how fast they work! 4/18
  • 11. Optimization Algorithms Optimization at large scale: simplicity rules! In special cases (f is smooth, X = Rd) we have ”basic”/”simple” algorithms: Gradient Descent Stochastic Gradient Descent (SGD) Coordinate Descent 5/18
  • 12. Optimization Algorithms Optimization at large scale: simplicity rules! In special cases (f is smooth, X = Rd) we have ”basic”/”simple” algorithms: Gradient Descent Stochastic Gradient Descent (SGD) Coordinate Descent History: 1847: Cauchy proposes gradient descent 1950s: Linear Programs, soon followed by non-linear, SGD 1980s: General optimization, convergence theory 2005-today: Large scale optimization, convergence of SGD 5/18
  • 13. Example: Coordinate Descent Goal: Find x? ∈ Rd minimizing f(x). (Example: d = 2) x? x1 x2 Idea: Update one coordinate at a time, while keeping others fixed. 6/18
  • 14. Example: Coordinate Descent Goal: Find x? ∈ Rd minimizing f(x). x? x1 x2 Idea: Update one coordinate at a time, while keeping others fixed. 7/18
  • 15. Example: Coordinate Descent Goal: Find x? ∈ Rd minimizing f(x). x? x1 x2 Idea: Update one coordinate at a time, while keeping others fixed. Q: How to pick coordinate direction? How to find out how far to go? Does it always 7/18
  • 17. Definitions minimizex∈Rd f(x) with x ∈ X ⊆ Rd P is an optimization problem (from class of problems P ∈ P) Oracle O answers questions for some optimization method M Q: What kind of questions we would need to be answered? 9/18
  • 18. Definitions minimizex∈Rd f(x) with x ∈ X ⊆ Rd P is an optimization problem (from class of problems P ∈ P) Oracle O answers questions for some optimization method M Q: What kind of questions we would need to be answered? A: what is f(x) for given x? Is the x ∈ X? can we compute ∇f(x) and what is it? The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P 9/18
  • 19. Questions The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P ... to solve the problem.... Q: what does it mean? 10/18
  • 20. Questions The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P ... to solve the problem.... Q: what does it mean? Example: minx 1 2x2 and M be such that given x, it returns x − x 2 . Q: Will we even solve the problem? 10/18
  • 21. Questions The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P ... to solve the problem.... Q: what does it mean? Example: minx 1 2x2 and M be such that given x, it returns x − x 2 . Q: Will we even solve the problem? Approximate solution to P in many areas of numerical analysis, it is impossible to find exact solution 10/18
  • 22. Questions The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P ... to solve the problem.... Q: what does it mean? Example: minx 1 2x2 and M be such that given x, it returns x − x 2 . Q: Will we even solve the problem? Approximate solution to P in many areas of numerical analysis, it is impossible to find exact solution relaxed goal: find an approximate solution to P with some accuracy 0! 10/18
  • 23. Questions The performance The performance of M on P is the total amount of computational effort required by method M to solve the problem P ... to solve the problem.... Q: what does it mean? Example: minx 1 2x2 and M be such that given x, it returns x − x 2 . Q: Will we even solve the problem? Approximate solution to P in many areas of numerical analysis, it is impossible to find exact solution relaxed goal: find an approximate solution to P with some accuracy 0! let T be some termination criteria 10/18
  • 24. Complexity of General Iterative Scheme [N+ 18] Analytical complexity number of calls of the oracle necessary to solve problem P to accuracy Arithmetical complexity total number of arithmetic operations (including the work of oracle and work of method) which is necessary for solving problem P up to accuracy 11/18
  • 25. Standard Oracles Zero-order oracle returns the function value f(x) First-order oracle returns the function value f(x), ∇f(x) Second-order oracle returns the function value f(x), ∇f(x), ∇2f(x) 12/18
  • 26. Complexity Bounds for Global Optimization Assume a simple problem min x∈Bd f(x) where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1} Q: Can we find the 0 solutions? How many times do we need to call zero-order oracle O? 13/18
  • 27. Complexity Bounds for Global Optimization Assume a simple problem min x∈Bd f(x) where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1} Q: Can we find the 0 solutions? How many times do we need to call zero-order oracle O? We need some assumptions on f to derive some complexity bounds + we need an algorithm! 13/18
  • 28. Complexity Bounds for Global Optimization Assume a simple problem min x∈Bd f(x) where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1} Q: Can we find the 0 solutions? How many times do we need to call zero-order oracle O? We need some assumptions on f to derive some complexity bounds + we need an algorithm! Lipschitz Continuity of f The f : Rd → R is Lipschitz continuous on Bd: |f(x) − f(y)| ≤ Lkx − yk∞ ∀x, y ∈ Bd Q: How can it help us? 13/18
  • 29. Complexity Bounds for Global Optimization Assume a simple problem min x∈Bd f(x) where Bd = {x ∈ Rd : ∀i : 0 ≤ xi ≤ 1} Q: Can we find the 0 solutions? How many times do we need to call zero-order oracle O? We need some assumptions on f to derive some complexity bounds + we need an algorithm! Lipschitz Continuity of f The f : Rd → R is Lipschitz continuous on Bd: |f(x) − f(y)| ≤ Lkx − yk∞ ∀x, y ∈ Bd Q: How can it help us? A: Assume we split Bd into small grid points. Let ∆ is the size of the grid. If we return the ”best” grid point, what has to be ∆ to guarantee solutions? 13/18
  • 30. Uniform Grid Method [N+ 18] note that Nesterov uses n for dimension of the problem any two neighboring points x, y in the grid have kx − yk∞ ≤ 1 p for x∗, there is a grid point x̄ such that kx∗ − x̄k∞ ≤ 1 p 14/18
  • 31. Uniform Grid Method [N+ 18] note that Nesterov uses n for dimension of the problem any two neighboring points x, y in the grid have kx − yk∞ ≤ 1 p for x∗, there is a grid point x̄ such that kx∗ − x̄k∞ ≤ 1 p |f(x̄) − f(x∗)| ≤ Lkx̄ − x∗k∞ ≤ L 2p Q: How many Oracle class does the method need? Q: How to pick p to guarantee solution? 14/18
  • 32. Final Complexity to find solution, we need L 2p ≤ ⇒ p = L 2 + 1 Analytical Complexity Q: How many calls of zero-order oracle do we need? 15/18
  • 33. Final Complexity to find solution, we need L 2p ≤ ⇒ p = L 2 + 1 Analytical Complexity Q: How many calls of zero-order oracle do we need? A: We need L 2 + 1 d zero-order oracle calls 15/18
  • 34. Final Complexity to find solution, we need L 2p ≤ ⇒ p = L 2 + 1 Analytical Complexity Q: How many calls of zero-order oracle do we need? A: We need L 2 + 1 d zero-order oracle calls Q: Is this also the worst-case behaviour (lower-bound) OR we are just using ”very naı̈ve” algorithm? 15/18
  • 35. Lower-Bound and Computational Need for Tiny Problem Lower-Bound We can build a L-Lipchitz function that requires any method to explore ( L 2 )d points before it can identify solution. Example Assume L = 2, d = 10 and = 0.01 If we change d to d + 1, then the estimate is multiplied by one hundred if we multiply by two, we reduce the complexity by a factor of a thousand if = 8%, then we need only two weeks 16/18
  • 36. Conclusion a simple example above shows that optimization in hard! Q: What can save us? 17/18
  • 37. Conclusion a simple example above shows that optimization in hard! Q: What can save us? we can assume some special properties of the problems use different oracle (e.g., use gradients) 17/18
  • 38. Bibliography Yurii Nesterov et al. Lectures on convex optimization, volume 137. Springer, 2018. Thanks also to Prof. Martin Jaggi and Prof. Mark Schmidt for their slides and lectures and [N+18]. 18/18
  • 39. mbzuai.ac.ae Mohamed bin Zayed University of Artificial Intelligence Masdar City Abu Dhabi United Arab Emirates