Quasi newton method

Kyeongmin Woo
Quasi Newton Method

1
Iterative Methods
for Continuous
Unconstrainted
Optimization

Continuous Unconstrainted
Optimization
목표는 f(x)를 최소화하는 optimal point (solution) x*를 찾는 것
f(x) : Continuous Objective Function → Continuous
제약조건식이 없다 → Unconstrainted

Iterative Methods
다음 수식과 같이 반복적으로 x 를 업데이트하며 solution을 찾는
방법을 Iterative Method 라고 한다.
Gradient Descent
Newton’s Method
Quasi Newton Method

Iterative Methods
다음 수식과 같이 반복적으로 x 를 업데이트하며 solution을 찾는
방법을 Iterative Method 라고 한다.
Gradient Descent
Newton’s Method
Quasi Newton Method
Update Direction
Update Direction의 차이!

Iterative Methods
Gradient Descent
Newton’s Method
Quasi Newton Method
Update Direction

Gradient Descent
현재 위치 X에서의 Gradient 방향으로 업데이트가 이뤄진다.
Gradient만 구하면 되므로 First-Order Method에 속한다.

Gradient Descent
Objective Function f의 Contour와 직교하는 방향으로 Update
Direction이 결정된다.
현재 위치에서 기울기가 가장 가파른
방향으로 Update 되므로 Steepest
Descent Method라고도 한다.
Numerical Optimization 2nd
, p.21

Gradient Descent
Objective Function f의 Contour와 직교하는 방향으로 Update
Direction이 결정된다.
현재 위치에서 기울기가 가장 가파른
방향으로 Update 되므로 Steepest
Descent Method라고도 한다.
, p.21
Orthogonal to the contour at x!

Gradient Descent
현재 지점에서 가장 가파른 방향이라고 하여 반드시 Optimal에 가장
빠르게 가까워지는 방향은 아니다.
, p.21

Gradient Descent
현재 지점에서 가장 가파른 방향이라고 하여 반드시 Optimal에 가장
빠르게 가까워지는 방향은 아니다.
, p.21
1. Update Direction이 Descent
Direction이고,
2. Step size alpha가 충분히 작다면
Optimal로의 수렴성이 보장된다.

Newton’s Method
Gradient Descent와 달리 Hessian을 필요로 한다.
Second-order Method에 속한다.
을 Newton Direction 이라고 한다.

Newton’s Method
는 Taylor 2차 근사를 통해 구해진다.
좌변을 극소화하는 vector p 를 찾고 싶다면, 2차 함수이므로
우변에서 p에 대한 미분값이 0이 되는 지점을 찾으면 된다.

Newton’s Method
Newton’s Method의 장점
step size를 고려하지 않아도 된다.
Quadratic Convergence Rate
Gradient Descent보다 빠르다.

Newton’s Method
Newton’s Method의 단점
Hessian이 너무 비싸다.
Non-Convex 문제에 적용 어려움
Time complexity: O(n³)
Space complexity: O(n²)
Hessian이 Positive Definite해야 보장된다.

Newton’s Method
Newton’s Method의 단점
조금 더 효율적으로 할 수 없을까?
→ 근사하여 대체해볼까?
→ Quasi Newton Method
Hessian이 너무 비싸다.
Time complexity: O(n³)
Space complexity: O(n²)
Non-Convex 문제에 적용 어려움
Hessian이 Positive Definite해야 보장된다.

Quasi Newton Method
Goal of Quasi Newton Method
적은 연산량으로 Hessian Inverse를 근사하여 사용할 수 없을까?
Key Idea of Quasi Newton Method
Hessian의 속성을 최대한 유지하도록 하면서
반복적(recursive)으로 업데이트하면 가능하지 않을까?

Quasi Newton Method
Quasi Newton Method의 업데이트 식
한 번의 업데이트에 Gradient만 있으면 된다.
여기서 H는 Hessian Inverse에 대한 근사 행렬이다.
H 또한 반복적으로 업데이트하게 된다.

Quasi Newton Method
Algorithm
반복적으로 업데이트

How to approximate Hessian
H: Hessian Inverse를 근사하는 행렬
1. Hessian Inverse는 Symmetric Matrix 이다.
2. Hessian은 Gradient의 Derivative 이다.
3. Hessian이 Positive Definite 해야 Descending Direction이
보장된다.
Hessian Inverse가 가지고 있는 속성들
H Matrix가 Hessian Inverse와 유사한 속성을 가지도록 해야 한다.

보장된다.
Hessian Inverse가 가지고 있는 속성들
Hessian은 Symmetric Matrix 이다.
+ Symmetric Matrix의 Inverse도
Symmetric 하다.
Positive Definite Matrix의 Inverse도 Positive
Definite하다.

Taylor Series와 Hessian
식을 정리해보자

Taylor Series와 Hessian
식을 정리해보자: Secant Equation

Secant Equation
아래 식은
다음과 같이 Hessian Inverse에 대한 식으로도 표현 가능하다.

3. Hessian이 Positive Definite 해야 유일한 해를 가진다.
Modified Newton Method
어떤 방향 d로 업데이트 한다고 할 때, 그 방향이 descent
direction이라 하기 위해서는 다음을 만족해야 한다.
Newton Method의 search direction은 다음과 같다.

Modified Newton Method의 수렴성 증명
이라는 것은 alpha가 커질 때, 값이 작아진다는 것을
의미한다.

Modified Newton Method의 수렴성 증명
이라는 것은 alpha가 커질 때, 값이 작아진다는 것을
의미한다.
Hessian Inverse가 Positive Definite 해야 적용
가능하다.

보장된다.
Hessian Inverse가 가지고 있는 속성들 → 근사할 때 고려해야 할 것들

3.
Secant Equation
Positive Definite

2.
3.
Secant Equation
Positive Definite

3
Algorithms for
Quasi Newton Method

Algorithms for Quasi Newton
Method
Single-Rank Symmetric(SRS) algorithm
Davidon–Fletcher–Powell(DFP) Algorithm
Broyden, Fletcher, Goldfarb, and Shanno (BFGS) Algorithm

Quasi Newton Method
Algorithm
반복적으로 업데이트
알고리즘에 따라 업데이트 방식에 차이

Notations
Hessian Inverse의 세 조건과 관련된 수식이 반복적으로 등장
Rewrite Secant Equation

Single-Rank Symmetric(SRS)
algorithm
Symmetric Matrix를 더해주어 업데이트 하는 알고리즘
Rank one Correction formular 를 사용하는 방법
Symmetric Matrix를 더해주기 때문에 Symmetricity 가 유지된다.

algorithm
Symmetric Matrix를 더해주어 업데이트 하는 알고리즘
Rank one Correction formular 를 사용하는 방법
Symmetric Matrix를 더해주기 때문에 Symmetricity 가 유지된다.
Vector z의 외적
* 동일 vector를 서로 외적하면
1. Symmetric Matrix
2. Rank 1 Matrix

algorithm
Symmetricity는 유지되므로, 두 번째 secant equation 의 성질도
유지할 수 있도록 alpha와 z를 우리가 알고 있는 H, g, x로
표현해보자.
양변에 Δg 곱하기
secant equation 적용
: scalar

algorithm
모든 Term을 H, g, x 로 표현!

algorithm
SRS의 한계
1. H가 Positive Definite 하지 않을 수 있다.
따라서 Descent Direction이 항상 보장되지는 않는다.
2. 해에 가까워질수록 분모 가 0에 가까워져 계산에
어려움이 생길 수 있다.
Positive Definite를 유지하도록 하는 방법은 없을까 → DFP
algorithm

Davidon–Fletcher–Powell(DFP)
Algorithm
Positive Definite한 성질을 유지하는 알고리즘
Double Rank Symmetric Matrix를 더해주는 알고리즘

Algorithm
DFP에서는 가 Positive Definite하면 도 Positive
Definite 하다.
Quadratic Form으로 표현하기

Algorithm
Notations
Definite 하다.

Algorithm
Definite 하다.

Algorithm
Cauchy Schwarz
Inequality
H is positive Definite
and alpha is positive
Definite 하다.

Algorithm
H is positive Definite
and alpha is positive
Next H is also
positive Definite!
Cauchy Schwarz
Inequality
Definite 하다.

Broyden, Fletcher, Goldfarb, and
Shanno (BFGS) Algorithm
DFP의 업데이트 수식

BFGS Algorithm
Hessian Inverse를 직접 대신 Hessian을 근사하는 건 어떨까?
Secant Equation을 만족하면
된다.

BFGS Algorithm
Hessian Inverse를 직접 대신 Hessian을 근사하는 건 어떨까?
Secant Equation을 만족하면 된다.

BFGS Algorithm
DFP의 업데이트 수식 (Update Approximation of Hessian
Inverse Directly)
BFGS의 업데이트 수식 (Update Approximation of Hessian)
Interchanging the roles

BFGS Algorithm
BFGS의 업데이트 수식 (Update Approximation of Hessian
Inverse)
Hessian을 근사하여 Inverse를 추가적으로 취해줄 것이라면 이렇게 할
필요가 있을까?

BFGS Algorithm
Sherman-Morrison formula: Inverse Matrix와 외적의 합 성질
이용
BFGS의 업데이트
식

BFGS Algorithm
장점
Quasi Newton Method 로서 Conjugated Direction Property
를 가진다.
Positive Definiteness 를 가진다.
Line Search 가 필요하지 않아 효율적이다.
DFP Algorithm 보다 효율적이다.

References
Jorge Nocedal, Stephen Wright, 2006, Numerical Optimization,
Edwin Chong, Stanislaw, 2013, An Introduction to Optimization
박진우 외, 2021, 모두를 위한 컨벡스 최적화, https://convex-optimization-for-
all.github.io/

Quasi newton method

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Quasi newton method