SlideShare a Scribd company logo
1 of 48
Download to read offline
LINEAR MODEL
2016/06/08
Yagi Takayuki
REFERENCE
Pattern Recognition and Machine Learning (PRML)
chapter3,4
TABLE OF CONTENTS
1. linear regression
1.1. what is regression
1.2. linear regression
1.3. ridge regression
1.4. lasso regression
1.5. generalization
1.6. maximum likelihood estimation
1.7. MAP estimation
2. linear classification
2.1. multi-class classification
2.2. disadvandages of least squares method
we want to know a line that most fitting
WHAT IS REGRESSION
there are some of data
we want to know such a line in any situation.
-> regression analysis
WHAT IS REGRESSION
is best fittingy = 0.6147 + 1.0562x1
NOTATION
: scalarx, w, t
: vectorx, w, t
: matrixX, W, T
LINEAR MODEL
the simplest model is
y(x, w) = + + ⋯ +w0 w1 x1 wD xD
: weight parameteswi : variablesxi : the number of variablesD
FEATURE
linear with respect to
linear with respect to
model is too simple (poor expressive power)
w
x
EXTEND THE MODEL
add linear combination of the non-linear function
y(x, w) = + (x)w0 ∑M−1
j=1
wj ϕj
is number of basis functionsM − 1
called basis function(x)ϕj
called bias parameterw0
LINEAR MODEL
if we add dummy basis function( )(x) = 1ϕ0
y(x, w) = (x) = ϕ(x)∑M−1
j=0
wj ϕj w
T
w = ( , …,w0 wM−1 )
T
ϕ(x) = ( (x), …, (x)ϕ0 ϕM−1 )
T
BASIS FUNCTION
there are various choices for the basis function
polynomial basis
gaussian basis
logistic sigmoid basis
POLYNOMIAL BASIS
(x) =ϕj x
j
GAUSSIAN BASIS
(x) = exp
(
− )
ϕj
(x−μj
)
2
2s
2
LOGISTIC SIGMOID BASIS
(x) = σ( )
ϕj
x−μj
s
σ(a) =
1
1+exp (−a)
LINEAR MODEL
y(x, w) = ϕ(x)w
T
FEATURE
linear with respect to
non-linear with respect to
we can choose a favorite basis
w
x
LINEAR REGRESSION
we want to find the best
y(x, w) = ϕ(x)w
T
w
is best fittingy = 0.6147 + 1.0562x1
HOW TO REGRESSION
reducing the error
THE MAIN IDEA OF REGRESSION
Minimization of the error function
: number of data
: i-th data
: target value of i-th data
E(w) = ( ϕ( ) −1
2
∑N
i=1
w
T
x
(i)
t
(i)
)
2
N
x
(i)
t
(i)
※ called least squares method
※
E(w)minw
called sum-of-squares errorE(w)
LINEAR REGRESSION
we want to minimize the error function
E(w)minw
E(w) = ( ϕ( ) − = ||Φw − t|
1
2
∑N
i=1
w
T
x
(i)
t
(i)
)
2 1
2
|
2
Φ = (ϕ( ), ϕ( ), …, ϕ( )x
(1)
x
(2)
x
(N)
)
T
t = ( , , …,t
(1)
t
(2)
t
(N)
)
T
LINEAR REGRESSION
E(w) = ||Φw − t|
1
2
|
2
partial differential in w
= (Φw − t)
∂E(w)
∂w
Φ
T
put with = 0
E(∂w)
∂w
Φw = tΦ
T
Φ
T
∴ w = ( Φ tΦ
T
)
−1
Φ
T
implementation
REDGE REGRESSIN
E(w)minw
E(w) = ||Φw − t| + ||w|
1
2
|
2 λ
2
|
2
※ called L2 regularization term||w||
2
REDGE REGRESSION
E(w) = ||Φw − t| + ||w|
1
2
|
2 λ
2
|
2
= (Φw − t) + λw = 0
E(∂w)
∂w
Φ
T
( Φ − λI)w = tΦ
T
Φ
T
∴ w = ( Φ − λI tΦ
T
)
−1
Φ
T
implementation
LASSO REGRESSION
E(w) = ||Φw − t| + |w|
1
2
|
2 λ
2
∑M−1
j=1
※ called L1 regularization term|w|
LASSO REGRESSION
not be solved analytically (nondifferentiable)
solved by coordinate descent
perform variable selection(some of the parameters to 0)
implementation
GENERALIZATION
general of the redge and lasso
E(w) = ||Φt − t| + |w
1
2
|
2 λ
2
∑M−1
j=1
|
q
RE-EXPRESSION
is equal to
s.t.
(
||Φw − t| + |w
)
minw
1
2
|
2 λ
2
∑M
j=1
|
q
||Φw − t|minw
1
2
|
2
| ≤ η∑M
j=1
wj |
q
※ is calculated from lagrange multiplier methodη
IMAGE
le : redge regression
right : lasso regression
PROOF
s.t.( ϕ( ) −minw
1
2
∑N
i=1
w
T
x
(i)
t
(i)
)
2
| ≤ η∑M
j=1
wj |
q
by using the Lagrange multiplier method
L(w, λ) = ( ϕ( ) − + ( | − η)
1
2
∑N
i=1
w
T
x
(i)
t
(i)
)
2 λ
2
∑M
j=1
wj |
q
by using KKT conditions
= | − η∂L(w,λ)
∂λ ∑M
j=1
wj |
q
∴ | = η∑M
j=1
w
∗
j
|
q
MAXIMIZE LIKELIHOOD AND SUM-OF-SQUARES ERROR
we think is represented by sum of and Gaussian noiset y(x, w)
t = y(x, w) + ε
MAXIMIZE LIKELIHOOD AND SUM-OF-SQUARES ERROR
p(t|x, w, β) = N(t|y(x, w), )β−1
N(x|μ, ) = exp
(
− (x − μ )
σ2 1
(2πσ2
)
1/2
1
2σ2
)
2
LIKELIHOOD
p(t|x, w, β) = N( | ϕ( ), )∏N
n=1
tn w
T
xn β−1
※ t = ( , , …,t1 t2 tN )
T
MAXIMIZE LIKELIHOOD
ln p(t|x, w, β) = ln N( | ϕ( ), )∑N
n=1
tn w
T
xn β−1
ln p(t|x, w, β) = − ( − ϕ( ) + ln β − ln(2
β
2
∑N
n=1
tn w
T
xn )
2 N
2
N
2
therefore, maximize likelifood is equal to minimize sum-of-squares error
FORMULA DEFORMATION
p(t|X, w, β) = N( | ϕ( ), )∏N
n=1
tn w
T
xn β−1
p(t|X, w, β) =
(
exp
(
− ( ϕ( ) − )
∏N
n=1
β
2π )
1/2
β
2
w
T
x
(i)
t
(i)
)
2
ln p(t|X, w, β) = − ( − ϕ( ) + ln β − ln(
β
2
∑N
n=1
tn w
T
xn )
2 N
2
N
2
MAP ESTIMATION AND L2 REGULARIZATION
we add the prior distribution.
by using the Bayes theorem
p(w|x, t, α, β) ∝ p(t|x, w, β)p(w|α)
PRIOR DISTRIBUTION
Given this prior distribution
because calculation is easy.
p(w|α) = N(w|0, I) =
(
exp(− w)α−1 α
2π )
(M+1)/2
α
2
w
T
MAP ESTIMATION
is equal to
p(w|x, t, α, β)maxw
( ||Φw − t| + ||w| )minw
1
2
|
2 λ
2
|
2
p(w|x, t, α, β) ∝ p(t|x, w, β)p(w|α)
p(t|x, w, β) = N( | ϕ( ), )∏N
n=1
tn w
T
xn β−1
p(w|α) = N(w|0, I) =
(
exp(− w)α−1 α
2π )
(M+1)/2
α
2
w
T
so, MAP estimation is equal to redge regression
FORMULA DEFORMATION
p(t|x, w, α, β) =
((
exp
(
− ( ϕ( ) − ))(
exp(− w)∏N
n=1
β
2π )
1/2
β
2
w
T
x
(i)
t
(i)
)
2 α
2π )
(M+1)/2
α
2
w
T
ln p(t|x, w, β) = − ( − ϕ( ) + ln β − ln(2π) + ln α − ln(2π) −
β
2
∑N
n=1
tn w
T
xn )
2 N
2
N
2
M+1
2
M+1
2
α
2
w
T
SUMMARY OF LINEAR REGRESSION
I introduced some of the linear regression model
I showed maximize likelifood is equal to minimize sum-of-
squares error
I showed MAP estimation is equal to redge regression
LINEAR CLASSIFICATION
we think K-class (K>2) classification
so, we prepare K linear models
(x) = ϕ(x)yk w
T
k
y(x) = xW˜
T
= ( , , …, )W˜ w1 w2 wK
y(x) = ( (x), (x), …, (x)y1 y2 yK )
T
1-OF-K CODING
we prepare vector p
(if is class )
(if is not class )
p = ( , , …,c1 c2 cK )
T
= 1ci x i
= 0ci x i
ex) p = (0, 0, …, 0, 1, 0, …, 0)
T
E( )W˜
∂E( W)˜
∂W˜
= ||y( ) − |
1
2 ∑
i=1
N
x
(i)
p
(i)
|
2
= || ϕ( ) − |
1
2 ∑
i=1
N
W˜
T
x
(i)
p
(i)
|
2
= ( ϕ( ) − ( ϕ( ) − )
1
2 ∑
i=1
N
W˜
T
x
(i)
p
(i)
)
T
W˜
T
x
(i)
p
(i)
=
(
ϕ( ϕ( ) − 2ϕ( + || ||
)
1
2 ∑
i=1
N
x
(i)
)
T
W˜W˜ x
(i)
x
(i)
)
T
W˜p
(i)
p
(i)
=
(
ϕ( )ϕ( − ϕ( )
)∑
i=1
N
x
(i)
x
(i)
)
T
W˜ x
(i)
p
(i)
T
LEAST-SQUARES METHOD
LEAST-SQUARES METHOD
by
= X − P
∂E( )W˜
∂W˜
X
T
W˜ X
T
= 0
∂E( )W˜
∂W˜
X = PX
T
W˜ X
T
∴ = ( X PW˜ X
T
)
−1
X
T
X
P
= (ϕ( ), ϕ( ), …, ϕ( )x
(1)
x
(2)
x
(N)
)
T
= ( , , …, )p
(i)
p
(2)
p
(N)
WEAK TO OUTLIERS
red : least square method
green : logistic regression
ASSUMING A NORMAL DISTRIBUTION
le : least square method
right : logistic regression
DISADVANTAGES OF LEAST-SQUARES
METHOD
not handle the label as the probability
weak to outliers
assuming a normal distribution
if data does not follow the normal distribution, bad
result
we shoud not use least-squares method in classification
problem
SUMMARY
I introduced linear model (regression, classification)
there are the basis for some of other machine learning
model
PRML is difficult for me, but I want to continue reading
thank you

More Related Content

What's hot

Integration
IntegrationIntegration
Integrationsuefee
 
Local linear approximation
Local linear approximationLocal linear approximation
Local linear approximationTarun Gehlot
 
Common derivatives integrals
Common derivatives integralsCommon derivatives integrals
Common derivatives integralsolziich
 
numericai matmatic matlab uygulamalar ali abdullah
numericai matmatic  matlab  uygulamalar ali abdullahnumericai matmatic  matlab  uygulamalar ali abdullah
numericai matmatic matlab uygulamalar ali abdullahAli Abdullah
 
Interpolation and Extrapolation
Interpolation and ExtrapolationInterpolation and Extrapolation
Interpolation and ExtrapolationVNRacademy
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learningAlexander Novikov
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theoremJamesMa54
 
Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...JamesMa54
 
Integral Calculus
Integral CalculusIntegral Calculus
Integral Calculusitutor
 
Lecture 11 systems of nonlinear equations
Lecture 11 systems of nonlinear equationsLecture 11 systems of nonlinear equations
Lecture 11 systems of nonlinear equationsHazel Joy Chong
 
NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)krishnapriya R
 
Inner Product Space
Inner Product SpaceInner Product Space
Inner Product SpacePatel Raj
 

What's hot (20)

11365.integral 2
11365.integral 211365.integral 2
11365.integral 2
 
exponen dan logaritma
exponen dan logaritmaexponen dan logaritma
exponen dan logaritma
 
Integration
IntegrationIntegration
Integration
 
Local linear approximation
Local linear approximationLocal linear approximation
Local linear approximation
 
Common derivatives integrals
Common derivatives integralsCommon derivatives integrals
Common derivatives integrals
 
Reduction forumla
Reduction forumlaReduction forumla
Reduction forumla
 
Interpolation
InterpolationInterpolation
Interpolation
 
numericai matmatic matlab uygulamalar ali abdullah
numericai matmatic  matlab  uygulamalar ali abdullahnumericai matmatic  matlab  uygulamalar ali abdullah
numericai matmatic matlab uygulamalar ali abdullah
 
Interpolation and Extrapolation
Interpolation and ExtrapolationInterpolation and Extrapolation
Interpolation and Extrapolation
 
Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learning
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theorem
 
Jacobi method
Jacobi methodJacobi method
Jacobi method
 
HERMITE SERIES
HERMITE SERIESHERMITE SERIES
HERMITE SERIES
 
Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...
 
NUMERICAL METHODS
NUMERICAL METHODSNUMERICAL METHODS
NUMERICAL METHODS
 
Integral Calculus
Integral CalculusIntegral Calculus
Integral Calculus
 
Lecture 11 systems of nonlinear equations
Lecture 11 systems of nonlinear equationsLecture 11 systems of nonlinear equations
Lecture 11 systems of nonlinear equations
 
Numerical Methods Solving Linear Equations
Numerical Methods Solving Linear EquationsNumerical Methods Solving Linear Equations
Numerical Methods Solving Linear Equations
 
NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)NUMERICAL METHODS -Iterative methods(indirect method)
NUMERICAL METHODS -Iterative methods(indirect method)
 
Inner Product Space
Inner Product SpaceInner Product Space
Inner Product Space
 

Viewers also liked

Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Stefan Kühn
 
線形識別モデル
線形識別モデル線形識別モデル
線形識別モデル貴之 八木
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNEDavid Khosid
 
混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム貴之 八木
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
 
トピックモデル
トピックモデルトピックモデル
トピックモデル貴之 八木
 
自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試みtm_2648
 
11 ak45b5 5
11 ak45b5 511 ak45b5 5
11 ak45b5 5crom68
 
自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭Yuya Unno
 
word2vec - From theory to practice
word2vec - From theory to practiceword2vec - From theory to practice
word2vec - From theory to practicehen_drik
 
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所Ogushi Masaya
 
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)STAIR Lab, Chiba Institute of Technology
 
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料tmprcd12345
 
fastTextの実装を見てみた
fastTextの実装を見てみたfastTextの実装を見てみた
fastTextの実装を見てみたYoshihiko Shiraki
 
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心にtmprcd12345
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNETomoki Hayashi
 
Chainerの使い方と 自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と 自然言語処理への応用Yuya Unno
 

Viewers also liked (20)

Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Data Visualization at codetalks 2016
Data Visualization at codetalks 2016
 
最適腕識別
最適腕識別最適腕識別
最適腕識別
 
線形識別モデル
線形識別モデル線形識別モデル
線形識別モデル
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
主成分分析
主成分分析主成分分析
主成分分析
 
トピックモデル
トピックモデルトピックモデル
トピックモデル
 
自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み
 
t-SNE
t-SNEt-SNE
t-SNE
 
11 ak45b5 5
11 ak45b5 511 ak45b5 5
11 ak45b5 5
 
自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭
 
word2vec - From theory to practice
word2vec - From theory to practiceword2vec - From theory to practice
word2vec - From theory to practice
 
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所
 
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
 
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
 
fastTextの実装を見てみた
fastTextの実装を見てみたfastTextの実装を見てみた
fastTextの実装を見てみた
 
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
Chainerの使い方と 自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と 自然言語処理への応用
 

Similar to 線形回帰モデル

Unit 1 Operation on signals
Unit 1  Operation on signalsUnit 1  Operation on signals
Unit 1 Operation on signalsDr.SHANTHI K.G
 
University of manchester mathematical formula tables
University of manchester mathematical formula tablesUniversity of manchester mathematical formula tables
University of manchester mathematical formula tablesGaurav Vasani
 
Hermite integrators and 2-parameter subgroup of Riordan group
Hermite integrators and 2-parameter subgroup of Riordan groupHermite integrators and 2-parameter subgroup of Riordan group
Hermite integrators and 2-parameter subgroup of Riordan groupKeigo Nitadori
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAn Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAlex (Oleksiy) Varfolomiyev
 
Admissions in India 2015
Admissions in India 2015Admissions in India 2015
Admissions in India 2015Edhole.com
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Alexander Litvinenko
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Gabriel Peyré
 
Linear Regression
Linear RegressionLinear Regression
Linear RegressionVARUN KUMAR
 
Mathematical formula tables
Mathematical formula tablesMathematical formula tables
Mathematical formula tablesSaravana Selvan
 
Discrete Signal Processing
Discrete Signal ProcessingDiscrete Signal Processing
Discrete Signal Processingmargretrosy
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES
 

Similar to 線形回帰モデル (20)

5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Unit 1 Operation on signals
Unit 1  Operation on signalsUnit 1  Operation on signals
Unit 1 Operation on signals
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
University of manchester mathematical formula tables
University of manchester mathematical formula tablesUniversity of manchester mathematical formula tables
University of manchester mathematical formula tables
 
Hermite integrators and 2-parameter subgroup of Riordan group
Hermite integrators and 2-parameter subgroup of Riordan groupHermite integrators and 2-parameter subgroup of Riordan group
Hermite integrators and 2-parameter subgroup of Riordan group
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
Adaline and Madaline.ppt
Adaline and Madaline.pptAdaline and Madaline.ppt
Adaline and Madaline.ppt
 
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface ProblemsAn Efficient Boundary Integral Method for Stiff Fluid Interface Problems
An Efficient Boundary Integral Method for Stiff Fluid Interface Problems
 
NODDEA2012_VANKOVA
NODDEA2012_VANKOVANODDEA2012_VANKOVA
NODDEA2012_VANKOVA
 
Admissions in India 2015
Admissions in India 2015Admissions in India 2015
Admissions in India 2015
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics
 
Section4 stochastic
Section4 stochasticSection4 stochastic
Section4 stochastic
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Quadratic Function.pptx
Quadratic Function.pptxQuadratic Function.pptx
Quadratic Function.pptx
 
Mathematical formula tables
Mathematical formula tablesMathematical formula tables
Mathematical formula tables
 
Discrete Signal Processing
Discrete Signal ProcessingDiscrete Signal Processing
Discrete Signal Processing
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

線形回帰モデル

  • 2. REFERENCE Pattern Recognition and Machine Learning (PRML) chapter3,4
  • 3. TABLE OF CONTENTS 1. linear regression 1.1. what is regression 1.2. linear regression 1.3. ridge regression 1.4. lasso regression 1.5. generalization 1.6. maximum likelihood estimation 1.7. MAP estimation 2. linear classification 2.1. multi-class classification 2.2. disadvandages of least squares method
  • 4. we want to know a line that most fitting WHAT IS REGRESSION there are some of data
  • 5. we want to know such a line in any situation. -> regression analysis WHAT IS REGRESSION is best fittingy = 0.6147 + 1.0562x1
  • 6. NOTATION : scalarx, w, t : vectorx, w, t : matrixX, W, T
  • 7. LINEAR MODEL the simplest model is y(x, w) = + + ⋯ +w0 w1 x1 wD xD : weight parameteswi : variablesxi : the number of variablesD
  • 8. FEATURE linear with respect to linear with respect to model is too simple (poor expressive power) w x
  • 9. EXTEND THE MODEL add linear combination of the non-linear function y(x, w) = + (x)w0 ∑M−1 j=1 wj ϕj is number of basis functionsM − 1 called basis function(x)ϕj called bias parameterw0
  • 10. LINEAR MODEL if we add dummy basis function( )(x) = 1ϕ0 y(x, w) = (x) = ϕ(x)∑M−1 j=0 wj ϕj w T w = ( , …,w0 wM−1 ) T ϕ(x) = ( (x), …, (x)ϕ0 ϕM−1 ) T
  • 11. BASIS FUNCTION there are various choices for the basis function polynomial basis gaussian basis logistic sigmoid basis
  • 13. GAUSSIAN BASIS (x) = exp ( − ) ϕj (x−μj ) 2 2s 2
  • 14. LOGISTIC SIGMOID BASIS (x) = σ( ) ϕj x−μj s σ(a) = 1 1+exp (−a)
  • 15. LINEAR MODEL y(x, w) = ϕ(x)w T
  • 16. FEATURE linear with respect to non-linear with respect to we can choose a favorite basis w x
  • 17. LINEAR REGRESSION we want to find the best y(x, w) = ϕ(x)w T w is best fittingy = 0.6147 + 1.0562x1
  • 19. THE MAIN IDEA OF REGRESSION Minimization of the error function : number of data : i-th data : target value of i-th data E(w) = ( ϕ( ) −1 2 ∑N i=1 w T x (i) t (i) ) 2 N x (i) t (i) ※ called least squares method ※ E(w)minw called sum-of-squares errorE(w)
  • 20. LINEAR REGRESSION we want to minimize the error function E(w)minw E(w) = ( ϕ( ) − = ||Φw − t| 1 2 ∑N i=1 w T x (i) t (i) ) 2 1 2 | 2 Φ = (ϕ( ), ϕ( ), …, ϕ( )x (1) x (2) x (N) ) T t = ( , , …,t (1) t (2) t (N) ) T
  • 21. LINEAR REGRESSION E(w) = ||Φw − t| 1 2 | 2 partial differential in w = (Φw − t) ∂E(w) ∂w Φ T put with = 0 E(∂w) ∂w Φw = tΦ T Φ T ∴ w = ( Φ tΦ T ) −1 Φ T implementation
  • 22. REDGE REGRESSIN E(w)minw E(w) = ||Φw − t| + ||w| 1 2 | 2 λ 2 | 2 ※ called L2 regularization term||w|| 2
  • 23. REDGE REGRESSION E(w) = ||Φw − t| + ||w| 1 2 | 2 λ 2 | 2 = (Φw − t) + λw = 0 E(∂w) ∂w Φ T ( Φ − λI)w = tΦ T Φ T ∴ w = ( Φ − λI tΦ T ) −1 Φ T implementation
  • 24. LASSO REGRESSION E(w) = ||Φw − t| + |w| 1 2 | 2 λ 2 ∑M−1 j=1 ※ called L1 regularization term|w|
  • 25. LASSO REGRESSION not be solved analytically (nondifferentiable) solved by coordinate descent perform variable selection(some of the parameters to 0) implementation
  • 26. GENERALIZATION general of the redge and lasso E(w) = ||Φt − t| + |w 1 2 | 2 λ 2 ∑M−1 j=1 | q
  • 27. RE-EXPRESSION is equal to s.t. ( ||Φw − t| + |w ) minw 1 2 | 2 λ 2 ∑M j=1 | q ||Φw − t|minw 1 2 | 2 | ≤ η∑M j=1 wj | q ※ is calculated from lagrange multiplier methodη
  • 28. IMAGE le : redge regression right : lasso regression
  • 29. PROOF s.t.( ϕ( ) −minw 1 2 ∑N i=1 w T x (i) t (i) ) 2 | ≤ η∑M j=1 wj | q by using the Lagrange multiplier method L(w, λ) = ( ϕ( ) − + ( | − η) 1 2 ∑N i=1 w T x (i) t (i) ) 2 λ 2 ∑M j=1 wj | q by using KKT conditions = | − η∂L(w,λ) ∂λ ∑M j=1 wj | q ∴ | = η∑M j=1 w ∗ j | q
  • 30. MAXIMIZE LIKELIHOOD AND SUM-OF-SQUARES ERROR we think is represented by sum of and Gaussian noiset y(x, w) t = y(x, w) + ε
  • 31. MAXIMIZE LIKELIHOOD AND SUM-OF-SQUARES ERROR p(t|x, w, β) = N(t|y(x, w), )β−1 N(x|μ, ) = exp ( − (x − μ ) σ2 1 (2πσ2 ) 1/2 1 2σ2 ) 2
  • 32. LIKELIHOOD p(t|x, w, β) = N( | ϕ( ), )∏N n=1 tn w T xn β−1 ※ t = ( , , …,t1 t2 tN ) T
  • 33. MAXIMIZE LIKELIHOOD ln p(t|x, w, β) = ln N( | ϕ( ), )∑N n=1 tn w T xn β−1 ln p(t|x, w, β) = − ( − ϕ( ) + ln β − ln(2 β 2 ∑N n=1 tn w T xn ) 2 N 2 N 2 therefore, maximize likelifood is equal to minimize sum-of-squares error
  • 34. FORMULA DEFORMATION p(t|X, w, β) = N( | ϕ( ), )∏N n=1 tn w T xn β−1 p(t|X, w, β) = ( exp ( − ( ϕ( ) − ) ∏N n=1 β 2π ) 1/2 β 2 w T x (i) t (i) ) 2 ln p(t|X, w, β) = − ( − ϕ( ) + ln β − ln( β 2 ∑N n=1 tn w T xn ) 2 N 2 N 2
  • 35. MAP ESTIMATION AND L2 REGULARIZATION we add the prior distribution. by using the Bayes theorem p(w|x, t, α, β) ∝ p(t|x, w, β)p(w|α)
  • 36. PRIOR DISTRIBUTION Given this prior distribution because calculation is easy. p(w|α) = N(w|0, I) = ( exp(− w)α−1 α 2π ) (M+1)/2 α 2 w T
  • 37. MAP ESTIMATION is equal to p(w|x, t, α, β)maxw ( ||Φw − t| + ||w| )minw 1 2 | 2 λ 2 | 2 p(w|x, t, α, β) ∝ p(t|x, w, β)p(w|α) p(t|x, w, β) = N( | ϕ( ), )∏N n=1 tn w T xn β−1 p(w|α) = N(w|0, I) = ( exp(− w)α−1 α 2π ) (M+1)/2 α 2 w T so, MAP estimation is equal to redge regression
  • 38. FORMULA DEFORMATION p(t|x, w, α, β) = (( exp ( − ( ϕ( ) − ))( exp(− w)∏N n=1 β 2π ) 1/2 β 2 w T x (i) t (i) ) 2 α 2π ) (M+1)/2 α 2 w T ln p(t|x, w, β) = − ( − ϕ( ) + ln β − ln(2π) + ln α − ln(2π) − β 2 ∑N n=1 tn w T xn ) 2 N 2 N 2 M+1 2 M+1 2 α 2 w T
  • 39. SUMMARY OF LINEAR REGRESSION I introduced some of the linear regression model I showed maximize likelifood is equal to minimize sum-of- squares error I showed MAP estimation is equal to redge regression
  • 40. LINEAR CLASSIFICATION we think K-class (K>2) classification so, we prepare K linear models (x) = ϕ(x)yk w T k y(x) = xW˜ T = ( , , …, )W˜ w1 w2 wK y(x) = ( (x), (x), …, (x)y1 y2 yK ) T
  • 41. 1-OF-K CODING we prepare vector p (if is class ) (if is not class ) p = ( , , …,c1 c2 cK ) T = 1ci x i = 0ci x i ex) p = (0, 0, …, 0, 1, 0, …, 0) T
  • 42. E( )W˜ ∂E( W)˜ ∂W˜ = ||y( ) − | 1 2 ∑ i=1 N x (i) p (i) | 2 = || ϕ( ) − | 1 2 ∑ i=1 N W˜ T x (i) p (i) | 2 = ( ϕ( ) − ( ϕ( ) − ) 1 2 ∑ i=1 N W˜ T x (i) p (i) ) T W˜ T x (i) p (i) = ( ϕ( ϕ( ) − 2ϕ( + || || ) 1 2 ∑ i=1 N x (i) ) T W˜W˜ x (i) x (i) ) T W˜p (i) p (i) = ( ϕ( )ϕ( − ϕ( ) )∑ i=1 N x (i) x (i) ) T W˜ x (i) p (i) T LEAST-SQUARES METHOD
  • 43. LEAST-SQUARES METHOD by = X − P ∂E( )W˜ ∂W˜ X T W˜ X T = 0 ∂E( )W˜ ∂W˜ X = PX T W˜ X T ∴ = ( X PW˜ X T ) −1 X T X P = (ϕ( ), ϕ( ), …, ϕ( )x (1) x (2) x (N) ) T = ( , , …, )p (i) p (2) p (N)
  • 44. WEAK TO OUTLIERS red : least square method green : logistic regression
  • 45. ASSUMING A NORMAL DISTRIBUTION le : least square method right : logistic regression
  • 46. DISADVANTAGES OF LEAST-SQUARES METHOD not handle the label as the probability weak to outliers assuming a normal distribution if data does not follow the normal distribution, bad result we shoud not use least-squares method in classification problem
  • 47. SUMMARY I introduced linear model (regression, classification) there are the basis for some of other machine learning model PRML is difficult for me, but I want to continue reading