SlideShare a Scribd company logo
1 of 14
Download to read offline
Gradient Descent
Natural Language Processing
Emory University
Jinho D. Choi
ˆE(f) =
1
n
nX
i=1
`(ˆyi; yi)
E(f) =
Z
`(ˆy; y) · P(x, y)
Supervised Learning
2
(X, Y ) = {(x1, y1), . . . , (xn, yn)}
ˆy = f(x) predicts the output of x
input
prediction
loss function joint distribution
Expected risk
unknown!
Empirical risk minimize!
output y = ±1 binomial distribution
`(w, x; y) =
1
2
(wT
x y)2
ˆE(f) =
1
n
nX
i=1
`(ˆyi; yi)
Linear Prediction
3
least squares
linear function
Find a weight vector that minimizes the loss.
`(ˆy; y) =
1
2
(ˆy y)2
ˆy = f(x) = wT
(x) = wT
x
feature vector
wt+1 wt ⌘t
1
n
nX
i=1
@
@w
`(wt, xi; yi)
Gradient Descent
4
learning rate derivative of the loss
Minimize loss
Derivative → 0
Global optimum?
Convex optimization
Gradient Descent
5
How often is the weight vector updated?
wt+1 wt ⌘t
1
n
nX
i=1
@
@w
`(wt, xi; yi)
`(w, x; y) =
1
2
(wT
x y)2
@
@w
`(w, x; y) =
@
@w
1
2
(wT
x y)2
= (wT
x y)x
wt+1 wt ⌘t
1
n
nX
i=1
(wT
xi yi)xi
Stochastic Gradient Descent
6
wt+1 wt ⌘t
1
n
nX
i=1
(wT
xi yi)xi
wt+1 wt ⌘t(wT
t xi yi)xi
0
+
-
w0 0
wT
0 x1 > 0
wT
1 x2 < 0
wT
2 x3 < 0 w3 w2 ⌘( 1)x3
w2 w1 ⌘( + 1)x2
w1 w0 ⌘( + 1)x1
wT
3 x4 > 0 w4 w3 ⌘( 1)x4
updated for every instance
Perceptron
7
wt+1 wt ⌘t `
Stochastic gradient descent
wt+1 wt + ⌘t
⇢
x · y wT
t x · y < 0
0 otherwise
`(w, x; y) =
1
2
(wT
x y)2
` = (wT
x y)x
Least squares
` =
⇢
x · y wT
x · y < 0
0 otherwise
`(w, x; y) = max{0, wT
x · y}
Perceptron
Averaged Perceptron
8
The final hyperplane may be

overfitted to later instances.
Take the average of all hyperplanes
including ones that are not updated.
Averaged Perceptron
9
c c + 1
Initialization:
Update rule: for every instance
c 1
sparse vector?
wt+1 wt + ⌘t(x · y)
vt+1 vt + ⌘t · c(x · y)
w w
1
c
· v
wt+1 wt + ⌘t(x · y) if wT
t x · y < 0
w
1
c
c 1X
t=0
wt
Emory University Logo Guidelines
-
Multinomial Perceptron
10
Binomial distribution requires
1 hyperplane to separate 2 classes.
Multinomial distribution requires
m hyperplanes to separate m classes.
How many for

m classes?
Multinomial Perceptron
11
a b c d ew =
1 0 0 1 0x =
wT
x = a + d ˆy =
⇢
1 wT
x 0
1 otherwise
a0 a1 a2 a3 b0 b1 b2 b3 c0 c1 c2 c3 d0 d1 d2 d3 e0 e1 e2 e3w =
5 features (including bias)
Binomial
Multinomial y = {0, 1, 2, 3}
ˆy = arg max
y
wT
y xwT
y x = ay + dy
y = { 1, 1}
Binomial vs. Multinomial Perceptron
12
wt+1 wt + ⌘t(x · y)
Binomial
wy,t+1 wy,t + ⌘t · x
Multinomial
wˆy,t+1 wˆy,t ⌘t · x
if wT
t x · y < 0 , y 6= ˆy
Hinge Loss
13
` =
⇢
x · y wT
x · y < 0
0 otherwise
`(w, x; y) = max{0, wT
x · y}
Perceptron
Hinge loss
`(w, x; y) = max{0, 1 wT
x · y}
` =
⇢
x · y wT
x · y < 1
0 otherwise
Adaptive Gradient Descent
14
if wT
t x · y < 0
Perceptron
if wT
t · y < 1
Hinge loss
wt+1 wt + ⌘t(x · y)
gt+1 gt + x x
wt+1 wt +
⌘
⇢ +
p
gt+1
· (x · y)

More Related Content

What's hot

Interpolation In Numerical Methods.
 Interpolation In Numerical Methods. Interpolation In Numerical Methods.
Interpolation In Numerical Methods.Abu Kaisar
 
[2019] Language Modeling
[2019] Language Modeling[2019] Language Modeling
[2019] Language ModelingJinho Choi
 
Newton’s Forward & backward interpolation
Newton’s Forward &  backward interpolation Newton’s Forward &  backward interpolation
Newton’s Forward & backward interpolation Meet Patel
 
Newton's forward difference
Newton's forward differenceNewton's forward difference
Newton's forward differenceRaj Parekh
 
Newton's Backward Interpolation Formula with Example
Newton's Backward Interpolation Formula with ExampleNewton's Backward Interpolation Formula with Example
Newton's Backward Interpolation Formula with ExampleMuhammadUsmanIkram2
 
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)Resumen de Integrales (Cálculo Diferencial e Integral UNAB)
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)Mauricio Vargas 帕夏
 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMTomonari Masada
 
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTES
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTESAsymptotes | WORKING PRINCIPLE OF ASYMPTOTES
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTESNITESH POONIA
 
Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Matthew Leingang
 
Complex Numbers 1 - Math Academy - JC H2 maths A levels
Complex Numbers 1 - Math Academy - JC H2 maths A levelsComplex Numbers 1 - Math Academy - JC H2 maths A levels
Complex Numbers 1 - Math Academy - JC H2 maths A levelsMath Academy Singapore
 
Interpolation functions
Interpolation functionsInterpolation functions
Interpolation functionsTarun Gehlot
 
BBMP1103 - Sept 2011 exam workshop - part 8
BBMP1103 - Sept 2011 exam workshop - part 8BBMP1103 - Sept 2011 exam workshop - part 8
BBMP1103 - Sept 2011 exam workshop - part 8Richard Ng
 
Lecture9 multi kernel_svm
Lecture9 multi kernel_svmLecture9 multi kernel_svm
Lecture9 multi kernel_svmStéphane Canu
 
Newton backward interpolation
Newton backward interpolationNewton backward interpolation
Newton backward interpolationMUHAMMADUMAIR647
 

What's hot (20)

Interpolation In Numerical Methods.
 Interpolation In Numerical Methods. Interpolation In Numerical Methods.
Interpolation In Numerical Methods.
 
[2019] Language Modeling
[2019] Language Modeling[2019] Language Modeling
[2019] Language Modeling
 
Newton’s Forward & backward interpolation
Newton’s Forward &  backward interpolation Newton’s Forward &  backward interpolation
Newton’s Forward & backward interpolation
 
Newton's forward difference
Newton's forward differenceNewton's forward difference
Newton's forward difference
 
Newton's Backward Interpolation Formula with Example
Newton's Backward Interpolation Formula with ExampleNewton's Backward Interpolation Formula with Example
Newton's Backward Interpolation Formula with Example
 
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)Resumen de Integrales (Cálculo Diferencial e Integral UNAB)
Resumen de Integrales (Cálculo Diferencial e Integral UNAB)
 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LM
 
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTES
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTESAsymptotes | WORKING PRINCIPLE OF ASYMPTOTES
Asymptotes | WORKING PRINCIPLE OF ASYMPTOTES
 
Gentle intro to SVM
Gentle intro to SVMGentle intro to SVM
Gentle intro to SVM
 
Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)Lesson 27: Integration by Substitution (Section 041 slides)
Lesson 27: Integration by Substitution (Section 041 slides)
 
Recurrence
RecurrenceRecurrence
Recurrence
 
Complex Numbers 1 - Math Academy - JC H2 maths A levels
Complex Numbers 1 - Math Academy - JC H2 maths A levelsComplex Numbers 1 - Math Academy - JC H2 maths A levels
Complex Numbers 1 - Math Academy - JC H2 maths A levels
 
Sample2
Sample2Sample2
Sample2
 
Interpolation functions
Interpolation functionsInterpolation functions
Interpolation functions
 
BBMP1103 - Sept 2011 exam workshop - part 8
BBMP1103 - Sept 2011 exam workshop - part 8BBMP1103 - Sept 2011 exam workshop - part 8
BBMP1103 - Sept 2011 exam workshop - part 8
 
AML
AMLAML
AML
 
125 5.2
125 5.2125 5.2
125 5.2
 
Integration by parts
Integration by partsIntegration by parts
Integration by parts
 
Lecture9 multi kernel_svm
Lecture9 multi kernel_svmLecture9 multi kernel_svm
Lecture9 multi kernel_svm
 
Newton backward interpolation
Newton backward interpolationNewton backward interpolation
Newton backward interpolation
 

Viewers also liked

Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntel Nervana
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Rule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesRule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesDmitry Kan
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisYun Hao
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...Cataldo Musto
 
CS571: Sentiment Analysis
CS571: Sentiment AnalysisCS571: Sentiment Analysis
CS571: Sentiment AnalysisJinho Choi
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结君 廖
 
Text categorization
Text categorizationText categorization
Text categorizationKU Leuven
 

Viewers also liked (9)

Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Rule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesRule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slides
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment Analysis
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
 
CS571: Sentiment Analysis
CS571: Sentiment AnalysisCS571: Sentiment Analysis
CS571: Sentiment Analysis
 
Text categorization
Text categorizationText categorization
Text categorization
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 
Text categorization
Text categorizationText categorization
Text categorization
 

Similar to CS571: Gradient Descent

Calculus First Test 2011/10/20
Calculus First Test 2011/10/20Calculus First Test 2011/10/20
Calculus First Test 2011/10/20Kuan-Lun Wang
 
Physical Chemistry Assignment Help
Physical Chemistry Assignment HelpPhysical Chemistry Assignment Help
Physical Chemistry Assignment HelpEdu Assignment Help
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithmsguestfee8698
 
Emat 213 study guide
Emat 213 study guideEmat 213 study guide
Emat 213 study guideakabaka12
 
Differential Calculus
Differential Calculus Differential Calculus
Differential Calculus OlooPundit
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svmStéphane Canu
 
MLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackMLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackarogozhnikov
 
6.3_DiscriminantFunctions for machine learning supervised learning
6.3_DiscriminantFunctions for machine learning supervised learning6.3_DiscriminantFunctions for machine learning supervised learning
6.3_DiscriminantFunctions for machine learning supervised learningMrsMargaretSavithaP
 
Calculus B Notes (Notre Dame)
Calculus B Notes (Notre Dame)Calculus B Notes (Notre Dame)
Calculus B Notes (Notre Dame)Laurel Ayuyao
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
1. newtonsforwardbackwordinterpolation-190305095001.pdf
1. newtonsforwardbackwordinterpolation-190305095001.pdf1. newtonsforwardbackwordinterpolation-190305095001.pdf
1. newtonsforwardbackwordinterpolation-190305095001.pdfFaisalMehmood887349
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydSri Ambati
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualStéphane Canu
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualStéphane Canu
 

Similar to CS571: Gradient Descent (20)

Calculus First Test 2011/10/20
Calculus First Test 2011/10/20Calculus First Test 2011/10/20
Calculus First Test 2011/10/20
 
Physical Chemistry Assignment Help
Physical Chemistry Assignment HelpPhysical Chemistry Assignment Help
Physical Chemistry Assignment Help
 
Eight Regression Algorithms
Eight Regression AlgorithmsEight Regression Algorithms
Eight Regression Algorithms
 
Emat 213 study guide
Emat 213 study guideEmat 213 study guide
Emat 213 study guide
 
Differential Calculus
Differential Calculus Differential Calculus
Differential Calculus
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svm
 
MLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackMLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic track
 
6.3_DiscriminantFunctions for machine learning supervised learning
6.3_DiscriminantFunctions for machine learning supervised learning6.3_DiscriminantFunctions for machine learning supervised learning
6.3_DiscriminantFunctions for machine learning supervised learning
 
Interpolation
InterpolationInterpolation
Interpolation
 
Calculus B Notes (Notre Dame)
Calculus B Notes (Notre Dame)Calculus B Notes (Notre Dame)
Calculus B Notes (Notre Dame)
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
1. newtonsforwardbackwordinterpolation-190305095001.pdf
1. newtonsforwardbackwordinterpolation-190305095001.pdf1. newtonsforwardbackwordinterpolation-190305095001.pdf
1. newtonsforwardbackwordinterpolation-190305095001.pdf
 
Sect1 5
Sect1 5Sect1 5
Sect1 5
 
Sect1 4
Sect1 4Sect1 4
Sect1 4
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the Dual
 
2.1 Calculus 2.formulas.pdf.pdf
2.1 Calculus 2.formulas.pdf.pdf2.1 Calculus 2.formulas.pdf.pdf
2.1 Calculus 2.formulas.pdf.pdf
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 

More from Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionJinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning RepresentationJinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingJinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet SimilaritiesJinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical RelationsJinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementJinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingJinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingJinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological SortJinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseJinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsJinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyJinho Choi
 

More from Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

CS571: Gradient Descent

  • 1. Gradient Descent Natural Language Processing Emory University Jinho D. Choi
  • 2. ˆE(f) = 1 n nX i=1 `(ˆyi; yi) E(f) = Z `(ˆy; y) · P(x, y) Supervised Learning 2 (X, Y ) = {(x1, y1), . . . , (xn, yn)} ˆy = f(x) predicts the output of x input prediction loss function joint distribution Expected risk unknown! Empirical risk minimize! output y = ±1 binomial distribution
  • 3. `(w, x; y) = 1 2 (wT x y)2 ˆE(f) = 1 n nX i=1 `(ˆyi; yi) Linear Prediction 3 least squares linear function Find a weight vector that minimizes the loss. `(ˆy; y) = 1 2 (ˆy y)2 ˆy = f(x) = wT (x) = wT x feature vector
  • 4. wt+1 wt ⌘t 1 n nX i=1 @ @w `(wt, xi; yi) Gradient Descent 4 learning rate derivative of the loss Minimize loss Derivative → 0 Global optimum? Convex optimization
  • 5. Gradient Descent 5 How often is the weight vector updated? wt+1 wt ⌘t 1 n nX i=1 @ @w `(wt, xi; yi) `(w, x; y) = 1 2 (wT x y)2 @ @w `(w, x; y) = @ @w 1 2 (wT x y)2 = (wT x y)x wt+1 wt ⌘t 1 n nX i=1 (wT xi yi)xi
  • 6. Stochastic Gradient Descent 6 wt+1 wt ⌘t 1 n nX i=1 (wT xi yi)xi wt+1 wt ⌘t(wT t xi yi)xi 0 + - w0 0 wT 0 x1 > 0 wT 1 x2 < 0 wT 2 x3 < 0 w3 w2 ⌘( 1)x3 w2 w1 ⌘( + 1)x2 w1 w0 ⌘( + 1)x1 wT 3 x4 > 0 w4 w3 ⌘( 1)x4 updated for every instance
  • 7. Perceptron 7 wt+1 wt ⌘t ` Stochastic gradient descent wt+1 wt + ⌘t ⇢ x · y wT t x · y < 0 0 otherwise `(w, x; y) = 1 2 (wT x y)2 ` = (wT x y)x Least squares ` = ⇢ x · y wT x · y < 0 0 otherwise `(w, x; y) = max{0, wT x · y} Perceptron
  • 8. Averaged Perceptron 8 The final hyperplane may be
 overfitted to later instances. Take the average of all hyperplanes including ones that are not updated.
  • 9. Averaged Perceptron 9 c c + 1 Initialization: Update rule: for every instance c 1 sparse vector? wt+1 wt + ⌘t(x · y) vt+1 vt + ⌘t · c(x · y) w w 1 c · v wt+1 wt + ⌘t(x · y) if wT t x · y < 0 w 1 c c 1X t=0 wt
  • 10. Emory University Logo Guidelines - Multinomial Perceptron 10 Binomial distribution requires 1 hyperplane to separate 2 classes. Multinomial distribution requires m hyperplanes to separate m classes. How many for
 m classes?
  • 11. Multinomial Perceptron 11 a b c d ew = 1 0 0 1 0x = wT x = a + d ˆy = ⇢ 1 wT x 0 1 otherwise a0 a1 a2 a3 b0 b1 b2 b3 c0 c1 c2 c3 d0 d1 d2 d3 e0 e1 e2 e3w = 5 features (including bias) Binomial Multinomial y = {0, 1, 2, 3} ˆy = arg max y wT y xwT y x = ay + dy y = { 1, 1}
  • 12. Binomial vs. Multinomial Perceptron 12 wt+1 wt + ⌘t(x · y) Binomial wy,t+1 wy,t + ⌘t · x Multinomial wˆy,t+1 wˆy,t ⌘t · x if wT t x · y < 0 , y 6= ˆy
  • 13. Hinge Loss 13 ` = ⇢ x · y wT x · y < 0 0 otherwise `(w, x; y) = max{0, wT x · y} Perceptron Hinge loss `(w, x; y) = max{0, 1 wT x · y} ` = ⇢ x · y wT x · y < 1 0 otherwise
  • 14. Adaptive Gradient Descent 14 if wT t x · y < 0 Perceptron if wT t · y < 1 Hinge loss wt+1 wt + ⌘t(x · y) gt+1 gt + x x wt+1 wt + ⌘ ⇢ + p gt+1 · (x · y)