SlideShare a Scribd company logo
FromLogisticRegression
toLinear-ChainCRF
Yow-Bang (Darren) Wang
12/20/2012
● Introduction
● Logistic Regression
● Log-Linear Model
● Linear-Chain CRF
○ Example: Part of Speech (POS) Tagging
● CRF Training and Testing
○ Example: Part of Speech (POS) Tagging
● Example: Speech Disfluency Detection
Outline
Introduction
Introduction
We can approach the theory of CRF from
1. Maximum Entropy
2. Probabilistic Graphical Model
3. Logistic Regression <– today's talk
LinearRegression
● Input x: real-valued features (RV)
● Output y: Gaussian distribution (RV)
● Model parameter
● ML (conditional likelihood) estimation of Ө:
, where {X, Y} are the training data.
LinearRegression
● Input x: real-valued features (RV)
● Output y: Gaussian distribution (RV)
● Represented with a graphical model:
1
x1
xN
y
a0
a1
aN
…...
LogisticRegression
LogisticRegression
● Input x: real-valued features (RV)
● Output y: Bernoulli distribution (RV)
● Model parameter
Q:Whythisform?
A:Bothsideshaverangeofvalue
{-∞,∞}
NoanalyticalsolutionforML
→gradientdescent
LogisticRegression
● Input x: real-valued features (RV)
● Output y: Bernoulli distribution (RV)
● Represented with a graphical model:
1
x1
xN
a0
a1
aN
…...
pSigmoid
LogisticRegression
Advantages of Logistic Regression:
1. Correlated features x don't lead to problems (contrast to
Naive Bayes)
2. Well-calibrated probability (contrast to SVM)
3. Not sensitive to unbalanced training data
numberof”Y=1"
MultinomialLogisticRegression
● Input x: real-valued features (RV), N-dimension
● Output y: Bernoulli distribution (RV), M-class
● Represented with a graphical model:
1
x1
xN
…
p1
pM
…
Softmax
Neuralnetwork
with2layers!!!
pm
:Probabilityof
m-thclass
Log-LinearModel
Log-LinearModel
An interpretation: Log-Linear Model is a Structured Logistic
Regression
● Structured: allow non-numerical input and output by
defining proper feature function
● Special case: Logistic regression
General form:
● Fj
(x,y): j-th feature function
Log-LinearModel
Note:
1. “Feature” vs. “Feature function”
○ Feature: only correspond to input
○ Feature function: correspond to both input and output
2. Must sum over all possible label y' for denominator
-> normalization into [0, 1].
General form:
● Fj
(x,y): j-th feature function
Linear-ChainCRF
hidden
observed
From probabilistic graphical model perspective:
● CRF is a Markov Random Field with some disjoint RVs
observed and some hidden.
x
z
y
q
r
p
ConditionalRandomField(CRF)
From probabilistic graphical model perspective:
● Linear-Chain CRF: a specific structure of CRF
Linear-ChainCRF
hidden
observed
Weoftenreferto"linear-chainCRF"
assimply"CRF"
Linear-ChainCRF
From Log-Linear Model point of view: Linear-Chain CRF is a
Log-Linear Model, of which
1. The length L of output y can be varying.
2. The form of feature function is the sum of ”low-level
feature functions”:
hidden
observed
y:
x:
……
Linear-ChainCRF
From Log-Linear Model point of view: Linear-Chain CRF is a
Log-Linear Model, of which
1. The length L of output y can be varying.
2. The form of feature function is the sum of ”low-level
feature functions”:
“We can have a fixed set of feature-functions Fj
for log-
linear training, even though the training examples are not
fixed-length.” [1]
Input (observed) x: word sequence
Output (hidden) y: POS tag sequence
● For example:
x = "He sat on the mat."
y = "pronoun verb preposition article noun"
pron. v.
He sat on the mat.
prep. art. n.
Example:PartofSpeech(POS)Tagging
Example:PartofSpeech(POS)Tagging
Input (observed) x: word sequence
Output (hidden) y: POS tag sequence
● With CRF we hope
CRF:
, where
Example:PartofSpeech(POS)Tagging
An example of low-level feature function fj
(x,yi
,yi-1
,i):
● "The i-th word in x is capitalized, and POS tag yi
=
proper noun." [TRUE(1) or FALSE(0)]
If wj
positively large: given x and other condition fixed, y
is more probable if fj
(x,yi
,yi-1
,i) is activated.
CRF:
, where
Noteafeaturefunctionmaynotuse
allthegiveninformation
CRFTrainingand
Testing
Training
Stochastic Gradient Ascent
● Partial derivative of conditional log-likelihood:
● Update weight by
Training
Note: if j-th feature function is not activated by this
training example
→ we don't need to update it!
→ usually only a few weights need to be updated in each
iteration
Testing
For 1-best derivation:
N V Adj ...
N
V
Adj
...
For 1-best derivation:
1. Pre-compute g(yi-1
,yi
) as a table for each i
2. Perform dynamic programming to find the best sequence y:
Example:PartofSpeech(POS)Tagging
●
●
……
……
…
●
●
…
For 1-best derivation:
1. Pre-compute g(yi-1
,yi
) as a table for each i
2. Perform dynamic programming to find the best sequence y:
● Complexity: O(M2
LD)
Example:PartofSpeech(POS)Tagging
Buildatable
Foreachelement
insequence
#offeaturefuNctions
Testing
For probability estimation:
● must also compute all possible y (e.g. all possible POS
sequences) for denominator......
Canbecalculatedbymatrix
multiplication!!!
Example:Speech
Disfluency
Detection
Example:SpeechDisfluencyDetection
One of the application of CRF in speech recognition:
Boundary/Disfluency Detection [5]
● Repetition : “It is is Tuesday.”
● Hesitation : “It is uh… Tuesday.”
● Correction: “It is Monday, I mean, Tuesday.”
● etc.
Possible clues: prosody
● Pitch
● Duration
● Energy
● Pause
● etc.
“Itisuh…Tuesday.”
● Pitchreset?
● Longduration?
● Lowenergy?
● Pauseexistence?
One of the application of CRF in speech recognition:
Boundary/Disfluency Detection [5]
● CRF Input x: prosodic features
● CRF Output y:
Speech
Recognition
Rescoring
Example:SpeechDisfluencyDetection
Reference
[1] Charles Elkan, “Log-linear Models and Conditional Random
Fields”
○ Tutorial at CIKM08 (ACM International Conference on Information and
Knowledge Management)
○ Video: http://videolectures.net/cikm08_elkan_llmacrf/
○ Lecture notes: http://cseweb.ucsd.edu/~elkan/250B/cikmtutorial.pdf
[2] Hanna M. Wallach, “Conditional Random Fields: An
Introduction”
[3] Jeremy Morris, “Conditional Random Fields: An Overview”
○ Presented at OSU Clippers 2008, January 11, 2008
Reference
[4] C. Sutton, K. Rohanimanesh, A. McCallum, “Conditional
random fields: Probabilistic models for segmenting and
labeling sequence data”, 2001.
[5] Liu, Y. and Shriberg, E. and Stolcke, A. and Hillard, D.
and Ostendorf, M. and Harper, M., “Enriching speech
recognition with automatic detection of sentence boundaries
and disfluencies”, in IEEE Transactions on Audio, Speech,
and Language Processing, 2006.

More Related Content

What's hot

Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
Divya Gera
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
Yan Xu
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
Rishabh Gupta
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
Hamed Abdi
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
Abhishek Vijayvargia
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed Bandits
Seung Jae Lee
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Universitat Politècnica de Catalunya
 
Activation functions
Activation functionsActivation functions
Activation functions
PRATEEK SAHU
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
Omar Enayet
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
Tomer Lieber
 
Mdp
MdpMdp
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Salem-Kabbani
 
bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models
Xiaotao Zou
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
Nikolay Pavlov
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
健程 杨
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
Subrat Panda, PhD
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 

What's hot (20)

Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed Bandits
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
Activation functions
Activation functionsActivation functions
Activation functions
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Mdp
MdpMdp
Mdp
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 

Similar to From logistic regression to linear chain CRF

Scala qq
Scala qqScala qq
Scala qq
羽祈 張
 
Ridge-based Profiled Differential Power Analysis
Ridge-based Profiled Differential Power AnalysisRidge-based Profiled Differential Power Analysis
Ridge-based Profiled Differential Power Analysis
Priyanka Aash
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Spark Summit
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
Databricks
 
Scheme 核心概念(一)
Scheme 核心概念(一)Scheme 核心概念(一)
Scheme 核心概念(一)
維然 柯維然
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsbutest
 
ReactiveX
ReactiveXReactiveX
ReactiveX
BADR
 
Unsupervised program synthesis
Unsupervised program synthesisUnsupervised program synthesis
Unsupervised program synthesis
Amrith Krishna
 
Halide - 2
Halide - 2 Halide - 2
Halide - 2
Kobe Yu
 
DEF CON 23 - Atlas - fun with symboliks
DEF CON 23 - Atlas - fun with symboliksDEF CON 23 - Atlas - fun with symboliks
DEF CON 23 - Atlas - fun with symboliks
Felipe Prado
 
Reactive cocoa 101
Reactive cocoa 101Reactive cocoa 101
Reactive cocoa 101
Hai Feng Kao
 
COMPILER_DESIGN_CLASS 2.ppt
COMPILER_DESIGN_CLASS 2.pptCOMPILER_DESIGN_CLASS 2.ppt
COMPILER_DESIGN_CLASS 2.ppt
ssuserebb9821
 
COMPILER_DESIGN_CLASS 1.pptx
COMPILER_DESIGN_CLASS 1.pptxCOMPILER_DESIGN_CLASS 1.pptx
COMPILER_DESIGN_CLASS 1.pptx
ssuserebb9821
 
Reactive programming using rx java & akka actors - pdx-scala - june 2014
Reactive programming   using rx java & akka actors - pdx-scala - june 2014Reactive programming   using rx java & akka actors - pdx-scala - june 2014
Reactive programming using rx java & akka actors - pdx-scala - june 2014Thomas Lockney
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash Cracking
Positive Hack Days
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
herbps10
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKOlivier Grisel
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
HONGJOO LEE
 

Similar to From logistic regression to linear chain CRF (20)

Scala qq
Scala qqScala qq
Scala qq
 
Ridge-based Profiled Differential Power Analysis
Ridge-based Profiled Differential Power AnalysisRidge-based Profiled Differential Power Analysis
Ridge-based Profiled Differential Power Analysis
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Scheme 核心概念(一)
Scheme 核心概念(一)Scheme 核心概念(一)
Scheme 核心概念(一)
 
Slides
SlidesSlides
Slides
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
ReactiveX
ReactiveXReactiveX
ReactiveX
 
20 mins of Liblinear
20 mins of Liblinear20 mins of Liblinear
20 mins of Liblinear
 
Unsupervised program synthesis
Unsupervised program synthesisUnsupervised program synthesis
Unsupervised program synthesis
 
Halide - 2
Halide - 2 Halide - 2
Halide - 2
 
DEF CON 23 - Atlas - fun with symboliks
DEF CON 23 - Atlas - fun with symboliksDEF CON 23 - Atlas - fun with symboliks
DEF CON 23 - Atlas - fun with symboliks
 
Reactive cocoa 101
Reactive cocoa 101Reactive cocoa 101
Reactive cocoa 101
 
COMPILER_DESIGN_CLASS 2.ppt
COMPILER_DESIGN_CLASS 2.pptCOMPILER_DESIGN_CLASS 2.ppt
COMPILER_DESIGN_CLASS 2.ppt
 
COMPILER_DESIGN_CLASS 1.pptx
COMPILER_DESIGN_CLASS 1.pptxCOMPILER_DESIGN_CLASS 1.pptx
COMPILER_DESIGN_CLASS 1.pptx
 
Reactive programming using rx java & akka actors - pdx-scala - june 2014
Reactive programming   using rx java & akka actors - pdx-scala - june 2014Reactive programming   using rx java & akka actors - pdx-scala - june 2014
Reactive programming using rx java & akka actors - pdx-scala - june 2014
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash Cracking
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

From logistic regression to linear chain CRF

  • 2. ● Introduction ● Logistic Regression ● Log-Linear Model ● Linear-Chain CRF ○ Example: Part of Speech (POS) Tagging ● CRF Training and Testing ○ Example: Part of Speech (POS) Tagging ● Example: Speech Disfluency Detection Outline
  • 4. Introduction We can approach the theory of CRF from 1. Maximum Entropy 2. Probabilistic Graphical Model 3. Logistic Regression <– today's talk
  • 5. LinearRegression ● Input x: real-valued features (RV) ● Output y: Gaussian distribution (RV) ● Model parameter ● ML (conditional likelihood) estimation of Ө: , where {X, Y} are the training data.
  • 6. LinearRegression ● Input x: real-valued features (RV) ● Output y: Gaussian distribution (RV) ● Represented with a graphical model: 1 x1 xN y a0 a1 aN …...
  • 8. LogisticRegression ● Input x: real-valued features (RV) ● Output y: Bernoulli distribution (RV) ● Model parameter Q:Whythisform? A:Bothsideshaverangeofvalue {-∞,∞} NoanalyticalsolutionforML →gradientdescent
  • 9. LogisticRegression ● Input x: real-valued features (RV) ● Output y: Bernoulli distribution (RV) ● Represented with a graphical model: 1 x1 xN a0 a1 aN …... pSigmoid
  • 10. LogisticRegression Advantages of Logistic Regression: 1. Correlated features x don't lead to problems (contrast to Naive Bayes) 2. Well-calibrated probability (contrast to SVM) 3. Not sensitive to unbalanced training data numberof”Y=1"
  • 11. MultinomialLogisticRegression ● Input x: real-valued features (RV), N-dimension ● Output y: Bernoulli distribution (RV), M-class ● Represented with a graphical model: 1 x1 xN … p1 pM … Softmax Neuralnetwork with2layers!!! pm :Probabilityof m-thclass
  • 13. Log-LinearModel An interpretation: Log-Linear Model is a Structured Logistic Regression ● Structured: allow non-numerical input and output by defining proper feature function ● Special case: Logistic regression General form: ● Fj (x,y): j-th feature function
  • 14. Log-LinearModel Note: 1. “Feature” vs. “Feature function” ○ Feature: only correspond to input ○ Feature function: correspond to both input and output 2. Must sum over all possible label y' for denominator -> normalization into [0, 1]. General form: ● Fj (x,y): j-th feature function
  • 16. hidden observed From probabilistic graphical model perspective: ● CRF is a Markov Random Field with some disjoint RVs observed and some hidden. x z y q r p ConditionalRandomField(CRF)
  • 17. From probabilistic graphical model perspective: ● Linear-Chain CRF: a specific structure of CRF Linear-ChainCRF hidden observed Weoftenreferto"linear-chainCRF" assimply"CRF"
  • 18. Linear-ChainCRF From Log-Linear Model point of view: Linear-Chain CRF is a Log-Linear Model, of which 1. The length L of output y can be varying. 2. The form of feature function is the sum of ”low-level feature functions”: hidden observed y: x: ……
  • 19. Linear-ChainCRF From Log-Linear Model point of view: Linear-Chain CRF is a Log-Linear Model, of which 1. The length L of output y can be varying. 2. The form of feature function is the sum of ”low-level feature functions”: “We can have a fixed set of feature-functions Fj for log- linear training, even though the training examples are not fixed-length.” [1]
  • 20. Input (observed) x: word sequence Output (hidden) y: POS tag sequence ● For example: x = "He sat on the mat." y = "pronoun verb preposition article noun" pron. v. He sat on the mat. prep. art. n. Example:PartofSpeech(POS)Tagging
  • 21. Example:PartofSpeech(POS)Tagging Input (observed) x: word sequence Output (hidden) y: POS tag sequence ● With CRF we hope CRF: , where
  • 22. Example:PartofSpeech(POS)Tagging An example of low-level feature function fj (x,yi ,yi-1 ,i): ● "The i-th word in x is capitalized, and POS tag yi = proper noun." [TRUE(1) or FALSE(0)] If wj positively large: given x and other condition fixed, y is more probable if fj (x,yi ,yi-1 ,i) is activated. CRF: , where Noteafeaturefunctionmaynotuse allthegiveninformation
  • 24. Training Stochastic Gradient Ascent ● Partial derivative of conditional log-likelihood: ● Update weight by
  • 25. Training Note: if j-th feature function is not activated by this training example → we don't need to update it! → usually only a few weights need to be updated in each iteration
  • 27. N V Adj ... N V Adj ... For 1-best derivation: 1. Pre-compute g(yi-1 ,yi ) as a table for each i 2. Perform dynamic programming to find the best sequence y: Example:PartofSpeech(POS)Tagging ● ● …… …… … ● ● …
  • 28. For 1-best derivation: 1. Pre-compute g(yi-1 ,yi ) as a table for each i 2. Perform dynamic programming to find the best sequence y: ● Complexity: O(M2 LD) Example:PartofSpeech(POS)Tagging Buildatable Foreachelement insequence #offeaturefuNctions
  • 29. Testing For probability estimation: ● must also compute all possible y (e.g. all possible POS sequences) for denominator...... Canbecalculatedbymatrix multiplication!!!
  • 31. Example:SpeechDisfluencyDetection One of the application of CRF in speech recognition: Boundary/Disfluency Detection [5] ● Repetition : “It is is Tuesday.” ● Hesitation : “It is uh… Tuesday.” ● Correction: “It is Monday, I mean, Tuesday.” ● etc. Possible clues: prosody ● Pitch ● Duration ● Energy ● Pause ● etc. “Itisuh…Tuesday.” ● Pitchreset? ● Longduration? ● Lowenergy? ● Pauseexistence?
  • 32. One of the application of CRF in speech recognition: Boundary/Disfluency Detection [5] ● CRF Input x: prosodic features ● CRF Output y: Speech Recognition Rescoring Example:SpeechDisfluencyDetection
  • 33. Reference [1] Charles Elkan, “Log-linear Models and Conditional Random Fields” ○ Tutorial at CIKM08 (ACM International Conference on Information and Knowledge Management) ○ Video: http://videolectures.net/cikm08_elkan_llmacrf/ ○ Lecture notes: http://cseweb.ucsd.edu/~elkan/250B/cikmtutorial.pdf [2] Hanna M. Wallach, “Conditional Random Fields: An Introduction” [3] Jeremy Morris, “Conditional Random Fields: An Overview” ○ Presented at OSU Clippers 2008, January 11, 2008
  • 34. Reference [4] C. Sutton, K. Rohanimanesh, A. McCallum, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data”, 2001. [5] Liu, Y. and Shriberg, E. and Stolcke, A. and Hillard, D. and Ostendorf, M. and Harper, M., “Enriching speech recognition with automatic detection of sentence boundaries and disfluencies”, in IEEE Transactions on Audio, Speech, and Language Processing, 2006.