SlideShare a Scribd company logo
1 of 26
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty, Andrew McCallum, Fernando Pereira Speaker : Shu-Ying Li 1
Outline Introduction Conditional Random Fields Parameter Estimated for CRFs Experiments Conclusions 2
Introduction Sequence Segmenting and Labeling ,[object Object],Generative Models ,[object Object]
Assign a joint probability to paired observation and label sequences
The parameters typically trained to maximize the joint likelihood of train examplesSt-1 St St+1 Ot Ot+1 3
Introduction(cont.) Conditional Model ,[object Object]
Allow arbitrary, non-independent features of the observation sequence X.
The probability of a transition between labels may depend on past and feature observations.Maximum Entropy Markov Models (MEMMs) St-1 St St+1 ... Ot Ot+1 Ot-1 4
Introduction(cont.) The Label Bias Problem: ,[object Object],Pr(1 and 2|ro) = Pr(2|1,ro)Pr(1,ro) = Pr(2| 1,o)Pr(1,r) Pr(1 and 2|ri) =  Pr(2|1,ri)Pr(1,ri)  =  Pr(2| 1,i)Pr(1,r) Pr(2|1,o) = Pr(2|1,r) = 1 Pr(1 and 2|ro) = Pr(1 and 2|ri)  But it should be Pr(1 and 2|ro) < Pr(1 and 2|ri)!  5
Introduction(cont.) Solve the Label Bias Problem Change the state-transition structure of the model Start with fully-connected model and let the training procedure figure out a good structure. 6
Conditional Random Fields Random Field ,[object Object],Example : ,[object Object],7
Conditional Random Fields Suppose P(Yv| X, all other Y) = P(Yv|X, neighbors(Yv)) then X with Y is a conditional random field ,[object Object]
P(Y3 | X, all other Y) = P(Y3 |X, Y2, Y4)X = X1,…, Xn-1, Xn 8
Conditional Random Fields 9 Conditional Distribution[2] ,[object Object]
sk(yi, x, i) is a state feature function of the label at position i and the observation sequence
λkand μkare parameters to be estimated from training data.Conditional Distribution[1] ,[object Object]
y : label sequence
v : vertex from vertex set V
e : edge from edge set E over V
fk: Boolean vertex feature; gk : Boolean edge feature
k : the number of features
λk and μk are parameters to be estimated
y|e is the set of components of y defined by edge e
y|v is the set of components of y defined by vertex vYt-1 Yt Yt+1 ... Xt Xt+1 Xt-1
Conditional Random Fields Conditional Distribution ,[object Object]
Z(x) is a normalization over the data sequence x

More Related Content

What's hot

word level analysis
word level analysis word level analysis
word level analysis tjs1
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
ML_ Unit_1_PART_A
ML_ Unit_1_PART_AML_ Unit_1_PART_A
ML_ Unit_1_PART_ASrimatre K
 
Design and Analysis of Algorithms
Design and Analysis of AlgorithmsDesign and Analysis of Algorithms
Design and Analysis of AlgorithmsSwapnil Agrawal
 
State Space Representation and Search
State Space Representation and SearchState Space Representation and Search
State Space Representation and SearchHitesh Mohapatra
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering AlgorithmLino Possamai
 
BackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and ExamplesBackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and ExamplesFahim Ferdous
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expressionvaluebound
 
Birch Algorithm With Solved Example
Birch Algorithm With Solved ExampleBirch Algorithm With Solved Example
Birch Algorithm With Solved Examplekailash shaw
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1Amrinder Arora
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
Ambiguous & Unambiguous Grammar
Ambiguous & Unambiguous GrammarAmbiguous & Unambiguous Grammar
Ambiguous & Unambiguous GrammarMdImamHasan1
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
 
Turing Machine
Turing MachineTuring Machine
Turing MachineRajendran
 

What's hot (20)

word level analysis
word level analysis word level analysis
word level analysis
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
ML_ Unit_1_PART_A
ML_ Unit_1_PART_AML_ Unit_1_PART_A
ML_ Unit_1_PART_A
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Design and Analysis of Algorithms
Design and Analysis of AlgorithmsDesign and Analysis of Algorithms
Design and Analysis of Algorithms
 
State Space Representation and Search
State Space Representation and SearchState Space Representation and Search
State Space Representation and Search
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
First order logic
First order logicFirst order logic
First order logic
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
BackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and ExamplesBackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and Examples
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Birch Algorithm With Solved Example
Birch Algorithm With Solved ExampleBirch Algorithm With Solved Example
Birch Algorithm With Solved Example
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Ambiguous & Unambiguous Grammar
Ambiguous & Unambiguous GrammarAmbiguous & Unambiguous Grammar
Ambiguous & Unambiguous Grammar
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Turing Machine
Turing MachineTuring Machine
Turing Machine
 

Similar to Conditional Random Fields

Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..butest
 
Chapter_09_ParameterEstimation.pptx
Chapter_09_ParameterEstimation.pptxChapter_09_ParameterEstimation.pptx
Chapter_09_ParameterEstimation.pptxVimalMehta19
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
2012 mdsp pr06  hmm
2012 mdsp pr06  hmm2012 mdsp pr06  hmm
2012 mdsp pr06  hmmnozomuhamada
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
 
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...Satoshi Kura
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleAlexander Litvinenko
 
20070823
2007082320070823
20070823neostar
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 

Similar to Conditional Random Fields (20)

Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..
 
Chapter_09_ParameterEstimation.pptx
Chapter_09_ParameterEstimation.pptxChapter_09_ParameterEstimation.pptx
Chapter_09_ParameterEstimation.pptx
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Section6 stochastic
Section6 stochasticSection6 stochastic
Section6 stochastic
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
3_MLE_printable.pdf
3_MLE_printable.pdf3_MLE_printable.pdf
3_MLE_printable.pdf
 
2012 mdsp pr06  hmm
2012 mdsp pr06  hmm2012 mdsp pr06  hmm
2012 mdsp pr06  hmm
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
simpl_nie_engl
simpl_nie_englsimpl_nie_engl
simpl_nie_engl
 
20070823
2007082320070823
20070823
 
Hmm and neural networks
Hmm and neural networksHmm and neural networks
Hmm and neural networks
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Conditional Random Fields

  • 1. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty, Andrew McCallum, Fernando Pereira Speaker : Shu-Ying Li 1
  • 2. Outline Introduction Conditional Random Fields Parameter Estimated for CRFs Experiments Conclusions 2
  • 3.
  • 4. Assign a joint probability to paired observation and label sequences
  • 5. The parameters typically trained to maximize the joint likelihood of train examplesSt-1 St St+1 Ot Ot+1 3
  • 6.
  • 7. Allow arbitrary, non-independent features of the observation sequence X.
  • 8. The probability of a transition between labels may depend on past and feature observations.Maximum Entropy Markov Models (MEMMs) St-1 St St+1 ... Ot Ot+1 Ot-1 4
  • 9.
  • 10. Introduction(cont.) Solve the Label Bias Problem Change the state-transition structure of the model Start with fully-connected model and let the training procedure figure out a good structure. 6
  • 11.
  • 12.
  • 13. P(Y3 | X, all other Y) = P(Y3 |X, Y2, Y4)X = X1,…, Xn-1, Xn 8
  • 14.
  • 15. sk(yi, x, i) is a state feature function of the label at position i and the observation sequence
  • 16.
  • 17. y : label sequence
  • 18. v : vertex from vertex set V
  • 19. e : edge from edge set E over V
  • 20. fk: Boolean vertex feature; gk : Boolean edge feature
  • 21. k : the number of features
  • 22. λk and μk are parameters to be estimated
  • 23. y|e is the set of components of y defined by edge e
  • 24. y|v is the set of components of y defined by vertex vYt-1 Yt Yt+1 ... Xt Xt+1 Xt-1
  • 25.
  • 26. Z(x) is a normalization over the data sequence x
  • 27. [1] :
  • 28. [2] : where each fj(yi-1, yi, x, i) is either a state function s(yi-1, yi, x, i) or a transition function t(yi-1, yi, x, i). 10
  • 29.
  • 30. Y’ and y are labels drawn from this alphabet.
  • 31. Define a set of n+1 matrices {Mi(x)|i=1,…,n+1}, where each Mi(x) is a matrix with elements of the form= exp ( ) 11
  • 32. Conditional Random Fields The normalization function is the (start, end) entry of the product of these matrices. The conditional probability of label sequence y is: [1] [2] where, y0 = start and yn+1 = end 12
  • 33. Parameter Estimated for CRFs Problem definition : determine the parameters θ= (λ1,λ2,…;μ1,μ2…). Goal : maximize the log-likelihood objective function. 13 [1] br />where is the empirical distribution of training data. This function is concave, guaranteeing convergence to the global maximum. [2] Ep[‧]denotes expectation with respect to distribution p
  • 34.
  • 35. δλk for edge feature fk is the solution of
  • 36. Efficiently computing the exponential sums on the right-hand sides of the these equations is problematic.->Because T(x, y) is a global property of (x, y) and dynamic programming will sum over sequence with potentially varying T. Dynamic Programming [2]
  • 37. Parameter Estimated for CRFs For each index i=0,…,n+1, we define forward vectors αi(x) and backward vectors βi(x) : [1] : [2]: 15
  • 38.
  • 39.
  • 40. Where S is a constant chosen so that s(x(i) , y) 0 for all y and all observation vectors x(i) in the training set
  • 42. Feature s is “global” : it does not correspond to any particular edge or vertex.16
  • 43. Parameter Estimated for CRFs Algorithm S [1] where δλk s = = = 17
  • 44. Parameter Estimated for CRFs Algorithm S [1] The constant S in algorithm S can be quite large, since in practice it is proportional to the length of the longest training observation sequence. The algorithm may converge slowly, taking very small steps toward the maximum in each iteration. 18
  • 45.
  • 46. Use forward-back ward recurrences to compute the expectations ak,t of feature fk and bk,t of feature gk given that T(x) = t.βk and γk are the unique positive roots to the following polynomial equations. which can be easily computed by Newton’s method. 19
  • 47.
  • 50. CRFs solve the label bias problem.20
  • 51.
  • 52. MEMMs converge in 100 iterations.MEMMs vs. HMM 21
  • 54.
  • 55. When the data is mostlysecond order   ½, the discriminatively trained CRF usually outperforms the MEMM23
  • 56.
  • 57. Data set: Penn Tree bank
  • 59. Use the optimal MEMM parameter vector as a starting point for training the corresponding CRF to accelerate convergence speed.24
  • 60. Conclusions Discriminatively trained models for sequence segmentation and labeling. Combination of arbitrary, overlapping and agglomerative observation features from both the past and future. Efficient training and decoding based on dynamic programming. Parameter estimation guaranteed to find the global optimum. 25
  • 61. Reference 26 J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilisticmodels for segmenting and labeling sequence data. In InternationalConference on Machine Learning, 2001. Hanna M. Wallach. Conditional Random Fields: An Introduction. University of Pennsylvania CIS Technical Report MS-CIS-04-21. 參考投影片(by RongkunShen)