SlideShare a Scribd company logo
1 of 14
Download to read offline
State-of-the-Art Named Entity
Recognition Framework
Anirudh Ganesh
Jayavardhan P Reddy
Speech and Language Processing
(CSE 5525) Spring 2018
Final Presentation
Prof. Eric Fosler-Lussier
Introduction
● The main goal with this tutorial is to develop a state of the art Named Entity
Recognizer.
● Along with this we also want the student to be able to understand and
implement a deep learning approach to solve a given problem.
● The tutorial is also structured in such a way as to provide the student with a
decent hands on experience in replicating a journal publication, given the
data and the model used.
Importance
● This is important since deep learning is gaining widespread traction for most
modern machine learning applications especially NLP.
● Replication of the results in such deep learning research publication is critical
for accelerated research growth.
● This is one crucial point that we wanted to tackle, that is replicability of deep
learning based publications and studies, because we feel that there is a huge
shortage of such work.
Resources used
● PyTorch
● NumPy
● Jupyter Notebook
● Python 3.5
● CoNLL 2003 dataset for NER
● Paper: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF by
Xuezhe Ma, Eduard Hovy; https://arxiv.org/abs/1603.01354
● F1 score : 91.21
Main parts of the Architecture:
● Data Preparation
● Convolutional Neural Network (CNN) Encoder for Character Level
representation
● Bi-directional Long Short Term Memory (LSTM) for Word-Level Encoding
● Conditional Random Fields (CRF) for output decoding
Data Preparation
The paper*
uses the English data from CoNLL 2003 shared task
● Tag Update:
In the paper, the authors use the tagging Scheme ( BIOES ) rather than BIO (which
is used by the dataset). So, we need to first update the data to convert tag scheme
from BIO to BIOES. (Beginning, Inside, Outside, End, Single or Unit Length)
● Mappings:
Create mapping for Words-to-ids, Tag-to-ids, characters-to-ids
* “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF”, Xuezhe Ma and Eduard H. Hovy, CoRR (2016) abs/1603.01354
Word Embedding
● Using Pre-Trained Embeddings:
The paper uses Global Vectors (GloVe) 100 dimension vectors trained on the (
Wikipedia 2014 + Gigaword 5 ) corpus containing 6 Billion Words.
● Word embedding Mapping
Create mapping for words in vocabulary to word embeddings
Model Details (CNN Encoder for Character Level representation)
● Convolution layer on top that generates
spatial coherence across characters
● Maxpool extracts meaningful features out
of our convolution layer
● This now gives us a dense vector
representation of each word.
● This representation will be concatenated
with the pre-trained GloVe embeddings
using a simple lookup
Figure Illustrating the Character Embedding CNN layer.
(Adapted from the paper)
Model Details (Bi-LSTM for Word-Level Encoding)
● The word-embeddings that we generated in
our previous layer, we feed to a bi-directional
LSTM model
● The forward layer takes in a sequence of word
vectors and generates a new vector based on
what it has seen so far in the forward
direction
● This vector can be thought of as a summary
of all the words it has seen
● The backwards layer does the same but in
opposite direction
Figure Illustrating the Sequence labelling LSTM layer.
(Adapted from the paper)
Model Details (CRF Layer)
● Even if we capture some information from
the context thanks to the bi-LSTM, the
tagging decision needs to take advantage of
this.
● Since NER is heavily influenced by
neighboring tagging decisions.
● This is why we apply CRFs over traditional
softmax
● Given a sequence of words and a sequence
of score vectors, a linear-chain CRF defines
a global score such that it generates
sentence level likelihoods for optimal tags.
Figure Illustrating Conditional Random Fields (CRF) for
sequence tagging.
(Adapted from the paper)
Computing Tags
Recall that the CRF computes a conditional probability. Let y be a tag sequence
and x an input sequence of words. Then we compute maximum likelihood as,
Viterbi decode is basically applying dynamic programming to choosing our tag
sequence
Closing Comments
Sample Output from the given model
Experience of presenting to friends
● Initial few drafts
○ Too much time preprocessing data
○ PyTorch API
● Changes
○ Detailed comments explaining each step and the intuition behind it.
○ Detailed comments for the PyTorch functions
● Final Draft:
○ Takes a little longer than the intended time if the student solving it
doesn’t have sufficient background in deep learning and PyTorch.
Thank You!

More Related Content

What's hot

Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert  pre_training_of_deep_bidirectional_transformers_for_language_understandingBert  pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
ThyrixYang1
 
Bt8903, c# programming
Bt8903, c# programmingBt8903, c# programming
Bt8903, c# programming
smumbahelp
 
Mca1020 programming in c
Mca1020  programming in cMca1020  programming in c
Mca1020 programming in c
smumbahelp
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
RIILP
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
RIILP
 

What's hot (20)

Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert  pre_training_of_deep_bidirectional_transformers_for_language_understandingBert  pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
 
Bt8903, c# programming
Bt8903, c# programmingBt8903, c# programming
Bt8903, c# programming
 
Mca1020 programming in c
Mca1020  programming in cMca1020  programming in c
Mca1020 programming in c
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
Compression-Based Parts-of-Speech Tagger for The Arabic Language
Compression-Based Parts-of-Speech Tagger for The Arabic LanguageCompression-Based Parts-of-Speech Tagger for The Arabic Language
Compression-Based Parts-of-Speech Tagger for The Arabic Language
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
 
Electra
ElectraElectra
Electra
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
Mi0041 java and web design
Mi0041  java and web designMi0041  java and web design
Mi0041 java and web design
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
Mit4021 c# and .net
Mit4021   c# and .netMit4021   c# and .net
Mit4021 c# and .net
 
An Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language RepresentationsAn Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language Representations
 
Asp.net main
Asp.net mainAsp.net main
Asp.net main
 
Object oriented programming interview questions
Object oriented programming interview questionsObject oriented programming interview questions
Object oriented programming interview questions
 

Similar to End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF

Overlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammarsOverlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammars
IAEME Publication
 

Similar to End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF (20)

Class Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP TechniquesClass Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP Techniques
 
D017232729
D017232729D017232729
D017232729
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text Detection
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
Text Processing Framework for Hindi
Text Processing Framework for HindiText Processing Framework for Hindi
Text Processing Framework for Hindi
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
Speech To Speech Translation
Speech To Speech TranslationSpeech To Speech Translation
Speech To Speech Translation
 
Overlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammarsOverlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammars
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 

End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF

  • 1. State-of-the-Art Named Entity Recognition Framework Anirudh Ganesh Jayavardhan P Reddy Speech and Language Processing (CSE 5525) Spring 2018 Final Presentation Prof. Eric Fosler-Lussier
  • 2. Introduction ● The main goal with this tutorial is to develop a state of the art Named Entity Recognizer. ● Along with this we also want the student to be able to understand and implement a deep learning approach to solve a given problem. ● The tutorial is also structured in such a way as to provide the student with a decent hands on experience in replicating a journal publication, given the data and the model used.
  • 3. Importance ● This is important since deep learning is gaining widespread traction for most modern machine learning applications especially NLP. ● Replication of the results in such deep learning research publication is critical for accelerated research growth. ● This is one crucial point that we wanted to tackle, that is replicability of deep learning based publications and studies, because we feel that there is a huge shortage of such work.
  • 4. Resources used ● PyTorch ● NumPy ● Jupyter Notebook ● Python 3.5 ● CoNLL 2003 dataset for NER ● Paper: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF by Xuezhe Ma, Eduard Hovy; https://arxiv.org/abs/1603.01354 ● F1 score : 91.21
  • 5. Main parts of the Architecture: ● Data Preparation ● Convolutional Neural Network (CNN) Encoder for Character Level representation ● Bi-directional Long Short Term Memory (LSTM) for Word-Level Encoding ● Conditional Random Fields (CRF) for output decoding
  • 6. Data Preparation The paper* uses the English data from CoNLL 2003 shared task ● Tag Update: In the paper, the authors use the tagging Scheme ( BIOES ) rather than BIO (which is used by the dataset). So, we need to first update the data to convert tag scheme from BIO to BIOES. (Beginning, Inside, Outside, End, Single or Unit Length) ● Mappings: Create mapping for Words-to-ids, Tag-to-ids, characters-to-ids * “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF”, Xuezhe Ma and Eduard H. Hovy, CoRR (2016) abs/1603.01354
  • 7. Word Embedding ● Using Pre-Trained Embeddings: The paper uses Global Vectors (GloVe) 100 dimension vectors trained on the ( Wikipedia 2014 + Gigaword 5 ) corpus containing 6 Billion Words. ● Word embedding Mapping Create mapping for words in vocabulary to word embeddings
  • 8. Model Details (CNN Encoder for Character Level representation) ● Convolution layer on top that generates spatial coherence across characters ● Maxpool extracts meaningful features out of our convolution layer ● This now gives us a dense vector representation of each word. ● This representation will be concatenated with the pre-trained GloVe embeddings using a simple lookup Figure Illustrating the Character Embedding CNN layer. (Adapted from the paper)
  • 9. Model Details (Bi-LSTM for Word-Level Encoding) ● The word-embeddings that we generated in our previous layer, we feed to a bi-directional LSTM model ● The forward layer takes in a sequence of word vectors and generates a new vector based on what it has seen so far in the forward direction ● This vector can be thought of as a summary of all the words it has seen ● The backwards layer does the same but in opposite direction Figure Illustrating the Sequence labelling LSTM layer. (Adapted from the paper)
  • 10. Model Details (CRF Layer) ● Even if we capture some information from the context thanks to the bi-LSTM, the tagging decision needs to take advantage of this. ● Since NER is heavily influenced by neighboring tagging decisions. ● This is why we apply CRFs over traditional softmax ● Given a sequence of words and a sequence of score vectors, a linear-chain CRF defines a global score such that it generates sentence level likelihoods for optimal tags. Figure Illustrating Conditional Random Fields (CRF) for sequence tagging. (Adapted from the paper)
  • 11. Computing Tags Recall that the CRF computes a conditional probability. Let y be a tag sequence and x an input sequence of words. Then we compute maximum likelihood as, Viterbi decode is basically applying dynamic programming to choosing our tag sequence
  • 12. Closing Comments Sample Output from the given model
  • 13. Experience of presenting to friends ● Initial few drafts ○ Too much time preprocessing data ○ PyTorch API ● Changes ○ Detailed comments explaining each step and the intuition behind it. ○ Detailed comments for the PyTorch functions ● Final Draft: ○ Takes a little longer than the intended time if the student solving it doesn’t have sufficient background in deep learning and PyTorch.