SlideShare a Scribd company logo
1 of 14
Download to read offline
State-of-the-Art Named Entity
Recognition Framework
Anirudh Ganesh
Jayavardhan P Reddy
Speech and Language Processing
(CSE 5525) Spring 2018
Final Presentation
Prof. Eric Fosler-Lussier
Introduction
● The main goal with this tutorial is to develop a state of the art Named Entity
Recognizer.
● Along with this we also want the student to be able to understand and
implement a deep learning approach to solve a given problem.
● The tutorial is also structured in such a way as to provide the student with a
decent hands on experience in replicating a journal publication, given the
data and the model used.
Importance
● This is important since deep learning is gaining widespread traction for most
modern machine learning applications especially NLP.
● Replication of the results in such deep learning research publication is critical
for accelerated research growth.
● This is one crucial point that we wanted to tackle, that is replicability of deep
learning based publications and studies, because we feel that there is a huge
shortage of such work.
Resources used
● PyTorch
● NumPy
● Jupyter Notebook
● Python 3.5
● CoNLL 2003 dataset for NER
● Paper: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF by
Xuezhe Ma, Eduard Hovy; https://arxiv.org/abs/1603.01354
● F1 score : 91.21
Main parts of the Architecture:
● Data Preparation
● Convolutional Neural Network (CNN) Encoder for Character Level
representation
● Bi-directional Long Short Term Memory (LSTM) for Word-Level Encoding
● Conditional Random Fields (CRF) for output decoding
Data Preparation
The paper*
uses the English data from CoNLL 2003 shared task
● Tag Update:
In the paper, the authors use the tagging Scheme ( BIOES ) rather than BIO (which
is used by the dataset). So, we need to first update the data to convert tag scheme
from BIO to BIOES. (Beginning, Inside, Outside, End, Single or Unit Length)
● Mappings:
Create mapping for Words-to-ids, Tag-to-ids, characters-to-ids
* “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF”, Xuezhe Ma and Eduard H. Hovy, CoRR (2016) abs/1603.01354
Word Embedding
● Using Pre-Trained Embeddings:
The paper uses Global Vectors (GloVe) 100 dimension vectors trained on the (
Wikipedia 2014 + Gigaword 5 ) corpus containing 6 Billion Words.
● Word embedding Mapping
Create mapping for words in vocabulary to word embeddings
Model Details (CNN Encoder for Character Level representation)
● Convolution layer on top that generates
spatial coherence across characters
● Maxpool extracts meaningful features out
of our convolution layer
● This now gives us a dense vector
representation of each word.
● This representation will be concatenated
with the pre-trained GloVe embeddings
using a simple lookup
Figure Illustrating the Character Embedding CNN layer.
(Adapted from the paper)
Model Details (Bi-LSTM for Word-Level Encoding)
● The word-embeddings that we generated in
our previous layer, we feed to a bi-directional
LSTM model
● The forward layer takes in a sequence of word
vectors and generates a new vector based on
what it has seen so far in the forward
direction
● This vector can be thought of as a summary
of all the words it has seen
● The backwards layer does the same but in
opposite direction
Figure Illustrating the Sequence labelling LSTM layer.
(Adapted from the paper)
Model Details (CRF Layer)
● Even if we capture some information from
the context thanks to the bi-LSTM, the
tagging decision needs to take advantage of
this.
● Since NER is heavily influenced by
neighboring tagging decisions.
● This is why we apply CRFs over traditional
softmax
● Given a sequence of words and a sequence
of score vectors, a linear-chain CRF defines
a global score such that it generates
sentence level likelihoods for optimal tags.
Figure Illustrating Conditional Random Fields (CRF) for
sequence tagging.
(Adapted from the paper)
Computing Tags
Recall that the CRF computes a conditional probability. Let y be a tag sequence
and x an input sequence of words. Then we compute maximum likelihood as,
Viterbi decode is basically applying dynamic programming to choosing our tag
sequence
Closing Comments
Sample Output from the given model
Experience of presenting to friends
● Initial few drafts
○ Too much time preprocessing data
○ PyTorch API
● Changes
○ Detailed comments explaining each step and the intuition behind it.
○ Detailed comments for the PyTorch functions
● Final Draft:
○ Takes a little longer than the intended time if the student solving it
doesn’t have sufficient background in deep learning and PyTorch.
Thank You!

More Related Content

What's hot

Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert  pre_training_of_deep_bidirectional_transformers_for_language_understandingBert  pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert pre_training_of_deep_bidirectional_transformers_for_language_understandingThyrixYang1
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
 
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...Fwdays
 
Bt8903, c# programming
Bt8903, c# programmingBt8903, c# programming
Bt8903, c# programmingsmumbahelp
 
Mca1020 programming in c
Mca1020  programming in cMca1020  programming in c
Mca1020 programming in csmumbahelp
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model reviewSeoung-Ho Choi
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
Compression-Based Parts-of-Speech Tagger for The Arabic Language
Compression-Based Parts-of-Speech Tagger for The Arabic LanguageCompression-Based Parts-of-Speech Tagger for The Arabic Language
Compression-Based Parts-of-Speech Tagger for The Arabic LanguageCSCJournals
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemAssociation for Computational Linguistics
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentationbhavesh_physics
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Yuki Tomo
 
Mi0041 java and web design
Mi0041  java and web designMi0041  java and web design
Mi0041 java and web designStudy Stuff
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
 
Mit4021 c# and .net
Mit4021   c# and .netMit4021   c# and .net
Mit4021 c# and .netsmumbahelp
 
An Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language RepresentationsAn Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language Representationszperjaccico
 
Object oriented programming interview questions
Object oriented programming interview questionsObject oriented programming interview questions
Object oriented programming interview questionsKeet Sugathadasa
 

What's hot (20)

Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert  pre_training_of_deep_bidirectional_transformers_for_language_understandingBert  pre_training_of_deep_bidirectional_transformers_for_language_understanding
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
 
Bt8903, c# programming
Bt8903, c# programmingBt8903, c# programming
Bt8903, c# programming
 
Mca1020 programming in c
Mca1020  programming in cMca1020  programming in c
Mca1020 programming in c
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
Compression-Based Parts-of-Speech Tagger for The Arabic Language
Compression-Based Parts-of-Speech Tagger for The Arabic LanguageCompression-Based Parts-of-Speech Tagger for The Arabic Language
Compression-Based Parts-of-Speech Tagger for The Arabic Language
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
 
Electra
ElectraElectra
Electra
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
Mi0041 java and web design
Mi0041  java and web designMi0041  java and web design
Mi0041 java and web design
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
Mit4021 c# and .net
Mit4021   c# and .netMit4021   c# and .net
Mit4021 c# and .net
 
An Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language RepresentationsAn Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language Representations
 
Asp.net main
Asp.net mainAsp.net main
Asp.net main
 
Object oriented programming interview questions
Object oriented programming interview questionsObject oriented programming interview questions
Object oriented programming interview questions
 

Similar to End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF

Class Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP TechniquesClass Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP Techniquesiosrjce
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET Journal
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesIRJET Journal
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig
 
Text Processing Framework for Hindi
Text Processing Framework for HindiText Processing Framework for Hindi
Text Processing Framework for HindiUtsav Chokshi
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHIRJET Journal
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindsporeijdms
 
Speech To Speech Translation
Speech To Speech TranslationSpeech To Speech Translation
Speech To Speech TranslationIRJET Journal
 
Overlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammarsOverlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammarsIAEME Publication
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...IRJET Journal
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET Journal
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGijasuc
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Vimukthi Wickramasinghe
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 

Similar to End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF (20)

Class Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP TechniquesClass Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP Techniques
 
D017232729
D017232729D017232729
D017232729
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text Detection
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
Text Processing Framework for Hindi
Text Processing Framework for HindiText Processing Framework for Hindi
Text Processing Framework for Hindi
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
Speech To Speech Translation
Speech To Speech TranslationSpeech To Speech Translation
Speech To Speech Translation
 
Overlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammarsOverlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammars
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 

Recently uploaded

Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 

Recently uploaded (20)

Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 

End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF

  • 1. State-of-the-Art Named Entity Recognition Framework Anirudh Ganesh Jayavardhan P Reddy Speech and Language Processing (CSE 5525) Spring 2018 Final Presentation Prof. Eric Fosler-Lussier
  • 2. Introduction ● The main goal with this tutorial is to develop a state of the art Named Entity Recognizer. ● Along with this we also want the student to be able to understand and implement a deep learning approach to solve a given problem. ● The tutorial is also structured in such a way as to provide the student with a decent hands on experience in replicating a journal publication, given the data and the model used.
  • 3. Importance ● This is important since deep learning is gaining widespread traction for most modern machine learning applications especially NLP. ● Replication of the results in such deep learning research publication is critical for accelerated research growth. ● This is one crucial point that we wanted to tackle, that is replicability of deep learning based publications and studies, because we feel that there is a huge shortage of such work.
  • 4. Resources used ● PyTorch ● NumPy ● Jupyter Notebook ● Python 3.5 ● CoNLL 2003 dataset for NER ● Paper: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF by Xuezhe Ma, Eduard Hovy; https://arxiv.org/abs/1603.01354 ● F1 score : 91.21
  • 5. Main parts of the Architecture: ● Data Preparation ● Convolutional Neural Network (CNN) Encoder for Character Level representation ● Bi-directional Long Short Term Memory (LSTM) for Word-Level Encoding ● Conditional Random Fields (CRF) for output decoding
  • 6. Data Preparation The paper* uses the English data from CoNLL 2003 shared task ● Tag Update: In the paper, the authors use the tagging Scheme ( BIOES ) rather than BIO (which is used by the dataset). So, we need to first update the data to convert tag scheme from BIO to BIOES. (Beginning, Inside, Outside, End, Single or Unit Length) ● Mappings: Create mapping for Words-to-ids, Tag-to-ids, characters-to-ids * “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF”, Xuezhe Ma and Eduard H. Hovy, CoRR (2016) abs/1603.01354
  • 7. Word Embedding ● Using Pre-Trained Embeddings: The paper uses Global Vectors (GloVe) 100 dimension vectors trained on the ( Wikipedia 2014 + Gigaword 5 ) corpus containing 6 Billion Words. ● Word embedding Mapping Create mapping for words in vocabulary to word embeddings
  • 8. Model Details (CNN Encoder for Character Level representation) ● Convolution layer on top that generates spatial coherence across characters ● Maxpool extracts meaningful features out of our convolution layer ● This now gives us a dense vector representation of each word. ● This representation will be concatenated with the pre-trained GloVe embeddings using a simple lookup Figure Illustrating the Character Embedding CNN layer. (Adapted from the paper)
  • 9. Model Details (Bi-LSTM for Word-Level Encoding) ● The word-embeddings that we generated in our previous layer, we feed to a bi-directional LSTM model ● The forward layer takes in a sequence of word vectors and generates a new vector based on what it has seen so far in the forward direction ● This vector can be thought of as a summary of all the words it has seen ● The backwards layer does the same but in opposite direction Figure Illustrating the Sequence labelling LSTM layer. (Adapted from the paper)
  • 10. Model Details (CRF Layer) ● Even if we capture some information from the context thanks to the bi-LSTM, the tagging decision needs to take advantage of this. ● Since NER is heavily influenced by neighboring tagging decisions. ● This is why we apply CRFs over traditional softmax ● Given a sequence of words and a sequence of score vectors, a linear-chain CRF defines a global score such that it generates sentence level likelihoods for optimal tags. Figure Illustrating Conditional Random Fields (CRF) for sequence tagging. (Adapted from the paper)
  • 11. Computing Tags Recall that the CRF computes a conditional probability. Let y be a tag sequence and x an input sequence of words. Then we compute maximum likelihood as, Viterbi decode is basically applying dynamic programming to choosing our tag sequence
  • 12. Closing Comments Sample Output from the given model
  • 13. Experience of presenting to friends ● Initial few drafts ○ Too much time preprocessing data ○ PyTorch API ● Changes ○ Detailed comments explaining each step and the intuition behind it. ○ Detailed comments for the PyTorch functions ● Final Draft: ○ Takes a little longer than the intended time if the student solving it doesn’t have sufficient background in deep learning and PyTorch.