SlideShare a Scribd company logo
Kfir Bar,
Chief Scientist, Basis Technology
Named Entity Recognition
2
Automatically find names of people,
organizations, locations, and more in text across
many languages.
Named entity recognition (NER)
According to Elon Musk, Mars rocket will fly
‘short flights’ next year.
?
5
Context is important
Edward Adelson
Neuroscientist, MIT
Checker shadow illusion
The squares represented by A and B
are of the same color
6
Context is important
Edward Adelson
Neuroscientist, MIT
Checker shadow illusion
The squares represented by A and B
are of the same color
Can't play Spain? Improve your
playing via easy step-by-step video
lessons!
7
But sometimes it gets ambiguous...
8
But sometimes it gets ambiguous...
Can't play Spain? Improve your playing
via easy step-by-step video lessons!
➔ Processing one word after another
➔ Assigning label to each word, based on local as well as global features
➔ Labels are B-PER, I-PER, B-LOC, I-LOC, OTHER, etc. (a.k.a IOB)
I/O am/O working/O for/O Basis/B-ORG Technology/I-ORG
9
NER as a sequence-labeling problem
Traditional ML vs. Deep Learning
I love this movie
words, part of speech tags,
lemmas, brown clusters
[00010010110000101001…..001]
☺ Positive
Feature extraction
Vectorization
Modeling
I love this movie
Embeddings lookup
[0.323, -0.3434, 0.901, …, -0.267]
[-0.4923, 0.554, 0.001, …, -0.365]
[1.58845, 0.478, 0.0901, …, -0.171]
…
[-0.0592, 0.588, -0.01, …, -0.111]
Modeling
☺ Positive
10
Word embeddings
- + BerlinJapan GermanyTokyo =
12
Feed forward network for NER
listen
to
while
I
B-PER
B-LOC
...
...
Layer 1 Layer 2 Output
Spain I-PER
...
13
Recurrent neural network (RNN)
listen
to
while
I
B-PER
B-LOC
...
...
Layer 1 Output
Spain I-PER
...
14
Recurrent neural network (RNN)
listen
to
while
I
B-PER
B-LOC
...
...
Layer 1 Output
Spain I-PER
...
15
Recurrent neural network (RNN)
listen
to
while
I
B-PER
B-LOC
...
...
Layer 1 Output
Spain I-PER
...
16
Recurrent neural network (RNN)
t-1 t t+1
B-PER I-PER OTHER
➔ At each time step we
process one word
concatenated with
the output from
previous time steps
➔ It remembers information
for many time steps
17
Long Short Term Memory (LSTM)
LSTMIt can forget information when
necessary
LSTM LSTM
t-1 t t+1
B-PER I-PER OTHER
18
LSTM for Sequence Labeling
LSTM
Washington
B-PER
LSTM
said
OTHER
LSTM
in
OTHER
LSTM
Chicago
B-LOC
LSTM
last
OTHER
...
+
19
Bidirectional LSTM for Sequence Labeling
LSTM
Washington
B-PER
LSTM
+
LSTM
said
OTHER
LSTM
+
LSTM
in
OTHER
LSTM
+
LSTM
Chicago
B-LOC
LSTM
+
LSTM
last
OTHER
LSTM
...
20
Multilayer LSTM for Sequence Labeling
+
LSTM
Washington
B-PER
LSTM
+
LSTM
said
OTHER
LSTM
+
LSTM
in
OTHER
LSTM
+
LSTM
Chicago
B-LOC
LSTM
+
LSTM
last
OTHER
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
+ + + + +
21
Multilayer LSTM for Sequence Labeling
+
LSTM
Washington
B-PER
LSTM
+
LSTM
said
OTHER
LSTM
+
LSTM
in
OTHER
LSTM
+
LSTM
Chicago
B-LOC
LSTM
+
LSTM
last
OTHER
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
+ + + + +
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
+ + + + +
+
22
Alternative decoding using Conditional Random Fields (CRF)
LSTM
Washington
LSTM
+
LSTM
said
LSTM
+
LSTM
in
LSTM
+
LSTM
Chicago
LSTM
+
LSTM
last
LSTM
...
B-PER OTHER OTHER B-LOC OTHER
+
23
Character encoding
LSTM
Washington
LSTM
+
LSTM
said
LSTM
+
LSTM
in
LSTM
+
LSTM
Chicago
LSTM
+
LSTM
last
LSTM
...
B-PER OTHER OTHER B-LOC OTHER
+
s a i d
24
Overall: better accuracy in multiple languages for NER,
using deep learning!
English Arabic Korean
Deep learning model 91.3 83.3 86.4
Traditional model 89.3 80.3 80.7
https://developer.rosette.com/
27
What does LSTM actually learn?
+
28
Bidirectional LSTM for NER
LSTM
Washington
B-PER
LSTM
+
LSTM
said
OTHER
LSTM
+
LSTM
in
OTHER
LSTM
+
LSTM
Chicago
B-LOC
LSTM
+
LSTM
last
OTHER
LSTM
...
+ + + ++
29
What does LSTM actually learn?
LSTM
Washington
B-PER
LSTM
LSTM
said
OTHER
LSTM
LSTM
in
OTHER
LSTM
LSTM
Chicago
B-LOC
LSTM
LSTM
last
OTHER
LSTM
...
+ + + ++
30
What does LSTM actually learn?
LSTM
Washington
B-PER
LSTM
LSTM
said
OTHER
LSTM
LSTM
in
OTHER
LSTM
LSTM
Chicago
B-LOC
LSTM
LSTM
last
OTHER
LSTM
...
Let’s look at this cell vector over time
...
31
What does LSTM actually learn?
32
Neuron 280 - gets positive around some punctuation marks
33
Neuron 189 - gets negative around potential locations
Questions?
Thank you!
kfir@basistech.com
@kfirbar

More Related Content

Similar to PyData Tel Aviv

Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Positive Hack Days
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
National Cheng Kung University
 
Getting Started with iBeacons (Designers of Things 2014)
Getting Started with iBeacons (Designers of Things 2014)Getting Started with iBeacons (Designers of Things 2014)
Getting Started with iBeacons (Designers of Things 2014)
Daniel Luxemburg
 
Fermín J. Serna - Exploits & Mitigations: EMET [RootedCON 2010]
Fermín J. Serna - Exploits & Mitigations: EMET [RootedCON 2010]Fermín J. Serna - Exploits & Mitigations: EMET [RootedCON 2010]
Fermín J. Serna - Exploits & Mitigations: EMET [RootedCON 2010]RootedCON
 
Another programming language - jeszcze jeden język
Another programming language - jeszcze jeden językAnother programming language - jeszcze jeden język
Another programming language - jeszcze jeden język
Jarek Ratajski
 
Lecture 2: Language
Lecture 2: LanguageLecture 2: Language
Lecture 2: Language
David Evans
 
Introduction to debugging linux applications
Introduction to debugging linux applicationsIntroduction to debugging linux applications
Introduction to debugging linux applications
commiebstrd
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
Gagan Gowda
 

Similar to PyData Tel Aviv (8)

Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
 
Getting Started with iBeacons (Designers of Things 2014)
Getting Started with iBeacons (Designers of Things 2014)Getting Started with iBeacons (Designers of Things 2014)
Getting Started with iBeacons (Designers of Things 2014)
 
Fermín J. Serna - Exploits & Mitigations: EMET [RootedCON 2010]
Fermín J. Serna - Exploits & Mitigations: EMET [RootedCON 2010]Fermín J. Serna - Exploits & Mitigations: EMET [RootedCON 2010]
Fermín J. Serna - Exploits & Mitigations: EMET [RootedCON 2010]
 
Another programming language - jeszcze jeden język
Another programming language - jeszcze jeden językAnother programming language - jeszcze jeden język
Another programming language - jeszcze jeden język
 
Lecture 2: Language
Lecture 2: LanguageLecture 2: Language
Lecture 2: Language
 
Introduction to debugging linux applications
Introduction to debugging linux applicationsIntroduction to debugging linux applications
Introduction to debugging linux applications
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 

PyData Tel Aviv

  • 1. Kfir Bar, Chief Scientist, Basis Technology Named Entity Recognition
  • 2. 2 Automatically find names of people, organizations, locations, and more in text across many languages. Named entity recognition (NER)
  • 3. According to Elon Musk, Mars rocket will fly ‘short flights’ next year.
  • 4. ?
  • 5. 5 Context is important Edward Adelson Neuroscientist, MIT Checker shadow illusion The squares represented by A and B are of the same color
  • 6. 6 Context is important Edward Adelson Neuroscientist, MIT Checker shadow illusion The squares represented by A and B are of the same color
  • 7. Can't play Spain? Improve your playing via easy step-by-step video lessons! 7 But sometimes it gets ambiguous...
  • 8. 8 But sometimes it gets ambiguous... Can't play Spain? Improve your playing via easy step-by-step video lessons!
  • 9. ➔ Processing one word after another ➔ Assigning label to each word, based on local as well as global features ➔ Labels are B-PER, I-PER, B-LOC, I-LOC, OTHER, etc. (a.k.a IOB) I/O am/O working/O for/O Basis/B-ORG Technology/I-ORG 9 NER as a sequence-labeling problem
  • 10. Traditional ML vs. Deep Learning I love this movie words, part of speech tags, lemmas, brown clusters [00010010110000101001…..001] ☺ Positive Feature extraction Vectorization Modeling I love this movie Embeddings lookup [0.323, -0.3434, 0.901, …, -0.267] [-0.4923, 0.554, 0.001, …, -0.365] [1.58845, 0.478, 0.0901, …, -0.171] … [-0.0592, 0.588, -0.01, …, -0.111] Modeling ☺ Positive 10
  • 11. Word embeddings - + BerlinJapan GermanyTokyo =
  • 12. 12 Feed forward network for NER listen to while I B-PER B-LOC ... ... Layer 1 Layer 2 Output Spain I-PER ...
  • 13. 13 Recurrent neural network (RNN) listen to while I B-PER B-LOC ... ... Layer 1 Output Spain I-PER ...
  • 14. 14 Recurrent neural network (RNN) listen to while I B-PER B-LOC ... ... Layer 1 Output Spain I-PER ...
  • 15. 15 Recurrent neural network (RNN) listen to while I B-PER B-LOC ... ... Layer 1 Output Spain I-PER ...
  • 16. 16 Recurrent neural network (RNN) t-1 t t+1 B-PER I-PER OTHER ➔ At each time step we process one word concatenated with the output from previous time steps ➔ It remembers information for many time steps
  • 17. 17 Long Short Term Memory (LSTM) LSTMIt can forget information when necessary LSTM LSTM t-1 t t+1 B-PER I-PER OTHER
  • 18. 18 LSTM for Sequence Labeling LSTM Washington B-PER LSTM said OTHER LSTM in OTHER LSTM Chicago B-LOC LSTM last OTHER ...
  • 19. + 19 Bidirectional LSTM for Sequence Labeling LSTM Washington B-PER LSTM + LSTM said OTHER LSTM + LSTM in OTHER LSTM + LSTM Chicago B-LOC LSTM + LSTM last OTHER LSTM ...
  • 20. 20 Multilayer LSTM for Sequence Labeling + LSTM Washington B-PER LSTM + LSTM said OTHER LSTM + LSTM in OTHER LSTM + LSTM Chicago B-LOC LSTM + LSTM last OTHER LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM + + + + +
  • 21. 21 Multilayer LSTM for Sequence Labeling + LSTM Washington B-PER LSTM + LSTM said OTHER LSTM + LSTM in OTHER LSTM + LSTM Chicago B-LOC LSTM + LSTM last OTHER LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM + + + + + LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM + + + + +
  • 22. + 22 Alternative decoding using Conditional Random Fields (CRF) LSTM Washington LSTM + LSTM said LSTM + LSTM in LSTM + LSTM Chicago LSTM + LSTM last LSTM ... B-PER OTHER OTHER B-LOC OTHER
  • 24. 24 Overall: better accuracy in multiple languages for NER, using deep learning! English Arabic Korean Deep learning model 91.3 83.3 86.4 Traditional model 89.3 80.3 80.7
  • 26.
  • 27. 27 What does LSTM actually learn?
  • 28. + 28 Bidirectional LSTM for NER LSTM Washington B-PER LSTM + LSTM said OTHER LSTM + LSTM in OTHER LSTM + LSTM Chicago B-LOC LSTM + LSTM last OTHER LSTM ...
  • 29. + + + ++ 29 What does LSTM actually learn? LSTM Washington B-PER LSTM LSTM said OTHER LSTM LSTM in OTHER LSTM LSTM Chicago B-LOC LSTM LSTM last OTHER LSTM ...
  • 30. + + + ++ 30 What does LSTM actually learn? LSTM Washington B-PER LSTM LSTM said OTHER LSTM LSTM in OTHER LSTM LSTM Chicago B-LOC LSTM LSTM last OTHER LSTM ... Let’s look at this cell vector over time ...
  • 31. 31 What does LSTM actually learn?
  • 32. 32 Neuron 280 - gets positive around some punctuation marks
  • 33. 33 Neuron 189 - gets negative around potential locations