Slides from the July Bay Area NLP Reading Group meetup covering Boosting Named Entity Recognition with Neural Character Embeddings by dos Santos and Guimaraes
2. Announcements
Join our Slack channel!
https://bay-area-nlp-reading.slack.com/
To join, message me (Katie Bauer) on Meetup, talk to me after the meeting or
email bay.area.nlp.reading.group@gmail.com
3. Want to help out?
Present a paper you love
Demo your favorite NLP tool or library
Host a future meetup
Participate!
4. What is NER?
Extracting proper nouns and classifying into categories
- Universally: person, location, organization
- Date/time, currencies, domain-specific
Traditional Approaches:
- gazetteers (list lookup)
- shallow parsing - ‘based in San Francisco’
Difficulties:
- Reconciling different versions of names - Noam Chomsky vs. Professor Chomsky
- Washington - person, place, collective name for US government
- May - person or month?
5. What are Convolutional Neural Nets?
1. Divide input into windows
2. Calculate some sort of summary
3. Feed that summary to next layer
4. Divide summary into windows
5. Summarize the summary
And so on and so forth
6. What does that look like for language?
Windows are word contexts
If wi
= ‘movie’,
[wi-2
, wi-1
, wi
, wi+1
, wi+2
] = [like, this, movie, very, much]
Wi
is a column vector
7. Model
Task: Given a sentence, score the likelihood of each named entity class word for
each word
Input:
Sentence of N words
{w1
,w2
, … , wn-1
, wn
}
Words
wn
= [wwrd
,wwch
]
8. Model
Scoring
Concatenate all word vectors centered around word n to get vector r
Pass r through two layers of the neural network
Check transition score Aut
to see likelihood of tags given previous tags
Store all possible tag sequences
Pick most likely sequence at end of sentence
Optimization
Sentence score is conditional probability, so minimize negative log likelihood
Backpropogated stochastic gradient descent
9. Corpora
Portuguese
- Word embeddings initialized with three corpora
- Trained and tested on HAREM
- HAREM 1 for training, miniHAREM for test
Spanish
- Word embeddings initialized with Spanish Wikipedia
- Trained and tested on SPA CoNLL-2002
- SPA CoNLL-2002 has predivided training, development and test sets
14. Takeaways
Different types of information are captured at word and character level
Prior knowledge (pretrained word embeddings) improves performance
With no prior knowledge, a bigger data set is better
15. Additional Resources
Introduction to Named Entity Recognition
https://gate.ac.uk/sale/talks/stupidpoint/diana-fb.ppt
Understanding Convolutional Neural Networks for NLP
http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
Implementing a CNN for Text Classification in Tensorflow
http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/