Recent Advances in
Natural Language
Processing
Seth Grimes
Alta Plana Corporation
@SethGrimes – grimes@altaplana.com
November 16, 2021
2019 & 2020
tedcomd.com
meetup.com/NY-NLP
Disclaimer
I use A LOT of commercial product materials in the
slides that follow. These are illustrations and not
recommendations, and I have no financial interest in
the companies (unless disclosed).
Natural Language Processing
Natural Language Understanding (NLU)
• OCR, language detection, tokenization, parsing
• Information extraction: parts of speech, chunks , entities,
aspects, topics/themes, relations, attributes, events, intent …
• Speech processing: verbal and nonverbal
Natural Language Generation (NLG)
NLU + NLG together, for example:
• Summarization
• Machine translation
• Conversational interfaces
• Question answering
Functions
https://gradientflow.com/2020nlpsurvey/
Empirical Methods in Natural Language Processing (EMNLP2020)
Explore EMNLP21
Early Days (1958)
Transcribing
Encoding
Abstracting
Who needs to know?
Who knows what?
What is known?
Hans Peter Luhn
“A Business Intelligence System”
IBM Journal, October 1958
“Statistical information
derived from word frequency
and distribution is used by the
machine to compute a relative
measure of significance, first
for individual words and then
for sentences. Sentences scoring
highest in significance are
extracted and printed out to
become the auto-abstract.”
-- H.P. Luhn, The Automatic
Creation of Literature Abstracts,
IBM Journal, 1958.
“All models are wrong, but some are useful.”
-- George Box
+17 years
https://en.wikipedia.org/wiki/Document-term_matrix
Skipping Over a Lot of Stuff…
Rules
Taxonomies & ontologies
Booleans
Statistical models, especially cooccurrence
Sequence models: RNNs & LSTM
…
Word2Vec (2013)
https://code.google.com/p/word2vec/
https://developers.google.com/machine-learning/crash-course/embeddings/translating-to-a-lower-dimensional-space
“You shall know a
word by the
company it
keeps.”
– J.R. Firth, 1957
Word2Vec: Key Concepts
Continuous bag-of-
words (CBOW)
predicts a word from
a window of
surrounding words.
Skip-gram uses a
word to predict a
window of
surrounding words.
Doc2Vec (2014)
https://arxiv.org/abs/1405.4053
Sense2Vec (2015)
https://arxiv.org/abs/1511.06388
“Sense2vec (Trask
et. al, 2015) is a
new twist on
word2vec that lets
you learn more
interesting, detailed
and context-
sensitive word
vectors.”
Encoder-
Decoder
Architecture
Here, machine
translation:
https://leonoverweel.com/projects/2019/nlu-coursework/
Transformers (2017)
https://arxiv.org/abs/1706.03762
2020:
https://arxiv.org/pdf/1910.03771.pdf
BERT (2018)
https://arxiv.org/abs/1810.04805
https://arxiv.org/pdf/1910.03771.pdf
Transfer Learning
https://pennylane.ai/qml/demos/tutorial_quantum_transfer_learning.html
Transfer Learning
https://pennylane.ai/qml/demos/tutorial_quantum_transfer_learning.html
https://pair-code.github.io/lit/
Back To The Garden
NLP Libraries
https://blog.rasa.com/rasa-nlu-in-depth-part-1-intent-classification/
Hugging Face
Model Hub
Hugging Face Pipeline Example
Hugging Face Pipeline Examples
Cloud Services
Amazon Comprehend Medical
https://aws.amazon.com/comprehend/medical/
“With a simple API call to Amazon Comprehend Medical you can quickly and
accurately extract information such as medical conditions, medications, dosages,
tests, treatments and procedures, and protected health information while retaining
the context of the information. Amazon Comprehend Medical can identify the
relationships among the extracted information to help you build applications for use
cases like population health analytics, clinical trial management, pharmacovigilance,
and summarization. You can also use Amazon Comprehend Medical to link the
extracted information to medical ontologies...”
AWS Comprehend:
Ontology Linking
https://aws.amazon.com/blogs/aws/new-amazon-comprehend-medical-adds-ontology-linking/
Services and Solutions: Examples
https://www.qualtrics.com/experience-management/research/text-analysis/
Conversation / Analytics
https://blog.rasa.com/conversational-ai-your-guide-to-five-levels-of-ai-assistants-in-enterprise/
(2018)
Voice conversation analytics
Keep Up With NLP Developments
https://www.language-technology.com/twin https://newsletter.ruder.io/
Recent Advances in
Natural Language
Processing
Seth Grimes
Alta Plana Corporation
@SethGrimes – grimes@altaplana.com
November 16, 2021

Recent Advances in Natural Language Processing