NLP with Deep Learning

NLP with
Deep Learning
Fatih Mehmet Güler

Outline
• My Background

• MS CENG 2010

• YFYI 2012 & Intel Global Challenge

• TÜBİTAK 1512 (2013)

• Projects so far (Intelligent Search Assistant, Neural Machine Translation, Summarization, Company Similarity)

• NLP with Deep Learning

• ‘NLP Almost From Scratch’ paper

• LSTM - SRL paper

• Word2Vec, Glove, Elmo, BERT

• POS/NER/CHUNK/SRL

• QA - SQuAD

• Seq2Seq

• Siamese Networks

• Practical Applications & Tools & Problems

• PyTorch, AllenNLP, SentencePiece (BPE), LSTM Sequence Problem

• What’s Next?

My Background
• MS CENG - METU 2010

• Courses

• Artiﬁcial Intelligence

• Pattern Recognition

• Computational Linguistics

• Knowledge Engineering

• Syntax, Semantics and Computation

• Advanced Graphics

• Advanced Unix

• Real Time and Embedded Software Development

• Projects

• Implementation of Massively Multiplayer Online Game Architecture for Educational Games

• Conceptual Graph Based Expert System Shell

• Natural Intelligence – Question Answering System

• Voice Command Recognition With Nearest Neighbor Approach

• Relational Reinforcement Learning for Hitori Puzzle

• YFYİ 2012 & Intel Global Challenge

• TÜBİTAK 1512

My Background
• 2009-2010 Natural Intelligence Project
– Commonsense Question Answering with Conceptual Graphs
(IJCAI 2009, ICCS 2010)
– CCG, C&C Tools, Conceptual Graphs, Common Sense Ontology
(Open Cyc), KRR

Projects
• Intelligent Search Assistant

• Neural Machine Translation (PragmaMT)

• Summarization (OzetGecer)

• Company Similarity (PragmaPredict)

Beam Search Manipulation
Example

NLP with Deep Learning
• Stages of Natural Language Processing

• POS, NER, CHUNK, SRL (+Parsing of Course)

• NLP from Scratch Paper

• Word2Vec Glove, Elmo, BERT

• Question Answering - SQuAD

• Seq2Seq - Machine Translation

SRL with LSTM Paper
• End-to-end Learning of Semantic Role Labeling Using
Recurrent Neural Networks

• Jie Zhou and Wei Xu, 2015 (Baidu Research)

•

Word Vectors
• Word2Vec

• CBOW: predict the word by the context

• several times faster to train than the skip-gram, slightly better accuracy for the
frequent words

• Skip-Gram: predict the context by the word

• works well with small amount of the training data, represents well even rare
words or phrases

• Glove: Count-based model that learn vectors by essentially doing dimensionality
reduction on the co-occurrence counts matrix

• Elmo

• BERT

ELMO
• Bidirectional LSTM Language Model

• Dynamic Word Embedding

• Embedding changes according to the context

BERT
• Replaces language modeling with “masked language
modeling”

• Words in a sentence are randomly erased and replaced
with a special token (“masked”) with some small
probability, 15%.

• Then, a Transformer is used to generate a prediction for
the masked word based on the unmasked words
surrounding it, both to the left and right.

Seq2Seq Applications
• Machine Translation

• Summarization

• Email Reply

• Question Answering

Practical Applications
• Frameworks

• PyTorch

• TensorFlow

• Keras

• More High Level

• AllenNLP

• spaCy

• Flair, PyText, Torchtext

• Problems

• Unknown Word: Byte Pair Encoding - Sentence Piece

• LSTM Long Sequence Problem

What’s Next?
• More Variants of Elmo/BERT - Transfer Learning

• More NLP Applications - Embeddings all the way

• My Unsolicited Advice :)

• deeplearning.ai (course 5 - sequence models)

• read lots of papers (http://arxiv-sanity.com)

• twitter & facebook (!)

• Andrew Ng, Yann Lecun, Andrej Karpathy

NLP with Deep Learning

More Related Content

What's hot

Similar to NLP with Deep Learning

Recently uploaded

NLP with Deep Learning