NLP with
Deep Learning
Fatih Mehmet Güler
Outline
• My Background

• MS CENG 2010

• YFYI 2012 & Intel Global Challenge

• TÜBİTAK 1512 (2013)

• Projects so far (Intelligent Search Assistant, Neural Machine Translation, Summarization, Company Similarity)

• NLP with Deep Learning

• ‘NLP Almost From Scratch’ paper

• LSTM - SRL paper

• Word2Vec, Glove, Elmo, BERT

• POS/NER/CHUNK/SRL

• QA - SQuAD

• Seq2Seq

• Siamese Networks

• Practical Applications & Tools & Problems

• PyTorch, AllenNLP, SentencePiece (BPE), LSTM Sequence Problem

• What’s Next?
My Background
• MS CENG - METU 2010

• Courses

• Artificial Intelligence

• Pattern Recognition

• Computational Linguistics

• Knowledge Engineering

• Syntax, Semantics and Computation

• Advanced Graphics

• Advanced Unix

• Real Time and Embedded Software Development

• Projects

• Implementation of Massively Multiplayer Online Game Architecture for Educational Games

• Conceptual Graph Based Expert System Shell

• Natural Intelligence – Question Answering System

• Voice Command Recognition With Nearest Neighbor Approach

• Relational Reinforcement Learning for Hitori Puzzle

• YFYİ 2012 & Intel Global Challenge

• TÜBİTAK 1512
My Background
• 2009-2010 Natural Intelligence Project
– Commonsense Question Answering with Conceptual Graphs
(IJCAI 2009, ICCS 2010)
– CCG, C&C Tools, Conceptual Graphs, Common Sense Ontology
(Open Cyc), KRR
Projects
• Intelligent Search Assistant

• Neural Machine Translation (PragmaMT)

• Summarization (OzetGecer)

• Company Similarity (PragmaPredict)
Intelligent Search Assistant
Intelligent Search Assistant
Intelligent Search Assistant
Intelligent Search Assistant
Neural Machine Translation
Beam Search Manipulation
Example
Neural Machine Translation
Neural Machine Translation
Summarization
Company Similarity
NLP with Deep Learning
• Stages of Natural Language Processing

• POS, NER, CHUNK, SRL (+Parsing of Course)

• NLP from Scratch Paper

• Word2Vec Glove, Elmo, BERT

• Question Answering - SQuAD

• Seq2Seq - Machine Translation
NLP Stages
The Seminal Paper
SRL with LSTM Paper
• End-to-end Learning of Semantic Role Labeling Using
Recurrent Neural Networks

• Jie Zhou and Wei Xu, 2015 (Baidu Research)

•
Word Vectors
• Word2Vec 

• CBOW: predict the word by the context

• several times faster to train than the skip-gram, slightly better accuracy for the
frequent words

• Skip-Gram:  predict the context by the word 

• works well with small amount of the training data, represents well even rare
words or phrases

• Glove: Count-based model that learn vectors by essentially doing dimensionality
reduction on the co-occurrence counts matrix

• Elmo

• BERT
ELMO
• Bidirectional LSTM Language Model

• Dynamic Word Embedding

• Embedding changes according to the context
BERT
• Replaces language modeling with “masked language
modeling”

• Words in a sentence are randomly erased and replaced
with a special token (“masked”) with some small
probability, 15%. 

• Then, a Transformer is used to generate a prediction for
the masked word based on the unmasked words
surrounding it, both to the left and right.
Sequence to Sequence
Seq2Seq Applications
• Machine Translation

• Summarization

• Email Reply

• Question Answering
Practical Applications
• Frameworks

• PyTorch

• TensorFlow

• Keras

• More High Level

• AllenNLP

• spaCy

• Flair, PyText, Torchtext

• Problems

• Unknown Word: Byte Pair Encoding - Sentence Piece

• LSTM Long Sequence Problem
What’s Next?
• More Variants of Elmo/BERT - Transfer Learning

• More NLP Applications - Embeddings all the way

• My Unsolicited Advice :)

• deeplearning.ai (course 5 - sequence models)

• read lots of papers (http://arxiv-sanity.com)

• twitter & facebook (!)

• Andrew Ng, Yann Lecun, Andrej Karpathy

NLP with Deep Learning

  • 1.
  • 2.
    Outline • My Background •MS CENG 2010 • YFYI 2012 & Intel Global Challenge • TÜBİTAK 1512 (2013) • Projects so far (Intelligent Search Assistant, Neural Machine Translation, Summarization, Company Similarity) • NLP with Deep Learning • ‘NLP Almost From Scratch’ paper • LSTM - SRL paper • Word2Vec, Glove, Elmo, BERT • POS/NER/CHUNK/SRL • QA - SQuAD • Seq2Seq • Siamese Networks • Practical Applications & Tools & Problems • PyTorch, AllenNLP, SentencePiece (BPE), LSTM Sequence Problem • What’s Next?
  • 3.
    My Background • MSCENG - METU 2010 • Courses • Artificial Intelligence • Pattern Recognition • Computational Linguistics • Knowledge Engineering • Syntax, Semantics and Computation • Advanced Graphics • Advanced Unix • Real Time and Embedded Software Development • Projects • Implementation of Massively Multiplayer Online Game Architecture for Educational Games • Conceptual Graph Based Expert System Shell • Natural Intelligence – Question Answering System • Voice Command Recognition With Nearest Neighbor Approach • Relational Reinforcement Learning for Hitori Puzzle • YFYİ 2012 & Intel Global Challenge • TÜBİTAK 1512
  • 4.
    My Background • 2009-2010Natural Intelligence Project – Commonsense Question Answering with Conceptual Graphs (IJCAI 2009, ICCS 2010) – CCG, C&C Tools, Conceptual Graphs, Common Sense Ontology (Open Cyc), KRR
  • 5.
    Projects • Intelligent SearchAssistant • Neural Machine Translation (PragmaMT) • Summarization (OzetGecer) • Company Similarity (PragmaPredict)
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 12.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    NLP with DeepLearning • Stages of Natural Language Processing • POS, NER, CHUNK, SRL (+Parsing of Course) • NLP from Scratch Paper • Word2Vec Glove, Elmo, BERT • Question Answering - SQuAD • Seq2Seq - Machine Translation
  • 19.
  • 20.
  • 22.
    SRL with LSTMPaper • End-to-end Learning of Semantic Role Labeling Using Recurrent Neural Networks • Jie Zhou and Wei Xu, 2015 (Baidu Research) •
  • 23.
    Word Vectors • Word2Vec • CBOW: predict the word by the context • several times faster to train than the skip-gram, slightly better accuracy for the frequent words • Skip-Gram:  predict the context by the word  • works well with small amount of the training data, represents well even rare words or phrases • Glove: Count-based model that learn vectors by essentially doing dimensionality reduction on the co-occurrence counts matrix • Elmo • BERT
  • 26.
    ELMO • Bidirectional LSTMLanguage Model • Dynamic Word Embedding • Embedding changes according to the context
  • 27.
    BERT • Replaces languagemodeling with “masked language modeling” • Words in a sentence are randomly erased and replaced with a special token (“masked”) with some small probability, 15%. • Then, a Transformer is used to generate a prediction for the masked word based on the unmasked words surrounding it, both to the left and right.
  • 28.
  • 29.
    Seq2Seq Applications • MachineTranslation • Summarization • Email Reply • Question Answering
  • 30.
    Practical Applications • Frameworks •PyTorch • TensorFlow • Keras • More High Level • AllenNLP • spaCy • Flair, PyText, Torchtext • Problems • Unknown Word: Byte Pair Encoding - Sentence Piece • LSTM Long Sequence Problem
  • 31.
    What’s Next? • MoreVariants of Elmo/BERT - Transfer Learning • More NLP Applications - Embeddings all the way • My Unsolicited Advice :) • deeplearning.ai (course 5 - sequence models) • read lots of papers (http://arxiv-sanity.com) • twitter & facebook (!) • Andrew Ng, Yann Lecun, Andrej Karpathy