Natural language processing
Parsing and understanding language
Full Day of Applied AI
Morning
Session 1 Intro to Artificial Intelligence
09:00-09:45 Introduction to Applied AI
09:45-10:00 Coffee and break
Session 2 Live Coding a machine learning app
10:00-10:10 Getting your machine ready for machine learning
10:10-10.20 Training and evaluating the model
10.20-10.50 Improving the model
10.50-11.00 Coffee and break
Session 3 Machine learning in the wild - deployment
11:00-11.15 Coding exercise continued
11:15-11:45 Serving your own machine learning model | Code
11:45-11:55 How to solve problems | interactive exercise
11:55-12:00 Q and A
Lunch
12:00-13:00 Lunch
Afternoon
Session 4 Hello World Deep Learning (MNIST)
13:00-13:15 Deep Learning intro
13:00-13.15 Image recognition and CNNs | Talk |
13:15-13:45 Building your own convolutional neural network | Code |
13:45-14:00 Coffee and break
Session 5 Natural Language Processing
14:00-14.30 Natural language processing | Talk |
14:30-14:45 Working with language | Code |
14:45-15:00 Coffee and break
Session 6 Conversational interfaces and Time Series
14:00-14.20 Conversational interfaces
14:20-14:45 Time Series prediction
14:45-15:00 Coffee and break
Session 7 Generative models and style transfer
16:00-16.30 Generative models | Talk |
16:30-16:45 Trying out GANS and style transfer | Code |
16:45-17:00 Coffee and break
Anton Osika AI Research Engineer Sana Labs AB
anton.osika@gmail.com
Birger Moëll Machine Learning Engineer
birger.moell@gmail.com
ECG
Deep learning for speech
State of the art in machine
learning can understand human
speech well enough to perform
fairly complicated actions based
on speech commands.
Natural Language Processing
Text to speech (Wavenet State of the Art, https://deepmind.com/blog/wavenet-generative-
model-raw-audio)
Speech to text (Alexa, Google Home, Google https://cloud.google.com/speech-to-text/)
Text to vector (Glove, Word2Vec, Bert)
Natural language creation (LSTMs, Generative models, GPT2)
Word vectors
Word vectors are high dimensional representations of language where each word
is assigned a vector based on its closeness to other words. This gives the model a
representation of language that includes bias and changes over time.
Word vector algebra
We can now compare how similar words are
How are word vectors used in ML?
Answer: Transfer learning
Using a model pre-trained on large text databases and then fine-tuning it on the desired task
has revolutionized NLP
Attention is all you need
The progress in NLP has been largely
based on the Attention Mechanism from
https://arxiv.org/abs/1706.03762.
Attention is all you need.
As opposed to directional models the
Attention model reads the entire
sequence of words at once instead of
sequentially.
It is therefore considered “bidirectional”,
though it would be more accurate to say
that it is “non-directional”.
The imagenet moment for NLP
October 2018, BERT Arrives
BERT is hailed as an imagenet moment for
natural language processing
https://thegradient.pub/nlp-imagenet/
Bert Paper
https://arxiv.org/abs/1810.04805
New state of the art on most NLP tasks
https://rajpurkar.github.io/SQuAD-
explorer/explore/1.1/dev/Harvard_Un
iversity.html?model=BERT%20(ense
mble)%20(Google%20AI%20Langua
ge)&version=1.1
Training BERT
Training BERT
How BERT works
BERT works using attention for
computing how much each word should
be combined with each other word to
compute their collective meaning.
http://jalammar.github.io/illustrated-bert/
Query and Key vectors “attend” Value vectors
Query, Key and Value
vectors gives each
word knowledge
about how much it
should be combined
with other words.
Attention visualized
The trainable Query, Key and
Value vectors result in what we
call “attention”.
The BERT encoder uses stacked attention
layers that does the attention computation
many times in parallel.
Output from the encoder is directly
mapped to the predicted output sequence.
Stacked attention heads
Fine tuning on the desired task
Fine tuning on the desired task
There exists several implementations of BERT
that gives us access to BERT-tensors to work
with.
However, in order to use BERT for language
tasks we need to train a classifier for our specific
problem.
This can be done with a neural network that uses
BERT-encoded word-vectors with labels as their
input (for classification).
Tensorflow Model
https://github.com/google-
research/bert
Pytorch Model
https://github.com/huggingfa
ce/pytorch-pretrained-BERT
Bert as Service
https://github.com/hanxiao/b
ert-as-service
GPT-2
GPT-2 is a model
created by Open AI
that improves state of
the art in language
creation.
Because of fears of
malicious use the full
model was never
released.
https://talktotransformer.com/
Lets get started coding
Open up the notebooks inside Natural
Language Processing to train your own
deep neural network for natural language
processing.
Almost human level...
Almost human level...

Natural language processing

  • 1.
    Natural language processing Parsingand understanding language
  • 2.
    Full Day ofApplied AI Morning Session 1 Intro to Artificial Intelligence 09:00-09:45 Introduction to Applied AI 09:45-10:00 Coffee and break Session 2 Live Coding a machine learning app 10:00-10:10 Getting your machine ready for machine learning 10:10-10.20 Training and evaluating the model 10.20-10.50 Improving the model 10.50-11.00 Coffee and break Session 3 Machine learning in the wild - deployment 11:00-11.15 Coding exercise continued 11:15-11:45 Serving your own machine learning model | Code 11:45-11:55 How to solve problems | interactive exercise 11:55-12:00 Q and A Lunch 12:00-13:00 Lunch Afternoon Session 4 Hello World Deep Learning (MNIST) 13:00-13:15 Deep Learning intro 13:00-13.15 Image recognition and CNNs | Talk | 13:15-13:45 Building your own convolutional neural network | Code | 13:45-14:00 Coffee and break Session 5 Natural Language Processing 14:00-14.30 Natural language processing | Talk | 14:30-14:45 Working with language | Code | 14:45-15:00 Coffee and break Session 6 Conversational interfaces and Time Series 14:00-14.20 Conversational interfaces 14:20-14:45 Time Series prediction 14:45-15:00 Coffee and break Session 7 Generative models and style transfer 16:00-16.30 Generative models | Talk | 16:30-16:45 Trying out GANS and style transfer | Code | 16:45-17:00 Coffee and break Anton Osika AI Research Engineer Sana Labs AB anton.osika@gmail.com Birger Moëll Machine Learning Engineer birger.moell@gmail.com
  • 3.
  • 4.
    Deep learning forspeech State of the art in machine learning can understand human speech well enough to perform fairly complicated actions based on speech commands.
  • 5.
    Natural Language Processing Textto speech (Wavenet State of the Art, https://deepmind.com/blog/wavenet-generative- model-raw-audio) Speech to text (Alexa, Google Home, Google https://cloud.google.com/speech-to-text/) Text to vector (Glove, Word2Vec, Bert) Natural language creation (LSTMs, Generative models, GPT2)
  • 6.
    Word vectors Word vectorsare high dimensional representations of language where each word is assigned a vector based on its closeness to other words. This gives the model a representation of language that includes bias and changes over time.
  • 7.
  • 8.
    We can nowcompare how similar words are
  • 9.
    How are wordvectors used in ML? Answer: Transfer learning Using a model pre-trained on large text databases and then fine-tuning it on the desired task has revolutionized NLP
  • 10.
    Attention is allyou need The progress in NLP has been largely based on the Attention Mechanism from https://arxiv.org/abs/1706.03762. Attention is all you need. As opposed to directional models the Attention model reads the entire sequence of words at once instead of sequentially. It is therefore considered “bidirectional”, though it would be more accurate to say that it is “non-directional”.
  • 11.
    The imagenet momentfor NLP October 2018, BERT Arrives BERT is hailed as an imagenet moment for natural language processing https://thegradient.pub/nlp-imagenet/ Bert Paper https://arxiv.org/abs/1810.04805
  • 12.
    New state ofthe art on most NLP tasks https://rajpurkar.github.io/SQuAD- explorer/explore/1.1/dev/Harvard_Un iversity.html?model=BERT%20(ense mble)%20(Google%20AI%20Langua ge)&version=1.1
  • 13.
  • 14.
  • 15.
    How BERT works BERTworks using attention for computing how much each word should be combined with each other word to compute their collective meaning. http://jalammar.github.io/illustrated-bert/
  • 16.
    Query and Keyvectors “attend” Value vectors Query, Key and Value vectors gives each word knowledge about how much it should be combined with other words.
  • 17.
    Attention visualized The trainableQuery, Key and Value vectors result in what we call “attention”.
  • 18.
    The BERT encoderuses stacked attention layers that does the attention computation many times in parallel. Output from the encoder is directly mapped to the predicted output sequence. Stacked attention heads
  • 19.
    Fine tuning onthe desired task
  • 20.
    Fine tuning onthe desired task There exists several implementations of BERT that gives us access to BERT-tensors to work with. However, in order to use BERT for language tasks we need to train a classifier for our specific problem. This can be done with a neural network that uses BERT-encoded word-vectors with labels as their input (for classification). Tensorflow Model https://github.com/google- research/bert Pytorch Model https://github.com/huggingfa ce/pytorch-pretrained-BERT Bert as Service https://github.com/hanxiao/b ert-as-service
  • 21.
    GPT-2 GPT-2 is amodel created by Open AI that improves state of the art in language creation. Because of fears of malicious use the full model was never released. https://talktotransformer.com/
  • 22.
    Lets get startedcoding Open up the notebooks inside Natural Language Processing to train your own deep neural network for natural language processing.
  • 23.
  • 24.

Editor's Notes