SlideShare a Scribd company logo
Images and words
mechanics of automated captioning
with neural networks
Alberto Massidda
Who we are
● Founded in 2001;
● Branches in Milan, Rome and London;
● Market leader in enterprise ready solutions based on Open Source tech;
● Expertise:
○ DevOps
○ Cloud
○ BigData and many more...
This presentation is Open Source (yay!)
https://creativecommons.org/licenses/by-nc-sa/3.0/
Outline
1. Task introduction
2. Object recognition
3. Language generation
4. Putting all together
5. Improving performance
6. Beyond captioning: deep image search
The task
Generating a description from an image.
“A man jumps over a skateboard”
The challenges
1. Recognize objects in the image
2. Generate a fluent description in natural language
Neural object recognition
A solved problem: Convolutional Neural Networks do the trick.
CNN is an architecture specialized in finding topological invariants in the input.
Finds relationships between atoms and infers higher abstractions.
Highly resistant to noise and spatial transformations.
It learns automatically what are the relevant features to extract from an input.
Not limited to images: CNNs can be applied to text, audio, etc..
An image as integers
A handwritten “8” can be
represented as a matrix of
integers.
● 0 for blank
● 1-255 for grayscales
white-to-black
Architecture of a ConvNet
Filter
Convolution
+
ReLU
Max
Pooling Filter
Convolution
+
ReLU
Max
Pooling
Fully
Connected
1. Convolution
2. Non Linearity (ReLU)
3. Pooling or Sub Sampling
4. Classification (Fully Connected Layer)
Convolution intuition
Let’s multiply a sliding matrix (the “brushing filter”) with our input matrix.
For example, the matrix does edge detection.
Convolution in CNN
Each new generated image is called “channel”. A common RGB has 3 channels.
Channels hold different perspectives about the image.
We start with random filters and tune these matrixes as part of our training.
We end up with filters that have learned perspectives of interest.
Convolution example
Rectifier Linear Unit
We can apply another filter to rule out pixel that don’t contribute.
Max Pooling 1/2
After this, we downsample the image by “hashing” it to fewer values. We can:
1 1 2 4
5 6 7 8
3 2 1 0
1 2 3 4
6 8
3 4
13 21
8 8
Max pooling
2x2
stride 2
Sum pooling
2x2
stride 2
● Max: pick only the
highest element
● Sum: sum together all
the elements
Max Pooling 2/2
Fully connected layer
After a couple of “convolute, relu and pool” cycles, we have maybe 128 channels
of 14x14 pixel images.
Concat and reshape them in a linear array of 25088 cells.
Feed it to a feed forward neural network that will output our classes.
CNN demo time
Real time web handwritten digit recognition
http://scs.ryerson.ca/~aharley/vis/conv/flat.html
There are a lot of “famous” nets that can be freely downloaded and used off the shelf,
like ResNet which has an error rate of 3.6% over 20000 categories.
Why not just using a MLP?
Why MLP suffers
The Multi Layer Perceptron can actually classify images as just array of pixels.
But it loses if I move and/or rotate the image.
This is because it lacks support for learning the invariant topological properties
that are maintained when the image goes through a spatial transformation.
Language generation with Recurrent Networks
Language generation is a serial task. We generate words one after another.
This is well modeled by Recurrent Neural Cells: a neuron that uses itself over and
over again to accept serial inputs, outputting each time a new value.
Words as integers without embedding
Vocabulary of words.
V = [‘fight’, ‘kill’, ‘queen’, ‘king’, ‘man’, ‘woman’, ‘love’,...]
“One hot vector” encoding representation of single words.
‘fight’ = [1 0 0 0 0 0 0 …]
‘kill’ = [0 1 0 0 0 0 0 …]
‘queen’ = [0 0 1 0 0 0 0 …]
Can correlate documents (TF-IDF), but can’t correlate single words to each other:
“I fight the king” = [ 1 1 0 1 0 0 0 …]
“fight the tirannny”= [ 0 1 0 1 0 0 1 …]
Words as floats with vector embedding
Word embedding.
Fixed length, real valued vector encoding representation of single words.
Close concepts have close vectors.
‘fight’ = [0.17 0.53 0.89 0.03 0.00 0.54 0.11 ]
‘kill’ = [0.17 0.53 0.91 0.06 0.00 0.54 0.12 ]
‘queen’ = [0.22 0.45 0.13 0.53 0.90 0.41 0.00 ]
Vector operation yields to coherent results: king - man + woman = queen
How language is generated
x1
h1
y1
x2
h2
y2
x3
h3
y3
“What” “is” “the”
“problem”“the”“is”target word
output likelihood
hidden state
input embedding
input word
Whh
Why
Wxh
RNN and the problem of memory
All network state is held in a single cell, used over and over again. Internal state
can get really complicated. Moving the values around during training can lead to
loss of data.
RNN has a “plugin” architecture, in which we can use different types of cells:
Simple RNN cell: fastest, but breaks over long sequences. Outdated.
LSTM cell: slower, supports selectively forgetting and keeping data. Standard.
GRU cell: like LSTM, but faster due to simpler internal architecture. State of art.
RNN demo
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Putting all together
This is a classical seq2seq.
An image is fed to the CNN.
The CNN generates a state that
model the scene as a cluster of
objects.
The state is fed to a LSTM cell
Avoid getting distracted and Attention
We can train an
intermediate network called
Attention that emphasizes
relationships between
different parts of the
encoder (image) with
different time step of the
decoder (current word
being generated).
How attention works
End to end demo
https://www.captionbot.ai/
Beyond captioning: deep image search
Inputting an image and a question, the network will output an answer.
Chain together CNN and RNN models to a FC outputting to our vocabulary.
http://vqa.cloudcv.org/
CNN
h
How many wheels has the skate?
RNN h
F
C
F
C
F
C
Thank you

More Related Content

What's hot

Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
Akshay Hegde
 
Basics of Deep learning
Basics of Deep learningBasics of Deep learning
Basics of Deep learning
Ramesh Kumar
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
Roshan Chettri
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
Philip Zheng
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
健程 杨
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial
Ligeng Zhu
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
Sungjoon Choi
 
P03 neural networks cvpr2012 deep learning methods for vision
P03 neural networks cvpr2012 deep learning methods for visionP03 neural networks cvpr2012 deep learning methods for vision
P03 neural networks cvpr2012 deep learning methods for vision
zukun
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
Xiaohu ZHU
 
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep LearningAnimesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
NUS Institute of Applied Learning Sciences and Educational Technology
 
Deep learning
Deep learning Deep learning
Deep learning
Rajgupta258
 
Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...
Divya Gera
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
Chun-Hao Chang
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
Divya Gera
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
Massimiliano Ruocco
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
Buhwan Jeong
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
Asim Jalis
 
Ersatz meetup - DeepLearning4j Demo
Ersatz meetup - DeepLearning4j DemoErsatz meetup - DeepLearning4j Demo
Ersatz meetup - DeepLearning4j Demo
Adam Gibson
 
Speech Processing with deep learning
Speech Processing  with deep learningSpeech Processing  with deep learning
Speech Processing with deep learning
Mohamed Essam
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introduction
ananth
 

What's hot (20)

Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Basics of Deep learning
Basics of Deep learningBasics of Deep learning
Basics of Deep learning
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
P03 neural networks cvpr2012 deep learning methods for vision
P03 neural networks cvpr2012 deep learning methods for visionP03 neural networks cvpr2012 deep learning methods for vision
P03 neural networks cvpr2012 deep learning methods for vision
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
 
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep LearningAnimesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
Animesh Prasad and Muthu Kumar Chandrasekaran - WESST - Basics of Deep Learning
 
Deep learning
Deep learning Deep learning
Deep learning
 
Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
Ersatz meetup - DeepLearning4j Demo
Ersatz meetup - DeepLearning4j DemoErsatz meetup - DeepLearning4j Demo
Ersatz meetup - DeepLearning4j Demo
 
Speech Processing with deep learning
Speech Processing  with deep learningSpeech Processing  with deep learning
Speech Processing with deep learning
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introduction
 

Similar to Alberto Massidda - Images and words: mechanics of automated captioning with neural networks - Codemotion Milan 2018

Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
RADO7900
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
Sasha Lazarevic
 
Cnn
CnnCnn
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
nyomans1
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
Muhanad Al-khalisy
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
Oswald Campesato
 
Deep learning
Deep learningDeep learning
Deep learning
Aman Kamboj
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Oswald Campesato
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
Oswald Campesato
 
lecun-01.ppt
lecun-01.pptlecun-01.ppt
lecun-01.ppt
VenkyChinna8
 
Deep learning for Computer Vision intro
Deep learning for Computer Vision introDeep learning for Computer Vision intro
Deep learning for Computer Vision intro
Nadav Carmel
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Oswald Campesato
 
Deep Learning and TensorFlow
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlow
Oswald Campesato
 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief nets
zukun
 
What's Wrong With Deep Learning?
What's Wrong With Deep Learning?What's Wrong With Deep Learning?
What's Wrong With Deep Learning?
Philip Zheng
 
deep learning
deep learningdeep learning
deep learning
Hassanein Alwan
 

Similar to Alberto Massidda - Images and words: mechanics of automated captioning with neural networks - Codemotion Milan 2018 (20)

Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
Cnn
CnnCnn
Cnn
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
nlp dl 1.pdf
nlp dl 1.pdfnlp dl 1.pdf
nlp dl 1.pdf
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
 
lecun-01.ppt
lecun-01.pptlecun-01.ppt
lecun-01.ppt
 
Deep learning for Computer Vision intro
Deep learning for Computer Vision introDeep learning for Computer Vision intro
Deep learning for Computer Vision intro
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
Deep Learning and TensorFlow
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlow
 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief nets
 
What's Wrong With Deep Learning?
What's Wrong With Deep Learning?What's Wrong With Deep Learning?
What's Wrong With Deep Learning?
 
deep learning
deep learningdeep learning
deep learning
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
Codemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
Codemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Recently uploaded

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 

Recently uploaded (20)

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 

Alberto Massidda - Images and words: mechanics of automated captioning with neural networks - Codemotion Milan 2018

  • 1. Images and words mechanics of automated captioning with neural networks Alberto Massidda
  • 2. Who we are ● Founded in 2001; ● Branches in Milan, Rome and London; ● Market leader in enterprise ready solutions based on Open Source tech; ● Expertise: ○ DevOps ○ Cloud ○ BigData and many more...
  • 3. This presentation is Open Source (yay!) https://creativecommons.org/licenses/by-nc-sa/3.0/
  • 4. Outline 1. Task introduction 2. Object recognition 3. Language generation 4. Putting all together 5. Improving performance 6. Beyond captioning: deep image search
  • 5. The task Generating a description from an image. “A man jumps over a skateboard”
  • 6. The challenges 1. Recognize objects in the image 2. Generate a fluent description in natural language
  • 7. Neural object recognition A solved problem: Convolutional Neural Networks do the trick. CNN is an architecture specialized in finding topological invariants in the input. Finds relationships between atoms and infers higher abstractions. Highly resistant to noise and spatial transformations. It learns automatically what are the relevant features to extract from an input. Not limited to images: CNNs can be applied to text, audio, etc..
  • 8. An image as integers A handwritten “8” can be represented as a matrix of integers. ● 0 for blank ● 1-255 for grayscales white-to-black
  • 9. Architecture of a ConvNet Filter Convolution + ReLU Max Pooling Filter Convolution + ReLU Max Pooling Fully Connected 1. Convolution 2. Non Linearity (ReLU) 3. Pooling or Sub Sampling 4. Classification (Fully Connected Layer)
  • 10. Convolution intuition Let’s multiply a sliding matrix (the “brushing filter”) with our input matrix. For example, the matrix does edge detection.
  • 11. Convolution in CNN Each new generated image is called “channel”. A common RGB has 3 channels. Channels hold different perspectives about the image. We start with random filters and tune these matrixes as part of our training. We end up with filters that have learned perspectives of interest.
  • 13. Rectifier Linear Unit We can apply another filter to rule out pixel that don’t contribute.
  • 14. Max Pooling 1/2 After this, we downsample the image by “hashing” it to fewer values. We can: 1 1 2 4 5 6 7 8 3 2 1 0 1 2 3 4 6 8 3 4 13 21 8 8 Max pooling 2x2 stride 2 Sum pooling 2x2 stride 2 ● Max: pick only the highest element ● Sum: sum together all the elements
  • 16. Fully connected layer After a couple of “convolute, relu and pool” cycles, we have maybe 128 channels of 14x14 pixel images. Concat and reshape them in a linear array of 25088 cells. Feed it to a feed forward neural network that will output our classes.
  • 17. CNN demo time Real time web handwritten digit recognition http://scs.ryerson.ca/~aharley/vis/conv/flat.html There are a lot of “famous” nets that can be freely downloaded and used off the shelf, like ResNet which has an error rate of 3.6% over 20000 categories.
  • 18. Why not just using a MLP?
  • 19. Why MLP suffers The Multi Layer Perceptron can actually classify images as just array of pixels. But it loses if I move and/or rotate the image. This is because it lacks support for learning the invariant topological properties that are maintained when the image goes through a spatial transformation.
  • 20. Language generation with Recurrent Networks Language generation is a serial task. We generate words one after another. This is well modeled by Recurrent Neural Cells: a neuron that uses itself over and over again to accept serial inputs, outputting each time a new value.
  • 21. Words as integers without embedding Vocabulary of words. V = [‘fight’, ‘kill’, ‘queen’, ‘king’, ‘man’, ‘woman’, ‘love’,...] “One hot vector” encoding representation of single words. ‘fight’ = [1 0 0 0 0 0 0 …] ‘kill’ = [0 1 0 0 0 0 0 …] ‘queen’ = [0 0 1 0 0 0 0 …] Can correlate documents (TF-IDF), but can’t correlate single words to each other: “I fight the king” = [ 1 1 0 1 0 0 0 …] “fight the tirannny”= [ 0 1 0 1 0 0 1 …]
  • 22. Words as floats with vector embedding Word embedding. Fixed length, real valued vector encoding representation of single words. Close concepts have close vectors. ‘fight’ = [0.17 0.53 0.89 0.03 0.00 0.54 0.11 ] ‘kill’ = [0.17 0.53 0.91 0.06 0.00 0.54 0.12 ] ‘queen’ = [0.22 0.45 0.13 0.53 0.90 0.41 0.00 ] Vector operation yields to coherent results: king - man + woman = queen
  • 23. How language is generated x1 h1 y1 x2 h2 y2 x3 h3 y3 “What” “is” “the” “problem”“the”“is”target word output likelihood hidden state input embedding input word Whh Why Wxh
  • 24. RNN and the problem of memory All network state is held in a single cell, used over and over again. Internal state can get really complicated. Moving the values around during training can lead to loss of data. RNN has a “plugin” architecture, in which we can use different types of cells: Simple RNN cell: fastest, but breaks over long sequences. Outdated. LSTM cell: slower, supports selectively forgetting and keeping data. Standard. GRU cell: like LSTM, but faster due to simpler internal architecture. State of art.
  • 26. Putting all together This is a classical seq2seq. An image is fed to the CNN. The CNN generates a state that model the scene as a cluster of objects. The state is fed to a LSTM cell
  • 27. Avoid getting distracted and Attention We can train an intermediate network called Attention that emphasizes relationships between different parts of the encoder (image) with different time step of the decoder (current word being generated).
  • 29. End to end demo https://www.captionbot.ai/
  • 30. Beyond captioning: deep image search Inputting an image and a question, the network will output an answer. Chain together CNN and RNN models to a FC outputting to our vocabulary. http://vqa.cloudcv.org/ CNN h How many wheels has the skate? RNN h F C F C F C