SlideShare a Scribd company logo
1 of 47
Download to read offline
let's build a system to describe an
image
Debarko De (@debarko)
Engineering Manager @ Practo
Who am I?
● Debarko De
● twitter.com/@debarko
● ex @Hashcube
○ Facebook
○ Mobile platforms
● Engineering Manager @ Practo
○ eCommerce Division of Practo
○ Medicines & Lab Tests
● AI ML Application Developer (Researcher)
Agenda
● Problem Statement
● Basic building blocks for the network
● CNN
● Transfer Learning
● RNN
● LSTM
● How do we wire them together?
● Code
● Other places this can be implemented
● Interaction & Questions
Setting right expectations
● I’ll cover the basic theory needed to understand the network
● This is not a math session about NNs
● It’s more about implementation rather than theory
● Preferable I would like to take the questions at the end
● Slides and Code will be shared separately
● Time is of essence, so let's begin.
What are we trying to do?
Use cases
● Visual to Text systems for blind people
● Search Engines for searching medical records
● Auto Tagging medical imaging data
● Tagging video consultations
Parts to an Image Captioning System
● CNN
● RNN with LSTM unit
● Training Data
● Training
● Eval with BLEU Score
Convolutional Neural Networks
CNN vs Traditional Imaging Ways
References for CNN
● Image classification using CNN
https://www.slideshare.net/debarko/image-classification-using-cnn
● Deep Learning for Noobs (Part 1 & Part 2)
https://hackernoon.com/supervised-deep-learning-in-image-classification-for-noobs-part-1-9f831b6d430d
● Karpathy’s Article on CNN
http://cs231n.github.io/convolutional-networks/
Recurrent Neural Network
RNN
● As humans we understand context
● Every single time we don’t reset our understanding
● Thoughts have persistence
● Traditional NNs like CNNs don’t have persistence
● Usage: speech recognition, language modeling, translation
PANDARUS:
Alas, I think he shall be come approached and the day
When little srain would be attain'd into being never fed,
And who is but a chain and subjects of his death,
I should not sleep.
Second Senator:
They are away this miseries, produced upon my soul,
Breaking and strongly should be buried, when I perish
The earth and thoughts of many states.
DUKE VINCENTIO:
Well, your wit is in the care of side and that.
Second Lord:
They would be ruled after this chamber, and
my fair nues begun out of the fact, to be conveyed,
Whose noble souls I'll have the heart of the wars.
Clown:
Come, sir, I will make did behold your worship.
VIOLA: I'll drink it.
● RNNs are supposed to remember
● Like in video understanding based on previous scenes
● Can RNNs do it?
Long term dependency problem
the grass is _____
the grass is green
In a long sentence that is really big do you think that RNNs
can really remember the text that happened here? Will it
remember whether the sentence was ____ or ____?
In a long sentence that is really big do you think that RNNs
can really remember the text that happened here? Will it
remember whether the sentence was ____ or ____?
In a long sentence that is really big do you think that RNNs
can really remember the text that happened here? Will it
remember whether the sentence was long or short?
RNNs can’t connect the dots ...
● RNNs technically should be capable to store long dependencies
● In practice the data gets diluted
● This problem gave birth to LSTM Networks
Long term dependency problem
Long Short Term Memory networks
A Regular RNN
LSTM
Sigmoid and tanh neural layers
LSTM
Sigmoid and tanh neural layers
Final Model
Training Data
1. Flickr 8k Dataset (Link)
2. Flickr 30k Dataset (Link)
3. Microsoft COCO Dataset (Link)
8kDB: M. Hodosh, P. Young and J. Hockenmaier (2013) "Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics", Journal of Artifical Intellegence Research,
Volume 47, pages 853-899 http://www.jair.org/papers/paper3994.html
Let’s get our hands dirty
Get Set Code...
Evaluation
BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate
translation of text to one or more reference translations.
Human captions are the reference translations
Generated text via the LSTM network is the candidate translation
DEMO
https://cs.stanford.edu/people/karpathy/deepimagesent/generationdemo/
http://localhost:3000/ - Demo server running locally on my laptop
Other places to use this
● Fashion websites can ask users to take photos and then get matching dresses for
sale
● Blind people can use a mobile app which reads out the current scene, helping
them understand the environment better and also enjoy the beauty of sunset.
● Image search engines can be built on this
Current Production Ready models
● Google Show and Tell
● NeuralTalk2 (https://github.com/karpathy/neuraltalk2)
● https://www.leadergpu.com/
References
● http://colah.github.io/posts/2015-08-Understanding-LSTMs/
● http://www.jair.org/media/3994/live-3994-7274-jair.pdf
● https://github.com/debarko/truview_models
● https://github.com/debarko/truview
● https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.ht
ml
● https://github.com/tensorflow/models/tree/master/research/im2txt
● One of the best GPU server farms (https://www.leadergpu.com/)
● https://en.wikipedia.org/wiki/BLEU
fin.
@debarko

More Related Content

What's hot

Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Edureka!
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief NetworksHasan H Topcu
 
Text summarization using deep learning
Text summarization using deep learningText summarization using deep learning
Text summarization using deep learningAbu Kaisar
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)Susang Kim
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)nikhilus85
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance SegmentationHichem Felouat
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnSumeraHangi
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amramit nagarkoti
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learningijtsrd
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 

What's hot (20)

Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 
Text summarization using deep learning
Text summarization using deep learningText summarization using deep learning
Text summarization using deep learning
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)
 
Text summarization
Text summarization Text summarization
Text summarization
 
Human Emotion Recognition
Human Emotion RecognitionHuman Emotion Recognition
Human Emotion Recognition
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance Segmentation
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Dive Deep Into Deep Learning
Dive Deep Into Deep LearningDive Deep Into Deep Learning
Dive Deep Into Deep Learning
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 

Similar to Image captioning with Keras and Tensorflow - Debarko De @ Practo

Conversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopConversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopTom Bocklisch
 
Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...PyData
 
Deprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackDeprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackJustina Petraitytė
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deakin University
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningDr. Ananth Krishnamoorthy
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0Plain Concepts
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfOrtus Solutions, Corp
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionLiad Magen
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsShreyas Suresh Rao
 
Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1Gendry Morales
 
AIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonAIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonNhi Nguyen
 
Writing effective content
Writing effective contentWriting effective content
Writing effective contentHarshal Patil
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxsaivinay93
 

Similar to Image captioning with Keras and Tensorflow - Debarko De @ Practo (20)

Conversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopConversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData Workshop
 
Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...
 
Deprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackDeprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stack
 
Understanding deep learning
Understanding deep learningUnderstanding deep learning
Understanding deep learning
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2
 
Messaging
MessagingMessaging
Messaging
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Keras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learningKeras: A versatile modeling layer for deep learning
Keras: A versatile modeling layer for deep learning
 
Messaging
MessagingMessaging
Messaging
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 
Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1
 
AIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with PythonAIS Technical Development Workshop 2: Text Analytics with Python
AIS Technical Development Workshop 2: Text Analytics with Python
 
Writing effective content
Writing effective contentWriting effective content
Writing effective content
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptx
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Image captioning with Keras and Tensorflow - Debarko De @ Practo

  • 1. let's build a system to describe an image Debarko De (@debarko) Engineering Manager @ Practo
  • 2. Who am I? ● Debarko De ● twitter.com/@debarko ● ex @Hashcube ○ Facebook ○ Mobile platforms ● Engineering Manager @ Practo ○ eCommerce Division of Practo ○ Medicines & Lab Tests ● AI ML Application Developer (Researcher)
  • 3. Agenda ● Problem Statement ● Basic building blocks for the network ● CNN ● Transfer Learning ● RNN ● LSTM ● How do we wire them together? ● Code ● Other places this can be implemented ● Interaction & Questions
  • 4. Setting right expectations ● I’ll cover the basic theory needed to understand the network ● This is not a math session about NNs ● It’s more about implementation rather than theory ● Preferable I would like to take the questions at the end ● Slides and Code will be shared separately ● Time is of essence, so let's begin.
  • 5. What are we trying to do?
  • 6.
  • 7.
  • 8. Use cases ● Visual to Text systems for blind people ● Search Engines for searching medical records ● Auto Tagging medical imaging data ● Tagging video consultations
  • 9. Parts to an Image Captioning System ● CNN ● RNN with LSTM unit ● Training Data ● Training ● Eval with BLEU Score
  • 11.
  • 12.
  • 13. CNN vs Traditional Imaging Ways
  • 14.
  • 15. References for CNN ● Image classification using CNN https://www.slideshare.net/debarko/image-classification-using-cnn ● Deep Learning for Noobs (Part 1 & Part 2) https://hackernoon.com/supervised-deep-learning-in-image-classification-for-noobs-part-1-9f831b6d430d ● Karpathy’s Article on CNN http://cs231n.github.io/convolutional-networks/
  • 16.
  • 18. RNN ● As humans we understand context ● Every single time we don’t reset our understanding ● Thoughts have persistence ● Traditional NNs like CNNs don’t have persistence ● Usage: speech recognition, language modeling, translation
  • 19.
  • 20.
  • 21.
  • 22. PANDARUS: Alas, I think he shall be come approached and the day When little srain would be attain'd into being never fed, And who is but a chain and subjects of his death, I should not sleep. Second Senator: They are away this miseries, produced upon my soul, Breaking and strongly should be buried, when I perish The earth and thoughts of many states. DUKE VINCENTIO: Well, your wit is in the care of side and that. Second Lord: They would be ruled after this chamber, and my fair nues begun out of the fact, to be conveyed, Whose noble souls I'll have the heart of the wars. Clown: Come, sir, I will make did behold your worship. VIOLA: I'll drink it.
  • 23. ● RNNs are supposed to remember ● Like in video understanding based on previous scenes ● Can RNNs do it? Long term dependency problem
  • 24. the grass is _____
  • 25. the grass is green
  • 26. In a long sentence that is really big do you think that RNNs can really remember the text that happened here? Will it remember whether the sentence was ____ or ____?
  • 27. In a long sentence that is really big do you think that RNNs can really remember the text that happened here? Will it remember whether the sentence was ____ or ____?
  • 28. In a long sentence that is really big do you think that RNNs can really remember the text that happened here? Will it remember whether the sentence was long or short?
  • 29. RNNs can’t connect the dots ...
  • 30. ● RNNs technically should be capable to store long dependencies ● In practice the data gets diluted ● This problem gave birth to LSTM Networks Long term dependency problem
  • 31. Long Short Term Memory networks
  • 33. LSTM Sigmoid and tanh neural layers
  • 34. LSTM Sigmoid and tanh neural layers
  • 36. Training Data 1. Flickr 8k Dataset (Link) 2. Flickr 30k Dataset (Link) 3. Microsoft COCO Dataset (Link) 8kDB: M. Hodosh, P. Young and J. Hockenmaier (2013) "Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics", Journal of Artifical Intellegence Research, Volume 47, pages 853-899 http://www.jair.org/papers/paper3994.html
  • 37.
  • 38. Let’s get our hands dirty
  • 39.
  • 41. Evaluation BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations. Human captions are the reference translations Generated text via the LSTM network is the candidate translation
  • 43. Other places to use this ● Fashion websites can ask users to take photos and then get matching dresses for sale ● Blind people can use a mobile app which reads out the current scene, helping them understand the environment better and also enjoy the beauty of sunset. ● Image search engines can be built on this
  • 44. Current Production Ready models ● Google Show and Tell ● NeuralTalk2 (https://github.com/karpathy/neuraltalk2) ● https://www.leadergpu.com/
  • 45. References ● http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ● http://www.jair.org/media/3994/live-3994-7274-jair.pdf ● https://github.com/debarko/truview_models ● https://github.com/debarko/truview ● https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.ht ml ● https://github.com/tensorflow/models/tree/master/research/im2txt ● One of the best GPU server farms (https://www.leadergpu.com/) ● https://en.wikipedia.org/wiki/BLEU
  • 46. fin.