SlideShare a Scribd company logo
1 of 13
Department of Artificial Intelligence
Winter 2022 (Session: 2022-2023)
G H Raisoni College of Engineering, Nagpur
Presented By:
1. Akundi Harshvardhan (A-20)
2. Arya Bharne (A-24)
3. Priyesh Gawali (A-62)
4. Rajat Satpure (A-65)
Guide:-
prof. Pravin Kshirsagar
Assistant Professor
GHRCE ,Nagpur
Title of the Project:-
Image Captioning using Deep Learning
and NLP
Contents
1. Introduction
2. Abstract
3. Objective
4. Literature survey
5. Proposed Methodology/System architecture.
6. Hardware / Software Specification
7. Conclusion
8. Reference
Introduction
• From a image we can describe what’s in
image , simply we can define the image by
seeing the image.
• In this project, we are developing a
system/model which will describe the
content present in image i.e. our model
will give a caption to a image which is
called Image Captioning.
• In this, we are using a Deep Neural
Network(DNN) ,Convolutional Neural
Network(CNN) and Long Short Term
Memory(LSTM).
• As we are dealing with text data (image
captions) so we are also using Natural
Language Processing (NLP).
• By combining all these we are developing
a model which is called as Image Caption
Generator.
A muscular man standing.
Abstract
• Image captioning is an important task
nowadays. It will help you describe the
image in Editing software, assists visually
impaired and it can also generate captions
for social media posts .
• In recent years, researchers made a
significant progress in image captioning .
• In our solution we are using ‘Long Short
Term Memory’ (LSTM) along with
‘Convolutional Neural Network’(CNN).
• We are using Convolutional Neural Network
(CNN) to extract features from images and
Long Short Term Memory (LSTM) for
generating description from extracted
features of image.
• To describe contents of an image using CNN .
• To showcase the effectiveness of LSTM .
• To create a working model that describes the image on the basis of
features that are extracted.
• To understand the features of an image.
• To predict the next words from extracted features to make a caption.
Objectives
● Template-based approaches are able to generate grammatically correct
captions, but since the templates are predefined, it cannot generate variable-
length captions.
● Retrieval-based methods produce general and grammatically correct captions.
However, they cannot generate more descriptive and semantically correct
captions.
● Captions are most often generated for a whole scene in the image. However,
captions can also be generated for different areas of an image such as in Dense
captioning.
● RNN when used along with CNN had a very short term memory.
● Multimodal recurrent neural network method is similar to the method of Kiros
which uses a fixed length context, but in this method, the temporal context is
stored in a recurrent architecture that allows an arbitrary context length.
Literature Survey (Survey of existing products)
• We describe something using features of that thing . Like if we are describing
image we use it features to describe it. For example :- If we saw a large red
rose, we started describing it by saying “ A big beautiful Red rose.”, in this
sentence we use features like ‘large(size)’, ‘red(colour)’,
’beautiful(appearance)’ to describe the flower i.e. we are giving caption to it .
This process of describing something by seeing it is called as Image Captioning.
• In this project we are developing an Image Caption Generator which extract the
features from image by using Convolutional Neural Network (CNN),from the
extracted features our model will generate the caption for given image by
arranging the features in proper meaningful manner using Recurrent Neural
Network(RNN) and Long Short Term Memory(LSTM).
• LSTM remembers the previous words which helps in the prediction of words
which came later to make a proper sentence(Caption).
• By combining CNN(for feature extraction), RNN and LSTM(for prediction and
arranging of words ) , we are developing our model.
Proposed Methodology/System Architecture
• Category:
Machine Learning, Deep Learning and NLP
• Programming Language:
Python
• Tools & Libraries:
Tensorflow, Keras, NumPy, TQDM
• IDE:
Google Colab, Kaggle and Jupyter notebook
• Prerequisites:
Python, Machine Learning, Deep Learning and NLP
• DataSets :
Flickr Dataset.
Hardware / Software Specification
Our developed solution is a model which will describe the image using features
extracted from it i.e. our model will give caption to an image. We have used
Convolutional Neural Network (CNN) for feature extraction from an image and
Recurrent Neural Network (RNN) along with Long Short Term Memory(LSTM)
for prediction of words to make a proper caption for an given image .
Conclusion
1. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and
Control Engineering NCIET- 2020
2. Aditya, A. N., Anditya, A. and Suyanto, (2019). “Generating Image Description on
Indonesian Language using Convolutional Neural Network and Gated Recurrent Unit”, 7th
International Conference on Information and Communication Technology (ICoICT).
3. Chetan, A. and Vaishli, J. (2018). “Image Caption Generation using Deep Learning
Technique”, Fourth International Conference on Computing Communication Control and
Automation (ICCUBEA).
4. Huda A. Al-muzaini, Tasniem N. and Hafida B. (2018) “Automatic Arabic Image
Captioning using RNN LSTM-Based Language Model and CNN”, International Journal of
Advanced Computer Science and Applications (IJACSA), Vol. 9, No.6.
5. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and
Control Engineering NCIET- 2020
6. Aditya, A. N., Anditya, A. and Suyanto, (2019). “Generating Image Description on
Indonesian Language using Convolutional Neural Network and Gated Recurrent Unit”, 7th
International Conference on Information and Communication Technology (ICoICT).
References
1. Chetan, A. and Vaishli, J. (2018). “Image Caption Generation using Deep
Learning Technique”, Fourth International Conference on Computing
Communication Control and Automation (ICCUBEA).
2. Huda A. Al-muzaini, Tasniem N. and Hafida B. (2018) “Automatic Arabic Image
Captioning using RNN LSTM-Based Language Model and CNN”, International
Journal of Advanced Computer Science and Applications (IJACSA), Vol. 9,
No.6.
3. J. Liu, G. Wang, P. Hu, L.-Y. Duan, and A. C. Kot. Global context-aware
attention lstm networks for 3d action recognition. CVPR, 2017.
4. J. Lu, C. Xiong, D. Parikh, and R. Socher. Knowing when to look: Adaptive
attention via a visual sentinel for image captioning. CVPR, 2017
5. S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel. Self-critical
sequence training for image captioning. CVPR, 2017.
6. Loshchilov and F. Hutter. Sgdr: Stochastic gradient de[1]scent with restarts.
ICLR, 2016.
7. J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional
localization networks for dense captioning. In CVPR, 2016.
References
Thank You

More Related Content

What's hot

What's hot (20)

ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
 
Emotion recognition using image processing in deep learning
Emotion recognition using image     processing in deep learningEmotion recognition using image     processing in deep learning
Emotion recognition using image processing in deep learning
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
INTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRUINTRODUCTION TO NLP, RNN, LSTM, GRU
INTRODUCTION TO NLP, RNN, LSTM, GRU
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Learning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksLearning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networks
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Facial Expression Recognition  System using Deep Convolutional Neural Networks.Facial Expression Recognition  System using Deep Convolutional Neural Networks.
Facial Expression Recognition System using Deep Convolutional Neural Networks.
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptx
 
IMAGE CAPTION GENERATOR USING DEEP LEARNING
IMAGE CAPTION GENERATOR USING DEEP LEARNINGIMAGE CAPTION GENERATOR USING DEEP LEARNING
IMAGE CAPTION GENERATOR USING DEEP LEARNING
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Google's Pathways Language Model and Chain-of-Thought
Google's Pathways Language Model and Chain-of-ThoughtGoogle's Pathways Language Model and Chain-of-Thought
Google's Pathways Language Model and Chain-of-Thought
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 

Similar to Image captioning using DL and NLP.pptx

Natural Language Description Generation for Image using Deep Learning Archite...
Natural Language Description Generation for Image using Deep Learning Archite...Natural Language Description Generation for Image using Deep Learning Archite...
Natural Language Description Generation for Image using Deep Learning Archite...
ijtsrd
 
Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...
Hưng Đặng
 
Kadir A_20160804_res_tea
Kadir A_20160804_res_teaKadir A_20160804_res_tea
Kadir A_20160804_res_tea
Kadir A Peker
 

Similar to Image captioning using DL and NLP.pptx (20)

Natural Language Description Generation for Image using Deep Learning Archite...
Natural Language Description Generation for Image using Deep Learning Archite...Natural Language Description Generation for Image using Deep Learning Archite...
Natural Language Description Generation for Image using Deep Learning Archite...
 
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNINGATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
ATTENTION BASED IMAGE CAPTIONING USING DEEP LEARNING
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
 
Modelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object RecognitionModelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object Recognition
 
Batik image retrieval using convolutional neural network
Batik image retrieval using convolutional neural networkBatik image retrieval using convolutional neural network
Batik image retrieval using convolutional neural network
 
IRJET- Capsearch - An Image Caption Generation Based Search
IRJET- Capsearch - An Image Caption Generation Based SearchIRJET- Capsearch - An Image Caption Generation Based Search
IRJET- Capsearch - An Image Caption Generation Based Search
 
Automated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureAutomated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU Architecture
 
Overview of Video Concept Detection using (CNN) Convolutional Neural Network
Overview of Video Concept Detection using (CNN) Convolutional Neural NetworkOverview of Video Concept Detection using (CNN) Convolutional Neural Network
Overview of Video Concept Detection using (CNN) Convolutional Neural Network
 
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A SurveyIRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
IRJET- Visual Question Answering using Combination of LSTM and CNN: A Survey
 
Optically processed Kannada script realization with Siamese neural network model
Optically processed Kannada script realization with Siamese neural network modelOptically processed Kannada script realization with Siamese neural network model
Optically processed Kannada script realization with Siamese neural network model
 
Image Processing Compression and Reconstruction by Using New Approach Artific...
Image Processing Compression and Reconstruction by Using New Approach Artific...Image Processing Compression and Reconstruction by Using New Approach Artific...
Image Processing Compression and Reconstruction by Using New Approach Artific...
 
Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...
 
Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...
 
Cartoonization of images using machine Learning
Cartoonization of images using machine LearningCartoonization of images using machine Learning
Cartoonization of images using machine Learning
 
Kadir A_20160804_res_tea
Kadir A_20160804_res_teaKadir A_20160804_res_tea
Kadir A_20160804_res_tea
 
Handwriting identification using deep convolutional neural network method
Handwriting identification using deep convolutional neural network methodHandwriting identification using deep convolutional neural network method
Handwriting identification using deep convolutional neural network method
 
Handwritten Digit Recognition Using CNN
Handwritten Digit Recognition Using CNNHandwritten Digit Recognition Using CNN
Handwritten Digit Recognition Using CNN
 
IRJET - Text Detection in Natural Scene Images: A Survey
IRJET - Text Detection in Natural Scene Images: A SurveyIRJET - Text Detection in Natural Scene Images: A Survey
IRJET - Text Detection in Natural Scene Images: A Survey
 
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
A Review on Natural Scene Text Understanding for Computer Vision using Machin...A Review on Natural Scene Text Understanding for Computer Vision using Machin...
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Recently uploaded (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 

Image captioning using DL and NLP.pptx

  • 1. Department of Artificial Intelligence Winter 2022 (Session: 2022-2023) G H Raisoni College of Engineering, Nagpur Presented By: 1. Akundi Harshvardhan (A-20) 2. Arya Bharne (A-24) 3. Priyesh Gawali (A-62) 4. Rajat Satpure (A-65) Guide:- prof. Pravin Kshirsagar Assistant Professor GHRCE ,Nagpur Title of the Project:- Image Captioning using Deep Learning and NLP
  • 2. Contents 1. Introduction 2. Abstract 3. Objective 4. Literature survey 5. Proposed Methodology/System architecture. 6. Hardware / Software Specification 7. Conclusion 8. Reference
  • 3. Introduction • From a image we can describe what’s in image , simply we can define the image by seeing the image. • In this project, we are developing a system/model which will describe the content present in image i.e. our model will give a caption to a image which is called Image Captioning. • In this, we are using a Deep Neural Network(DNN) ,Convolutional Neural Network(CNN) and Long Short Term Memory(LSTM). • As we are dealing with text data (image captions) so we are also using Natural Language Processing (NLP). • By combining all these we are developing a model which is called as Image Caption Generator. A muscular man standing.
  • 4. Abstract • Image captioning is an important task nowadays. It will help you describe the image in Editing software, assists visually impaired and it can also generate captions for social media posts . • In recent years, researchers made a significant progress in image captioning . • In our solution we are using ‘Long Short Term Memory’ (LSTM) along with ‘Convolutional Neural Network’(CNN). • We are using Convolutional Neural Network (CNN) to extract features from images and Long Short Term Memory (LSTM) for generating description from extracted features of image.
  • 5. • To describe contents of an image using CNN . • To showcase the effectiveness of LSTM . • To create a working model that describes the image on the basis of features that are extracted. • To understand the features of an image. • To predict the next words from extracted features to make a caption. Objectives
  • 6. ● Template-based approaches are able to generate grammatically correct captions, but since the templates are predefined, it cannot generate variable- length captions. ● Retrieval-based methods produce general and grammatically correct captions. However, they cannot generate more descriptive and semantically correct captions. ● Captions are most often generated for a whole scene in the image. However, captions can also be generated for different areas of an image such as in Dense captioning. ● RNN when used along with CNN had a very short term memory. ● Multimodal recurrent neural network method is similar to the method of Kiros which uses a fixed length context, but in this method, the temporal context is stored in a recurrent architecture that allows an arbitrary context length. Literature Survey (Survey of existing products)
  • 7. • We describe something using features of that thing . Like if we are describing image we use it features to describe it. For example :- If we saw a large red rose, we started describing it by saying “ A big beautiful Red rose.”, in this sentence we use features like ‘large(size)’, ‘red(colour)’, ’beautiful(appearance)’ to describe the flower i.e. we are giving caption to it . This process of describing something by seeing it is called as Image Captioning. • In this project we are developing an Image Caption Generator which extract the features from image by using Convolutional Neural Network (CNN),from the extracted features our model will generate the caption for given image by arranging the features in proper meaningful manner using Recurrent Neural Network(RNN) and Long Short Term Memory(LSTM). • LSTM remembers the previous words which helps in the prediction of words which came later to make a proper sentence(Caption). • By combining CNN(for feature extraction), RNN and LSTM(for prediction and arranging of words ) , we are developing our model. Proposed Methodology/System Architecture
  • 8.
  • 9. • Category: Machine Learning, Deep Learning and NLP • Programming Language: Python • Tools & Libraries: Tensorflow, Keras, NumPy, TQDM • IDE: Google Colab, Kaggle and Jupyter notebook • Prerequisites: Python, Machine Learning, Deep Learning and NLP • DataSets : Flickr Dataset. Hardware / Software Specification
  • 10. Our developed solution is a model which will describe the image using features extracted from it i.e. our model will give caption to an image. We have used Convolutional Neural Network (CNN) for feature extraction from an image and Recurrent Neural Network (RNN) along with Long Short Term Memory(LSTM) for prediction of words to make a proper caption for an given image . Conclusion
  • 11. 1. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering NCIET- 2020 2. Aditya, A. N., Anditya, A. and Suyanto, (2019). “Generating Image Description on Indonesian Language using Convolutional Neural Network and Gated Recurrent Unit”, 7th International Conference on Information and Communication Technology (ICoICT). 3. Chetan, A. and Vaishli, J. (2018). “Image Caption Generation using Deep Learning Technique”, Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). 4. Huda A. Al-muzaini, Tasniem N. and Hafida B. (2018) “Automatic Arabic Image Captioning using RNN LSTM-Based Language Model and CNN”, International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 9, No.6. 5. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering NCIET- 2020 6. Aditya, A. N., Anditya, A. and Suyanto, (2019). “Generating Image Description on Indonesian Language using Convolutional Neural Network and Gated Recurrent Unit”, 7th International Conference on Information and Communication Technology (ICoICT). References
  • 12. 1. Chetan, A. and Vaishli, J. (2018). “Image Caption Generation using Deep Learning Technique”, Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). 2. Huda A. Al-muzaini, Tasniem N. and Hafida B. (2018) “Automatic Arabic Image Captioning using RNN LSTM-Based Language Model and CNN”, International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 9, No.6. 3. J. Liu, G. Wang, P. Hu, L.-Y. Duan, and A. C. Kot. Global context-aware attention lstm networks for 3d action recognition. CVPR, 2017. 4. J. Lu, C. Xiong, D. Parikh, and R. Socher. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. CVPR, 2017 5. S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel. Self-critical sequence training for image captioning. CVPR, 2017. 6. Loshchilov and F. Hutter. Sgdr: Stochastic gradient de[1]scent with restarts. ICLR, 2016. 7. J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. In CVPR, 2016. References