Department of Artificial Intelligence
Winter 2022 (Session: 2022-2023)
G H Raisoni College of Engineering, Nagpur
Presented By:
1. Akundi Harshvardhan (A-20)
2. Arya Bharne (A-24)
3. Priyesh Gawali (A-62)
4. Rajat Satpure (A-65)
Guide:-
prof. Pravin Kshirsagar
Assistant Professor
GHRCE ,Nagpur
Title of the Project:-
Image Captioning using Deep Learning
and NLP
Contents
1. Introduction
2. Abstract
3. Objective
4. Literature survey
5. Proposed Methodology/System architecture.
6. Hardware / Software Specification
7. Conclusion
8. Reference
Introduction
• From a image we can describe what’s in
image , simply we can define the image by
seeing the image.
• In this project, we are developing a
system/model which will describe the
content present in image i.e. our model
will give a caption to a image which is
called Image Captioning.
• In this, we are using a Deep Neural
Network(DNN) ,Convolutional Neural
Network(CNN) and Long Short Term
Memory(LSTM).
• As we are dealing with text data (image
captions) so we are also using Natural
Language Processing (NLP).
• By combining all these we are developing
a model which is called as Image Caption
Generator.
A muscular man standing.
Abstract
• Image captioning is an important task
nowadays. It will help you describe the
image in Editing software, assists visually
impaired and it can also generate captions
for social media posts .
• In recent years, researchers made a
significant progress in image captioning .
• In our solution we are using ‘Long Short
Term Memory’ (LSTM) along with
‘Convolutional Neural Network’(CNN).
• We are using Convolutional Neural Network
(CNN) to extract features from images and
Long Short Term Memory (LSTM) for
generating description from extracted
features of image.
• To describe contents of an image using CNN .
• To showcase the effectiveness of LSTM .
• To create a working model that describes the image on the basis of
features that are extracted.
• To understand the features of an image.
• To predict the next words from extracted features to make a caption.
Objectives
● Template-based approaches are able to generate grammatically correct
captions, but since the templates are predefined, it cannot generate variable-
length captions.
● Retrieval-based methods produce general and grammatically correct captions.
However, they cannot generate more descriptive and semantically correct
captions.
● Captions are most often generated for a whole scene in the image. However,
captions can also be generated for different areas of an image such as in Dense
captioning.
● RNN when used along with CNN had a very short term memory.
● Multimodal recurrent neural network method is similar to the method of Kiros
which uses a fixed length context, but in this method, the temporal context is
stored in a recurrent architecture that allows an arbitrary context length.
Literature Survey (Survey of existing products)
• We describe something using features of that thing . Like if we are describing
image we use it features to describe it. For example :- If we saw a large red
rose, we started describing it by saying “ A big beautiful Red rose.”, in this
sentence we use features like ‘large(size)’, ‘red(colour)’,
’beautiful(appearance)’ to describe the flower i.e. we are giving caption to it .
This process of describing something by seeing it is called as Image Captioning.
• In this project we are developing an Image Caption Generator which extract the
features from image by using Convolutional Neural Network (CNN),from the
extracted features our model will generate the caption for given image by
arranging the features in proper meaningful manner using Recurrent Neural
Network(RNN) and Long Short Term Memory(LSTM).
• LSTM remembers the previous words which helps in the prediction of words
which came later to make a proper sentence(Caption).
• By combining CNN(for feature extraction), RNN and LSTM(for prediction and
arranging of words ) , we are developing our model.
Proposed Methodology/System Architecture
• Category:
Machine Learning, Deep Learning and NLP
• Programming Language:
Python
• Tools & Libraries:
Tensorflow, Keras, NumPy, TQDM
• IDE:
Google Colab, Kaggle and Jupyter notebook
• Prerequisites:
Python, Machine Learning, Deep Learning and NLP
• DataSets :
Flickr Dataset.
Hardware / Software Specification
Our developed solution is a model which will describe the image using features
extracted from it i.e. our model will give caption to an image. We have used
Convolutional Neural Network (CNN) for feature extraction from an image and
Recurrent Neural Network (RNN) along with Long Short Term Memory(LSTM)
for prediction of words to make a proper caption for an given image .
Conclusion
1. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and
Control Engineering NCIET- 2020
2. Aditya, A. N., Anditya, A. and Suyanto, (2019). “Generating Image Description on
Indonesian Language using Convolutional Neural Network and Gated Recurrent Unit”, 7th
International Conference on Information and Communication Technology (ICoICT).
3. Chetan, A. and Vaishli, J. (2018). “Image Caption Generation using Deep Learning
Technique”, Fourth International Conference on Computing Communication Control and
Automation (ICCUBEA).
4. Huda A. Al-muzaini, Tasniem N. and Hafida B. (2018) “Automatic Arabic Image
Captioning using RNN LSTM-Based Language Model and CNN”, International Journal of
Advanced Computer Science and Applications (IJACSA), Vol. 9, No.6.
5. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and
Control Engineering NCIET- 2020
6. Aditya, A. N., Anditya, A. and Suyanto, (2019). “Generating Image Description on
Indonesian Language using Convolutional Neural Network and Gated Recurrent Unit”, 7th
International Conference on Information and Communication Technology (ICoICT).
References
1. Chetan, A. and Vaishli, J. (2018). “Image Caption Generation using Deep
Learning Technique”, Fourth International Conference on Computing
Communication Control and Automation (ICCUBEA).
2. Huda A. Al-muzaini, Tasniem N. and Hafida B. (2018) “Automatic Arabic Image
Captioning using RNN LSTM-Based Language Model and CNN”, International
Journal of Advanced Computer Science and Applications (IJACSA), Vol. 9,
No.6.
3. J. Liu, G. Wang, P. Hu, L.-Y. Duan, and A. C. Kot. Global context-aware
attention lstm networks for 3d action recognition. CVPR, 2017.
4. J. Lu, C. Xiong, D. Parikh, and R. Socher. Knowing when to look: Adaptive
attention via a visual sentinel for image captioning. CVPR, 2017
5. S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel. Self-critical
sequence training for image captioning. CVPR, 2017.
6. Loshchilov and F. Hutter. Sgdr: Stochastic gradient de[1]scent with restarts.
ICLR, 2016.
7. J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional
localization networks for dense captioning. In CVPR, 2016.
References
Thank You

Image captioning using DL and NLP.pptx

  • 1.
    Department of ArtificialIntelligence Winter 2022 (Session: 2022-2023) G H Raisoni College of Engineering, Nagpur Presented By: 1. Akundi Harshvardhan (A-20) 2. Arya Bharne (A-24) 3. Priyesh Gawali (A-62) 4. Rajat Satpure (A-65) Guide:- prof. Pravin Kshirsagar Assistant Professor GHRCE ,Nagpur Title of the Project:- Image Captioning using Deep Learning and NLP
  • 2.
    Contents 1. Introduction 2. Abstract 3.Objective 4. Literature survey 5. Proposed Methodology/System architecture. 6. Hardware / Software Specification 7. Conclusion 8. Reference
  • 3.
    Introduction • From aimage we can describe what’s in image , simply we can define the image by seeing the image. • In this project, we are developing a system/model which will describe the content present in image i.e. our model will give a caption to a image which is called Image Captioning. • In this, we are using a Deep Neural Network(DNN) ,Convolutional Neural Network(CNN) and Long Short Term Memory(LSTM). • As we are dealing with text data (image captions) so we are also using Natural Language Processing (NLP). • By combining all these we are developing a model which is called as Image Caption Generator. A muscular man standing.
  • 4.
    Abstract • Image captioningis an important task nowadays. It will help you describe the image in Editing software, assists visually impaired and it can also generate captions for social media posts . • In recent years, researchers made a significant progress in image captioning . • In our solution we are using ‘Long Short Term Memory’ (LSTM) along with ‘Convolutional Neural Network’(CNN). • We are using Convolutional Neural Network (CNN) to extract features from images and Long Short Term Memory (LSTM) for generating description from extracted features of image.
  • 5.
    • To describecontents of an image using CNN . • To showcase the effectiveness of LSTM . • To create a working model that describes the image on the basis of features that are extracted. • To understand the features of an image. • To predict the next words from extracted features to make a caption. Objectives
  • 6.
    ● Template-based approachesare able to generate grammatically correct captions, but since the templates are predefined, it cannot generate variable- length captions. ● Retrieval-based methods produce general and grammatically correct captions. However, they cannot generate more descriptive and semantically correct captions. ● Captions are most often generated for a whole scene in the image. However, captions can also be generated for different areas of an image such as in Dense captioning. ● RNN when used along with CNN had a very short term memory. ● Multimodal recurrent neural network method is similar to the method of Kiros which uses a fixed length context, but in this method, the temporal context is stored in a recurrent architecture that allows an arbitrary context length. Literature Survey (Survey of existing products)
  • 7.
    • We describesomething using features of that thing . Like if we are describing image we use it features to describe it. For example :- If we saw a large red rose, we started describing it by saying “ A big beautiful Red rose.”, in this sentence we use features like ‘large(size)’, ‘red(colour)’, ’beautiful(appearance)’ to describe the flower i.e. we are giving caption to it . This process of describing something by seeing it is called as Image Captioning. • In this project we are developing an Image Caption Generator which extract the features from image by using Convolutional Neural Network (CNN),from the extracted features our model will generate the caption for given image by arranging the features in proper meaningful manner using Recurrent Neural Network(RNN) and Long Short Term Memory(LSTM). • LSTM remembers the previous words which helps in the prediction of words which came later to make a proper sentence(Caption). • By combining CNN(for feature extraction), RNN and LSTM(for prediction and arranging of words ) , we are developing our model. Proposed Methodology/System Architecture
  • 9.
    • Category: Machine Learning,Deep Learning and NLP • Programming Language: Python • Tools & Libraries: Tensorflow, Keras, NumPy, TQDM • IDE: Google Colab, Kaggle and Jupyter notebook • Prerequisites: Python, Machine Learning, Deep Learning and NLP • DataSets : Flickr Dataset. Hardware / Software Specification
  • 10.
    Our developed solutionis a model which will describe the image using features extracted from it i.e. our model will give caption to an image. We have used Convolutional Neural Network (CNN) for feature extraction from an image and Recurrent Neural Network (RNN) along with Long Short Term Memory(LSTM) for prediction of words to make a proper caption for an given image . Conclusion
  • 11.
    1. International Journalof Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering NCIET- 2020 2. Aditya, A. N., Anditya, A. and Suyanto, (2019). “Generating Image Description on Indonesian Language using Convolutional Neural Network and Gated Recurrent Unit”, 7th International Conference on Information and Communication Technology (ICoICT). 3. Chetan, A. and Vaishli, J. (2018). “Image Caption Generation using Deep Learning Technique”, Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). 4. Huda A. Al-muzaini, Tasniem N. and Hafida B. (2018) “Automatic Arabic Image Captioning using RNN LSTM-Based Language Model and CNN”, International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 9, No.6. 5. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering NCIET- 2020 6. Aditya, A. N., Anditya, A. and Suyanto, (2019). “Generating Image Description on Indonesian Language using Convolutional Neural Network and Gated Recurrent Unit”, 7th International Conference on Information and Communication Technology (ICoICT). References
  • 12.
    1. Chetan, A.and Vaishli, J. (2018). “Image Caption Generation using Deep Learning Technique”, Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). 2. Huda A. Al-muzaini, Tasniem N. and Hafida B. (2018) “Automatic Arabic Image Captioning using RNN LSTM-Based Language Model and CNN”, International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 9, No.6. 3. J. Liu, G. Wang, P. Hu, L.-Y. Duan, and A. C. Kot. Global context-aware attention lstm networks for 3d action recognition. CVPR, 2017. 4. J. Lu, C. Xiong, D. Parikh, and R. Socher. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. CVPR, 2017 5. S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel. Self-critical sequence training for image captioning. CVPR, 2017. 6. Loshchilov and F. Hutter. Sgdr: Stochastic gradient de[1]scent with restarts. ICLR, 2016. 7. J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. In CVPR, 2016. References
  • 13.