INDUSTRIAL TRAINING
Submitted inpartial fulfilment of the
Requirements for the award of the degree
Of
Bachelor of Technology
In
Computer Science & Engineering
By:
DAMANPREET KAUR (05913202719/ CSE1/ 2019)
Department of Computer Science & Engineering
Guru Tegh Bahadur Institute of Technology
Guru Gobind Singh Indraprastha University
2.
INTRODUCTION
• What isMachine Learning?
Machine Learning is a system of computer algorithms that can learn from examples through self-
improvement without being explicitly coded by a programmer. Machine learning is a part of artificial
intelligence which combines data with statistical tools to predict an output that can be used to make
actionable insights.
• What is Deep Learning?
Deep learning is based on the branch of machine learning, which is a subset of artificial intelligence. Since
neural networks imitate the human brain and so deep learning will do. In deep learning, nothing is
programmed explicitly. Basically, it is a machine learning class that makes use of numerous nonlinear
processing units so as to perform feature extraction as well as transformation. The output from each preceding
layer is taken as input by each one of the successive layers.
3.
• CNN (Convolutionalneural network)
CNN is a Deep Learning algorithm which takes in an input image and assigns importance (learnable weights
and biases) to various aspects/objects in the image, which helps it differentiate one image from the other. One
of the most popular applications of this architecture is image classification. The neural network consists of
several convolutional layers mixed with nonlinear and pooling layers. After a series of convolutional,
nonlinear and pooling layers, it is necessary to attach a fully connected layer. This layer takes the output
information from convolutional networks.
• CNN-LSTM ARCHITECTURE
The CNN-LSTM architecture involves using CNN layers for feature extraction on input data combined with
LSTMs to support sequence prediction. This model is specifically designed for sequence prediction problems
with spatial inputs, like images or videos. They are widely used in Activity Recognition, Image Description,
Video Description and many more.
4.
INTRODUCTION TO PROJECT
Imagecaption generator is a process of recognizing the context of an image and annotating it with relevant
captions using deep learning, and computer vision. It includes the labeling of an image with English
keywords with the help of datasets provided during model training.
The goal of image captioning is to convert a given input image into a natural language description.
The task of image captioning can be divided into two modules logically –
1. Image based model — Extracts the features of our image.
2. Language based model — which translates the features and objects extracted by our image based model to
a natural sentence.
For our image based model, we use CNN and for language based model, we use LSTM.
5.
WORKFLOW OF THEPROJECT
1. Perform Data Cleaning
This method is used to clean the data by taking all descriptions as input. While dealing with textual data we
need to perform several types of cleaning including uppercase to lowercase conversion, punctuation removal,
and removal of the number containing words.
2. Loading dataset for model training
We have used Kaggle (flickr) for our dataset which contains 8000 images and 5 captions related to each
image. It stores the captions for every image from the list of photos to a dictionary. For the ease of the LSTM
model in identifying the beginning and ending of a caption, we append the and identifier with each caption.
6.
3. Tokenizing thevocabulary
Machines are not familiar with complex English words so, to process model’s data they need a simple
numerical representation. That’s why we map every word of the vocabulary with a separate unique index
value.
4. Define the CNN-LSTM model
From the Functional API, we will use the Keras Model in order to define the structure of the model. It
includes:
•Feature Extractor –With a dense layer, it will extract the feature from the images.
•Sequence Processor – Followed by the LSTM layer, the textual input is handled by this embedded layer.
•Decoder – We will merge the output of the above two layers and process the dense layer to make the final
prediction.
7.
CONCLUSION
The project “Captionan Image” has been developed as per the requirement specification. It has been
developed in Machine Learning with the use of libraries like TensorFlow, Pandas, etc. and technologies
such as LSTM, Regression etc. The complete system is thoroughly tested with the availability of data and
throughput reports which are prepared manually.
These are found to be more accurate because of availability of information from various levels. This design
is so flexible that any new modules can be incorporated easily.
8.
FUTURE SCOPE OFPROJECT
ď‚· Visually Impaired: This project can be used for the visually impaired people as it can generate captions
for them by clicking or understanding the surroundings images and help them know their surroundings
better.
ď‚· Social Media: Using this model in Social Media, we can help it in various different social media sites and
can be used to implement new features such as new filters, also if a tourist wants to know new things they
can just scan the item and will get the generated caption and research on that a lot better. It can be used in
a daily life event very easily and effectively.
ď‚· NLP Applications: Furthermore, this model can also be used in various applications of Natural Learning
Processing, like in the case of Digital Image Processing, it can be used to research and train the big
models.