2. Members
➔ Harshil Jain
➔ Syed Ahmad
➔ Krishna Chaitanya Pappu
➔ Kaleemullah Mohammed
Mentor
➔ Shashank Gupta
3. Problem Description
The objective is to develop and IR/ML
system to generate hash tags for user
generated Social Media containing Images
as well as text.
4. Applications
The system can be easily incorporated in
various Social Media Platforms to generate
hashtags.
It can also be used to store metadata
about the content.
6. Data Collection and
Storage➔ We crawled twitter for tweets having images ,
text as well as hashtags using tweepy API.
➔ The image URLs were saved and the images
were downloaded at a later point of time.
➔ About 15 lakh tweets were crawled, which
had around 18 lakh(non-distinct) hashtags.
7. Data Cleaning
➔ The tweets were tokenized and a vocabulary
was made out of the tweets after basic text
processing.
➔ It was found that some images were removed
from the web by the time we downloaded.
➔ So we had to filter these images before we
moved to training the model.
9. INput
➔ Tweet text and Image
OUTput
➔ Hashtags generated on a character level
10. Workflow
➔ Features
◆ Image - From a pre-trained CNN
◆ Tweet - Glove embeddings fed into LSTM
➔ Then, the features are combined to give a full feature
vector.
➔ Then the concatenated feature is fed to a character
level Language Model(LSTM) which generates
hashtags.
13. Challenges
➔ Each entry has multiple hashtags, so feeding the data
to network was an issue.
➔ We thought of a couple of approaches and ended up by
feeding the hashtags as “#Hash1 #Hash2 #Hash3”.
➔ The Loss function and evaluation metric was a tough
area as it is a generative model. We used
Cross-Entropy as the training loss function.