Welcome
Name: Vinothkumar
Course: BSc Computer Science
Semester: Final Year
Batch: 2018-2022
[College Logo]
Flickr 8k Dataset Image
Captioning Project
Student: Vinothkumar
Register No.: XXXXXXX
Guide: [Faculty Advisor Name]
College Logo
Problem Statement (Abstract)
• • Huge image collections make manual
captioning impossible.
• • Goal: Automatically generate captions for
images.
• • Useful for accessibility, image search, and
photo organization.
Introduction
• • Combines Computer Vision + NLP.
• • Dataset: Flickr8k (images + human-written
captions).
• • Scope: Feature extraction, preprocessing,
model training, caption generation,
evaluation.
Existing System (Literature Survey)
• • Manual tagging or keyword-based search.
• • Drawbacks: Time-consuming, inaccurate,
lacks scalability.
Proposed System
• • Encoder-Decoder architecture (VGG16 +
LSTM).
• • Generates captions for unseen images.
• • Advantages: Scalable, improves accessibility,
automation.
System Design
• • Data Flow Diagram (DFD)
• • UML (Use Case/Class)
• • ER Diagram
• • Flowchart: preprocessing → training →
captioning
Hardware and Software
Requirements
• Hardware:
• • Processor: i5 or above
• • RAM: 8 GB minimum
• • GPU recommended
• Software:
• • Python, TensorFlow/Keras
• • NLTK, NumPy, Matplotlib
Modules Overview
• 1. Image Feature Extraction (VGG16)
• 2. Caption Preprocessing
• 3. Encoder-Decoder Model
• 4. Training & Validation
• 5. Inference & Caption Generation
• 6. Evaluation (BLEU Scores)
Module Explanation - Part 1
• • Image Feature Extraction (VGG16 → 4096
features)
• • Caption Preprocessing (cleaning,
tokenization, padding)
Module Explanation - Part 2
• • Encoder-Decoder Model (Image Encoder +
LSTM Decoder)
• • Training & Validation (data generator,
epochs)
Module Explanation - Part 3
• • Caption Generation (predict_caption)
• • Evaluation (BLEU scores, example output)
Output & Results
• • Training failed due to TypeError (data
generator issue).
• • Example caption: 'startseq ended ended...'
• • BLEU-1 = 0.02, BLEU-2 = 0.00
• • Lesson: Data formatting is critical.
Conclusion
• Expected Outcome:
• • Automated caption generation.
• • Applications: Accessibility, search, tagging.
• Future Scope:
• • Fix training error.
• • Use larger datasets (Flickr30k, MS COCO).
• • Explore Transformers & Attention.
Thank You
• Acknowledgement:
• • Guide
• • Review Committee
• — Vinothkumar

Flickr8k_Image_Captioning_Project_PPT.pptx

  • 1.
    Welcome Name: Vinothkumar Course: BScComputer Science Semester: Final Year Batch: 2018-2022 [College Logo]
  • 2.
    Flickr 8k DatasetImage Captioning Project Student: Vinothkumar Register No.: XXXXXXX Guide: [Faculty Advisor Name] College Logo
  • 3.
    Problem Statement (Abstract) •• Huge image collections make manual captioning impossible. • • Goal: Automatically generate captions for images. • • Useful for accessibility, image search, and photo organization.
  • 4.
    Introduction • • CombinesComputer Vision + NLP. • • Dataset: Flickr8k (images + human-written captions). • • Scope: Feature extraction, preprocessing, model training, caption generation, evaluation.
  • 5.
    Existing System (LiteratureSurvey) • • Manual tagging or keyword-based search. • • Drawbacks: Time-consuming, inaccurate, lacks scalability.
  • 6.
    Proposed System • •Encoder-Decoder architecture (VGG16 + LSTM). • • Generates captions for unseen images. • • Advantages: Scalable, improves accessibility, automation.
  • 7.
    System Design • •Data Flow Diagram (DFD) • • UML (Use Case/Class) • • ER Diagram • • Flowchart: preprocessing → training → captioning
  • 8.
    Hardware and Software Requirements •Hardware: • • Processor: i5 or above • • RAM: 8 GB minimum • • GPU recommended • Software: • • Python, TensorFlow/Keras • • NLTK, NumPy, Matplotlib
  • 9.
    Modules Overview • 1.Image Feature Extraction (VGG16) • 2. Caption Preprocessing • 3. Encoder-Decoder Model • 4. Training & Validation • 5. Inference & Caption Generation • 6. Evaluation (BLEU Scores)
  • 10.
    Module Explanation -Part 1 • • Image Feature Extraction (VGG16 → 4096 features) • • Caption Preprocessing (cleaning, tokenization, padding)
  • 11.
    Module Explanation -Part 2 • • Encoder-Decoder Model (Image Encoder + LSTM Decoder) • • Training & Validation (data generator, epochs)
  • 12.
    Module Explanation -Part 3 • • Caption Generation (predict_caption) • • Evaluation (BLEU scores, example output)
  • 13.
    Output & Results •• Training failed due to TypeError (data generator issue). • • Example caption: 'startseq ended ended...' • • BLEU-1 = 0.02, BLEU-2 = 0.00 • • Lesson: Data formatting is critical.
  • 14.
    Conclusion • Expected Outcome: •• Automated caption generation. • • Applications: Accessibility, search, tagging. • Future Scope: • • Fix training error. • • Use larger datasets (Flickr30k, MS COCO). • • Explore Transformers & Attention.
  • 15.
    Thank You • Acknowledgement: •• Guide • • Review Committee • — Vinothkumar