CV Capstone Project Report - Data Description

3
PROGRESS REPORT
Computer Vision Capstone Project
Manga Dialogue Extraction using Computer Vision Techniques
– Group 16

4
Presentation Outline
• Problem Statement
• Motivation
• Problem Approach
• Expected Outcomes
• Tools and Datasets Preparation
• Implementation Timeline
• Conclusion

5
Part 1/7: Problem Statement
• Manga Dialogue Extraction is the task of automatically
identifying and extracting text from speech bubbles in manga
pages using Computer Vision techniques.
Expected Output:
→ “did you…”
→ “run out of
cash?”

6
Part 1/7: Problem Statement
• Manga Dialogue Extraction is the task of automatically
identifying and extracting text from speech bubbles in manga
pages using Computer Vision techniques.
• Key objectives of Dialogue Extraction:
• Detect speech bubbles accurately, even in the complex artwork.
• Extract the dialogue text inside these bubbles for further processing (e.g.,
translation, dubbing, voiceover).
• Handle challenges such as complicated backgrounds, diverse bubble
styles and text.

7
Part 2/7: Motivation
• Manga dialogue within speech bubbles is essential for various
tasks, including translation, audiobook, or manga-reading apps.
• Manual text extraction is slow and error-prone, while manga
pages themselves have challenges:
• Irregular bubble shapes,

8
• Overlapping text and artwork,

9
• Diverse fonts.

10
• Irregular bubble shapes,
• Overlapping text and artwork,
• Diverse fonts.
• An automated system using Computer Vision can make dialogue
extraction faster, more accurate, and scalable for real-world
applications.

11
Part 3/7: Problem Approach
• Speech Bubble Detection:
• This step identifies the locations of speech bubbles that contain dialogue.
• Instead of relying on deep learning models, we explore traditional vision
techniques:
• Image Filters: e.g., Gaussian (smoothing), Sobel & Laplacian (edge detection)
• Morphological Operations: e.g., dilation, erosion, opening to enhance bubble
shapes
• Histogram Methods: for analyzing intensity changes and edge patterns
• Contour & Shape Analysis: to detect rounded or elliptical regions typical of bubbles

12
Part 3/7: Problem Approach
• Text Extraction from Bubbles:
• Localizing bubbles to:
• Reduce noise from the background and non-text regions
• Improve OCR accuracy and overall system speed
• Once bubbles are localized, OCR tools (e.g., Tesseract, EasyOCR) are
applied to extract dialogue.

13
Part 4/7: Project Outcomes
• Performance Comparison:
• Evaluate traditional vs. deep learning methods for speech bubble detection
• Metrics: Accuracy, Speed, and Memory Efficiency
• End-to-End CLI Application:
• Input: Manga-style images
• Output: Extracted dialogue in JSON or TXT
• Pipeline: Detect speech bubbles → Extract text using OCR
• Final Report

14
Part 5/7: Tools and Datasets Preparation
• Datasets:
• Manga Collection:
• Manga pages are crawled from online sources and saved as .jpeg images.
• The dataset includes a variety of visual styles and page layouts.
• Annotation for Training:
• For deep learning methods used as benchmarks, annotated datasets are required.
• We use the Roboflow platform to label speech bubble regions for training purposes.
• Tools and Technologies:
• Programming languages: Python
• Libraries: OpenCV, Tesseract OCR, EasyOCR, Ultralytics

15
Part 6/7: Implementation Timeline
• Data Collection & Annotation
• Crawl manga images & annotate speech bubbles on Roboflow
• Speech Bubble Detection
• Implement traditional CV & benchmark YOLOv8
• OCR Integration
• Extract dialogue using Tesseract / EasyOCR
• Application & Evaluation
• Build CLI app and test performance
• Report Finalization
• Document methods, results, and challenges

17
Part 7/7: Conclusion
• We presented a system for extracting dialogue from manga using
computer vision techniques.
• The approach was divided into two main steps:
• Speech bubble detection using traditional methods and deep learning
• Text extraction using OCR tools
• The CLI-based tool and annotated dataset provide a foundation
for further development.
• Future work includes:
• Improving detection accuracy for complex layouts
• Adapting OCR for stylized and handwritten manga fonts
• Expanding to multi-language manga content

CV Capstone Project Report - Data Description

More Related Content

Recently uploaded

Featured

CV Capstone Project Report - Data Description