3
PROGRESS REPORT
Computer Vision Capstone Project
Manga Dialogue Extraction using Computer Vision Techniques
– Group 16
4
Presentation Outline
• Problem Statement
• Motivation
• Problem Approach
• Expected Outcomes
• Tools and Datasets Preparation
• Implementation Timeline
• Conclusion
5
Part 1/7: Problem Statement
• Manga Dialogue Extraction is the task of automatically
identifying and extracting text from speech bubbles in manga
pages using Computer Vision techniques.
Expected Output:
→ “did you…”
→ “run out of
cash?”
6
Part 1/7: Problem Statement
• Manga Dialogue Extraction is the task of automatically
identifying and extracting text from speech bubbles in manga
pages using Computer Vision techniques.
• Key objectives of Dialogue Extraction:
• Detect speech bubbles accurately, even in the complex artwork.
• Extract the dialogue text inside these bubbles for further processing (e.g.,
translation, dubbing, voiceover).
• Handle challenges such as complicated backgrounds, diverse bubble
styles and text.
7
Part 2/7: Motivation
• Manga dialogue within speech bubbles is essential for various
tasks, including translation, audiobook, or manga-reading apps.
• Manual text extraction is slow and error-prone, while manga
pages themselves have challenges:
• Irregular bubble shapes,
8
Part 2/7: Motivation
• Manga dialogue within speech bubbles is essential for various
tasks, including translation, audiobook, or manga-reading apps.
• Manual text extraction is slow and error-prone, while manga
pages themselves have challenges:
• Overlapping text and artwork,
9
Part 2/7: Motivation
• Manga dialogue within speech bubbles is essential for various
tasks, including translation, audiobook, or manga-reading apps.
• Manual text extraction is slow and error-prone, while manga
pages themselves have challenges:
• Diverse fonts.
10
Part 2/7: Motivation
• Manga dialogue within speech bubbles is essential for various
tasks, including translation, audiobook, or manga-reading apps.
• Manual text extraction is slow and error-prone, while manga
pages themselves have challenges:
• Irregular bubble shapes,
• Overlapping text and artwork,
• Diverse fonts.
• An automated system using Computer Vision can make dialogue
extraction faster, more accurate, and scalable for real-world
applications.
11
Part 3/7: Problem Approach
• Speech Bubble Detection:
• This step identifies the locations of speech bubbles that contain dialogue.
• Instead of relying on deep learning models, we explore traditional vision
techniques:
• Image Filters: e.g., Gaussian (smoothing), Sobel & Laplacian (edge detection)
• Morphological Operations: e.g., dilation, erosion, opening to enhance bubble
shapes
• Histogram Methods: for analyzing intensity changes and edge patterns
• Contour & Shape Analysis: to detect rounded or elliptical regions typical of bubbles
12
Part 3/7: Problem Approach
• Text Extraction from Bubbles:
• Localizing bubbles to:
• Reduce noise from the background and non-text regions
• Improve OCR accuracy and overall system speed
• Once bubbles are localized, OCR tools (e.g., Tesseract, EasyOCR) are
applied to extract dialogue.
13
Part 4/7: Project Outcomes
• Performance Comparison:
• Evaluate traditional vs. deep learning methods for speech bubble detection
• Metrics: Accuracy, Speed, and Memory Efficiency
• End-to-End CLI Application:
• Input: Manga-style images
• Output: Extracted dialogue in JSON or TXT
• Pipeline: Detect speech bubbles → Extract text using OCR
• Final Report
14
Part 5/7: Tools and Datasets Preparation
• Datasets:
• Manga Collection:
• Manga pages are crawled from online sources and saved as .jpeg images.
• The dataset includes a variety of visual styles and page layouts.
• Annotation for Training:
• For deep learning methods used as benchmarks, annotated datasets are required.
• We use the Roboflow platform to label speech bubble regions for training purposes.
• Tools and Technologies:
• Programming languages: Python
• Libraries: OpenCV, Tesseract OCR, EasyOCR, Ultralytics
15
Part 6/7: Implementation Timeline
• Data Collection & Annotation
• Crawl manga images & annotate speech bubbles on Roboflow
• Speech Bubble Detection
• Implement traditional CV & benchmark YOLOv8
• OCR Integration
• Extract dialogue using Tesseract / EasyOCR
• Application & Evaluation
• Build CLI app and test performance
• Report Finalization
• Document methods, results, and challenges
17
Part 7/7: Conclusion
• We presented a system for extracting dialogue from manga using
computer vision techniques.
• The approach was divided into two main steps:
• Speech bubble detection using traditional methods and deep learning
• Text extraction using OCR tools
• The CLI-based tool and annotated dataset provide a foundation
for further development.
• Future work includes:
• Improving detection accuracy for complex layouts
• Adapting OCR for stylized and handwritten manga fonts
• Expanding to multi-language manga content
18
THANK YOU !

CV Capstone Project Report - Data Description

  • 3.
    3 PROGRESS REPORT Computer VisionCapstone Project Manga Dialogue Extraction using Computer Vision Techniques – Group 16
  • 4.
    4 Presentation Outline • ProblemStatement • Motivation • Problem Approach • Expected Outcomes • Tools and Datasets Preparation • Implementation Timeline • Conclusion
  • 5.
    5 Part 1/7: ProblemStatement • Manga Dialogue Extraction is the task of automatically identifying and extracting text from speech bubbles in manga pages using Computer Vision techniques. Expected Output: → “did you…” → “run out of cash?”
  • 6.
    6 Part 1/7: ProblemStatement • Manga Dialogue Extraction is the task of automatically identifying and extracting text from speech bubbles in manga pages using Computer Vision techniques. • Key objectives of Dialogue Extraction: • Detect speech bubbles accurately, even in the complex artwork. • Extract the dialogue text inside these bubbles for further processing (e.g., translation, dubbing, voiceover). • Handle challenges such as complicated backgrounds, diverse bubble styles and text.
  • 7.
    7 Part 2/7: Motivation •Manga dialogue within speech bubbles is essential for various tasks, including translation, audiobook, or manga-reading apps. • Manual text extraction is slow and error-prone, while manga pages themselves have challenges: • Irregular bubble shapes,
  • 8.
    8 Part 2/7: Motivation •Manga dialogue within speech bubbles is essential for various tasks, including translation, audiobook, or manga-reading apps. • Manual text extraction is slow and error-prone, while manga pages themselves have challenges: • Overlapping text and artwork,
  • 9.
    9 Part 2/7: Motivation •Manga dialogue within speech bubbles is essential for various tasks, including translation, audiobook, or manga-reading apps. • Manual text extraction is slow and error-prone, while manga pages themselves have challenges: • Diverse fonts.
  • 10.
    10 Part 2/7: Motivation •Manga dialogue within speech bubbles is essential for various tasks, including translation, audiobook, or manga-reading apps. • Manual text extraction is slow and error-prone, while manga pages themselves have challenges: • Irregular bubble shapes, • Overlapping text and artwork, • Diverse fonts. • An automated system using Computer Vision can make dialogue extraction faster, more accurate, and scalable for real-world applications.
  • 11.
    11 Part 3/7: ProblemApproach • Speech Bubble Detection: • This step identifies the locations of speech bubbles that contain dialogue. • Instead of relying on deep learning models, we explore traditional vision techniques: • Image Filters: e.g., Gaussian (smoothing), Sobel & Laplacian (edge detection) • Morphological Operations: e.g., dilation, erosion, opening to enhance bubble shapes • Histogram Methods: for analyzing intensity changes and edge patterns • Contour & Shape Analysis: to detect rounded or elliptical regions typical of bubbles
  • 12.
    12 Part 3/7: ProblemApproach • Text Extraction from Bubbles: • Localizing bubbles to: • Reduce noise from the background and non-text regions • Improve OCR accuracy and overall system speed • Once bubbles are localized, OCR tools (e.g., Tesseract, EasyOCR) are applied to extract dialogue.
  • 13.
    13 Part 4/7: ProjectOutcomes • Performance Comparison: • Evaluate traditional vs. deep learning methods for speech bubble detection • Metrics: Accuracy, Speed, and Memory Efficiency • End-to-End CLI Application: • Input: Manga-style images • Output: Extracted dialogue in JSON or TXT • Pipeline: Detect speech bubbles → Extract text using OCR • Final Report
  • 14.
    14 Part 5/7: Toolsand Datasets Preparation • Datasets: • Manga Collection: • Manga pages are crawled from online sources and saved as .jpeg images. • The dataset includes a variety of visual styles and page layouts. • Annotation for Training: • For deep learning methods used as benchmarks, annotated datasets are required. • We use the Roboflow platform to label speech bubble regions for training purposes. • Tools and Technologies: • Programming languages: Python • Libraries: OpenCV, Tesseract OCR, EasyOCR, Ultralytics
  • 15.
    15 Part 6/7: ImplementationTimeline • Data Collection & Annotation • Crawl manga images & annotate speech bubbles on Roboflow • Speech Bubble Detection • Implement traditional CV & benchmark YOLOv8 • OCR Integration • Extract dialogue using Tesseract / EasyOCR • Application & Evaluation • Build CLI app and test performance • Report Finalization • Document methods, results, and challenges
  • 16.
    17 Part 7/7: Conclusion •We presented a system for extracting dialogue from manga using computer vision techniques. • The approach was divided into two main steps: • Speech bubble detection using traditional methods and deep learning • Text extraction using OCR tools • The CLI-based tool and annotated dataset provide a foundation for further development. • Future work includes: • Improving detection accuracy for complex layouts • Adapting OCR for stylized and handwritten manga fonts • Expanding to multi-language manga content
  • 17.