5
Part 1/7: ProblemStatement
• Manga Dialogue Extraction is the task of automatically
identifying and extracting text from speech bubbles in manga
pages using Computer Vision techniques.
Expected Output:
→ “did you…”
→ “run out of
cash?”
6.
6
Part 1/7: ProblemStatement
• Manga Dialogue Extraction is the task of automatically
identifying and extracting text from speech bubbles in manga
pages using Computer Vision techniques.
• Key objectives of Dialogue Extraction:
• Detect speech bubbles accurately, even in the complex artwork.
• Extract the dialogue text inside these bubbles for further processing (e.g.,
translation, dubbing, voiceover).
• Handle challenges such as complicated backgrounds, diverse bubble
styles and text.
7.
7
Part 2/7: Motivation
•Manga dialogue within speech bubbles is essential for various
tasks, including translation, audiobook, or manga-reading apps.
• Manual text extraction is slow and error-prone, while manga
pages themselves have challenges:
• Irregular bubble shapes,
8.
8
Part 2/7: Motivation
•Manga dialogue within speech bubbles is essential for various
tasks, including translation, audiobook, or manga-reading apps.
• Manual text extraction is slow and error-prone, while manga
pages themselves have challenges:
• Overlapping text and artwork,
9.
9
Part 2/7: Motivation
•Manga dialogue within speech bubbles is essential for various
tasks, including translation, audiobook, or manga-reading apps.
• Manual text extraction is slow and error-prone, while manga
pages themselves have challenges:
• Diverse fonts.
10.
10
Part 2/7: Motivation
•Manga dialogue within speech bubbles is essential for various
tasks, including translation, audiobook, or manga-reading apps.
• Manual text extraction is slow and error-prone, while manga
pages themselves have challenges:
• Irregular bubble shapes,
• Overlapping text and artwork,
• Diverse fonts.
• An automated system using Computer Vision can make dialogue
extraction faster, more accurate, and scalable for real-world
applications.
11.
11
Part 3/7: ProblemApproach
• Speech Bubble Detection:
• This step identifies the locations of speech bubbles that contain dialogue.
• Instead of relying on deep learning models, we explore traditional vision
techniques:
• Image Filters: e.g., Gaussian (smoothing), Sobel & Laplacian (edge detection)
• Morphological Operations: e.g., dilation, erosion, opening to enhance bubble
shapes
• Histogram Methods: for analyzing intensity changes and edge patterns
• Contour & Shape Analysis: to detect rounded or elliptical regions typical of bubbles
12.
12
Part 3/7: ProblemApproach
• Text Extraction from Bubbles:
• Localizing bubbles to:
• Reduce noise from the background and non-text regions
• Improve OCR accuracy and overall system speed
• Once bubbles are localized, OCR tools (e.g., Tesseract, EasyOCR) are
applied to extract dialogue.
13.
13
Part 4/7: ProjectOutcomes
• Performance Comparison:
• Evaluate traditional vs. deep learning methods for speech bubble detection
• Metrics: Accuracy, Speed, and Memory Efficiency
• End-to-End CLI Application:
• Input: Manga-style images
• Output: Extracted dialogue in JSON or TXT
• Pipeline: Detect speech bubbles → Extract text using OCR
• Final Report
14.
14
Part 5/7: Toolsand Datasets Preparation
• Datasets:
• Manga Collection:
• Manga pages are crawled from online sources and saved as .jpeg images.
• The dataset includes a variety of visual styles and page layouts.
• Annotation for Training:
• For deep learning methods used as benchmarks, annotated datasets are required.
• We use the Roboflow platform to label speech bubble regions for training purposes.
• Tools and Technologies:
• Programming languages: Python
• Libraries: OpenCV, Tesseract OCR, EasyOCR, Ultralytics
15.
15
Part 6/7: ImplementationTimeline
• Data Collection & Annotation
• Crawl manga images & annotate speech bubbles on Roboflow
• Speech Bubble Detection
• Implement traditional CV & benchmark YOLOv8
• OCR Integration
• Extract dialogue using Tesseract / EasyOCR
• Application & Evaluation
• Build CLI app and test performance
• Report Finalization
• Document methods, results, and challenges
16.
17
Part 7/7: Conclusion
•We presented a system for extracting dialogue from manga using
computer vision techniques.
• The approach was divided into two main steps:
• Speech bubble detection using traditional methods and deep learning
• Text extraction using OCR tools
• The CLI-based tool and annotated dataset provide a foundation
for further development.
• Future work includes:
• Improving detection accuracy for complex layouts
• Adapting OCR for stylized and handwritten manga fonts
• Expanding to multi-language manga content