SlideShare a Scribd company logo
1 of 38
Download to read offline
A Corpus of Chinese Comic Books:
Database, Metadata, and Visual Object
Recognition
Matthias Arnold, HRA, Universität Heidelberg
Agenda
• Project history
• Approaching the material
• Achievements
• Automatic object detection
• New system and user annotation
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
“Leihbibliothek für Kinder an der Straße“, She Zeh Tschi, Der Holzschnitt im Neuen China, Katalog, Dresden 1951, p.101
图为上世纪80年代初的孩子们留连在小人书摊的情景。http://www.dili360.com/ch/article/p5350c3d9d48d394.htm
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Project History
2009-11 Main digitisation project
Funding: Cluster of Excellence „Asia and Europe“ and Institute
of Chinese Studies
Scanning: MediaLab at Cluster
First database: eXist-db, metadata in MODS XML
Presentation at Cartoon Museum Basel „Visual Words - Comics
from China“ (2010/11)
Content expansion and separation of books and stories
Image analysis project with Computer Vision (radiances)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Enduser interface (2010)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Project History
2009-11 Main digitisation project
Funding: Cluster of Excellence „Asia and Europe“ and Institute
of Chinese Studies
Scanning: MediaLab at Cluster
First database: eXist-db, metadata in MODS XML
Presentation at Cartoon Museum Basel „Visual Words - Comics
from China“ (2010/11)
Content expansion and separation of books and stories
Image analysis project with Computer Vision (radiances)
2018: Data migration (ongoing):
Mongo DB, ingest in XML, image service (IIIF), browse - search
- filter, user annotation through Mirador viewer
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Approaching the material
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
趙百萬 Zhao Baiwan (1951)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
红灯记 Hong deng ji (1970)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
鲁迅和青年的故事 Lu Xun he qingnian de gushi (1976)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
1. Spreadsheets
2. XML Database (MODS records) and frontend
3. Refined metadata schema (books & stories, agents), SQL
database
4. Mongo database, IIIF image service, user annotations in
Mirador
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Special cases
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
生死緣 Sheng si yuan (1953)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
高歌猛进 Gaoge mengjin (1952)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Project Achievements
1. Scans from 5 collections (1250/1031 books)
2. Digitize material (greysacale .tif @600dpi, ca. 4 TB)
3. Record all metadata (as provided on books)
4. Provide online access to metadata
5. Provide online access to full books (read books)
6. Open data for research annotations (Mirador)
7. Explore automatic content analysis (computer vision)
8. *Link data to authorities (agents)
9. **Generate fulltext
10. **Explore generic content description (XML)
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Automatic object detection
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
• Chinese comics from the
second half of the Cultural
Revolution
• Over 1200 books (~120,000
pages)
• Grayscale .tif images @600 dpi
~ 4,5 Tb data
• Focus: comic book production
of late 1960s and 1970s from
China.
• Shows the diversity of Chinese
comic production in general
• Special type of emphasis
(heroes, symbolic objects or
idols)
Specific type of emphasis: radiance
Monroy et. al. (2011)
Monroyet.al.(2011)
Monroyet.al.(2011)
Outcome of experiment
• Automatic detection of objects using radiances
• No training data, multiple categories, multiple scales,
intense clutter, high object variability
• However: no tool for re-use
• …for object detection, text-image separation, auto-fulltext
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
New system and user annotation
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Website:
http://comics.freizo.org
Use of Mirador for (manual) annotation
Arnold/Decker - Annotationssysteme für Bild- und Videomedien, 2016-01-26 35
Standards - Interoperability
1. Image service:
IIIF Image API using standardized IIIF image call, http://iiif.io
2. Image annotation:
Web Annotation using Mirador IIIF viewer
http://w3c.github.io/web-annotation/
3. Bibliographic metadata: MODS XML for library catalogs
4. Agent‘s data: link to authorities (e.g. GND or VIAF)
5. Textual data (ideas):
Publish metadata on research data platform (HeiDATA)
Texts (stories, pre-/postface, paratext) in TEI XML
Structure in CBML?
6. Re-use and enhancements:
Include samples in Graphic Narrative or Vis. Lang Res. Corpus?
Find partners for automatical (Chinese) comics analysis
References
• Seifert, Andreas. Bildgeschichten für Chinas
Massen: Comic und Comicproduktion im 20.
Jahrhundert. Köln: Böhlau, 2008.
• Monroy, Antonio, Tobias Kröger, Matthias
Arnold, and Björn Ommer. “Parametric Object
Detection for Iconographic Analysis.” In
SCCH11, 1-8. Heidelberg, 2011.
• “Reddition : Reddition 63.” Edition Alfons, Dec.
2015. https://www.reddition.de/index.php/shop/reddition/reddition-63-
detail.
• Database:
http://comics.freizo.org
Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
Contact
Matthias Arnold
Heidelberg Research Architecture
Cluster of Excellence “Asia and Europe in a Global Context”
Heidelberg Centre for Transcultural Studies | HCTS
Karl Jaspers Centre
Voßstr. 2 | Building 4400 | Room 005b
69115 Heidelberg, Germany
arnold@asia-europe.uni-heidelberg.de
http://www.asia-europe.uni-heidelberg.de

More Related Content

What's hot

Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
 
Opening up the National Gallery’s Collection Information
Opening up the National Gallery’s Collection InformationOpening up the National Gallery’s Collection Information
Opening up the National Gallery’s Collection InformationMuseums Computer Group
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataLora Aroyo
 
Integration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and OntologiesIntegration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and OntologiesRoberto García
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddataStefan Gradmann
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madridStefan Gradmann
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionSören Auer
 
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainer
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainerBHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainer
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainerHeimo Rainer
 
ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniDataStadt Wien
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWGlen Robson
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...BigData_Europe
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
 

What's hot (20)

Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
 
Prague Hacks 2015
Prague Hacks 2015Prague Hacks 2015
Prague Hacks 2015
 
Open statistics Belgium
Open statistics BelgiumOpen statistics Belgium
Open statistics Belgium
 
Opening up the National Gallery’s Collection Information
Opening up the National Gallery’s Collection InformationOpening up the National Gallery’s Collection Information
Opening up the National Gallery’s Collection Information
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked Data
 
Weso research group
Weso research groupWeso research group
Weso research group
 
Integration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and OntologiesIntegration and Exploration of Financial Data using Semantics and Ontologies
Integration and Exploration of Financial Data using Semantics and Ontologies
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddata
 
Data Mining Newspapers Metadata
Data Mining Newspapers MetadataData Mining Newspapers Metadata
Data Mining Newspapers Metadata
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
20130711 records2 graphs_madrid
20130711 records2 graphs_madrid20130711 records2 graphs_madrid
20130711 records2 graphs_madrid
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and Discussion
 
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainer
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainerBHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainer
BHL-Europe_metadata_harmonisation_TDWG_20111018_kollerw_hrainer
 
ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniData
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLW
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
 

Similar to Jingjing Zhangzhang@asia-europe.uni-heidelberg.deMatthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19

Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Matthias Arnold
 
Text and Data Mining at Springer Nature
Text and Data Mining at Springer NatureText and Data Mining at Springer Nature
Text and Data Mining at Springer NatureSpringer Nature
 
Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...BOBCATSSS 2017
 
Estermann wikidata performing-arts-20181109
Estermann wikidata performing-arts-20181109Estermann wikidata performing-arts-20181109
Estermann wikidata performing-arts-20181109Beat Estermann
 
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020Beat Estermann
 
07 reusable padfield
07 reusable padfield07 reusable padfield
07 reusable padfieldShareCareX
 
3D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
3D-printing with GRASS GIS – a work in progress in report FOSS4G 20143D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
3D-printing with GRASS GIS – a work in progress in report FOSS4G 2014Peter Löwe
 
Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Marton Nemeth
 
opening up japanese resources 4 linked data cloud
opening up japanese resources 4 linked data cloudopening up japanese resources 4 linked data cloud
opening up japanese resources 4 linked data cloudeveline wandl-vogt
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Peter Löwe
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesRichard Wallis
 
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)Matthias Arnold
 
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)Vladimir Alexiev, PhD, PMP
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria
 
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015KISK FF MU
 
IIIF for CNI Spring 2014 Membership Meeting
IIIF for CNI Spring 2014 Membership MeetingIIIF for CNI Spring 2014 Membership Meeting
IIIF for CNI Spring 2014 Membership MeetingTom-Cramer
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon
 
Estermann wd glam-intro_20181204
Estermann wd glam-intro_20181204Estermann wd glam-intro_20181204
Estermann wd glam-intro_20181204Beat Estermann
 

Similar to Jingjing Zhangzhang@asia-europe.uni-heidelberg.deMatthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19 (20)

Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
 
Text and Data Mining at Springer Nature
Text and Data Mining at Springer NatureText and Data Mining at Springer Nature
Text and Data Mining at Springer Nature
 
Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...
 
Estermann wikidata performing-arts-20181109
Estermann wikidata performing-arts-20181109Estermann wikidata performing-arts-20181109
Estermann wikidata performing-arts-20181109
 
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
 
07 reusable padfield
07 reusable padfield07 reusable padfield
07 reusable padfield
 
3D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
3D-printing with GRASS GIS – a work in progress in report FOSS4G 20143D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
3D-printing with GRASS GIS – a work in progress in report FOSS4G 2014
 
Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...
 
opening up japanese resources 4 linked data cloud
opening up japanese resources 4 linked data cloudopening up japanese resources 4 linked data cloud
opening up japanese resources 4 linked data cloud
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Open Fashion & Europeana Fashion
Open Fashion & Europeana FashionOpen Fashion & Europeana Fashion
Open Fashion & Europeana Fashion
 
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)
Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO)
 
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
Museum LOD (Ontotext, 1 May 2019, Doha, Qatar)
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
 
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
 
IIIF for CNI Spring 2014 Membership Meeting
IIIF for CNI Spring 2014 Membership MeetingIIIF for CNI Spring 2014 Membership Meeting
IIIF for CNI Spring 2014 Membership Meeting
 
Europeana datainaction nov2012
Europeana datainaction nov2012Europeana datainaction nov2012
Europeana datainaction nov2012
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 
Estermann wd glam-intro_20181204
Estermann wd glam-intro_20181204Estermann wd glam-intro_20181204
Estermann wd glam-intro_20181204
 

More from Matthias Arnold

Ocr workshop ubhd 2020 10-15
Ocr workshop ubhd  2020 10-15Ocr workshop ubhd  2020 10-15
Ocr workshop ubhd 2020 10-15Matthias Arnold
 
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...Matthias Arnold
 
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – ArchivierungMatthias Arnold
 
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...Matthias Arnold
 
The Chinese Women’s Magazines Database
The Chinese Women’s Magazines DatabaseThe Chinese Women’s Magazines Database
The Chinese Women’s Magazines DatabaseMatthias Arnold
 
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...Matthias Arnold
 
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.Matthias Arnold
 
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...Matthias Arnold
 

More from Matthias Arnold (9)

Ocr workshop ubhd 2020 10-15
Ocr workshop ubhd  2020 10-15Ocr workshop ubhd  2020 10-15
Ocr workshop ubhd 2020 10-15
 
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...
Republikzeitliche chinesische Presse – Crowdsourcing und andere Wege in Richt...
 
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung
(Projekt)Ende gut – Alles gut? Benutzbarkeit – Verfügbarhaltung – Archivierung
 
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...
Die Erschließung eines vielsprachigen bibliographischen Korpus: Der Turkologi...
 
The Chinese Women’s Magazines Database
The Chinese Women’s Magazines DatabaseThe Chinese Women’s Magazines Database
The Chinese Women’s Magazines Database
 
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...
Videoannotationsdatenbank Pan.do/ra in der HRA ("Loosing my religion" - Kunst...
 
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
 
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...
Periodicals and Newspapers in Database Projects of the Heidelberg Research Ar...
 
Ziziphus/Tamboti
Ziziphus/TambotiZiziphus/Tamboti
Ziziphus/Tamboti
 

Recently uploaded

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 

Recently uploaded (20)

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 

Jingjing Zhangzhang@asia-europe.uni-heidelberg.deMatthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19

  • 1. A Corpus of Chinese Comic Books: Database, Metadata, and Visual Object Recognition Matthias Arnold, HRA, Universität Heidelberg
  • 2. Agenda • Project history • Approaching the material • Achievements • Automatic object detection • New system and user annotation Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 3. “Leihbibliothek für Kinder an der Straße“, She Zeh Tschi, Der Holzschnitt im Neuen China, Katalog, Dresden 1951, p.101
  • 5. Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 6. Project History 2009-11 Main digitisation project Funding: Cluster of Excellence „Asia and Europe“ and Institute of Chinese Studies Scanning: MediaLab at Cluster First database: eXist-db, metadata in MODS XML Presentation at Cartoon Museum Basel „Visual Words - Comics from China“ (2010/11) Content expansion and separation of books and stories Image analysis project with Computer Vision (radiances) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 7. Enduser interface (2010) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Project History 2009-11 Main digitisation project Funding: Cluster of Excellence „Asia and Europe“ and Institute of Chinese Studies Scanning: MediaLab at Cluster First database: eXist-db, metadata in MODS XML Presentation at Cartoon Museum Basel „Visual Words - Comics from China“ (2010/11) Content expansion and separation of books and stories Image analysis project with Computer Vision (radiances) 2018: Data migration (ongoing): Mongo DB, ingest in XML, image service (IIIF), browse - search - filter, user annotation through Mirador viewer Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 14. Approaching the material Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 15. 趙百萬 Zhao Baiwan (1951) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 16. 红灯记 Hong deng ji (1970) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 17. 鲁迅和青年的故事 Lu Xun he qingnian de gushi (1976) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 18. 1. Spreadsheets 2. XML Database (MODS records) and frontend 3. Refined metadata schema (books & stories, agents), SQL database 4. Mongo database, IIIF image service, user annotations in Mirador Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 19. Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 20.
  • 21. Special cases Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 22. 生死緣 Sheng si yuan (1953) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 23. 高歌猛进 Gaoge mengjin (1952) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 24. Project Achievements 1. Scans from 5 collections (1250/1031 books) 2. Digitize material (greysacale .tif @600dpi, ca. 4 TB) 3. Record all metadata (as provided on books) 4. Provide online access to metadata 5. Provide online access to full books (read books) 6. Open data for research annotations (Mirador) 7. Explore automatic content analysis (computer vision) 8. *Link data to authorities (agents) 9. **Generate fulltext 10. **Explore generic content description (XML) Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 25. Automatic object detection Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 26. Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19 • Chinese comics from the second half of the Cultural Revolution • Over 1200 books (~120,000 pages) • Grayscale .tif images @600 dpi ~ 4,5 Tb data • Focus: comic book production of late 1960s and 1970s from China. • Shows the diversity of Chinese comic production in general • Special type of emphasis (heroes, symbolic objects or idols)
  • 27. Specific type of emphasis: radiance
  • 28. Monroy et. al. (2011)
  • 31. Outcome of experiment • Automatic detection of objects using radiances • No training data, multiple categories, multiple scales, intense clutter, high object variability • However: no tool for re-use • …for object detection, text-image separation, auto-fulltext Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 32. New system and user annotation Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 34. Use of Mirador for (manual) annotation
  • 35. Arnold/Decker - Annotationssysteme für Bild- und Videomedien, 2016-01-26 35
  • 36. Standards - Interoperability 1. Image service: IIIF Image API using standardized IIIF image call, http://iiif.io 2. Image annotation: Web Annotation using Mirador IIIF viewer http://w3c.github.io/web-annotation/ 3. Bibliographic metadata: MODS XML for library catalogs 4. Agent‘s data: link to authorities (e.g. GND or VIAF) 5. Textual data (ideas): Publish metadata on research data platform (HeiDATA) Texts (stories, pre-/postface, paratext) in TEI XML Structure in CBML? 6. Re-use and enhancements: Include samples in Graphic Narrative or Vis. Lang Res. Corpus? Find partners for automatical (Chinese) comics analysis
  • 37. References • Seifert, Andreas. Bildgeschichten für Chinas Massen: Comic und Comicproduktion im 20. Jahrhundert. Köln: Böhlau, 2008. • Monroy, Antonio, Tobias Kröger, Matthias Arnold, and Björn Ommer. “Parametric Object Detection for Iconographic Analysis.” In SCCH11, 1-8. Heidelberg, 2011. • “Reddition : Reddition 63.” Edition Alfons, Dec. 2015. https://www.reddition.de/index.php/shop/reddition/reddition-63- detail. • Database: http://comics.freizo.org Matthias Arnold, HRA, Uni Heidelberg - Chinese Comics Database | Workshop Comics Annotation | Potsdam 2018-06-19
  • 38. Contact Matthias Arnold Heidelberg Research Architecture Cluster of Excellence “Asia and Europe in a Global Context” Heidelberg Centre for Transcultural Studies | HCTS Karl Jaspers Centre Voßstr. 2 | Building 4400 | Room 005b 69115 Heidelberg, Germany arnold@asia-europe.uni-heidelberg.de http://www.asia-europe.uni-heidelberg.de