Building a Semantic Web of Comic Book Metadata: User Application Profiles for Publishing Linked Data in HTML/RDFa
Kent State University - November 11, 2014
The objective of this research was to present a case study for developing a domain ontology, and explore methodologies for improving the usability and potential usage of that vocabulary through the development of interoperable metadata application profiles designed for specific groups of users within a community. This objective was realized by the development of a metadata vocabulary for comic books and comic book collections, and a series of metadata application profiles designed for publishing Linked Data in the content of existing information systems using HTML/RDFa. Semantic Web standards and technologies represent an opportunity for connecting data about comic books and graphic novels in LOD datasets with detailed, community-created data on the open Web. Recognizing the potential for an open exchange of data about comic books and graphic novels, a case study was designed to gain a comprehensive understanding of the domain and develop an effective data model. The initial phase of the study involved a review of information and reference resources, acquisition of example materials, and practical experience gained indexing comics in a collaborative Web database. A metamodel for comics was then developed and realized as an XML schema, with those elements mapped as properties to classes in an OWL ontology. In order to align the ontology with the wider Web environment and validate the model, the final phase of the case study explored external sources through a review of existing information systems and an analysis of their content. Results were then summarized as skeleton, data-driven user persona documents, which were used to guide the design of a series of metadata application profiles representing the functional requirements identified. The profiles build upon a core schema and incorporate elements from other Web vocabularies as necessary, focusing on publishing Linked Data in existing information systems using HTML/RDFa. Examples were explored and validated for their ability to link to LOD resources and produce meaningful, valid RDF data consistent with the Ontology. The final result is a flexible and extensible, semantic model for comics. The Comic Book Ontology (CBO) as an RDFS/OWL vocabulary is compatible with a variety of other systems, including next-generation library catalogs, where it can potentially be used in a collaborative exchange of data to describe relationships between comics material and content not previously available. This study demonstrates how an ontology can be applied to existing collaborative projects, database, content, or research to enhance the visibility, reference, and utilization of those endeavors through their publication as Linked Data.
3. 2.0 Problem
• No open, shared metadata vocabulary or standard for comics
• No domain ontology or comics Web vocabulary
• No model that describes all dimensions of a comic
• Comic book data exists in a variety of systems/formats
• Most community data can be found in spreadsheets (CSV), or
existing hypertext (HTML) systems
5. 3.0 Objectives
• Develop a domain ontology and metadata vocabulary for
comic books and comic book collections
• Improve the usability of the vocabulary by creating
application profiles for publishing Linked Data in
HTML/RDFa
6. 4.0 Methodology
• Case Study
• Phase I
• Example Materials
• Domain Model
• Pilot Study
• Phase II
• System Review
• Content Analysis
• Personas
13. 6.1 Pilot – Workflow
STEP1: RAW DATA (CSV)
XML Map
STEP 2: XML
STEP 3: RDF
XSLT
comicmeta.org/tools/core-convert
14. 6.2 Pilot - Publishing Linked Data
• Method 1: Parallel (Additional Dataset/Service)
• Method 2: Inline (Existing HTML Content)
• Problem 1: Understanding of RDFa/Microdata syntax
• Problem 2: Understanding of Web vocabularies
(Pohorec et al., 2013)
30. 5.0 References
Duncan, R., & Smith, M. J. (2013). The power of comics: History, form, & culture.
New York: Bloomsbury Academic.
Jaffe, M., & Holm, J. L. (2014). Raising a reader! How comics & graphic novels
can help your kids love to read! New York, NY: Comic Book Legal Defense Fund.
Pohorec, S., Zorman, M., & Kokol, P. (2013, November). Analysis of approaches
to structured data on the web. Computer Standards & Interfaces, 36(1), 256-262.
doi:10.1016/j.csi.2013.06.003
Editor's Notes
Comic Book Convention
Traditional Conventions
Dealers/Shopowners Backissues
Collectors/Readers fill “Runs” of favorite series/volume, character, Publishers
Represent serial/magazine collected in GraphicNovel/TPB
Libraries, Bookstores
Interesting Object to describe
Many bibliographic relationships
Publication, Document, Story, Artwork
What metadata was available/used?
What data was available?
No lib. Background/Opportunity to Apply Concepts
No open/shared metadata vocab
No comics specific Web vocabulary
No model for all dimensions
Many participants, many systems/formats
Most data in CSV/HTML (no Standard)
Finding Aid for Notes about Story
DBpedia entry for Story
Community data for Story
Authoritative data for Writer
Objective
Domain ontology for comics
Existing hypertext -> Build application profiles to aid HTML/RDFa pub.
Methodology
Case Study for overview of domain
Phase I: Collect Reference materials, Volunteer for GCD exp. Web data about Comics, and collect materials.
Phase II: Preliminary user research method. (start of project) -> broader view
Story Example
Issues in Story Arc
Reprints/Adapts
Copy Example
Issue
Translation
Copy -> Grading, summary of Condition, Serial Number
Artwork Example
Artwork
Page (in Layers->Pencils, Inks, Colors, Final)
Collected by Archives/Museums
Saved by Collectors
Domain Model for Ontology
Aligned with FRBR, inherent Biblio. Object
Work -> Issue
Expression -> Translation (also Audiobook, Dig Comic: Motion, Sound “Experienced differently”)
Manifestation -> Reprints, Variants
2nd Level: Copy is Manifestation of Manifestation
Grades, Price Guide
Item -> Concrete, physical item
Pilot Study: Develop Structures
XML
Common properties
All levels of description
OWL
Map prop. to classes/concepts
Difference
XML is closed (a db schema)
OWL is open
Properties/info from other sources
Pilot Study: Workflow
Map from common CSV to XML, to RDF
Pilot Study: Workflow
Step 1: Take spreadsheet map columns to XML
Using Software
Step 2: Generate XML records
Using Software
Step 3: Generate RDF
Using XSLT stylesheet, “transforms syntax”
Packaged into WebUtility
“Converts” does not “Publish”
How to connect resources?
2 Methods for Linked Data: Parallel/Inline
2 Problems: Understand syntax/Knowledge of Web Vocab
Web vocabs
What properties to use?
How to accomplish tasks?
How to combine vocabs?
Not data dictionaries->concept models?
Is data saying the right thing?
Modularize Onto. / Data Collection
Gain broader view/external source
Review existing information systems
Analyze content
Attempt to summarize goals/requirements
Aggregate research from Phase I + II in Persona for Design Phase
Phase II – System Review
5 Agents
4 Systems each, Quality data, Candidate
Criteria
3 Categories
Tasks: End User
Features: End User / Data
Data: Structure, Format, Markup
Agents identified in Communication Model of Comics
A Source, B Delivery, C Receivers
Phase II – Content Analysis
Content Object, Descriptive Comic Book Data
Visible Content / Markup carries Structured Data/Annotations
Process:
Group related areas
Assign identifier to each chunk of info
Distinguish by type of Metadata
Content Objects:
Lists of Items
Issue Details
Collected Edition Details
Phase II – Content Analysis
Summarize list of labeled data points by Content Object
Add terms to database
Join all lists, pull Distinct values using SQL
Phase II – Persona
Analysis summarized in User Persona
Describes User Group in relationship to System/Data
Aggregates research from Phase I + II
Identifies Goals and Requirements
System Review
Content Analysis
Reference for Design Phase -> Alignment/APs
Alternative to revising data/research.
Combines research from Multiple Sources
Analysis and Alignment
Step 1: Compare data points between systems
Step 2: Compare data points between groups
Overlapping requirements
Step 3: Align Ontology
Add properties from other vocab where necessary
Subclass where term was used freq.
Findings/Results
Final result is Ontology for Comic Books
1 core profile, identifies a Resource at all Levels of Description
User Profiles to address req. of specific Groups
When combined with base schema, retain Interoperability with Core and Ontology
Owl Ontology
Data Models
Work Model
Dimensions of Comic
Publication
Series/Volume not used uniformly
Aligned with Schema, wider Web of Data
Does not address Content
Universe Model
Hard to Separate
Important for Collocating Material
Important Access Points
Also a Creative Work, not a Comic
Summary of Creative Endeavors
Ontology within Ontology (“Cartoon Universe”)
Not a fictional world
Comics often refer to real people, places, events, auto-biographical
Does not assert Thing is fictional, just that it is an Avatar in a Comic Universe
Core Profile
Represents all levels of description to Item
Volume not used consistently
Often replaced by Series Year
Neither property mandatory
Infers resource to have qualities of both Series and Volume
Data consumer/app can decide how to Split Entities if at all
User Profile
Retailer AP -> Sell Product
Includes additional properties
Add schema.org vocab to Retailer AP subset
Better modeled in other vocab
Use of Persona in Design Phase
Library AP
Library offer product/service
Digital comic book libraries
Compare “Like” systems
Display holding using same schema.org method
Availability: Online
Conclusion
Demonstrates using Core AP + User AP to address Func. Req. / Goals of User Groups
In Publishing Linked Data in HTML/Rdfa in existing hypertext systems
Building Graph of Resources from HTML
Future Studies
Study touched on Source, Deliver, Receivers
Other Aspects of Comics Culture
Conventions, Cosplay, Fan Art/Stories
Linking other resources, research, scholarship -> Topic OF/About
Especially digital encodings using CBML
Social Cataloging / NextGen
Compatible with BIBFRAME/RDFA as RDF model
Used alongside lib data
One application: Good material to encourage Reading, delicate Subjects