@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Tiers of Abstraction and Audience
In Cultural Heritage Data Modeling
Rob Sanderson
Semantic Architect
rsanderson@getty.edu
@azaroth42
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Conclusion
encodes
refines
specialized by implemented by
serves
serves
Consistent
Correct/Complete
Connected
Collaborative
Usable
Model
Ontology
Vocabulary
Profile
API
Human
Machine
Network
Research
enhances
Abstraction AudienceImplementation
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Overview
• Introduction – Digital Cultural Heritage
• Tiers of Abstraction
• Separation of concerns between model, ontology, vocabulary
• Implementations and Usability
• The importance of usability and the role of profiles
• Tiers of Audience
• Progressive enhancement to serve increasing audience needs
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Cultural Heritage?
• Libraries
• Value is the content, objects are merely carriers
• Archives
• Value is in the collection of information-carrying objects
• Museums
• Value is in the object
• Conservation (Science)
• Value is in research and conservation activities
• Aggregators
• All of the above!
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Digital Heritage
Non-rivalrous
Use/consumption by an individual
does not reduce use by others
Consistency is Good
Experience is better if the product
reuses known interaction patterns
Fewer is Better
Having few highly functional and
usable digital products improves
community and sustainability
Rivalrous
Use/consumption by one individual
reduces simultaneous use by others
Diversity is Good
Experience is better if the resource is
novel, innovative and emotive
More is Better
Having many diverse cultural heritage
resources available improves impact
to the user
Cultural
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Non-rivalrous
Consistency is Good
Fewer is Better
Rivalrous
Diversity is Good
More is Better
Bridging the Digital / Cultural Divide?
Openly licensed, published, digitized surrogates with data
Community developed, open, usable digital standards
Many publishers, using collaborative, sustainable products
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Goal
Engineer a socio-technical ecosystem in which standards
are agreed upon and used to publish diverse cultural
heritage knowledge, that is easily usable by sustainable
applications to further public engagement and research
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Abstraction Levels
• Conceptual Model
• Abstract way to think about the domain
• Ontology
• Shared format to encode that thinking
• Vocabulary
• Curated set of domain specific terms,
to make the ontology concrete
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Conceptual Model
• Shared understanding of the knowledge
management framework
• Technology agnostic:
potential for increased participation
• Methodology for conflict resolution
• Risk of unending philosophy arguments
• Challenge to write good documentation
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Examples: Conceptual Model
• CIDOC-CRM (Museums, Conservation)
• Robust, complex model
• BIBFRAME (Libraries)
• Almost undocumented model
• Resource in Context (Archives)
• Early, ongoing model
• Europeana (Aggregation)
• Model not separately documented from ontology
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
CIDOC Conceptual Reference Model
what
when
who
where
/
how
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
BibFrame Model
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Resource in Context Model
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Model vs Ontology
encodes
Model
Ontology
• Single, new namespace or reuse terms?
• Easier argument that new terms are needed,
as other terms reflect other conceptual models
• Reuse of terms can take place downstream
• Opaque term names vs human-readable?
• Human-readable! The model gives the abstraction,
the ontology can be encoded in different ways if
needed.
• Ontology can use technology features
• Ontology encodes, not defines, the model
• RDFS vs OWL ; json-schema vs xml-schema
• Property Graphs, Named Graphs, simple Trees
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Case Study: CRM’s meta properties
CIDOC-CRM has meta-properties
• Property Graph – Native support for meta-properties
• Named Graph – Name the triple, and assert the role
• Reification – Map the property to a class with a role
• Partitioning – Separate part of the production with a role
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Ontology vs Vocabulary
encodes
refines
Model
Ontology
Vocabulary
• Separation of Knowledge Management?
• Model and Ontology should be lean and general to
ensure breadth of use
• Modelers are often not domain experts, and domain
thesauri exist outside of any model
• Every Identity, Its Ontology
• Description of the vocabulary entity requires
ontology, so the separation is complex
• Concepts (AAT, ICONCLASS, LCSH) more appropriate
than Things (ULAN, VIAF, TGN, Geonames, …)
• Model needs to recognize Vocabulary
• Need to have the right slots for external vocab terms
• Otherwise ontology will take up the responsibility
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
CIDOC-CRM and Vocabularies
P2 has type (is type of)
This property allows sub typing of CIDOC CRM entities - a form of
specialization – through the use of a terminological hierarchy, or
thesaurus.
The CIDOC CRM is intended to focus on the high-level entities and
relationships needed to describe data structures. Consequently, it does not
specialize entities any further than is required for this immediate purpose.
*
* This is very debatable ;)
“ ”
Every entity can have an external classification, keeping the model lean.
Examples:
Human Made Object Painting, Brush, Book, XRF Scanner, …
Identifier DOI, ISBN, Local, Accession Number, …
Dimension Height, Width, Duration, File Size, …
Linguistic Object Description, Article, Abstract, Chapter, …
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
BibFrame, RiCO and Vocabularies
• BibFrame relies on Ontology for classification…
Examples:
41 Identifier subclasses: Ansi, AudioTake, Barcode, Coden, …
9 Digital Characteristics: EncodingFormat, ObjectCount, Resolution, …
6 Titles: Title, KeyTitle, VariantTitle, AbbreviatedTitle, ParallelTitle, …
• RiCO expands a small set of model classes to a long list of ontology classes,
that could easily be solved with vocabulary…
Examples:
47 Relation subclasses: AccumulationRelation, … WorkRelation
14 Type subclasses: ActivityType … RoleType
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Completeness
encodes
refines
Correct/Complete
Model
Ontology
Vocabulary
• Goal for abstraction layer is completeness
• “No data left behind”
• Able to express everything that anyone might want to
document
• Goal for instance data is correctness
• “No data left behind”
• Any errors impact trust in the data, reputation of the
institution, and the reliability of research using the
data
• Perfect is the Enemy of the Good
• Left unchecked, abstraction, mapping and data
cleaning will consume all available time and effort
Abstraction
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Separating Abstraction and Implementation?
• Model, Ontology, Vocabulary
• Available classes, properties and instances
• API
• How to interact with the data over the network
• LOD does not separate Model / Ontology and API by
requiring the syntax to directly reflect the ontology
• LOD does not separate Vocabulary and API by requiring
the terms to be instances (as above)
• Solutions?
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
[Metadata] Application Profile
encodes
refines
specialized by
Correct/Complete
Model
Ontology
Vocabulary
Profile
For any given domain, there must be
selection of appropriate abstractions,
documented as an application profile.
The more abstract the base, the more
specification is needed for the profile.
The profile needs to act as the friction to
balance Completeness with Usability.
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Case Study: Linked Art
A Linked Open Usable Data profile for cultural heritage,
collaboratively designed to work across organizations,
that is easy to publish and use in consuming applications.
Design Principles:
• Focused on Usability, not 100% precision / completeness
• Consistently solves actual challenges from real data
• Development is iterative, as new use cases are found
• Solve 90% of use cases, with 10% of the effort
• https://linked.art/
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Linked Art Collaboration
Formalization of the profile in ICOM, funded by Kress & AHRC
• Getty
• Rijksmuseum
• Metropolitan Museum of Art
• Smithsonian
• MoMA
• V&A
• NGA
• Philadelphia Art Museum
• Indianapolis Art Museum
• The Frick Collection
• Princeton University
• Yale University
• Oxford University
• Academica Sinica
• ETH Zurich
• FORTH
• University of the Arts, London
• Canadian Heritage Info. Network
• American Numismatics Society
• Europeana
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Linked Art Profile
Model: CIDOC-CRM
Ontology: CRM-base, local extension, + cherry pickings
Vocabulary: Getty AAT, + cherry pickings
Look for consistent patterns across institutions’ data,
resulting in core patterns for ease of adoption
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Linked Art: Vocabulary Extensibility
Allow vocabulary to be extensible,
while the profile remains usable.
Solution: Required Meta-Types for non-enumerable cases
Reference: https://github.com/linked-art/linked.art/issues/186
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Application Programing Interface
encodes
refines
specialized by implemented by
Correct/Complete
Model
Ontology
Vocabulary
Profile
API
The API is how programmers interact with the data across system boundaries.
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Profile vs API
A Profile is a selection of appropriate abstractions,
to encode the scope of what can be described.
An API is a selection of appropriate technologies,
to give access to the data managed using the profile.
Scope
• Classes
• Properties and Relationships
• Structure of Graph
• Vocabulary Terms
Access
• Document format(s)
• Document structure and boundary
• URI patterns
• Operations: CRUD, Browse, Search
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Evaluation of APIs
/ht Michael Barth, Ulm University
 Abstraction level
 Comprehensibility
 Consistency
 Discoverability / Documentation
 Domain Correspondence
 Few Barriers to Entry
 Extensibility
 Infrastructure
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Case Study: IIIF APIs
1. Scope design through shared use cases
2. Design for international use
3. As simple as possible, but no simpler
4. Make easy things easy, complex things possible
5. Avoid dependency on specific technologies
6. Use REST / Don’t break the web
7. Separate concerns, keep APIs loosely coupled
8. Design for JSON-LD, using LOD principles
9. Follow existing standards, best practices, when possible
10. Define success, not failure (for extensibility)
https://iiif.io/api/annex/notes/design_patterns/
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Application Programing Interface
encodes
refines
specialized by implemented by
Correct/Complete
Model
Ontology
Vocabulary
Profile
API
APIs must be Usable by software developers for them to be adopted and used.
Usable
Usability is the
correction function
for the tendency
towards
Completeness.
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Usable vs Complete: Target Zone
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
From LOD to LOUD
https://linked.art/
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Standards are for Consistency
encodes
refines
specialized by implemented by
Consistent
Correct/Complete
Usable
Model
Ontology
Vocabulary
Profile
API
Abstraction Implementation
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Audiences
encodes
refines
specialized by implemented by
serves
serves
Consistent
Correct/Complete
Usable
Model
Ontology
Vocabulary
Profile
API
Human
Machine
Network
Research
Abstraction AudienceImplementation
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Audiences: Progressive Enhancement
• Data for: Humans - Strings
• Separate entities, with attached textual descriptions
• Data for: Machines - Structure
• Entities with machine-processable, comparable values
• Data for: The Network - d’Stributed
• Entities are connected across systems and institutions
• Data for: Research - Stringent
• Sufficient accuracy and comprehensiveness to answer
research questions from aggregated data
Human
Machine
Network
Research
enhances
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Audience: Humans
• Strings: Entities with descriptions
• Easy to do with existing data
• Regardless of information system, can export data as strings
• Easy on-ramp … need to start somewhere
• Serves important audience: everyone
• It’s our cultural heritage, after all! :)
• Data not Document
• Better than today, as encourages multiple interfaces and reuse
• Can be enhanced by third party with more resources
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Data for Humans
• Diagram - Name, Identifier, dimensions Description on a
thing
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Audience: Machines
• Structure: Connected, comparable values for machines
to process, rather than just display
• Comparison of Values
• Answering basic questions via dimensions, materials, age etc.
• Sorting entities by values rather than only computed relevance
• Indexing values
• Searching based on values rather than full text
• Facets require consistent, structured data
• Visualization
• Can only have visualization, rather than display, with structure
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Data for Machines
• Same thing with dimensions as structured data
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Audience: The Network
• d’Stributed: Entities connected across systems
• Part of the Web
• Not just on the web
• Shared Cataloging
• No need for everyone to describe everything
• Improved Discovery
• By leveraging the connections for search and reconciliation
• Towards Research Results
• By taking into account information across institutions, datasets
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Audience: Research
• Stringent: Answering research questions requires
sufficient aggregation, precision and completeness
• Requirement or Stretch Goal?
• Audience is relatively small, but important
• Cultural Sector: Entertainment or Educational?
• Requires Collaboration and Continuity
• To be cost effective, must be ongoing, sustainable resource
used for multiple projects
• Need for Contextualization of Knowledge?
• The process of knowledge capture, and meta-meta-data
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Research: Contextualized Data
• Data Provenance
• Creation of the data: Who, When and Why
• Uncertainty
• Confidence of the data owners in their data
• Localization of Knowledge
• When and Where does the data apply
• Considerations
• Who is the audience, human (string) or machine (structured)?
• Why do they want it?
• Context recorded at the dataset level, or at the assertion level?
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Context: Data Provenance
• Creation / Modification of the Data: Who, When, Why?
• Considerations
• Researcher (human) wants Confidence in the dataset
• Developer will ignore it if possible
• Internal use for structure (edits by X between t1 and t2)
but not external research use
• Dataset Level description in prose is fine for external use
• Otherwise, need named graphs per triple (expensive, support?)
or to reify everything (expensive, un-ignorable)
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Context: Uncertainty
• Confidence in the validity of the data?
• Considerations
• Researcher (human) wants Confidence in the dataset
• Developer will ignore it if possible
• Some valid uses:
• Possibly / Probably <some value>
• Machine generated scores
• Scoping: How do you want it to appear in search results?
• Needs to be in the data if at all
• Benefits from Data Provenance for meta-context
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Context: Localization
• Socio-spatio-temporal validity of the data?
• Considerations
• Researcher wants to know where and when the data is valid,
to know when it should be used: factual not social
• Developer will still just ignore it if possible
• Many valid uses:
• How and when were dimensions measured?
• Jurisdiction of the ownership right?
• Over what period was the person a painter?
• Needs to be in the data, and thus in the model
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Case Study: Proxies
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Case Study: Attribute Assignment
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Conclusion
encodes
refines
specialized by implemented by
serves
serves
Consistent
Correct/Complete
Connected
Collaborative
Usable
Model
Ontology
Vocabulary
Profile
API
Human
Machine
Network
Research
enhances
Abstraction AudienceImplementation
@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Thank You!
Rob Sanderson
rsanderson@getty.edu
@azaroth42
https://www.slideshare.net/azaroth42/
tiers-of-abstraction-and-audience-in-cultural-heritage-data-modeling

Tiers of Abstraction and Audience in Cultural Heritage Data Modeling

Editor's Notes

  • #5 Libraries – We don’t care about the paper, only the balance sheet Archives – Look at this complete set of every denomination (that we can’t describe because there’s too much of it) Museums – Look at this mint condition 1975 Roosevelt Dime! Conservation – The metal in a penny is worth more than a penny Aggregators – Please give us everything in only unmarked dollar bills
  • #8 In order to do that need to have appropriate abstractions and implementations that meet the technical and social needs of diverse stakeholders and user communities.
  • #9 Encode = machine-actionable
  • #10 Consistency of thinking about and managing diversity
  • #12 Many trees worth of documentation, most of which is utterly opaque.
  • #13 That’s it. There’s Work, Instance, Item, Agent, Event and Subject.
  • #14 RiC is in early stages, but clearly building upon a conceptual model that is then instantiated in an ontology. Hard to know if inspired by CRM, but certainly has some of its hallmarks.
  • #15 An ontology encodes a model, regardless of whether that model is separately documented. The advantage of the separation is the ability to have multiple ontologies encoding the same model, thereby having some degree of semantic interoperability, even if not necessarily technical interoperability. Few models as possible! Alignment between models helps to get to alignment between ontologies.
  • #16 Neo4J and similar systems have property graphs. Named Graphs standardized but not well implemented, and tend to add complexity. Only get one Named Graph, so use it wisely. Reification is generally unloved, but used in CRM’s ontology – PC14 is the class (the number of the property) and then pc14.1 for the role of relationship on that class. Partitioning avoids the issue at the expense of some semantic precision, which is what we try to do in Linked Art profile.
  • #17 Vocabulary allows subdomains to be specific about their content, within a more generalized model. This is important, as we want as few models as possible. Vocabularies might conceptually relate to the model generally, but given the description of the vocabulary term, needs to be thought of in terms of the ontology encoding of it as well.
  • #22 Ontologists gonna Ontologize, but you shouldn’t have to care. A profile selects the features and instances in order to meet the needs of the application domain.
  • #25 Subset of the model, ontology and vocabularies as appropriate. Name, Identifier, Contact Point, “Statement”, …
  • #29 Michael Barth has six fundamental features for API evaluation, which relate directly to the value of the API as a standard for use. This seems like a good starting point for standards for digital interoperability. Abstraction Level -- is the abstraction of the data and functionality appropriate to the audience and use cases. An end user of the "car" API presses a button or turns a key. A "car" developer needs access to engine directly. Comprehensibility -- is the audience able to understand how to use it to accomplish their goals Consistency -- if you know the "rules" of the API, how well does it stick to them? Or how many exceptions are there to a core set of design principles Documentation -- How easy is it to find out the functionality of the API? Domain Correspondence -- If you understand the core domain of the data and API, how closely does the understanding of the domain align with an understanding of the data? And what barriers to getting started are there? There are two more that I think are important.
  • #30 Note – these are all about access, not about modeling.
  • #32 Less important for IIIF to be complete as it’s about presentation not semantics. But for semantic description…
  • #36 Stressful but Strategic Stretch goal.
  • #37 Still within the framework of the profile and API – needs to be possible in the model, simultaneously with more structured data.
  • #38 Still within the framework of the profile and API – needs to be possible in the model, simultaneously with more structured data.
  • #44 Corpus Art History
  • #48 Painful to implement. ORE  Europeana, RICO.
  • #49 Knowledge Provenance, not data provenance. Adds evidence that the dimension was valid in 1986. Could add confidence or technique used. Doesn’t rely on reification, but doesn’t work everywhere.