Successfully reported this slideshow.
Your SlideShare is downloading. ×

Tiers of Abstraction and Audience in Cultural Heritage Data Modeling

Ad

@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Tiers of Abstraction and Audien...

Ad

@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Conclusion
encodes
refines
spec...

Ad

@azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Overview
• Introduction – Digit...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 50 Ad
1 of 50 Ad

Tiers of Abstraction and Audience in Cultural Heritage Data Modeling

Download to read offline

A walk through of a framework based around the distinctions between Abstraction, Implementation and Audience for considering the value and utility of data modeling patterns and paradigms in cultural heritage information systems. In particular, a focus on CIDOC-CRM, BibFrame, RiC-CM/RiC-O, EDM, and IIIF, with the intent to demonstrate best practices and anti-patterns in modeling.

A walk through of a framework based around the distinctions between Abstraction, Implementation and Audience for considering the value and utility of data modeling patterns and paradigms in cultural heritage information systems. In particular, a focus on CIDOC-CRM, BibFrame, RiC-CM/RiC-O, EDM, and IIIF, with the intent to demonstrate best practices and anti-patterns in modeling.

More Related Content

Tiers of Abstraction and Audience in Cultural Heritage Data Modeling

  1. 1. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Tiers of Abstraction and Audience In Cultural Heritage Data Modeling Rob Sanderson Semantic Architect rsanderson@getty.edu @azaroth42
  2. 2. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Conclusion encodes refines specialized by implemented by serves serves Consistent Correct/Complete Connected Collaborative Usable Model Ontology Vocabulary Profile API Human Machine Network Research enhances Abstraction AudienceImplementation
  3. 3. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Overview • Introduction – Digital Cultural Heritage • Tiers of Abstraction • Separation of concerns between model, ontology, vocabulary • Implementations and Usability • The importance of usability and the role of profiles • Tiers of Audience • Progressive enhancement to serve increasing audience needs
  4. 4. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Cultural Heritage? • Libraries • Value is the content, objects are merely carriers • Archives • Value is in the collection of information-carrying objects • Museums • Value is in the object • Conservation (Science) • Value is in research and conservation activities • Aggregators • All of the above!
  5. 5. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Digital Heritage Non-rivalrous Use/consumption by an individual does not reduce use by others Consistency is Good Experience is better if the product reuses known interaction patterns Fewer is Better Having few highly functional and usable digital products improves community and sustainability Rivalrous Use/consumption by one individual reduces simultaneous use by others Diversity is Good Experience is better if the resource is novel, innovative and emotive More is Better Having many diverse cultural heritage resources available improves impact to the user Cultural
  6. 6. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Non-rivalrous Consistency is Good Fewer is Better Rivalrous Diversity is Good More is Better Bridging the Digital / Cultural Divide? Openly licensed, published, digitized surrogates with data Community developed, open, usable digital standards Many publishers, using collaborative, sustainable products
  7. 7. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Goal Engineer a socio-technical ecosystem in which standards are agreed upon and used to publish diverse cultural heritage knowledge, that is easily usable by sustainable applications to further public engagement and research
  8. 8. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Abstraction Levels • Conceptual Model • Abstract way to think about the domain • Ontology • Shared format to encode that thinking • Vocabulary • Curated set of domain specific terms, to make the ontology concrete
  9. 9. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Conceptual Model • Shared understanding of the knowledge management framework • Technology agnostic: potential for increased participation • Methodology for conflict resolution • Risk of unending philosophy arguments • Challenge to write good documentation
  10. 10. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Examples: Conceptual Model • CIDOC-CRM (Museums, Conservation) • Robust, complex model • BIBFRAME (Libraries) • Almost undocumented model • Resource in Context (Archives) • Early, ongoing model • Europeana (Aggregation) • Model not separately documented from ontology
  11. 11. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 CIDOC Conceptual Reference Model what when who where / how
  12. 12. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 BibFrame Model
  13. 13. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Resource in Context Model
  14. 14. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Model vs Ontology encodes Model Ontology • Single, new namespace or reuse terms? • Easier argument that new terms are needed, as other terms reflect other conceptual models • Reuse of terms can take place downstream • Opaque term names vs human-readable? • Human-readable! The model gives the abstraction, the ontology can be encoded in different ways if needed. • Ontology can use technology features • Ontology encodes, not defines, the model • RDFS vs OWL ; json-schema vs xml-schema • Property Graphs, Named Graphs, simple Trees
  15. 15. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Case Study: CRM’s meta properties CIDOC-CRM has meta-properties • Property Graph – Native support for meta-properties • Named Graph – Name the triple, and assert the role • Reification – Map the property to a class with a role • Partitioning – Separate part of the production with a role
  16. 16. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Ontology vs Vocabulary encodes refines Model Ontology Vocabulary • Separation of Knowledge Management? • Model and Ontology should be lean and general to ensure breadth of use • Modelers are often not domain experts, and domain thesauri exist outside of any model • Every Identity, Its Ontology • Description of the vocabulary entity requires ontology, so the separation is complex • Concepts (AAT, ICONCLASS, LCSH) more appropriate than Things (ULAN, VIAF, TGN, Geonames, …) • Model needs to recognize Vocabulary • Need to have the right slots for external vocab terms • Otherwise ontology will take up the responsibility
  17. 17. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 CIDOC-CRM and Vocabularies P2 has type (is type of) This property allows sub typing of CIDOC CRM entities - a form of specialization – through the use of a terminological hierarchy, or thesaurus. The CIDOC CRM is intended to focus on the high-level entities and relationships needed to describe data structures. Consequently, it does not specialize entities any further than is required for this immediate purpose. * * This is very debatable ;) “ ” Every entity can have an external classification, keeping the model lean. Examples: Human Made Object Painting, Brush, Book, XRF Scanner, … Identifier DOI, ISBN, Local, Accession Number, … Dimension Height, Width, Duration, File Size, … Linguistic Object Description, Article, Abstract, Chapter, …
  18. 18. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 BibFrame, RiCO and Vocabularies • BibFrame relies on Ontology for classification… Examples: 41 Identifier subclasses: Ansi, AudioTake, Barcode, Coden, … 9 Digital Characteristics: EncodingFormat, ObjectCount, Resolution, … 6 Titles: Title, KeyTitle, VariantTitle, AbbreviatedTitle, ParallelTitle, … • RiCO expands a small set of model classes to a long list of ontology classes, that could easily be solved with vocabulary… Examples: 47 Relation subclasses: AccumulationRelation, … WorkRelation 14 Type subclasses: ActivityType … RoleType
  19. 19. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Completeness encodes refines Correct/Complete Model Ontology Vocabulary • Goal for abstraction layer is completeness • “No data left behind” • Able to express everything that anyone might want to document • Goal for instance data is correctness • “No data left behind” • Any errors impact trust in the data, reputation of the institution, and the reliability of research using the data • Perfect is the Enemy of the Good • Left unchecked, abstraction, mapping and data cleaning will consume all available time and effort Abstraction
  20. 20. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Separating Abstraction and Implementation? • Model, Ontology, Vocabulary • Available classes, properties and instances • API • How to interact with the data over the network • LOD does not separate Model / Ontology and API by requiring the syntax to directly reflect the ontology • LOD does not separate Vocabulary and API by requiring the terms to be instances (as above) • Solutions?
  21. 21. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 [Metadata] Application Profile encodes refines specialized by Correct/Complete Model Ontology Vocabulary Profile For any given domain, there must be selection of appropriate abstractions, documented as an application profile. The more abstract the base, the more specification is needed for the profile. The profile needs to act as the friction to balance Completeness with Usability.
  22. 22. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Case Study: Linked Art A Linked Open Usable Data profile for cultural heritage, collaboratively designed to work across organizations, that is easy to publish and use in consuming applications. Design Principles: • Focused on Usability, not 100% precision / completeness • Consistently solves actual challenges from real data • Development is iterative, as new use cases are found • Solve 90% of use cases, with 10% of the effort • https://linked.art/
  23. 23. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Linked Art Collaboration Formalization of the profile in ICOM, funded by Kress & AHRC • Getty • Rijksmuseum • Metropolitan Museum of Art • Smithsonian • MoMA • V&A • NGA • Philadelphia Art Museum • Indianapolis Art Museum • The Frick Collection • Princeton University • Yale University • Oxford University • Academica Sinica • ETH Zurich • FORTH • University of the Arts, London • Canadian Heritage Info. Network • American Numismatics Society • Europeana
  24. 24. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Linked Art Profile Model: CIDOC-CRM Ontology: CRM-base, local extension, + cherry pickings Vocabulary: Getty AAT, + cherry pickings Look for consistent patterns across institutions’ data, resulting in core patterns for ease of adoption
  25. 25. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Linked Art: Vocabulary Extensibility Allow vocabulary to be extensible, while the profile remains usable. Solution: Required Meta-Types for non-enumerable cases Reference: https://github.com/linked-art/linked.art/issues/186
  26. 26. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Application Programing Interface encodes refines specialized by implemented by Correct/Complete Model Ontology Vocabulary Profile API The API is how programmers interact with the data across system boundaries.
  27. 27. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Profile vs API A Profile is a selection of appropriate abstractions, to encode the scope of what can be described. An API is a selection of appropriate technologies, to give access to the data managed using the profile. Scope • Classes • Properties and Relationships • Structure of Graph • Vocabulary Terms Access • Document format(s) • Document structure and boundary • URI patterns • Operations: CRUD, Browse, Search
  28. 28. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Evaluation of APIs /ht Michael Barth, Ulm University  Abstraction level  Comprehensibility  Consistency  Discoverability / Documentation  Domain Correspondence  Few Barriers to Entry  Extensibility  Infrastructure
  29. 29. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Case Study: IIIF APIs 1. Scope design through shared use cases 2. Design for international use 3. As simple as possible, but no simpler 4. Make easy things easy, complex things possible 5. Avoid dependency on specific technologies 6. Use REST / Don’t break the web 7. Separate concerns, keep APIs loosely coupled 8. Design for JSON-LD, using LOD principles 9. Follow existing standards, best practices, when possible 10. Define success, not failure (for extensibility) https://iiif.io/api/annex/notes/design_patterns/
  30. 30. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Application Programing Interface encodes refines specialized by implemented by Correct/Complete Model Ontology Vocabulary Profile API APIs must be Usable by software developers for them to be adopted and used. Usable Usability is the correction function for the tendency towards Completeness.
  31. 31. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Usable vs Complete: Target Zone
  32. 32. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 From LOD to LOUD https://linked.art/
  33. 33. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Standards are for Consistency encodes refines specialized by implemented by Consistent Correct/Complete Usable Model Ontology Vocabulary Profile API Abstraction Implementation
  34. 34. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audiences encodes refines specialized by implemented by serves serves Consistent Correct/Complete Usable Model Ontology Vocabulary Profile API Human Machine Network Research Abstraction AudienceImplementation
  35. 35. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audiences: Progressive Enhancement • Data for: Humans - Strings • Separate entities, with attached textual descriptions • Data for: Machines - Structure • Entities with machine-processable, comparable values • Data for: The Network - d’Stributed • Entities are connected across systems and institutions • Data for: Research - Stringent • Sufficient accuracy and comprehensiveness to answer research questions from aggregated data Human Machine Network Research enhances
  36. 36. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audience: Humans • Strings: Entities with descriptions • Easy to do with existing data • Regardless of information system, can export data as strings • Easy on-ramp … need to start somewhere • Serves important audience: everyone • It’s our cultural heritage, after all! :) • Data not Document • Better than today, as encourages multiple interfaces and reuse • Can be enhanced by third party with more resources
  37. 37. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Data for Humans • Diagram - Name, Identifier, dimensions Description on a thing
  38. 38. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audience: Machines • Structure: Connected, comparable values for machines to process, rather than just display • Comparison of Values • Answering basic questions via dimensions, materials, age etc. • Sorting entities by values rather than only computed relevance • Indexing values • Searching based on values rather than full text • Facets require consistent, structured data • Visualization • Can only have visualization, rather than display, with structure
  39. 39. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Data for Machines • Same thing with dimensions as structured data
  40. 40. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audience: The Network • d’Stributed: Entities connected across systems • Part of the Web • Not just on the web • Shared Cataloging • No need for everyone to describe everything • Improved Discovery • By leveraging the connections for search and reconciliation • Towards Research Results • By taking into account information across institutions, datasets
  41. 41. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42
  42. 42. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audience: Research • Stringent: Answering research questions requires sufficient aggregation, precision and completeness • Requirement or Stretch Goal? • Audience is relatively small, but important • Cultural Sector: Entertainment or Educational? • Requires Collaboration and Continuity • To be cost effective, must be ongoing, sustainable resource used for multiple projects • Need for Contextualization of Knowledge? • The process of knowledge capture, and meta-meta-data
  43. 43. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Research: Contextualized Data • Data Provenance • Creation of the data: Who, When and Why • Uncertainty • Confidence of the data owners in their data • Localization of Knowledge • When and Where does the data apply • Considerations • Who is the audience, human (string) or machine (structured)? • Why do they want it? • Context recorded at the dataset level, or at the assertion level?
  44. 44. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Context: Data Provenance • Creation / Modification of the Data: Who, When, Why? • Considerations • Researcher (human) wants Confidence in the dataset • Developer will ignore it if possible • Internal use for structure (edits by X between t1 and t2) but not external research use • Dataset Level description in prose is fine for external use • Otherwise, need named graphs per triple (expensive, support?) or to reify everything (expensive, un-ignorable)
  45. 45. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Context: Uncertainty • Confidence in the validity of the data? • Considerations • Researcher (human) wants Confidence in the dataset • Developer will ignore it if possible • Some valid uses: • Possibly / Probably <some value> • Machine generated scores • Scoping: How do you want it to appear in search results? • Needs to be in the data if at all • Benefits from Data Provenance for meta-context
  46. 46. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Context: Localization • Socio-spatio-temporal validity of the data? • Considerations • Researcher wants to know where and when the data is valid, to know when it should be used: factual not social • Developer will still just ignore it if possible • Many valid uses: • How and when were dimensions measured? • Jurisdiction of the ownership right? • Over what period was the person a painter? • Needs to be in the data, and thus in the model
  47. 47. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Case Study: Proxies
  48. 48. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Case Study: Attribute Assignment
  49. 49. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Conclusion encodes refines specialized by implemented by serves serves Consistent Correct/Complete Connected Collaborative Usable Model Ontology Vocabulary Profile API Human Machine Network Research enhances Abstraction AudienceImplementation
  50. 50. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Thank You! Rob Sanderson rsanderson@getty.edu @azaroth42 https://www.slideshare.net/azaroth42/ tiers-of-abstraction-and-audience-in-cultural-heritage-data-modeling

Editor's Notes

  • Libraries – We don’t care about the paper, only the balance sheet
    Archives – Look at this complete set of every denomination (that we can’t describe because there’s too much of it) Museums – Look at this mint condition 1975 Roosevelt Dime!
    Conservation – The metal in a penny is worth more than a penny
    Aggregators – Please give us everything in only unmarked dollar bills
  • In order to do that need to have appropriate abstractions and implementations that meet the technical and social needs of diverse stakeholders and user communities.
  • Encode = machine-actionable
  • Consistency of thinking about and managing diversity
  • Many trees worth of documentation, most of which is utterly opaque.
  • That’s it. There’s Work, Instance, Item, Agent, Event and Subject.
  • RiC is in early stages, but clearly building upon a conceptual model that is then instantiated in an ontology. Hard to know if inspired by CRM, but certainly has some of its hallmarks.
  • An ontology encodes a model, regardless of whether that model is separately documented. The advantage of the separation is the ability to have multiple ontologies encoding the same model, thereby having some degree of semantic interoperability, even if not necessarily technical interoperability. Few models as possible! Alignment between models helps to get to alignment between ontologies.
  • Neo4J and similar systems have property graphs. Named Graphs standardized but not well implemented, and tend to add complexity. Only get one Named Graph, so use it wisely.
    Reification is generally unloved, but used in CRM’s ontology – PC14 is the class (the number of the property) and then pc14.1 for the role of relationship on that class.
    Partitioning avoids the issue at the expense of some semantic precision, which is what we try to do in Linked Art profile.
  • Vocabulary allows subdomains to be specific about their content, within a more generalized model. This is important, as we want as few models as possible.
    Vocabularies might conceptually relate to the model generally, but given the description of the vocabulary term, needs to be thought of in terms of the ontology encoding of it as well.
  • Ontologists gonna Ontologize, but you shouldn’t have to care. A profile selects the features and instances in order to meet the needs of the application domain.
  • Subset of the model, ontology and vocabularies as appropriate.
    Name, Identifier, Contact Point, “Statement”, …
  • Michael Barth has six fundamental features for API evaluation, which relate directly to the value of the API as a standard for use. This seems like a good starting point for standards for digital interoperability.

    Abstraction Level -- is the abstraction of the data and functionality appropriate to the audience and use cases. An end user of the "car" API presses a button or turns a key. A "car" developer needs access to engine directly.
    Comprehensibility -- is the audience able to understand how to use it to accomplish their goals
    Consistency -- if you know the "rules" of the API, how well does it stick to them? Or how many exceptions are there to a core set of design principles
    Documentation -- How easy is it to find out the functionality of the API?
    Domain Correspondence -- If you understand the core domain of the data and API, how closely does the understanding of the domain align with an understanding of the data?
    And what barriers to getting started are there?

    There are two more that I think are important.
  • Note – these are all about access, not about modeling.
  • Less important for IIIF to be complete as it’s about presentation not semantics. But for semantic description…
  • Stressful but Strategic Stretch goal.
  • Still within the framework of the profile and API – needs to be possible in the model, simultaneously with more structured data.
  • Still within the framework of the profile and API – needs to be possible in the model, simultaneously with more structured data.
  • Corpus Art History
  • Painful to implement. ORE  Europeana, RICO.
  • Knowledge Provenance, not data provenance. Adds evidence that the dimension was valid in 1986. Could add confidence or technique used. Doesn’t rely on reification, but doesn’t work everywhere.

×