Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tiers of Abstraction and Audience in Cultural Heritage Data Modeling

A walk through of a framework based around the distinctions between Abstraction, Implementation and Audience for considering the value and utility of data modeling patterns and paradigms in cultural heritage information systems. In particular, a focus on CIDOC-CRM, BibFrame, RiC-CM/RiC-O, EDM, and IIIF, with the intent to demonstrate best practices and anti-patterns in modeling.

  • Be the first to comment

Tiers of Abstraction and Audience in Cultural Heritage Data Modeling

  1. 1. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Tiers of Abstraction and Audience In Cultural Heritage Data Modeling Rob Sanderson Semantic Architect @azaroth42
  2. 2. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Conclusion encodes refines specialized by implemented by serves serves Consistent Correct/Complete Connected Collaborative Usable Model Ontology Vocabulary Profile API Human Machine Network Research enhances Abstraction AudienceImplementation
  3. 3. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Overview • Introduction – Digital Cultural Heritage • Tiers of Abstraction • Separation of concerns between model, ontology, vocabulary • Implementations and Usability • The importance of usability and the role of profiles • Tiers of Audience • Progressive enhancement to serve increasing audience needs
  4. 4. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Cultural Heritage? • Libraries • Value is the content, objects are merely carriers • Archives • Value is in the collection of information-carrying objects • Museums • Value is in the object • Conservation (Science) • Value is in research and conservation activities • Aggregators • All of the above!
  5. 5. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Digital Heritage Non-rivalrous Use/consumption by an individual does not reduce use by others Consistency is Good Experience is better if the product reuses known interaction patterns Fewer is Better Having few highly functional and usable digital products improves community and sustainability Rivalrous Use/consumption by one individual reduces simultaneous use by others Diversity is Good Experience is better if the resource is novel, innovative and emotive More is Better Having many diverse cultural heritage resources available improves impact to the user Cultural
  6. 6. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Non-rivalrous Consistency is Good Fewer is Better Rivalrous Diversity is Good More is Better Bridging the Digital / Cultural Divide? Openly licensed, published, digitized surrogates with data Community developed, open, usable digital standards Many publishers, using collaborative, sustainable products
  7. 7. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Goal Engineer a socio-technical ecosystem in which standards are agreed upon and used to publish diverse cultural heritage knowledge, that is easily usable by sustainable applications to further public engagement and research
  8. 8. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Abstraction Levels • Conceptual Model • Abstract way to think about the domain • Ontology • Shared format to encode that thinking • Vocabulary • Curated set of domain specific terms, to make the ontology concrete
  9. 9. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Conceptual Model • Shared understanding of the knowledge management framework • Technology agnostic: potential for increased participation • Methodology for conflict resolution • Risk of unending philosophy arguments • Challenge to write good documentation
  10. 10. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Examples: Conceptual Model • CIDOC-CRM (Museums, Conservation) • Robust, complex model • BIBFRAME (Libraries) • Almost undocumented model • Resource in Context (Archives) • Early, ongoing model • Europeana (Aggregation) • Model not separately documented from ontology
  11. 11. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 CIDOC Conceptual Reference Model what when who where / how
  12. 12. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 BibFrame Model
  13. 13. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Resource in Context Model
  14. 14. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Model vs Ontology encodes Model Ontology • Single, new namespace or reuse terms? • Easier argument that new terms are needed, as other terms reflect other conceptual models • Reuse of terms can take place downstream • Opaque term names vs human-readable? • Human-readable! The model gives the abstraction, the ontology can be encoded in different ways if needed. • Ontology can use technology features • Ontology encodes, not defines, the model • RDFS vs OWL ; json-schema vs xml-schema • Property Graphs, Named Graphs, simple Trees
  15. 15. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Case Study: CRM’s meta properties CIDOC-CRM has meta-properties • Property Graph – Native support for meta-properties • Named Graph – Name the triple, and assert the role • Reification – Map the property to a class with a role • Partitioning – Separate part of the production with a role
  16. 16. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Ontology vs Vocabulary encodes refines Model Ontology Vocabulary • Separation of Knowledge Management? • Model and Ontology should be lean and general to ensure breadth of use • Modelers are often not domain experts, and domain thesauri exist outside of any model • Every Identity, Its Ontology • Description of the vocabulary entity requires ontology, so the separation is complex • Concepts (AAT, ICONCLASS, LCSH) more appropriate than Things (ULAN, VIAF, TGN, Geonames, …) • Model needs to recognize Vocabulary • Need to have the right slots for external vocab terms • Otherwise ontology will take up the responsibility
  17. 17. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 CIDOC-CRM and Vocabularies P2 has type (is type of) This property allows sub typing of CIDOC CRM entities - a form of specialization – through the use of a terminological hierarchy, or thesaurus. The CIDOC CRM is intended to focus on the high-level entities and relationships needed to describe data structures. Consequently, it does not specialize entities any further than is required for this immediate purpose. * * This is very debatable ;) “ ” Every entity can have an external classification, keeping the model lean. Examples: Human Made Object Painting, Brush, Book, XRF Scanner, … Identifier DOI, ISBN, Local, Accession Number, … Dimension Height, Width, Duration, File Size, … Linguistic Object Description, Article, Abstract, Chapter, …
  18. 18. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 BibFrame, RiCO and Vocabularies • BibFrame relies on Ontology for classification… Examples: 41 Identifier subclasses: Ansi, AudioTake, Barcode, Coden, … 9 Digital Characteristics: EncodingFormat, ObjectCount, Resolution, … 6 Titles: Title, KeyTitle, VariantTitle, AbbreviatedTitle, ParallelTitle, … • RiCO expands a small set of model classes to a long list of ontology classes, that could easily be solved with vocabulary… Examples: 47 Relation subclasses: AccumulationRelation, … WorkRelation 14 Type subclasses: ActivityType … RoleType
  19. 19. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Completeness encodes refines Correct/Complete Model Ontology Vocabulary • Goal for abstraction layer is completeness • “No data left behind” • Able to express everything that anyone might want to document • Goal for instance data is correctness • “No data left behind” • Any errors impact trust in the data, reputation of the institution, and the reliability of research using the data • Perfect is the Enemy of the Good • Left unchecked, abstraction, mapping and data cleaning will consume all available time and effort Abstraction
  20. 20. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Separating Abstraction and Implementation? • Model, Ontology, Vocabulary • Available classes, properties and instances • API • How to interact with the data over the network • LOD does not separate Model / Ontology and API by requiring the syntax to directly reflect the ontology • LOD does not separate Vocabulary and API by requiring the terms to be instances (as above) • Solutions?
  21. 21. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 [Metadata] Application Profile encodes refines specialized by Correct/Complete Model Ontology Vocabulary Profile For any given domain, there must be selection of appropriate abstractions, documented as an application profile. The more abstract the base, the more specification is needed for the profile. The profile needs to act as the friction to balance Completeness with Usability.
  22. 22. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Case Study: Linked Art A Linked Open Usable Data profile for cultural heritage, collaboratively designed to work across organizations, that is easy to publish and use in consuming applications. Design Principles: • Focused on Usability, not 100% precision / completeness • Consistently solves actual challenges from real data • Development is iterative, as new use cases are found • Solve 90% of use cases, with 10% of the effort •
  23. 23. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Linked Art Collaboration Formalization of the profile in ICOM, funded by Kress & AHRC • Getty • Rijksmuseum • Metropolitan Museum of Art • Smithsonian • MoMA • V&A • NGA • Philadelphia Art Museum • Indianapolis Art Museum • The Frick Collection • Princeton University • Yale University • Oxford University • Academica Sinica • ETH Zurich • FORTH • University of the Arts, London • Canadian Heritage Info. Network • American Numismatics Society • Europeana
  24. 24. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Linked Art Profile Model: CIDOC-CRM Ontology: CRM-base, local extension, + cherry pickings Vocabulary: Getty AAT, + cherry pickings Look for consistent patterns across institutions’ data, resulting in core patterns for ease of adoption
  25. 25. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Linked Art: Vocabulary Extensibility Allow vocabulary to be extensible, while the profile remains usable. Solution: Required Meta-Types for non-enumerable cases Reference:
  26. 26. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Application Programing Interface encodes refines specialized by implemented by Correct/Complete Model Ontology Vocabulary Profile API The API is how programmers interact with the data across system boundaries.
  27. 27. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Profile vs API A Profile is a selection of appropriate abstractions, to encode the scope of what can be described. An API is a selection of appropriate technologies, to give access to the data managed using the profile. Scope • Classes • Properties and Relationships • Structure of Graph • Vocabulary Terms Access • Document format(s) • Document structure and boundary • URI patterns • Operations: CRUD, Browse, Search
  28. 28. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Evaluation of APIs /ht Michael Barth, Ulm University  Abstraction level  Comprehensibility  Consistency  Discoverability / Documentation  Domain Correspondence  Few Barriers to Entry  Extensibility  Infrastructure
  29. 29. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Case Study: IIIF APIs 1. Scope design through shared use cases 2. Design for international use 3. As simple as possible, but no simpler 4. Make easy things easy, complex things possible 5. Avoid dependency on specific technologies 6. Use REST / Don’t break the web 7. Separate concerns, keep APIs loosely coupled 8. Design for JSON-LD, using LOD principles 9. Follow existing standards, best practices, when possible 10. Define success, not failure (for extensibility)
  30. 30. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Application Programing Interface encodes refines specialized by implemented by Correct/Complete Model Ontology Vocabulary Profile API APIs must be Usable by software developers for them to be adopted and used. Usable Usability is the correction function for the tendency towards Completeness.
  31. 31. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Usable vs Complete: Target Zone
  32. 32. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 From LOD to LOUD
  33. 33. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Standards are for Consistency encodes refines specialized by implemented by Consistent Correct/Complete Usable Model Ontology Vocabulary Profile API Abstraction Implementation
  34. 34. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audiences encodes refines specialized by implemented by serves serves Consistent Correct/Complete Usable Model Ontology Vocabulary Profile API Human Machine Network Research Abstraction AudienceImplementation
  35. 35. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audiences: Progressive Enhancement • Data for: Humans - Strings • Separate entities, with attached textual descriptions • Data for: Machines - Structure • Entities with machine-processable, comparable values • Data for: The Network - d’Stributed • Entities are connected across systems and institutions • Data for: Research - Stringent • Sufficient accuracy and comprehensiveness to answer research questions from aggregated data Human Machine Network Research enhances
  36. 36. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audience: Humans • Strings: Entities with descriptions • Easy to do with existing data • Regardless of information system, can export data as strings • Easy on-ramp … need to start somewhere • Serves important audience: everyone • It’s our cultural heritage, after all! :) • Data not Document • Better than today, as encourages multiple interfaces and reuse • Can be enhanced by third party with more resources
  37. 37. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Data for Humans • Diagram - Name, Identifier, dimensions Description on a thing
  38. 38. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audience: Machines • Structure: Connected, comparable values for machines to process, rather than just display • Comparison of Values • Answering basic questions via dimensions, materials, age etc. • Sorting entities by values rather than only computed relevance • Indexing values • Searching based on values rather than full text • Facets require consistent, structured data • Visualization • Can only have visualization, rather than display, with structure
  39. 39. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Data for Machines • Same thing with dimensions as structured data
  40. 40. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audience: The Network • d’Stributed: Entities connected across systems • Part of the Web • Not just on the web • Shared Cataloging • No need for everyone to describe everything • Improved Discovery • By leveraging the connections for search and reconciliation • Towards Research Results • By taking into account information across institutions, datasets
  41. 41. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42
  42. 42. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Audience: Research • Stringent: Answering research questions requires sufficient aggregation, precision and completeness • Requirement or Stretch Goal? • Audience is relatively small, but important • Cultural Sector: Entertainment or Educational? • Requires Collaboration and Continuity • To be cost effective, must be ongoing, sustainable resource used for multiple projects • Need for Contextualization of Knowledge? • The process of knowledge capture, and meta-meta-data
  43. 43. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Research: Contextualized Data • Data Provenance • Creation of the data: Who, When and Why • Uncertainty • Confidence of the data owners in their data • Localization of Knowledge • When and Where does the data apply • Considerations • Who is the audience, human (string) or machine (structured)? • Why do they want it? • Context recorded at the dataset level, or at the assertion level?
  44. 44. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Context: Data Provenance • Creation / Modification of the Data: Who, When, Why? • Considerations • Researcher (human) wants Confidence in the dataset • Developer will ignore it if possible • Internal use for structure (edits by X between t1 and t2) but not external research use • Dataset Level description in prose is fine for external use • Otherwise, need named graphs per triple (expensive, support?) or to reify everything (expensive, un-ignorable)
  45. 45. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Context: Uncertainty • Confidence in the validity of the data? • Considerations • Researcher (human) wants Confidence in the dataset • Developer will ignore it if possible • Some valid uses: • Possibly / Probably <some value> • Machine generated scores • Scoping: How do you want it to appear in search results? • Needs to be in the data if at all • Benefits from Data Provenance for meta-context
  46. 46. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Context: Localization • Socio-spatio-temporal validity of the data? • Considerations • Researcher wants to know where and when the data is valid, to know when it should be used: factual not social • Developer will still just ignore it if possible • Many valid uses: • How and when were dimensions measured? • Jurisdiction of the ownership right? • Over what period was the person a painter? • Needs to be in the data, and thus in the model
  47. 47. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Case Study: Proxies
  48. 48. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Case Study: Attribute Assignment
  49. 49. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Conclusion encodes refines specialized by implemented by serves serves Consistent Correct/Complete Connected Collaborative Usable Model Ontology Vocabulary Profile API Human Machine Network Research enhances Abstraction AudienceImplementation
  50. 50. @azaroth42 rsanderson IIIF:Interoperabilituy Abstractions &Audiences @azaroth42 Thank You! Rob Sanderson @azaroth42 tiers-of-abstraction-and-audience-in-cultural-heritage-data-modeling