Successfully reported this slideshow.
Your SlideShare is downloading. ×

Community Challenges for Practical Linked Open Data - Linked Pasts keynote

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Every Identity, its Ontology
Every Identity, its Ontology
Loading in …3
×

Check these out next

1 of 62 Ad

Community Challenges for Practical Linked Open Data - Linked Pasts keynote

Download to read offline

A call to action to discuss and agree on practical considerations around the creation, publication and discovery of linked open data about historical activities and objects.

Text of approximately what I said: http://bit.ly/usable_lod

A call to action to discuss and agree on practical considerations around the creation, publication and discovery of linked open data about historical activities and objects.

Text of approximately what I said: http://bit.ly/usable_lod

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Community Challenges for Practical Linked Open Data - Linked Pasts keynote (14)

More from Robert Sanderson (13)

Advertisement

Recently uploaded (20)

Community Challenges for Practical Linked Open Data - Linked Pasts keynote

  1. 1. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Community Challenges For Practical Linked Open Data Rob Sanderson Semantic Architect J. Paul Getty Trust rsanderson@getty.edu / @azaroth42
  2. 2. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Meta Header
  3. 3. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Meta Header
  4. 4. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Meta Header
  5. 5. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Call To Action! https://www.flickr.com/photos/archivesfoundation/9517852418/
  6. 6. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Call To Action! Come Together as a Community To Agree on How Best to Create & Publish Historical LOD
  7. 7. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Call To Action! Come Together as a Community To Agree on How Best to Create & Publish Historical LOD (And then Do It!)
  8. 8. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Agenda Come Together as a Community To Agree on How Best to Create & Publish Historical LOD
  9. 9. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Community not Committee Key features of successful communities: • Focused: Solve real problems from within • Open: Requirement is participation not reputation • Active: Constant attention to product & process • Flexible: Adapt to changing situation
  10. 10. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Community Engagement Pyramid Leaders Experts Contributors Members Watchers /ht Katherine Skinner, @educopia
  11. 11. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Community Leadership 1. Know Your Audience 2. Meet on Their Terms 3. Have a Conversation 4. Create Opportunities for Meaningful Participation /ht Catherine Bracy, @cbracy
  12. 12. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Know Your Audience Who is the Audience for Linked Open Data?
  13. 13. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu http://knowyourmeme.com/photos/ 424743-x-x-everywhere
  14. 14. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu LOD Community Pyramid Architects Providers Developers Users (esp. Researchers) Watchers
  15. 15. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Meet on Their Terms Listening can reveal how your community speaks and can help you speak easier with them and to them. You can use their language and meet them on their terms “ ” -- Kevan Lee, Director of Marketing at Buffer https://blog.bufferapp.com/social-media- marketing-voice-and-tone
  16. 16. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Have a Conversation What do you need to be successful? Is our data understandable? Can you do what you want with it? What could we improve? Are your users satisfied?
  17. 17. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Create Participation Opportunities https://www.flickr.com/photos/ helvetas_vietnam/6793512507/
  18. 18. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Create Participation Opportunities https://www.flickr.com/photos/ helvetas_vietnam/6793512507/ Shouldn’t that E89 Propositional Object be E33 Linguistic Object instead?
  19. 19. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Create Participation Opportunities https://www.flickr.com/photos/ helvetas_vietnam/6793512507/ Can’t you just give me some JSON?!
  20. 20. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu
  21. 21. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu
  22. 22. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu
  23. 23. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Patrick Hochstenbach, @hochstenbach
  24. 24. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Linked Open Data • Complete • Usable • Accurate
  25. 25. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Linked Open Data • Complete • Usable • Accurate Pick One.
  26. 26. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Linked Open Data • Complete • Usable • Accurate Pick One. And Pick Usable.
  27. 27. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Usable? Complete? Accurate?
  28. 28. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Optimizing Complete and Usable?
  29. 29. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Optimizing Complete and Usable?
  30. 30. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Optimizing Complete and Usable?
  31. 31. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Usable vs Complete
  32. 32. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Usable vs Complete
  33. 33. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Target Zone
  34. 34. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Forest for the Trees? @azaroth42 & @bekisanderson
  35. 35. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Evaluation? https://www.nngroup.com/articles/which-ux-research-methods/
  36. 36. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu API Evaluation  Abstraction level  Comprehensibility  Consistency  Discoverability / Documentation  Domain Correspondence  Few Barriers to Entry /ht Michael Barth, Ulm University
  37. 37. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu “Just Use Federated SPARQL Queries!” ❌ Abstraction level: Poor ❌ Comprehensibility: Terrible ❌ Consistency: Mediocre ❌ Discoverability / Documentation: Poor ❌ Domain Correspondence: Very poor ❌ Few Barriers to Entry: Abysmal
  38. 38. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu “Just Use Federated SPARQL Queries!” Now you have more problems than you can count
  39. 39. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Venn: JSON vs SPARQL Developers
  40. 40. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Venn: JSON vs SPARQL Developers
  41. 41. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Linked Pasts? • Ontology • Identity • Activity Type/Intent • Actor • Time of Activity • Place of Activity • Acted on/with Object(s) • Outcome of Activity Scope: Description of Historical Activities
  42. 42. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Serialization: Use JSON-LD { "@context": "https://lod.museum/ns/context/1/full.jsonld", "id": "https://lod.museum/example/object/1", "type": "ManMadeObject", "classified_as": "aat:300033618", "label": "Example Painting", "made_of": { "id": "aat:300015045", "type": "Material", "label": "watercolor" } }
  43. 43. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Or … {} are the New <> { "@context": "https://lod.museum/ns/context/1/full.jsonld", "id": "https://lod.museum/example/object/1", "type": "ManMadeObject", "classified_as": "aat:300033618", "label": "Example Painting", "made_of": { "id": "aat:300015045", "type": "Material", "label": "watercolor" } }
  44. 44. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 5 Hardest Challenges in Practical LOD 5 - Order
  45. 45. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 5 - Order https://www.ajactraining.org/women-diversity/timeline/
  46. 46. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 5 Hardest Challenges in Practical LOD 5 - Order 4 - Boundary of Representation
  47. 47. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 4 - Boundary of Representation { "@context": "https://lod.museum/ns/context/1/full.jsonld", "id": "https://lod.museum/example/object/1", "type": "ManMadeObject", "classified_as": "aat:300033618", # by reference "label": "Example Painting", "made_of": { "id": "aat:300015045", # by (minimal) value "type": "Material", "label": "watercolor" } }
  48. 48. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 5 Hardest Challenges in Practical LOD 5 - Order 4 - Boundary of Representation 3 - Meta-Meta-*-Data
  49. 49. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 3 - Meta-Meta-Meta-Meta-Meta-…-Data http://allsmallthings.blogspot.com/ 2012/05/inception-info-graphic.html
  50. 50. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 5 Hardest Challenges in Practical LOD 5 - Order 4 - Boundary of Representation 3 - Meta-Meta-*-Data 2 - Naming Things
  51. 51. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 2 - Naming Things http://www.getty.edu/art/collection/objects/249050/
  52. 52. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 5 Hardest Challenges in Practical LOD 5 - Order 4 - Boundary of Representation 3 - Meta-Meta-*-Data 2 - Naming Things 1 - Cache Invalidation
  53. 53. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 1 - Cache Invalidation
  54. 54. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu 5 Hardest Challenges in Practical LOD 5 - Order 4 - Boundary of Representation 3 - Meta-Meta-*-Data 2 - Naming Things 1 - Cache Invalidation 0 - Off-by-One Errors
  55. 55. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Practical Linked Open Data? https://www.flickr.com/photos/dusty7s/4271619606
  56. 56. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Practical Linked Open Data? We Want U
  57. 57. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu https://www.flickr.com/photos/harris77/3357537737
  58. 58. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu With Community … CLOUD!
  59. 59. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Community Linked Open Usable Data! The Community includes Everyone Linking to others’ data reduces Completeness burden Enabling feedback from users reduces Accuracy burden Working with developers validates Usability Remember FOAF: Focused, Open, Active, Flexible
  60. 60. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Challenge Suggestions Publish JSON-LD using Lists for local order using Frames for graph boundaries validated by application use as an API with understandable keys and aliased URIs validated by developer understanding And Publish notifications when you change things
  61. 61. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Thank You! Rob Sanderson rsanderson@getty.edu / @azaroth42
  62. 62. @azaroth42 rsanderson @getty.edu IIIF:Interoperabilituy CommunityChallenges forLinkedOpenData @azaroth42 rsanderson @getty.edu Discuss!

Editor's Notes

  • Good morning! Thank you Leif/Elton for the introduction, and for the invitation to come and give the keynote presentation. I'm honestly excited to be here, as my relatively recent move to the Getty [smile at Nicole] has given me an amazing opportunity to become more deeply involved in both Linked Open Data, and how to use it pragmatically to describe objects and the events that carry them through history.
  • I’m going to start with a brief meta-level "header" about Keynotes, if you'll indulge me. There are three types of Keynotes, in my experience. The first two types tend to bore me to tears, so I hope not to repeat their mistakes!
  • This is not a “CV” sort of keynote, where I just talk about things that I’ve done. If you want to read my CV, well ... you can read my CV ... or better, come and talk to me over the next couple of days!
  • And it’s not a dry, background work, scene setting, domain survey. You already know all of that, and while you might have heard about a couple of things that you didn’t know about, you might also have been asleep by that point.
  • Which leaves the third type: the call to action. I’m going to try and lay out where I think we, as a community should be trying to go and some of the challenges we need to decide how to resolve along the way. How can we make a difference beyond these two days, when we get back to our regular work. And I’m not going to be coy and keep you in suspense as to what I want us to do ...
  • It’s pretty simple. We should come together as a Community to agree on how Best to create and publish linked open data about historical events and their participants.
  • And then, of course, to go home and actually do something about it :)
  • That's also pretty much the outline of the presentation. I’m going to talk about Community, which leads into what I think constitutes good linked data, and then about creating and publishing it.
  • In my experience of the technical information standards world, starting with Z39.50 and SRU about 20 years ago, the most successful specifications are those that come from Communities, not as mandates from self-appointed Committees.
    Direction and vision are required... the focus of the community needs to be articulated clearly and convincingly. Communities need awareness and understanding of the problems that they are facing, and the motivation to work together to overcome them..
    There needs to be leadership, but not at the expense of being open and welcoming. Not at the expense of engaging and understanding what is actually needed. Instead the focus gives a methodology for community based decision making, ensuring that the problems being solved are real for the members of the community.
    Participation is the key requirement in solving practical challenges, not reputation. And Active participation, not just lurking and occasionally making a snide comment about how nothing is getting done. A community that isn't doing anything, isn't a community. It needs to not only think about the end result, but actively consider and adapt how it is getting there.
    So Flexible, not Agile? I prefer the notion of the community being flexible but strong – when the willow tree sways with the wind it’s flexible, but is true to its purpose. A cat agilely avoiding a dangerous situation doesn’t address the danger, it just avoids it. Agility lets you dance around the problem, Flexibility lets you overcome it.

    Focused, Open, Active, Flexible ... sounds great, but how do we get there?
  • Katherine Skinner of Educopia introduced me to this notion of the community engagement pyramid. There are a few people at the top of the pyramid, and increasingly many as you move down the tiers. In the IIIF community, for example, there are probably 5 active leaders, but a good 10-20 experts and advocates, maybe 50-100 contributors, 500 or so members that aren't constantly engaged but actively following, and then an impossible to determine number of people on the edges looking in. Or up.

    The point is not to have a hierarchical and fixed structure, but to recognize that people look upwards and it is the sign of a healthy community when everyone on the tier above is reaching down to help those who want to take on a bigger role to do so.

    In order to be successful in pulling off the amazing transformation of cultural heritage into linked open data, we need to have a solid understanding of how to work together strategically, while advancing our own organizations' immediate goals.
  • Catherine Bracy, Director of Community for Code for America, gave an outstanding keynote at the Museum Computer Network in November, and I'm shamelessly echoing her points on how community leadership can most effectively work because they're bang on.

    There are four easy steps:
    First, know your audience. Who is the community, and who, as a community leader, do you need to be working directly with.
    Secondly, reach out to those people on their terms, not on yours. You need them to participate, and for that, they need to understand and agree with the goals and direction.
    Thirdly, have a continuing conversation with the community about everything!
    And make opportunities for people to actively and meaningfully participate. Through participation you're building the ladders to bring them to the next level of the pyramid.

    Let's get more concrete...
  • And perhaps a little controversial ... Who IS the audience for Linked Open Data?

    (beat)
  • Developers. Developers, Developers everywhere. And I mean that literally, not just your developers, but developers everywhere. Developers who cannot come to your office and ask why an E12 Production is an Activity, but an E6 Destruction is only an Event.

    Oh and if you can explain that to _me_, that'd be great too.
  • Lets look at the LOD community pyramid, rather than the generic one. There are a few architects, ontologists and leaders who discuss, design, create and advocate for ontologies. Then there are experts and content providers who understand and use those ontologies to made their data available ... to developers to build applications for the users, and one of the most focused on class of user is researchers.

    Individuals can fit into multiple tiers, and you should think of them for the purposes of this pyramid as if they were the highest of those tiers. A researcher than can also write code or SPARQL queries is a developer. Thinking that “lots of researchers use my data” because they can write sparql queries is a category error – developers that are also researchers use your data.
  • Meet the developers on their terms. Don’t go to developers and talk about ontologies and triples and namespaces and reification and inference and sparql and quadstores and turtle and ... You lost them way back at ontologies.

    Instead talk about JSON, and APIs, and HTTP, and content, and applications. Because remember, developers are your gateway to the users of their applications. So as Lee says, listen to them and engage in a way that makes them feel needed and wanted, because they are!
  • And don't just talk AT them, discuss WITH them. Have a conversation about what they need, what you can do to help them, and of course what they know their users need. Some middle aged white guy [cough] talking at an audience is a presentation, not a community.
  • Finally, create opportunities for the community to participate meaningfully. Not just listening but actively engaging. Can the developers help fix your bugs? Can their users annotate your connect with corrections and suggestions of related content?

    Listening to and acting on feedback is important, but think about ways for others to _get involved_. Can you provide APIs to let developers and users actually do something directly?

    I love the expression of the woman in the middle. She's REALLY dubious about something! Maybe ...
  • Ha! Said no Developer, Ever. Meeting on their terms, remember.
  • That seems much more likely! Or at least, that's what I hear most often
  • So let's look at how this workflow plays out. It starts with the data being created and knowledge is being represented – which is expensive! Money goes in, and something called "triples" comes out the other end of what appears to most management like a black box.
  • Those triples go to the developer, who has to actually DO something with them to build a web application ...
  • Which is then used by researchers (and others) to form hypotheses, do research and write papers. But each of them have different expectations ...
  • For the creation and transformation of the dataset, there should be “No data left behind” – the ontology and model should be thorough and complete, a good knowledge representation.

    The developer however wants the data to be usable and understandable, otherwise he can’t do his job of making it available.

    And the researcher, not knowing any of the process behind the HTML application, needs the data to be accurate, otherwise when using it and comparing it with other data, his research will be equally inaccurate.
  • What I've come to realize is that Linked Data can be Complete, Usable, Accurate ...
  • ... Pick one.
  • ... And pick Usable.
  • It would be great to pick all three, but like “Good, Cheap, Fast, Pick two” it’s a matter of limited resources and priorities. For today, let’s take Accuracy of the data off the table. Ensuring that all statements correctly represent reality is a direct function of resources (spending time to find and fix errors), not the model or its representation. And we’re not going to solve the challenges of humanities funding today.
  • By trying to make the data both Complete AND Usable, we're trying to optimize for two independent variables at the same time, with different purposes and most importantly different audiences. Are we prioritizing the needs of the ontologist and data manager, or the developer that has to work with the result? In my experience, we tend to meet our own requirements first because we care about the knowledge representation and hope that the developer can make do with what they get, without much thought to the API.
  • Optimizing for completeness fulfills the knowledge representation use case. But Linked Data also has a protocol aspect – it’s a fundamental part of LOD that the representation is available at its URI via HTTP.
  • This means that we are implicitly also designing, and hopefully co-optimizing for, a Usable API. And this is where the Completeness and Usability axes become more complicated. The question is not only can I understand and use the model or data in that model, it’s how easy is it to use that data in the way it is made available – its serialization and the network transfers.

    I think the chart of ontologies looks something like this ...
  • At the beginning, any completeness adds to usability. Then there's a dramatic rise in usability, without adding so much completeness when you hit the sweet spot of "just enough data". Then as you add more to the ontology, it starts to drop off slowly at first ... not so much as to be a significant problem ... but then you reach the tipping point where it becomes incomprehensible, and as completeness tends to 100%, usability tends to 0%

    Okay, I know you want examples, so some more potential controversy ...
  • Bibframe 1.0 was terrible. It was complex without actually addressing the issues, worse than Simple Dublin Core, which at least has enough to get /something/ done with. Then you have frameworks further up the usability scale like Web Annotations and EDM, and a little more complete, but by no means everything you want to say.

    The IIIF Presentation API is, demonstrably, about as usable as a linked data API can get ... but we're constantly fighting to stay at high usability by resisting requests to add features.

    Then comes the slippery slope that schema.org is further down ... still usable for now, but they're constantly adding to it ... and not in a sustainable or directed fashion ... until you hit rock bottom with CIDOC CRM and the meta-meta-meta statements available via CRMInf.
  • The zone most ontologies should aim to be in, in my opinion, is the top right hand corner ... maximizing usability and completeness, maximizing the area between 0,0 and the x,y point reached. The community needs to take into account where on that slope will result in greatest adoption.
  • For our wedding anniversary, my wife and I drove up to see the giant redwoods in Northern California. There’s a living tree you can drive your car through even. However, there’s still the forest surrounding the trees that people want to look at, and without the forest, those trees would fall in the next storm. You can cut down a few trees to make sure that the rest survive and people can see them, but there’s no need to build paths to every tree. Like the forest, you can and should keep data around towards completeness and stability of the whole, without exposing it to developers. In the Getty vocabularies, we have a changelog of every edit for every resource … we publish that, and it just gets in the way of understanding for no value. We don’t have to throw it away completely to increase usability, we can just leave it out of the API.

    So how do we understand what we should publish?
  • The other stages in the workflow have reasonably well understood evaluation processes -- the formal validity of the ontology, the extent to which it can encode all of the required information, unit and integration tests for code, user acceptance tests for the application, user personae to guide development ... but how should we evaluate the quality of the API we're providing to our data?
  • Michael Barth lays out six fundamental features for API evaluation.

    Abstraction Level -- is the abstraction of the data and functionality appropriate to the audience and use cases. An end user of the "car" API presses a button or turns a key. A "car" developer needs access to engine controls, all the modern safety features and so forth.
    Comprehensibility -- is the audience able to understand how to use it to accomplish their goals
    Consistency -- if you know the "rules" of the API, how well does it stick to them? Or how many exceptions are there to a core set of design principles (like Destruction not being an Activity)
    Documentation -- How easy is it to find out the functionality of the API?
    Domain Correspondence -- If you understand the core domain of the data and API, how closely does the understanding of the domain align with an understanding of the data?
    And finally, what barriers to getting started are there?
  • Sometimes I hear linked data experts say "You should just use Federated SPARQL Queries". But SPARQL, let alone federations of systems, performs very poorly on all of those metrics. To explain why "consistency" is "mediocre", that's because everyone has different underlying models and exceptions, and SPARQL is a complex but very thin layer over those models. The Abstraction level for SPARQL is for the car designer who knows everything and needs to be able to get at it, which is a very very limited number of people.

    So when I hear people say "You should use SPARQL", my internal reaction is ...
  • Now you have more problems than you can count.
    Or … You can tell that people are using your SPARQL endpoint, because it’s down.
  • And even if you love SPARQL, the incontrovertible fact is that when you compare to REST + JSON developers, the Venn diagram looks something like this. Sorry it's a little hard to see, let me zoom in a little for you ...
  • Is that better? There MIGHT be one SPARQL developer who doesn't know JSON, but I doubt it.
  • Okay, so what do we need to have available in that data? (This is the brief background, scene setting bit)

    The scope, in my view, is broadly covered by "historical activities", and the participants in those activities. We need a model for describing them, and shared resources such as people, places and objects, should have shared identities. There's no need to get into philosophical wars right now about the best ontology or the most appropriate sources of shared identity, so what CAN we practically and pragmatically discuss?
  • Let's start with serialization. JSON-LD. It's JSON with explicit, managed structure. The keys are named in a way that’s easily understandable to humans and easy to use when programming. No hyphens, no numbers, no strange symbols everywhere, …

    And let me explain the significance of the colors. All the blue strings are URIs, including "ManMadeObject" and "Material". The only actual string literals are the two red labels. JSON-LD lets you manage the complexity of the graph in a way that ends up familiar to the audience, the developer, not daunting to them. Remember: Meet On Their Terms.
  • Or ... Curly brackets are the new Angle brackets.
  • If we’ve solved serialization ... well, have a direction at least ... what ARE the challenges we need to work on? Let’s count down my top 5.

    Coming in at number 5 … Order!
  • History is Ordered, and order is hard in RDF, right?

    We’re fortunate that history is ordered globally by the steady march of time, not locally.
    For historical events, we can be universally correct, Dr Who time travel aside, saying that an event in 1500 occurred after an event in 1490. No need to worry about explicitly ordering them, when applications can use the timestamps to do it themselves, as needed.

    But for local ordering, use rdf:List. The serialization in JSON-LD is good, sparql 1.1 supports property chains, and SPARQL support shouldn’t be our main concern anyway.
  • Number 4 is "boundary of representation" ... or which triples should be included when you dereference a particular URI.
  • This is a critical point for the use of Linked Data as an API. We need to optimize the representations for use cases, based on the developer audience and what they need. There's no one rule that can generally determine the best practice here.

    Note that the terms from AAT are used both by reference -- the object is classified as a painting, but without even a label -- and by value -- we're explicit that aat:300015045 is a Material, and it has a label of "watercolor". Why? Well, why indeed?! This is just an example, but one that we need to question with a critical eye, and discuss with developers as to whether it meets their needs.

    JSON-LD does have an algorithm called Framing that makes this sort of effort much more consistent and efficient. Also, we might take a leaf or two out of Facebook's GraphQL book, where the request can govern some aspects of the response's representation.
  • Number 3! The many levels of meta-data that we try to squash into a single response.
  • RDF is good at making statements about reality, and bad at making statements about statements. At three levels its terrible, and when you end up trying to make statements about what you believe that others used to believe, you’ve gone too far into the dream and need to get back to reality.

    Inception made for a cool movie, but would be a terrible API. Make broad statements about your dataset, and leave it at that. For example … Associate a license with the dataset, not with each resource … You’re not letting people use Rembrandt, the actual person, with a CC-BY license so don't claim that (like ULAN currently does). And don't reify everything in order to make mostly blanket statements about certainty or preference. No one is absolutely certain about anything, and everyone has a preference about which label or description to use ... but don't try and encode all of those subjective assertions against each triple!
  • Of course ... naming things.
  • Plato has Socrates discuss the correctness of names in Cratylus, which leads to the theory of forms, and nominalism. As a race, we haven’t solved this question in the last 2500 years, so I would predict that we’re unlikely to solve it today.

    We have this problem in several different guises:
    The URIs of Predicates, and their JSON-LD keys
    The URIs of instances, particularly ones that are common across datasets, and hopefully across organizations

    The only way to know the "best" or "most correct" name for something is through use and discussion.
  • And if number 2 is Naming Things, then number 1 must surely be Cache Invalidation.
  • And it really is. Efficient change management in distributed systems without a layer of orchestration is literally asking us to solve predicting the future.

    The best we can do is add a lightweight notifications layer, using a publish/subscribe pattern with standards like LDN (Linked Data Notifications) and WebSub. Both are currently going through the standardization process within the Social Web Working Group of the W3C.

    That gives us distributed hosting, but the potential for centralized discovery and use, where applications are informed about changes to remote systems that have been updated, so that they can update their caches of that information.
  • And finally Number Zero, Off-By-One errors. (beat) Oh.

  • Home stretch folks, thank you for your attention so far!

    Practical Linked Open Data... P L O D .. Plod. I've gotta say, Leif, it's not an awe inspiring acronym, sorry. Mr Plod the Police Officer is just not the most exciting figure. Police make you think of being told to stop, and we want to move forwards and learn from our mistakes. To get something out there, and iterate.
  • Remember ... We Want U. U for Usable.

    (beat)
  • So my call to action, fans, is to Get LOUD. Linked Open Usable Data.
  • With the Community ... Community Linked Open Usable Data ... or the CLOUD.

    (And it wouldn't be a linked data keynote without the LOD CLOUD diagram, would it!)
  • The community includes everyone in that pyramid, and together we can share the burdens across all levels. Shared ontologies, shared identities, shared code, shared use cases ... but also think about this ... By linking to other peoples' data, you're reducing your own completeness burden. And theirs, at the same time. Not everyone needs a complete description of Rembrandt, or Plato.
    Enabling users and developers to provide feedback on your data reduces your burden of Accuracy. They have the possibility to work with you to correct it.
    And working directly with developers, regardless of whether they're in your organization or not, validates Usability. Barth has several really good options for how to put that evaluation into practice.
    Remember FOAF ... who got that first time round? ... Focused, Open, Active, Flexible.
  • The audience for your actual Linked Data is the developers within your community.
    We need to meet on their terms, and allow them (and their users) to participate in the creation and management process for the data. We thus need to focus on usability of the data, not necessarily the completeness nor the accuracy.

    A good way to do that is through the use of JSON-LD, with frames governing the graph boundaries and validated through use cases and discussion about the names used. Let them know, through notifications, when the data is updated.

    Let developers help with usability, let the community help with completeness, and let the users help with accuracy.
    (beat)
    You’ll have to deal with the off by one errors yourself, unfortunately.
  • Thank You!
  • I don't want to "take questions". I want us, as a community, to discuss :) So ... Discussion?

×