• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Linked Data Workshop Stanford University
 

Linked Data Workshop Stanford University

on

  • 2,694 views

Presented by Jerry Persons at Linked Data and Libraries 2011, London 14th July 2011

Presented by Jerry Persons at Linked Data and Libraries 2011, London 14th July 2011

Statistics

Views

Total Views
2,694
Views on SlideShare
2,066
Embed Views
628

Actions

Likes
3
Downloads
34
Comments
0

3 Embeds 628

http://consulting.talis.com 427
http://talis-systems.com 198
http://linkeddata.uriburner.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • a brief synopsis of the end-of June workshop
  • the slogan & logo a) it’s the web, stupid! is pretty long in the tooth these days b) never the less, tension between dogma & web-driven functionality for linked data does exist c) this offers a gentle reminder of this particular workshop’s focus
  • checklist of the participating institutions and organizations 25 people all together a) library-centric by design b) other initiatives & events are addressing wider range of resources & agencies (this JISC session, Europeana, LOD_LAM in SF, early June) c) yeast added to the mix: research, corporate, and non-profit content technology agencies
  • here’s where you get to do a bit of work ... I won’t read these a) objectives for the workshop fund-able plans for creating tools, processes, and vehicles to expedite a disruptive paradigm shift catch phrase for the effort as a whole: less talking, more doing ... doing at web scale (pace!)
  • (pace!)
  • (pace!)
  • CLIR funded creation of a linked-data survey ... state affairs to set a framework and provide a means for participants to begin work with relatively similar definitions for “linked data”
  • screen shot of the survey’s outline (note headings ... pace!)
  • drilling down ... extant metadata (note headings ... pace -- )
  • looking at a specific subheading
  • which takes one to the content checklists of content with URLs 1 st version published to the Workshop participants final version (with updates from the workhop & ensuing developments in the linked data arena will be public on the CLIR site by end of summer ... ever changing landscape: schema.org W3C Library Linked Data Incubator Group David / Richard will hear when the web site is publicly accessable
  • some additional work for you ... first is Josh Greenberg’s telling synopsis
  • and Mike Bergman’s thoughts on provenance & co-referencing ... worth consideration relative to cultural heritage efforts & objectives
  • Stefano mazzoki’s tale on linking take note of the date and give some thought to this snippet ... in terms of what’s playing out in the dialogs among proponents & advocates for varied flavors of structured data ... linked-data in the W3C cannon, RDF, RDFa, rich snippets, schema.org, Facebook’s Open Graph Protocol, Freebase’s GraphD driven topics
  • ... speaks for itself
  • this checklist surfaced near the Wokshop’s mid-point a) procedurally the workshop was shaped by an evolving agenda b) said agenda driven by the products of four 6-person workgroups c) cross pollination among workgroups provided by alternating plenary ... workgroup ... plenary ... workgroup sessions it’s a “prioritized” list of the issues coming separately from the workgoups ... it’s ordered via a simple voting process, each participant having 7 votes ... any number of votes could go to any issue (1 – 7) -- its a rough hewn working document, a quick snapshot that then went back into the work-group process 1. co-reference, reconciliation ... of URs 3. killer apps ... tension between cult heritage built ... emergent via web scale entrepreneurs 11. feed back, metrics ... what’s the value-add
  • 12 user seduction ! 13. workflow ... gross simplifications
  • sketch of the chief components of a linked-data creation pipeline MARC / MODS library data was the specific use case considerable detail developed in the working documents note: 1) co-references captured early pay big dividends later (less URI bloat) 2) comparison of machine vs. human ID of co-references 80 / 20 – ideal !! mebbe as low 60 / 40 in many cases need efficient human resource ontribution crowdsourcing Freebase reconciliation pipelines note also: 1) revisions that go back thru the whole pipeline can be expensive think about transforming MARC to linked data then pushing revisions back into MARC highly dis-functional impedance (granularity) miss-match
  • how soon can the publish canon of linked data become the metadata resource of record ?? !! in Workshop’s parlance, what can be done to make the change ... and can that change be incremental
  • a 10,000 foot snapshot of a work plan for a specific project a) large pools of journal citations b) highly constrained workflows in extant systems with narrow flexibility in staff resources (ranging from little to none) c) using a rough project planning outline, where do the ISSUES fit into the planning and workflow of such a project sketches of Use cases Data modeling expectations for production workflows
  • maintenance and distribtuion
  • bottom line IS and REMAINS metrics ... feedback, reporting, reward systems VALUE ADD ... for consumers
  • VALUE ACCRUED linked data publishers where’s the convincing elevator speech for VLUE ADD / ACCRUED via linked data by the cultural heritage community ?? !!
  • another take on putting linked data to work across many levels expertise found in GLAM environments a checklist of tasks / events / considerations / planning nodes
  • placed in a matrix with 3 levels of organization maturity novice journeyman master (oops ... we left out apprentice ...
  • at the nexis of each matrix junction institutions could find reference implementations that addressed the issues and functions needed by organizations to move forward with their linked data projects
  • OK ... the unspoken issues much discussed at the Workshop !! URIs change in basic culture ... flat, string-based records throughout the cultural heritage community -- conversion of data hard -- conversion of culture ... DOABLE ?! co-references and provenance inseparable ... trust comes from source of assertions how choices about sameAs were made ... AND NOT MADE crucial value ADD and value ACCRUED
  • an array of VALUE assessments from Europeana’s mid-June workshop increased relevance ... an apparent winner ... metrics for same? new customers & public mission data enhancement
  • things to keep in mind ... recurrent themes during the week at Stanford co-references / reconciliation is more web emergent and less policy driven -- in the terms of one of the groups at the workshop schema last ... data first
  • Wendy Hall talked with the Stanford Libraries’ staff last Summer (2010) ... she recounted her experience with the emerging web, from the perspective of one who developed a well structued, standards-based means of providing access to a massive archive of historial papers at Southampton in the end, the web worked better than what she’d created her message to the Stanford folk: scruffy works
  • speaks for itself
  • what’s coming from the Workshop Public version of the CLIR survey Documents from with workshop A few number of proposals (over a relatively short period of time)

Linked Data Workshop Stanford University Linked Data Workshop Stanford University Presentation Transcript

  • Linked Data Workshop Stanford University June 27 – July 1, 2011
  • Linked Data Workshop Stanford University June 27 – July 1, 2011
    • who
    • CLIR (Council on Library and Information Resources)
    • Research Libraries
    • National Libraries
    • HighWire Press
    • LOCKSS / CLOCKSS
    • Metaweb / Freebase (Google)
    • Research Center for Informatics, National Institute for Informatics, Japan
    • sameAs.org, Seme4 and University of Southampton
    • Semantic Computing Research Group (SeCo), Aalto University, Finland
      • Michigan
      • Stanford
      • Virginia
      • Bibliotheca Alexandrina
      • California
      • Emory
    • Bibliothèque nationale de France
    • British Library
    • Deutsche Nationalbibliothek
    • Kongelige Bibliotek (Denmark)
    • Library of Congress
    • what
    The Stanford Workshop focused on crafting fund-able plans for creating tools, processes, and vehicles to expedite a disruptive paradigm shift in the work flows, data stores, and interfaces used for managing, discovering and navigating the knowledge and information resources that fuel scholarship and research. The goal was identifying knowledge management capabilities and specifying designs for requisite new components, mechanisms, environments, and communities that will:
    • what
    • The Stanford Workshop will focus on crafting fund-able plans for
    • creating tools, processes, and vehicles to expedite a disruptive
    • paradigm shift in the work flows, data stores, and interfaces used
    • for managing, discovering and navigating the knowledge and
    • information resources that fuel scholarship and research.
    • The goal is identifying knowledge management capabilities and
    • specifying designs for requisite new components, mechanisms,
    • environments, and communities that will:
    • move beyond current metadata practices based on discrete, distributed, and replicated database records;
    • precipitate a new family of methods and tools to replace today’s metadata records with an array of emergent, open, link-driven meta services ;
    • what
    • rapidly expand the breadth, density, and reliability of well-curated identifiers and links associated with the publications, data, manuscripts, documents, artifacts, and other resources available via the services and holdings of the world’s national+research libraries, museums, archives, and other science, social science, and cultural heritage institutions; and
    • provide for continuous improvement in the quality and density of link-driven navigation and discovery capabilities through provision of open, managed feedback and annotation by individuals and communities who seek, gather, consume, and build content in the course of their reading, teaching, learning, scholarship, research, and other knowledge-based activities.
    • context
    • context
    • context
    • context
    • context
    • context
    I’ve liked to characterize the current moment as a circle of libraries, museums, archives, universities, journalists, publishers, broadcasters and a number of others in the culture industries standing around, eyeing other and at the space in between them while wondering how they need to reconfigure for a world of digitally networked knowledge. Josh Greenberg, Moving a handful of blocks north …, April, 2010.
    • context
    I’ve liked to characterize the current moment as a circle of libraries, museums, archives, universities, journalists, publishers, broadcasters and a number of others in the culture industries standing around, eyeing other and at the space in between them while wondering how they need to reconfigure for a world of digitally networked knowledge. Josh Greenberg, Moving a handful of blocks north …, April, 2010. Whichever organizations do an excellent job of providing context and coherent linkages will be the go-to ones for data consumers. As we have seen to date, merely publishing linked data triples does not meet this test. Mike Bergman, I have yet to metadata I didn’t like , 2010
    • context
    The biggest problem we face right now is a way to ‘link’ information that comes from different sources that can scale to hundreds of millions of statements (and hundreds of thousands of equivalences). Equivalences and subclasses are the only things that we have ever needed of OWL and RDFS, we want to ‘connect’ dots that otherwise would be unconnected. Stefano Mazzocchi, Darkness is relative, I guess ,  January, 2007.
    • context
    The biggest problem we face right now is a way to ‘link’ information that comes from different sources that can scale to hundreds of millions of statements (and hundreds of thousands of equivalences). Equivalences and subclasses are the only things that we have ever needed of OWL and RDFS, we want to ‘connect’ dots that otherwise would be unconnected. Stefano Mazzocchi, Darkness is relative, I guess ,  January, 2007. comment for every one of these questions,  I know multiple librarians who would know the answers off the top of their heads rejoinder can I have copies of those librarians? anonymized from the IRC back channel at a Code4Lib meeting
    • issues ... snapshot at mid-point of workshop
    1. co-referencing, reconciliation – across formats, disciplines ... 2. use of extant, well curated metdata – including authority files, ... 3. killer apps – via GLAM communities? ... emergent via web? 4. provenance – attribution / origin / authority 5. staff training; creating, deriving, publishing URIs, making links, using links in discovery environments 6. usability of data -- “reifiable” 7. QC – immediate and over time – across language boundaries 8. standards for URIs – versioning 9. data curation – i.e. linked data and its various components 10. distribution of responsibility – e.g. preserve metadata, content 11. feedback, reporting, reward systems, metrics, contribute linkable data (filling gaps), contribute URIs (SEO issues)
    • issues
    • marketing / outreach – user seduction & training
    • workflow
    • scalability [an indicator of success, fixes exist]
    • indexing – how to get data once you have the link
    • use of ontologies
    • licensing – focused on metadata at this juncture, content later
    • annotation – linked data extended / improved by its consumers
    • relationship to e-scholarship (esp. e-science) & e-learning
    • cultural diversity (languages, character sets) – existing schema adequate?
    • search engine optimization
    • social networking (FaceBook, Google+, ...)
  • extant metadata reconcile + newly minted
    • vectors: 1. workflow / pipeline
    transcode reconcile reconcile revise published canon WWW fabric of linked data via algorithm killer app(s) via people
  • + newly minted
    • vectors: 1. workflow / pipeline
    WWW fabric of linked data via algorithm reconcile reconcile revise published canon killer app(s) via people
    • vectors: 2. projects  issues
    Bring issues to bear in project plans for a real-life project 1. Use cases [3. killer apps] a. put yourself in role of linked-data developer and/or consumer - what’s needed, what will foster new/better capabilities b. what are relationships between this and other data - what vocabularies, schema, URIs, and models are in play c. components (the test case is journals) [2. extant authorities] - names, journal & article titles, date ranges, citations, publishers, ISSN, language, topics/classification d. effect of proposed project [19. relationship to e-scholarship, etc.] 2. Output data representation / model a. [17. licensing] for the metadata b. schema / vocabulary selection - [8. standards for URIs] - [6. usability] and [20. cultural / language issues ] 3. Production [13. workflows] a. [5. staff training ...] b. [1. co-referencing & reconciliation] c. massive conversion from strings to URIs typical w/ extant data
    • vectors: 2. projects  issues
    4. Maintenance - production systems vs. new mgmt requirements for linked data - where are updates & revisions applied? - [9. Data curation] and [7. QC, immediate & over time] - [10l shared responsibilities, e.g. meta data preservation] 5. Distribtution - [12. marketing/outreach, user seduction] - [14. scalability] - [21. SEO] - [22. social networking (FaceBook, Google+, etc)] - [15. indexing] and [18. annotation]
    • vectors: 2. projects  issues
    4. Maintenance - production systems vs. new mgmt requirements for linked data - where are updates & revisions applied? - [9. Data curation] [QC, immediate & over time] - [shared responsibilities, e.g. meta data preservation] 5. Distribtution - [12. marketing/outreach, user seduction] - [14. scalability] - [21. SEO] - [22. social networking (FaceBook, Google+, etc) - [15. indexing] [18. annotation] 6. Metrics [11. feedback, reporting, reward systems, ...] Value added linked-data consumers
    • vectors: 2. projects  issues
    4. Maintenance - production systems vs. new mgmt requirements for linked data - where are updates & revisions applied? - [9. Data curation] [QC, immediate & over time] - [shared responsibilities, e.g. meta data preservation] 5. Distribtution - [12. marketing/outreach, user seduction] - [14. scalability] - [21. SEO] - [22. social networking (FaceBook, Google+, etc) - [15. indexing] [18. annotation] 6. Metrics [11. feedback, reporting, reward systems, ...] Value added   Value accrued linked-data consumers metadata producers
    • vectors: 3. cookbook  issues
    value statements use cases ingestion of data confidence of data, provenance publishing data providing / engendering services education / outreach user seduction
    • vectors: 3. cookbook  issues
    maturity novice journeyman master value statements use cases ingestion of data confidence of data, provenance publishing data providing / engendering services education / outreach user seduction
    • vectors: 3. cookbook  issues
    maturity novice journeyman master value statements use cases ingestion of data confidence of data, provenance publishing data providing / engendering services education / outreach user seduction reference implementations
    • elephants in the room
    • URIs, not strings
      • must not underestimate the amount of effort required to transform large subsets of GLAM metadata from flat records into linked data replete with URIs
    • reconciliation   provenance
      • need plans for mgmt of co-references emerging from large swaths of newly minted GLAM linked data, e.g.
        • -- norms / vehicles for provenance that track and record reconciliation events, agents, criteria, etc.
        • -- means to track negative co-reference decisions
    • feedback, reporting, reward systems, metrics
      • need persuasive justifications for building and supporting linked-data systems for the cultural heritage community
  • http://blog.okfn.org/2011/06/24/notes-from-open-metadata-workshop-hague-15th-june-2011/ Notes from Open Metadata Workshop [Europeana] The Hague, 15th June 2011 Posted on June 24, 2011 by Jonathan Gray
    • e.g.
    • caveats
    • mgmt of co-references needs to be a bottom-up process
      • funders will pressure to impose standards
      • risk is that top-down approach will capsize the effort
      • need to let things grow organically
    • caveats
    • mgmt of co-references needs to be a bottom-up process
      • funders will pressure to impose standards
      • risk is that top-down approach will capsize the effort
      • need to let things grow organically
    • build systems that accept the way the world is, not what you would like it to be
    • caveats
    • mgmt of co-references needs to be a bottom-up process
      • funders will pressure to impose standards
      • risk is that top-down approach will capsize the effort
      • need to let things grow organically
    • build systems that accept the way the world is, not what you would like it to be
    • focus on changing current practices (in the long run), not only on reconciling data (in the short run)
    • caveats
    • mgmt of co-references needs to be a bottom-up process
      • funders will pressure to impose standards
      • risk is that top-down approach will capsize the effort
      • need to let things grow organically
    • build systems that accept the way the world is, not what you would like it to be
    • focus on changing current practices (in the long run), not only on reconciling data (in the short run)
      •  preventing problems is better than solving them
    • stay tuned
    • CLIR linked-data survey
    • Workshop documents
      • introductory presentations
      • agendas as they evolved
      • reports from the work groups
      • summaries
    • Proposals for work
      • specific projects
      • communities of practice
      • opportunities to collaborate & contribute
    • questions / thoughts ?
    Thank you for your time and attention Jerry Persons [email_address]