ResourceSync in 24x7

702 views
640 views

Published on

Overview of ResourceSync (http://www.openarchives.org/rs) given as a 24x7 presentation at Open Repositories 2013 (http://or2013.net/), July 10, 2013.

Published in: Education, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
702
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • 0:00I will attempt, in the next 7minutes, to motive creation of the ResourceSync framework and explain what is means in a slightly less circular manner than the title. But first, I cannot claim that this is all my work...
  • 0:17Core team comprises
  • 0:34Technical committee
  • 0:51and all this would not have been possible without funding for in-person meetings and some core team timeprimary funding from SloanUK participation funding from Jisc
  • 1:08Let me pull apart the two words of the title and framework name
  • 1:25ResourceSync is about Web Resources, things on the web with a URI identifier that can be derefenced to get one or more representations- the project is making and observation and a statement that repositories should exist really on the web- from 10s on a small website to 10s of millions in big repositories- large data resources, publications, linked data- changes multiple times per second to infrequent changes of archival records
  • 1:42So far I’ve told you that a whole bunch of people are using up some generous funding to think about how to better synchronize web resources between systems. Why would we do this? What is the need? Going to give just two example use cases. More in Dlib article about a year ago.
  • 1:59Many contexts when copies of resources in scholarly repositories are necessary. From one repo I’m involved with, arXiv.org, mirroring, copy for index, copy for researchCurrently either ad-hoc approaches or resort to the very blunt instrument of web crawling
  • 2:16Ironic perhaps that while linked data is fundamentally distributed, many applications require local copies. Ad-hoc approaches to bulk copy
  • 2:33OAI-PMH was introduced over 12 years ago (before the first JCDL, before OR was even imagined)
  • 2:50Know why we need this new protocol, what should it do? Took a BIG step back to look at the fundamentals of the synchronization problem. We came up with the following 3 operations.
  • 3:07Use Resource List or a Resource Dump which includes a Resource List as a manifest and the actual content
  • 3:24
  • 3:41
  • 3:58So, we have three operations, how do these get implemented? What is the lowest barrier, most widely compatible, most performant, and most future proof way?Preferably inventing as little new stuff as possible.
  • 4:15Do everything with sitemaps. Considered many options but sitemaps won because good match, wide adoption, simple, extensible. Minor extensions required.
  • 4:32Yes, really minor. Two extra elements and attributes borrowed from several other specifications, notably Atom Link Extensions. In January the Sitemaps.org folks modified their schema to all the top level elements and this all ResourceSync documents are schema-valid sitemap (or sitemap index documents).
  • 4:49Really cool thing about using sitemaps is that by implementing the most basic capability, the Resource List, you are also producing a sitemap that can be used by all the major search engines
  • 5:06
  • 5:23It is just possible that we haven’t thought of everything or got everything perfect. Three areas of extensibility: expression of relations between resources, expresssion of fixity and other information about resources, and at the framework level new capabilities can be added
  • 5:40
  • 5:57
  • 6:14
  • 6:31
  • ResourceSync in 24x7

    1. 1. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Synchronize your resources with ResourceSync Simeon Warner (Cornell University Library) 1
    2. 2. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada 2 Team sport
    3. 3. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada 3 more, still more missing JISC Richard Jones Graham Klyne Stuart Lewis OCLC Jeff Young LOCKSS David Rosenthal RedHat Christian Sadilek Ex Libris Inc. Shlomo Sanders Library of Congress Kevin Ford
    4. 4. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada 4 Alfred P. Sloan Foundation
    5. 5. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Synchronize • keep “in sync” (colloq.) • Following changes over time and • Keeping copies on different systems the same • Tackle only the unidirectional problem: From a Source, to a Destination 5
    6. 6. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Resources aka Web Resources: have URI, HTTP GET representation(s)  Many / Few  Big / Small  Fast / Slow 6
    7. 7. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Why?
    8. 8. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Scholarly repositories • Replicate data/articles for mirroring, reuse, indexing, ... • OAI-PMH for metadata • Many custom solutions for full content 8
    9. 9. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Linked data Fundamentally distributed but local copy often required. Either: 1. cache 2. sync local copy... • Many custom solutions for local copy 9 Last.FM MusicBrainz GeoNames DBpedia others... BBC
    10. 10. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Didn’t you sell us OAI- PMH? Or... will ResourceSync replace OAI-PMH?  Proven metadata transfer protocol  Widely adopted in our community X Predates REST, not “of the web” X Not adopted for content transfer Can replace, likely coexistence 10
    11. 11. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada What?
    12. 12. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada 1. Baseline sync Initial load, copy, or catch-up from source • need list of all resources • optional packaged content Want to • avoid out-of-band setup & customization 12
    13. 13. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada 2. Incremental sync Keep up-to-date with changes at a source • need information about changes • optional packaged content • minimal primitives: create/update/delete Want • allow catch-up after destination offline • lower latency and/or greater efficiency than repeated baseline sync 13
    14. 14. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada 3. Audit Destination should be able to verify whether it is synchronized with a source • need list of all resources + fixity info Want • lower latency and/or greater efficiency than baseline sync • note: subject to some latency 14
    15. 15. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada How?
    16. 16. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada
    17. 17. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada 17 Minor? <urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:ln …/> <rs:md …/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:ln …/> <rs:md …/> </url> <url> … </url> </urlset>
    18. 18. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Baseline sync & Google Most basic capability is Resource List: • Snapshot of state of resources • URI, datestamp + optional extra fixity info • Destination does GET on each resource ResourceSync Baseline sync & Audit Google/Bing/Yahoo!/etc. harvest 18
    19. 19. 19 Modular Discovery Four Core Capabilities 1 2 3 4
    20. 20. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Extensible Extensible use of Link Relations from Atom • Spec describes use for mirrors, patches, historical, provenance, conneg... • Use <rs:ln rel=“your-relation-here” .../> Extensible attributes for fixity etc. • Includes lastmod, fixity, length, type... Extensible framework -> new capabilities 20
    21. 21. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Push = Lower latency Pull • easy setup, no trust required Push Changes • lower latency, better scaling • same descriptions as pull • standard transports (XMPP, Websockets...) • can push discovery info to trigger pull 21
    22. 22. “Synchronize your resources with ResourceSync” July 10, 2013, Open Repositories 2013, PEI, Canada Timeline January 2013 June 2013 July 2013 Fall 2013 • Tools and libraries being developed to ease implementation  First beta  Version 0.9  Update and push spec  NISO standardization • Tutorials at major conferences (OAI8, OR, JCDL,...) 22
    23. 23. 23 http://www.openarchives.org/rs/ • Framework • Archives • Push (to come) • Links to Google group, associated articles, blogs, etc.
    24. 24. 24

    ×