ResourceSync - An Introduction


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

ResourceSync - An Introduction

  1. 1. ResourceSync - An Introduction Todd Carpenter Executive Director, NISO ALCTS Continuing Resources Standards Forum Sunday, June 24, 2012 With thanks to Herbert Van de Sompel and Robert Sanderson (LANL)
  2. 2. @TAC_NISO Twitter Highlights• Presenting this morning on the ResourceSync project at ALCTS Continuing Resources Standards Forum #ALCTSCRS #ala12• I’m pre-tweeing my slides during #rsync presentation. Slides here:  _________ #ala12• NISO mission is to develop and maintain technical standards related to information, documentation, discovery and distribution of content #ala12• Standards are all around us, even if we dont notice them, especially in books.  Things like page numbers, paper, binding, even spelling is standardized. #NISO #ala12• Machines don’t talk like people do.  Then again some people don’t talk like other people do, particularly teenagers #ala12• So where did the ResourceSync project start?  #NISO approached OAI about updating the PMH protocol. #ala12• The #NISO / OAI ResourceSync project was possible through the generous support of the Alfred P. Sloan Foundation.  Thank you!  #ala12• What is RSync trying to solve: Source Server has resources that change.  Destination servers want to leverage some or all of Source on regular ongoing basis in near-real-time & at web scale. #ala12• Syntonization can be good enough or perfect and synchronization can be fast or fast enough.  #ala12• RSync is studying a number of existing protocols to determine which (or combination of) protocols can best meet needs.  We have an bias against developing new spec from scratch. #ala12• There are several models for synchronizing content: pull, push, conditional pull, mediated feed and pull, and a mix of feed/push/pull/service models. #ala12• The goal of ResourceSync is to find the model that most efficiently distributes the content, while limiting the tax on the source system. #ala12• This is very early days in the process of standards development. We’re still in the incubation stage. Consensus and adoption phases will come in 2013 and beyond. #ala12• We hope to have a beta specification available by the end of 2012 of ResourceSync #ala12
  3. 3. AboutNon-profit industry trade associationaccredited by ANSIMission of developing and maintaining technicalstandards related to information,documentation, discovery and distribution ofpublished materials and mediaVolunteer driven organization: 400+ spread outacross the world
  4. 4. Standards  are  familiar,  even  if  you  don’t  no4ceJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 4
  5. 5. Machines don’t talk like people doJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 5
  6. 6. Machines talk like thisJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 6
  7. 7. How  did  we  get  here?• OAI-­‐PMH  Protocol – Developed  in  200X – Developed  by  Herbert  van  de  Sompel,  Carl  Lagoze  and   the  OAI  team – Fairly  wide  adopQon  in  scholarly  community• In  spring  2011,  NISO  approached  OAI  to  discuss   updaQng  PMH  Protocol• Response  was  “Let’s  try  something  else  more  in   line  with  more  modern  technology”  June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 7
  8. 8. A partnership is born• Agreement to launch RSync as a NISO standards process• Partnership on grant application• OAI team comprised core technology team• Partnership on grant applicationJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 8
  9. 9. Special  thanks  are  due  to...  June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 9
  10. 10. What  we  trying  to  solve? Consideration: Source (server) A has resources that change over time: they get created, modified, deleted, moved, … Destination (servers) X, Y, and Z leverage (some) resources of Source A.Problem: Destinations want to keep in step with the resource changes at Source A: resource synchronization.Task of ResourceSync effort: Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities. The approach must scale better than recurrent HTTP HEAD/ GET on resources.June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 10
  11. 11. Use  cases  differ How good is the synchronization? Perfect Good  enough How fast is the synchronization? Fast Fast  enoughJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 11
  12. 12. 3  disQnct  needs  regarding  resource  synchronizaQon Baseline  matching:  An  approach  to  allow  a  DesQnaQon  that  wants   to  start  synchronizing  with  a  Source  to  perform  an  iniQal  catch  up   –  Dump. Incremental  resource  synchronizaQon:  An  approach  to  allow  a   DesQnaQon  to  remain  up-­‐to-­‐date  regarding  changes  at  the   Source. Audit:  An  approach  to  allow  checking  whether  a  DesQnaQon  is  in   sync  with  a  Source    –  Inventory. =>  All  3  are  considered  in  scope  for  the  effortJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 12
  13. 13. ResourceSync  Working  GroupHerbert Van de Sompel (Chair) Stuart LewisLos Alamos National Laboratory Joint Information Systems Committee (JISC)Todd Carpenter (Co-Chair) Peter MurrayNational Information Standards Organization (NISO) LyrasisNettie Lagace Michael NelsonNational Information Standards Organization (NISO) Old Dominion University David RosenthalManuel Bernhardt Stanford UniversityDelving B.V. Christian SadilekKevin Ford Red HatLibrary of Congress Shlomo SandersBernhard Haslhofer Ex Libris, Inc.Cornell University Robert SandersonRichard Jones Los Alamos National LaboratoryJoint Information Systems Committee (JISC) Sjoerd SiebingaMartin Klein Delving B.V.Los Alamos National Laboratory Ed SummersGraham Klyne Library of CongressJoint Information Systems Committee (JISC) Simeon WarnerCarl Lagoze Cornell UniversityCornell University Jeff Young OCLC Online Computer Library CenterJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 13
  14. 14. hep:// Data  AeribuQon  and  CitaQon  Workshop 14
  15. 15. Change  NoQficaQon  -­‐  Protocols Atom PubSubHubbub (PuSH) XMPP PubSub extension BoSH (XMPP over HTTP) Comet / HTTP Streaming Open an HTTP connection and keep reading from it Bayeux Protocol Long Polling Keep HTTP connection open until a message, then reopen BoSH, Bayeux option WebSockets NullMQ / ZeroMQ XMPP over WebSockets?June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 15
  16. 16. Incremental  Synchroniza9on   Change  NoQficaQon  (CN) Alert  that  something  happened   (create,update,delete) Content  Transfer  (CT) Transfer  of  just  the  change  or  the  full  resourceJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 16
  17. 17. Trivial  versus  OpQmal  Approaches• Trivial  Approach  -­‐  Retrieve  &  Compare• OpQmal  Approach  -­‐  push only the change to only the destinations monitoring the resourceJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 17
  18. 18. More  advanced  opQons• Trivial  Approach  plus  CondiQonal  GET: – Retrieve  every  resource  if  it  has  changed – EssenQally  this  is  a  Change  NoQficaQon  Pull – Not  scalable,  strain  on  Source  Systems,  no  way  to   know  of  new  resourcesJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 18
  19. 19. More  advanced  opQons Simplest  Workable  Model: Introduce  a  Feed  of  change  noQficaQons  for  all   resources Atom,  RSS,  OAI-­‐PMH,  SiteMaps,  etc =>SQll  not  efficient,  no  way  to  know  when  to  pullJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 19
  20. 20. More  advanced  opQons Feed  Extension  SoluQon: ConQnue  the  Feed  paradigm,  but  introduce   aggregaQng  service  and  ping  noQficaQon  to  re-­‐pull   (simulated  push) Only  advantageous  if  Source  already  supports  a  FeedJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 20
  21. 21. The  lifecycle  of  standards   You  are  hereJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 21
  22. 22. Ongoing  Research• Change  NoQficaQon  -­‐  XMPP  &  XMPP  PubSub  &  bleeps – LANL – Ongoing  Experiment  with  Live  DBPedia• Change  NoQficaQon  -­‐  Comet  /  HTTP  Streaming  &  bleeps – ODU – Bayeux  Protocol  via  Faye  ImplementaQon• Change  NoQficaQon  -­‐  Change  Simulator – Cornell  U – Generate  configurable  change  noQficaQons – Use  as  standardized  input  to  different  systems  for  tesQng• Baseline  Matching  &  Audit – Cornell  UJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 22
  23. 23. Timeline• Project  Launch  =  November  2011• Approved  work  item  =  December  2011• Working  Group  formed  =  February  2012• Webinar  on  project  =  March  2012• JCDL  meeQng,  Washington  DC  =  June  2012• Alpha  =  ??  September  2012• Beta/Dran  for  trail  use  =  ??  December  2012• Comment  period  =  ??  December  2012  -­‐  March  2012• Training  =  ??  May  -­‐  July  2013• Approval  =  ??  December  2013June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 23
  24. 24. Thank you! Todd Carpenter, Executive Director National Information Standards Organization (NISO) One North Charles Street, Suite 1905 Baltimore, MD 21201 USA +1 (301) 654-2512 NOTE  =>NISO  IS  MOVING  IN  JULY  2012  <= www.niso.orgJune  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 24