SWORD2 & BITTORRENTA Network Admin’s Worst Nightmare Tim Brody, Damian Steer,Sander van der Waal, Steve Welburn
WHAT IS SWORD2?SWORD2 is a protocol for depositing stuff and its metadatawith a repository. Its implemented as a profile of the AtomPublishing Protocol, which is roughly: Client GETs service document from server Client POSTs stuff for deposit and metadata to url mentioned in service document Server responds with created this at url
Client can GET url Client edit url content with PUT Client can DELETE urlAtom originated in blogs, and SWORD2 essentially justexpands the metadata used.
THE PROBLEM WITHBIG DEPOSITSBig deposits take ages to transfer, which makes themsuseptible to interruptions due to error, or simply boredom(Oops, I closed my laptop...). In itself that ought to berecoverable since HTTP supports partial uploads using therange header. However if you look at steps 2 and 3 aboveyou may see a problem: Client POSTs stuff for deposit and metadata Server responds with created this at url
THE IDEASend a reference to content via SWORD, rather than thecontent itself.We could use any number of schemes then, such as ftp,rsync or http. (HTTP will work fine this way around becausethe content has an identity and could be resumed)(Aside: its also interesting that a repository could chose notto download, such as situations where the data is stored in anational subject repository)
OR BITTORRENTUnlike rsync, ftp, or http, there are many serverimplementations, with nice GUIs, for a variety of platformsin a number of languages. (server and client labels arentespecially helpful with bittorrent)Handles partial downloading with ease.No packaging required: moving directories is as easy asindividual files.
WHAT DO YOU NEED?A bittorrent client at the depositors end. This is where thefiles start.A bittorrent client at the repository end. This is where thefiles will appear.A bittorrent tracker.
WHAT WE NOW KNOWBittorrent is a peer-to-peer network. The clients are peers, itjust happens that some have all the data (seeders), andsome are seeking data (leechers). Data is identified (veryroughly) using a hash of the content.Clients need to find each other, and to do this they useservers called trackers, the URLs of which are included intorrent files. Trackers are pretty simple: you can contactthem to say I am interested in X, and find other clientsinterested in X.
USING SWORD2 ANDBITTORRENTUploader opens bittorrent client, and creates a torrent filefor a file or directory.The tracker used may be the repository itself.SWORD deposit is made as usual, but the content is atorrent file.Content will be deposited.
IMPLEMENTATIONTim has / is making EPrints a bittorrent tracker.It will spot torrents uploaded via SWORD.Uses transmission-cli to download.Steve is making a deposit client.Makes a torrent file, opens in torrent client, and uploads viaSWORD.
INTERESTING STUFFThis really helps with the other issue of large datasets:downloading. I hope people will typically want individualfiles, but this would allow full downloads without killing theserver.
MORE INTERESTINGSTUFFIts robust, and actually quite secure. You cant downloadwithout the torrent file.Can limit torrents to a particular tracker.The tracker also provides basic usage information.
THE DISTRIBUTEDCONTENT STOREThe picture shifts from repositories holding data to databeing spread across the network.