OARJ Project@ #jiscDEPO programme meeting 1 st March 2011 Theo Andrew Project Manager EDINA
Talk outline <ul><li>Aims </li></ul><ul><li>Background </li></ul><ul><ul><li>Discovery </li></ul></ul><ul><ul><li>Delivery </li></ul></ul><ul><li>Proof-of-concept </li></ul><ul><li>Demonstrator service </li></ul><ul><li>Issues & Next steps </li></ul>
Aims: assist deposit into multiple existing repository services by developing middleware that will aid both discovery of repository targets and delivery of the content
Background <ul><li>Depot (2007/09) - unmediated eprints repo </li></ul><ul><li>EDINA added a referral service, called Repository Junction, to redirect users to existing IR services. </li></ul><ul><li>Survived by the OpenDepot.org service run by EDINA. </li></ul><ul><li>OA-RJ (2009/11) – to expand on the concept of the Repository Junction </li></ul><ul><li>Initial focus on the discovery aspect; however, </li></ul><ul><li>Concept of data mining for target repo identification lead to broker service. </li></ul>
Discovery: The Junction open DOAR ROAR UKAMF Junction db: Org IDs matched to IRs Named entity recognition WhoIS ORCID Funding codes Other AMFs SOURCES API Known org ID Article XML Known IP location INPUTS Matched repositories
The Junction API Suite of three APIs for interacting with the data: /api [primary point of interaction] /cgi/list/ [lists known values - type/content/country/lang/org/net] /cgi/get [used for internal AJAX functions orgs, repos, net] http://oarepojunction.wordpress.com/junction-api/ <ul><li>/api can be given a specific locus to deduce repositories (IP address or an ID code) to specify the organisation, or it will deduce a locus based on the calling client. </li></ul><ul><li>The script can be asked to restrict the returned list by repository type (institutional/learning/..) or accepted content (pre-prints/data/thesis/...) </li></ul><ul><li>Data is returned in either JSON, Text, or XML formats </li></ul>
Consider a complete bipartite graph between 2 sets, where Set A (=3 nodes) passes information to Set B (= 5 nodes) : Set a Set b Total number of edges = 15 Each data provider needs to broker an agreement with every target repository, and each target repository needs to authenticate each data provider - this does not scale
Consider adding a central node to connect the sets: Set a Set b Set A (=3 nodes) passes information to central node, Central node passes information to Set B (= 5 nodes), number of edges = 8 In this structure, each party maintains just one relationship with a trusted operator
<ul><li>Nodes: </li></ul><ul><li>185 repos listed in open DOAR for UK </li></ul><ul><li>200+ publishers listed in SHERPA </li></ul><ul><li>Edges: </li></ul><ul><li>37,000 or 385 </li></ul><ul><li>... what are the Global Figures? Researchers are not confined to the UK borders </li></ul>
How a broker model could simplify things: - one consistent deposit process - single sign up for content providers and receivers - building a network of trust Demonstrator service Broker Institutional Repository 1 Institutional Repository 2 Institutional Repository 3
Case study 1: multiple authored paper Journal Y Repository 1 Repository 2 Repository 3 Copy A 3 Paper A Copy A 2 Researcher 2 Copy A 1 Researcher 3 Metadata A 2 Metadata A 3 Metadata A 1 Researcher 1
Case study 2: Mandated open access Journal Y Paper A Researcher 1 £000s Copy A 1 Copy A 1 Researchers 2 & 3
Estimate of the number of broker transferred items during a six month demonstrator service. Data is based upon the number of papers published in journals from the participating NPG portfolio during Jan - June 2010. Data retrieved from PubMed Central and ISI Web of Knowledge. (*Figure rounded down, **Still to be confirmed as a participating institutions). 607 1220 3660 TOTAL 8 17 53 Auckland 41 83 248 Yale** 46 92 275 Cornell 83 166 499 MIT 160 321 962 Oxford 237 476 1429 Cambridge 32 65 194 Edinburgh 50% author participation rate* Participating NPG journals All NPG journals Institutional partner
Issues and dependencies <ul><li>Common deposit package for SWORD </li></ul><ul><li>Missing data – provenance/embargo details/ author affiliations </li></ul><ul><li>Licensing – content providers and repos </li></ul><ul><li>Institutional sign-up – federation model? </li></ul>
Project Blog: http://oarepojunction.wordpress.com/ Thankyou for listening. Questions?
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.