Successfully reported this slideshow.
Your SlideShare is downloading. ×

JISC CNI Meeting, Edinburgh 2010

Loading in …3

Check these out next

1 of 16 Ad

More Related Content

Similar to JISC CNI Meeting, Edinburgh 2010 (20)


More from Paul Walk (20)

Recently uploaded (20)


JISC CNI Meeting, Edinburgh 2010

  1. 1. Supporting Technical Innovation in the UK: RepositoriesUK Paul Walk UKOLN is supported by: A centre of expertise in digital information management
  2. 2. innovation support • UKOLN is now one of two JISC-funded Innovation Support Centres • this role is being worked out • UKOLN has a long-standing role supporting and helping to develop the JISC Information Environment • repositories • UKOLN has an increasing role in supporting developers in UK HE • RepositoriesUK is a JISC-funded UKOLN project 2
  3. 3. provenance.... • Intute IRS • nothing to do with taxes.... • Intute Institutional Repository Search • a managed aggregation underpinning a search interface for researchers • ePrints UK and the Resource Discovery Network 3
  4. 4. lessons • the aggregation has general potential value • a cache on the network • a search service is only one realisation of that potential value • separation of concerns was needed • a particular service (such as search) should not dictate the entire infrastructure • lessons from this project complemented some thinking I was doing elsewhere.... 4
  5. 5. familiar? machine interfaces API AP I I AP some aggregated data of broad interest and potential usefulness UI end-user 5
  6. 6. a pessimistic view.... end-user end-user end-user UI UI UI Future Future 3rd-party Future 3rd-party dev 3rd-party dev dev API AP I I AP some aggregated data of broad interest and potential usefulness = certainty UI = belief = speculation end-user 6
  7. 7. why is this? • funding follows services & happy users (& new features?) • funders like to see their investment showcased • infrastructure is mostly invisible - hard to ascertain impact from users • so, there is strong motivation to develop a user- facing service, and then concentrate resources on this 7
  8. 8. a better pattern? = certainty = belief end-user end-user = speculation UI UI 3rd-party focussed app app application pre-existing user- developed for facing service specific (OPAC, VLE, API requirement Facebook, (might be simply NetVibes....) for research and some aggregated data of broad interest and potential usefulness development) 8
  9. 9. RepUK • RepositoriesUK • a managed aggregation of repository metadata from UK HE institutions • un-normalised records • well-formed XML (no check for validity) • focussed on academic papers • goals: 1. support innovation 2. develop some business intelligence 3. develop infrastructure component for services 9
  10. 10. design principles • tiered service model (quasi SOA) • serving intermediaries • negotiated supply to consumers • built around an unnormalised cache of metadata • well-formed is good enough Local Service • Common just as well really.... Service Local Core Service Services API Common Service Local Service closely integrated loosely coupled 10
  11. 11. RepUK 2 XML XML XML Files Files WorldCat Google LCSH Identities & language & MIMAS identifier JACS Names 3 SOLR SOLR Index Index Operational Metadata MySQL Registry Database (OpenDOAR) Export Export Process Export Process Process RDF Scheduler Database Harvester & Admin 4 XQuery HTML &XML RDFaXML Files 5 Files Repository XML Repository Repository Database 1 Document HTTP Server 11
  12. 12. progress • 750,000+ metadata records • ~140 repositories • 6 consuming projects so far.... 12
  13. 13. ‘consumers’ to date • RIDIR • identifiers • & FixRep • metadata & full-text • RKBExplorer & sameas • metadata to inform linked data • NaCTeM • full-text (text-mining) • Talis....? • hosting linked data 13
  14. 14. developer appreciation "We have found that the RepUK aggregated repository datasets are a very useful basis on which to build, and have used the data in a number of projects.... The ability to build on other services means that we can reuse what has been done, rather than replicating functionality, freeing more time to work on the key functionality of our own projects." 14
  15. 15. issues • state management is the real challenge! • deletions • changes • federation is consequently non-trivial • scale & inequality (one repository = half of all the records) • linking? • should the records in the aggregation ever be the target of a link? Or, should such links point to the source repository? • if we succeed with SEO, are we undermining source repositories? 15
  16. 16. new lessons • developers need infrastructure too! • finding the right place to intervene • funders need to find ways to measure value which does not necessarily stem from direct end-user satisfaction • a leap of faith.... • doing what no one else wants to do, to paraphrase Prof. David Baker • creating the right environmental conditions to allow innovative services to emerge 16

Editor's Notes

  • the cache is valuable without having to layer on added value ourselves

  • Who recognises this?
    lots of standards based apis allowing seamless interoperability
    I think this is an antipattern In software engineering terms, an anti-pattern is a design approach which seems plausible and attractive but which has been shown, with practice to be non-optimal or even counter-productive.
  • what this often means in reality (pessimistic but frequently observed)
    orange stuff is what actually gets built and delivered
    the users are yellow because they represent an expected demand, rather than an actual demand
    major investment in UI is wasted. Investment in APIs is also wasted
    neither infrastructure, nor focussed end-user service

  • a slightly better version
    investment in API is immediately realised - service is built on API - both infrastructure and service
    risk of locally built focussed app is reduced because API is developed anyway. This might be orange if properly understood. It might be OK to be yellow because might be R&D
    reality will be more than this.
  • we have concentrated on 1 and 2, with 2 being the test for the approach being taken in 1
    rapid innovation projects - 6 months, small grants, waste of time and money assembling the data. Lot’s of interest in linked data R&D on this data set.
    business intelligence - shape of UK research, gap analysis, topic maps etc.
  • we started to think about infrastructure. Infrastructure might not serve end users. It might serve those who provide services to end users.
    opportunistic developers
  • white is an external system
    blue is wholly controlled by the project - we might call this infrastructure
    yellow is negotiated between RepUK and developer projects. This might eventually become a candidate for infrastructure
    google & SEO from HTML & RDFa