Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

data.ac.uk briefing paper

908 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

data.ac.uk briefing paper

  1. 1. What should be atdata.ac.uk ?<br />Briefing Paper and Recommendations<br />
  2. 2. The Primary Community Concerns<br />Overall themes from community consultation (see quotes in notes section below this page):<br />Opening up data is something Universities should now be demonstrating good practice.<br />Without a collective voice for how Universities, and as importantly individuals, should be exposing their data, <br />Data is different, in it’s various forms including spreadsheets (more than any other kind of resource) it is subject to being lost as search engines do not easily facilitate data discovery to support learning and research uses.<br />The future of the research is becoming dependent upon interrelating more granular pieces of data. The future Web is providing us with the ability to expose data so that detailed visualisation and specific answers can be better provided for end users.<br />Data access and minting (permanence) of URIs needs to happen across sector, institution, department and individual in order to support this interrelation of data.<br />There is an urgency to enable a URL syntax (XXX.data.ac.uk) so that institutions can start to open up the small bits and pieces of data they have now, otherwise more data is being lost everyday.<br />
  3. 3. So, what should be at <data.ac.uk> ?<br />data.ac.uk continuum<br />The List – The Register – The Safe Storage<br />Option 1: The List – Provide a simple single web page list that points to data applicable to the Academic domain.<br />Option 2: The Register – An multi-page registry and catalogue that can point to datasets around the web and provide a search facility and API to find data sets (but will not store data itself).<br />Option 3: The Safe Storage – A single central data repository where people can come to store their data for the long term.<br />NOTE: Regardless of which solution / technology is selected to be used at data.ac.uk further guidance to the community is required.<br />
  4. 4. Option 1: The List<br />Summary: A single web page that will list data sets applicable to Academia along with where they can be found and who owns them (community currated). <br />The Problem it Solves: Will make it easier for librarians and developers to find data sets, which will in turn enable them to deliver data sets (via new tools and helpdesks) to researchers and learners for use specific to their subject niche.<br />The Audience: Developers and librarians will make their data discoverable via auto lookup and librarians will help encourage end users to register their specific datasets at data.ac.uk<br />Value to UK H/FEIs:<br />Will help encourage a common way of exposing data across multiple institutional, departmental and individual websites so that other in the Academic sector can find and reuse the data.<br />Risks:<br />If not popular or en vogue could fall into disrepair and cease to be a useful resource.<br />Could confuse end users as to the reason for having a registry that is separate from where the data is held.<br />Examples:<br />Barcode registry and lookup service: http://www.keyword.com/barcode_upc.htm<br />Suggested Features: <br />Guidance on how to make your data discoverable to the list at data.ac.uk provided, copying the Gov’t recommendations on URL sets.<br />A crawler tool should be built in at the website to enable an auto discovery of data from data.xyzUniversity.ac.uk sites and list them along with an owner.<br />Form-fill submission form for anyone with a “.ac.uk” email address to come along and add the dataset to the list or to suggest an amendment, correction: a “wikipedia” like interface (but only one page, not multiple web pages).<br />
  5. 5. Option 2: The Registry<br />Summary: An multi-page website and catalogue that can point to datasets around the web and provide a search facility and API to find data sets (but will not store data itself).<br />The Problem it Solves: Will increase the likelihood of small and medium sized datasets to be found and used by the end user which are otherwise not discoverable via current search & discovery technologies.<br />The Audience: Researchers, teachers, developers, librarians, administrators and other academic staff who are tasked with the collection of data from across and within the institution and its departments. <br />Value to UK H/FEIs:<br />Will provide a one-stop shop for acquiring metadata on a diverse rage of datasets that will be categorised and listed according to various tags.<br />Will suggest tools and methods for working with the datasets along with examples and good practice.<br />Risks:<br />Same risks from Option 1 (the List), plus...<br />Multi-website catalogue could confuse and act as a multi-click barrier which would dissuade the community from participating in the community curation of the website.<br />Examples:<br />CKAN http://www.ckan.net/ <- same s/w that data.gov.uk uses.<br />Semantic Wiki: http://semantic-mediawiki.org/wiki/Semantic_MediaWiki<br />Suggested Features: <br />Same features as (option 1) ‘the list’, but in a multi page wiki like space.<br />Auto discover of metadata and community curation would be the emphasis however additional support via an editorial support team<br />Could provide a place for datasets that could not be hosted at an .ac.uk<br />Could potential list both datasets as well as tool for working with the datasets.<br />CKAN advice on how to use at local “data.*.ac.uk” institutions would be provided so as to provide a networked aggregated search layer.<br />
  6. 6. Option 3: The Safe Storage<br />Summary: A full search and retrieve repository that is able to list and hold data in a single place, this would act as both a registry and as a datastore that would assure the per<br />The Problem it Solves: This would attempt to bring all data into a central storage facility so that it could be managed and curated by a central team and staff.<br />The Audience: All UK Academics would be encouraged to place their data here for the long term so that it would be <br />Value to UK H/FEIs:<br />This would encourage long term perseverance of data regardless of institutional change. <br />Risks:<br />Central repositories have had limited success and are often met with criticism in the community.<br />Time and expense required to deliver a single solution over what an individual institution can provide immediately could make this system out of data prior to being launched.<br />Examples:<br />National Learning Object Repository Jorum - http://www.jorum.ac.uk/<br />Suggested Features: <br />Features of both option 1 (the list) and 2 (the registry), plus...<br />Deposit features and long term citable persistence of URIs<br />Committee for selecting and minting URI domains for various subject areas (eg. www.classicalcomposers.data.ac.uk)<br />Database team for assuring delivery of URIs in the long term<br />Transcoding team in place to assure delivery of multiple formats including CSV, RSS, ATOM, JSON and RDF<br />Marketing team for engaging a wider community of contributors and volunteer curators. <br />Business and service models as a self-sustaining self contained organisation.<br />
  7. 7. What to choose?<br />Option 1<br />Option 2<br />Option 3<br />data.ac.uk continuum<br />The List – The Register – The Safe Storage<br />Recommendations:<br />The first step must be immediate advice to the local community as they are ready to publish data NOW, e.g. Use of data.*.ac.uk URL syntax must be in place tomorrow.<br />A central hub at data.ac.uk (regardless of solution or technology) should enable the community to come together and share advice and thoughts, this should be an attitude of ‘aggregation from across the Web’ rather than ‘build it and they will come’.<br />While community should be first and foremost at data.ac.uk, second to this should be an urgency of government legislation for having this data open.<br />JISC should continue to pursue supporting activities that engage communities of end users around the use of their spreadsheets as they pertain to administration, teaching, learning and research (cite jiscEXPO and jiscMRD).<br />JISC should continue to help change the preception of senior management from a resource oriented view to a parrallel data centric view of the sector.<br />
  8. 8. Recommendation: a step-by-step action plan<br />A suggested tiered approach: action is needed now so as to engage the community (first and foremost) with the technology to iteratively follow:<br /><ul><li>Project 1: JISC could action a small project now (£50k) that could begin to engage the community and provide fundamental support.
  9. 9. Each project cyclereviews community reflection and re-scopring would occur prior to further funding.
  10. 10. Early wins, guidance on URL syntax, etc.
  11. 11. It would be the institution decision during the bidding process to decide on what technology (list, registry, safe-store) would be applicable to support the community engagement. </li></ul>Reflect &<br />Re-Scope<br />Reflect &<br />Re-Scope<br />data.ac.uk v2<br />data.ac.uk v1<br />data.ac.uk v0<br />
  12. 12. data.ac.uk URL syntax recommendations<br />One thing that almost all community members agreed upon is the need for a guidance document on URL syntax recommendations, i.e. the OPSI’s guidance on URI Sets but for UK H/FEIs.<br />
  13. 13. Suggested Timeline<br />Sept 2010 = Closed ITT to Southampton, Manchester and Oxford to implement data.ac.uk<br />Oct 2010 = Selection of Project <br />Nov 2010 – April 2011 = data.ac.uk project 1<br />April 2011 = JISC community review of data.ac.uk project 1<br />May 2011 = Consideration for data.ac.uk project 2<br />This would be a non-committal funding cycle that would be subject to community review and approval prior to continuation. <br />
  14. 14. Value proposition diagram for institution exposing their data.<br />developer community building subject specific tools<br />audience specific website<br />available open data<br />

×