Discovery: A Metadata Ecology for UK Education and Research What makes data (or metadata) open? (relevant to international/USA audience)What’s the business case?
Add in JISC, BL, National Archives and other logosUse keyboard slideDiscovery = implementation for the RDTF vision (screenshot of Discovery logo)National agenda w/ national stakeholder representatives (inc. JISC, British Library, The National Archives, Collections Trust, etc).JISC funded
Resource Discovery Taskforce arrived at this vision, after much debate.The question was if we had to start over, what would we build?No agreement on the solution, but agreement on the vision[include RDTF partners slides and questions they attempted to answer? But probably not enough time to answer– maybe for OCLC][show this image, but then also crop the vision part and read it out]
Agreement over the Vision, but not how to get there. (agreement/disagreement slide)Significant disagreement over services (and what ‘infrastructure’ means) (buckets vssiloes; one ring to bind them all, versus, etc). (LOTR slide & Linked data slides?)Agreement over metadata and over Open databeing the necessary means to achieve this (but this is not as simple as it seems).From technical infrastructure to a web ‘ecosystem’…Whilst business drivers vary and use cases are distinct, our overarching aim across these sectors is the same - we want our collections to be instrumental in teaching, learning and research. Given the paradigms of the web, that aim is most likely to be achieved if those collections are discoverable through popular search engines as well as through specialised services and aggregations, and if they can be exposed through social platforms ranging from scholarly reference management systems to Facebook and Twitter.
From a shared services perspective, the offer of the cloud is increasingly compelling/attractiveAggregate and store in the cloud, and build services on top…
But there are other equally strong voices arguing that this approach is too monolithic – the attempt to circumscribe, control and monopolise and control is completely at odds with the way the web works in a dispersed and entropic way.For each solution you implement, a new set of problems, unanticipated or ignored requirements emerges. Centralisation and dispersal
The possibilities of ‘Linked data’ as the semantic web done right are beguiling in this contextCould Linked Data be the answer – the way to knit together content to create meaningful ‘web of data’ for researchers?
Ecoystem slide (find image or use Discovery logo)From technical infrastructure to a web ‘ecosystem’…Whilst business drivers vary and use cases are distinct, our overarching aim across these sectors is the same - we want our collections to be instrumental in teaching, learning and research. Given the paradigms of the web, that aim is most likely to be achieved if those collections are discoverable through popular search engines as well as through specialised services and aggregations, and if they can be exposed through social platforms ranging from scholarly reference management systems to Facebook and Twitter.
What’s Open Data? (barriers to reuse are removed as much as possible) (find image;unlocking/removing barriers, etc)
We need to open data from a legal perspective: You need explicit open licensing statement (not simply an open interface) (solicitor wig) We can’t derisk ‘Open’Concerns arising through our interviewsIn perpetuityBut risks alleviated as soon as benefits are made clearer
Approach taken by Discovery (screenshot of website pages & PDF)Open Metadata Licensing PrinciplesCC-0 vs CC-BYCommunity consensus/building a critical mass or movement (again, minimizes risk)
We recommend that institutions and agencies should proceed on the presumption that their metadata is by default made freely available for use and reuse, unless explicitly precluded by third party rights or licences.We strongly advocate that all metadata releases require licensing, for which institutions and agencies should adopt a standard open licensing framework that is suited to their purposes.Reference to permissible usage under the terms of a standard open licence will promote confident and appropriate use. When licensing open metadata in the majority of circumstances, the standard Open Data Commons Public Domain Dedication & Licence (ODC-PDDL), the broadly similar Creative Commons CC0 licence or the UK Open Government Licence (OGL) will be appropriate.Avoidance of variations to such standard licences will make it easier to combine data from different resources and will reduce repeated requirement for legal advice.Highlight/list key points from principles here…Approach taken by Discovery (screenshot of website pages & PDF)Open Metadata Licensing PrinciplesCC-0 vs CC-BYCommunity consensus/building a critical mass or movement (again, minimizes risk)
We need to open data from a technical perspective – exposing data in ways that ensure it is findable by Google and reuseable by developers.
http://technicalfoundations.ukoln.info/guidance/technical-guidelines-discovery-ecosystem1. Discovery is heterogeneousThe Discovery ecosystem is a heterogeneous environment, encompassing a wide variety of users, resources and types of resources, domains, technologies and and business models. Discovery balances the need for a degree of homogeneity to serve management and interoperability requirements, with a recognition of the importance of variety in any ecosystem.2. Discovery is resource-orientedDiscovery is innately resource-oriented. It is a principle of Discovery that metadata resources may have intrinsic value, and that the ‘opening up’ of these to all will create more value as they are used, enhanced and combined with other resources.3. Discovery is distributedThe Web is starting to be realised as a network where nodes are both client and server - functioning in potentially many different interactions with other nodes.This allows for, and even encourages, the possibility that systems operating in the Discovery ecosystem can be both providers of information resources and services at the same time that they consume and use other, remote resources and services.The idea of the Application Programming Interface (API), and principles of modular systems design, are important concepts for Discovery.4. Discovery relies on persistent global identifiersThe resource oriented architecture encourages the identification of information entities. In the Discovery ecosystem, such entities are typically metadata records, although there is growing interest in experimenting with a finer granularity of metadata in a Linked Data context. In any information system, such entities are uniquely identified. As Discovery deals with open data, such identifiers must be globally unique for the distribution of resources and services to work. The default global identifier scheme for The Web is the HTTP URI, however there are other important schemes in use in the Discovery ecosystem.5. Discovery is built on aggregations of metadataMetadata aggregation is a foundational aspect of the Discovery vision. This might seem somewhat in opposition to the previous principle: however, The Web is sufficiently unrestrictive that it allows both distribution and aggregation as useful strategies in certain contexts. Dempsey uses the terms diffusion and concentration to describe these two approaches and indicates how they are complementary.6. Discovery works well with global search enginesSearch Engine Optimisation (SEO) is the process of exploiting an understanding of the functions and algorithms of the major global search engines. With such an understanding, Web content providers can present web resources in such a way that they gain the optimum ranking in the indexes created by those search engines. SEO is a fully developed industry in the commercial sector, but many of it principles and techniques are well known and applicable to the Discovery ecosystem.7. Discovery balances consensus with agilityConsensus on technical and information standards is what allows information systems in the Discovery ecosystem to interoperate. Discovery favours open standards, but is also pragmatic about the adoption of less open standards where they are in mainstream use.While there are, undoubtedly, benefits to be gained from consolidation, standardisation and consistency of approach in the Discovery ecosystem, it is also understood that there are domains and communities of practice within the ecosystem which take different approaches, use different standardTechnical principles currently in development (screenshot) (put list here)
Beyond rhetoric. Beyond hype. If we’re going to convince institutions to join us, then the benefits need to be more clearly defined [benefits we’re working to realise – shorten these; maybe present as wordle type of thing?]Libraries can enhance support for efficiencies such as collaborative cataloguing and collection managementMemory institutions can combine information to provide a more complete set of signposts to support a richer range of narratives and user questsAny special collection can become more discoverable and therefore more widely usedAggregators can be enabled to work more innovatively to promote exposure of contributing collections [ref: Kira’s presentation]The wider community of developers, of finding aid authors and of narrators can be leveraged as co-creators to benefit access and articulation in both planned and serendipitous waysOverall, institutions can focus their efforts on adding service value and providing authentic raw material, rather than on preserving the dikes and halting the waves'Open metadata creates the opportunity for enhancing impact through the release of descriptive data about library, archival and museum resources. It allows such data to be made freely available and innovatively reused to serve researchers, teachers, students, service providers and the wider community in the UK and internationally.'
This is about recasting the value chainThe Discovery initiative and this movement more broadly is about embracing and facilitating the growth of new business models, not only rethinking our value proposition but also reflecting on our very purpose. We hope you will join us, not in blind pursuit of an ideal but rather by contributing to the community dialogue about rationale and business case and consequently to the shared reservoir of open metadata.
Discovery: Implementing a vision for a 'virtuous' flow of metadata across the Web
Discovery: Implementing a vision for a 'virtuous' flow of metadata across the Web<br />Joy Palmer<br />Mimas, University of Manchester<br />
Key points<br />Discovery: A Metadata Ecology for UK Education and Research <br />What makes data (or metadata) open? <br />What’s the business case for opening data?<br />
A national initiative, funded by JISC<br />Includes major research universities, The National Archives, British Library & Collections Trust….<br />Libraries, Archives & Museums<br />
There was agreement on the Vision. But not how to get there….<br />
Disagreement over services…<br />Agreement over metadata<br />http://www.flickr.com/photos/arenamontanus/4970297633/sizes/m/in/photostream/<br />http://www.flickr.com/photos/yesiamisme/3629393406/sizes/m/in/photostream/<br />
With the introduction of data cycles we have a real ecosystem not a one way street and this ecosystem thrives on collaboration, componentization and open data… <br />(Rufus Pollock, OKFN blog, March 31 2011)<br />
By working on a national level to:<br />Open up metadata about institutional collections<br />Achieve further clarity and advocacy over licensing (especially around opening up data for reuse)<br />Provide sustainable technical guidance <br />Develop core standards guidance for metadata<br />Engaging the key stakeholders at every level throughout the process<br />
Your data’s not open unless it has an explicit open licensing statement<br />
We the people…<br />discovery.ac.uk/principles<br />
In short…<br />Adopt an ‘open by default’ mindset<br />All metadata releases should adopt standard open licenses<br />In the vast majority of cases ODC-PDDL, CC0 are appropriate<br />Avoid home grown variations<br />
The Discovery ecosystem is…<br />Heterogeneous<br />Resource-orientated (not service)<br />Built on aggregations of metadata<br />Distributed<br />Reliant on persistent global identifiers<br />Striving to work well with global search engines<br />
This is all very well in principle…<br />The next year is about animating the principles on the ground…<br />
Where‘s the business case?<br />user demand, benefit, added-value = sustainability<br />
Beyond rhetoric and hype….we can:<br />Support institutional efficiencies<br />Provide more complete signposts to support richer discovery<br />Increase discoverability of hidden collections<br />Leverage aggregators to exploit data for us<br />Leverage the technical creativity of others & innovate<br />
And maybe recast the value chain<br />New business modelsNew value propositionsNew purpose?<br />