This presentation is based on a simple premise: that the current inaccuracies and inadequacies of data in the supply chain are not only detrimental to the user experience and the ability of institutions to effectively manage their collections, but resolving them is becoming increasingly unsustainable at institutional level impacting not only on access to resources but more widely on institutional decision making.In this presentation I hope to explain how two projects KB+ in the UK and GOKb in the USA, increasingly working in cooperation with a range of other partners, are adopting a community centric approach to try and resolve these issues as well as broaden out the scope and utility of knowledge bases.It is our belief that it is only through collaboration at a wide range of levels and on a number of fronts that these challenges can be overcome.This is not to belittle the efforts of any of the vendors out there and the products which they continue to develop, rather to suggest
Just a warning at the start, none of what I’m about to say is rocket science.
KB+ is a new service from JISC Collections building a shared academic knowledge base for the UK academic community. KB+ was funded through HEFCE’s Universities Modernisation Fund in response to studies conducted by SCONUL and Jisc suggesting library directors wanted to see greater action on shared services in the areas of library management systems, electronic resource management and licensing.KB+ is concerned with capturing and representing the information that institutions need to manage their subscribed resources, so things like journal packages, e-resource licences and institutional entitlements. It is providing a one stop shop for this information for UK institutions, but also making sure that this information is available to others in the supply chain such as systems vendors who provide services to institutions.GOKb is a sister project of KB+, funded by the Mellon Foundation in the US. The aim is to create a community sourced knowledge base, primarily for users of KUALI’s open library environment. However, like KB+, with which it has worked on software development and data architecture it intends to make all of the data it collects available under an open licence so it can be used in other systems. Whilst both KB+ and GOKb have a shared interest in package information and licence information – GOKb is less concerned with local level subscription and entitlement information. As such the two complement each other as both sources and users of data collected by each of them.
There are so many problems with the data in the supply chain that I don’t really no where to start. Though this quote from a librarian goes some way towards addressing the issues we’re facing.The first issue is accuracy, it unfortunate that many publishers do not seem entirely certain about what exactly they publish. One of my colleagues in KB+ has so far spent over 70 hours creating a complete list for content of one of the major publishers, a list that the publisher is unable to generate from their own systems, meaning that much of the work has to be done by hand clicking through the content of their web site. They are not alone, some publishers have started using our lists themselves so they know what they publish.From there we have the issue of availability – not all of parts of the supply chain have access to all of the information they need, and if some publishers are unsure about what they publish, they are even more unsure about what institutions are buying now and have bought in the past. Add to that the fact that there sometimes appear to be entire classes of information that are absent from the wider supply chain in any format that can be used with systems – the main one here being licence information.DataAccuracy and availabilityContextualisationInteroperabilityData silos and flowsImplementation of standardsWorkflows“the workflow support problem” (NISO, 2012)Duplication of effort Maintaining numerous KBs WITHIN an institutionMaintaining numerous KBs ACROSS institutions
So you have duplication of effort at the service provider level and that is before you get anywhere near the amount of work that is then going at the local institutional level to correct inaccuracies in the knowledge base data and then make sure that it is accurate for ones own institution.Such duplication of effort may have been tenable in an age of plenty, but in the UK at least and I suspect elsewhere, it just isn’t sustainable to have intelligent academic librarians spending a large proportion of their time correcting and maintaining what is essentially commodity information.
But it isn’t just about the knowledge bases for link resolvers, at the institutional level one is attempting to synchronise a number of different views on their data, all telling a slightly different story and giving a different version of the same thing.
One of the common misconceptions about this work is that we just want to reproduce what is already there, just more accurately. That is only part of the story, by working at different levels, we believe that we can actually achieve more.We can all contribute to and make use of globally relevant information, and we see GOKb as the venue for that information, of publisher, title and platform information as well as standard licence agreements.However, we also know that there are important differences at the national and institutional level that need to be managed appropriately.Just some examples might be the details of national purchases of journal archives where they differ from what is available from the publisher, one might also add in specific national licences such as the JISC model licence, or consortially negotiated entitlements that differ from the standard.Organisations such as JISC Collections can help add important context to national level information by providing the benefit of their direct knowledge of the licensing process and negotiations.There is also the local institutional world of local holdings, financial data and documentation.What we are seeking to achieve though is the minimisation of activity that occurs at the level of the individual institution wherever commonality of the data means that the work could have been undertaken nationally or internationally.
One of the chief critiques of GOKb, KB+ and other similar initiatives is given the limited resources available how can they hope to offer the comprehensive of the existing major offerings.It’s a fair point when you consider that some of these Kbs are employing about 30 odd people just to work on the data.However, we feel that in some ways this misses the point:At the national level there are already tends of librarians working by themselves or in groups aligned to products to maintain and update knowledge bases, this effort could be coordinated to bring benefits to all by reducing and hopefully eliminating any duplication of effort across institutions.Similarly by collaborating internationally, we can maximise this network effect.The groups listed above have all actively started to investigate the means by which they can share and collaborate on data management together, thus overcoming our own limited national capabilities to build something greater than the sum of its parts.And because we use open data and set no limits on what each of us can do with our data, we are free to go on establishing new partnerships with any group that wants to see this type of information openly available to all irrespective of platform/system/service or vendor.We don’t all need to do everything, we can divide up the work to reflect different areas of expertise and according to national priorities.But it is an essential piece of the sustainability of these efforts.
There is a range of data available to all of us, all of which contain part of the overall picture, it is only by bringing this data and information together in the right context that we get the answers we need to drive and inform the decisions that we want to take.Some of the examples might be the use of licence and holdings information in KB+ coupled with preservation information from the Keepers registry to provide understanding about post-cancellation access entitlements or the disposal of print.Similarly, package information from KB+ together with usage information from JUSP can inform collection development.Or one may simply want to know what the applicable copyright policies are to the various titles are in your collection from SHERPA/Romeo.The data sources aren’t the really important part, it is the fact that we can bring multiple sources together to answer different questions.
So far I have concentrated on data, it’s management, improvement and collation, but underpinning all of this is human interaction and relationships.As anyone who has ever read a licence will know, what’s written down on the page is often only half the story – experience, knowledge are all needed in order to apply that information correctly, and all too often individuals are subjecting themselves to a lot of stress and worry because they don’t know who to ask or even what to ask.We want to maximise the collective knowledge in the library communityAsking questions, sharing knowledge, filling in gaps, adding contextLicence interpretation as a key aspect of this
When we started KB+, I had a vision that by encouraging the adoption of standards, and making use of those that
The use of standards, identifiers and compliance with best practice are all fundamental to our concept of how to resolve the issues we face with regard to this information.Their implementation supports accuracy, availability and exchange of data – which is what KB+ and GOKb are all about.When I started the KB+ project I had a rather naïve view that we could use standards etc to minimise the need for human mainpulation of the data, thus creating machine based processes for the collation, maintenance and distribution of data.How misguided was I?There are two real problems in all of this.The first is that even where a standard exists, it has often not been implemented widely or indeed at all – this is particularly the case of some of the onix standards for serials and licences.The second problem is often that even when they have been implemented, there are still ongoing problems with the accuracy of the data itself.So we have had to be practical – where they are there we will try to use them – as in the case of KBART, sometimes we will put the data into the required standard – as with KBART and ONIX-PL. Other times we map identifiers together so that we can build up an increasingly accurate picture – so for example using either e-issn or issn.However, none of this can take away from the fact that we’re still having to do far to much work on this data ourselves. Encouraging the adoption of standards is hard enough, the experience of ONIX-PL testifies to that, but when you discover that you are having to put together the basic data itself, correct it and then format it accordingly you do suspect that some other people in the supply chain could lend a hand.
And finally, Last year my colleague Ben Showers urged delegates at this conference to set their data free so that it could achieve its potential.As I’ve said, I’m a big believer in open data and as much of the data that we have as we can will be made available under an open licence.However, I’m also quite a believer in accurate data, so as much as I’d like you all to let your data be promiscuous, I’d be grateful if you could give it quick tidy up first so that it looks presentable and doesn’t get knocked back.
1. Maximising the Knowledge BaseKB+ and GOKb
2. The problems we are seeking to addressWHY ARE WE DOING THIS?
3. Data quality and availability“ My sales rep asked me if we could let them know which of their journals we subscribe to. We didn’t know so we asked our “ agent, but they didn’t know either Librarian JISC Collections Roadshow 12th June 2012
4. Duplicated EffortThe Big Four KBs employ c80 FTE on KB maintenanceBUT… “ minor points of differentiation in their comprehensiveness “ and quality Breeding, M, E-resource knowledge bases and link resolvers: an assessment of the current products and emerging trends, Insights, 2012, 25(2), 173–182, doi: 10.1629/2048-77220.127.116.11
5. Interoperability Publisher Subs Consortia Agent Library My ERM staff Content Spread Link sheets resolver Filing cabinets
6. So what do we mean by maximising the knowledge base?APPROACHES
7. Approaches to resolving the problem Collaborative Enriched Standards andOpen data Communities information Best Practice? Maximising the knowledge base Publications – Packages – Licences – Subscriptions – Entitlements
8. OPEN DATA
9. Open data delivers practical benefits• Share and Collaborate with anyone and everyone – Improve accuracy – Publication information provided to all systems vendors – Reduces the burden on any one element in the supply chain – Efforts not tied in or limited to specific products or systems – Easier to seed knowledge bases
10. COLLABORATIVE COMMUNITIES
11. Doing more by working together • Publisher Data • Package information Global (GOKb) • Standard licences3rd Party • National/Consortial informationSystems • National licences National (KB+) • Central Services andServices • Local holdings • Financial information Institutional • Documentation
12. International Cooperation ? GOKb
13. ENRICHED INFORMATION
14. A Fraction of the Whole Data Answer Data Decision Licence Post-Cancellation Holdings andInformation and Access Print Disposal Entitlements Interpretation Entitlements
15. Communities of knowledge“ Ainto intelligence or information human must turn knowledge “ Grace Hopper "The Wit and Wisdom of Grace Hopper" by Philip Schieber in OCLC Newsletter, No. 167 (March/April 1987)
16. STANDARDS AND BEST PRACTICE
17. “ The that you have so many to is nice thing about standards “ choose from Andrew Tanenbaum ….but do please choose one