Hi everyone and thanks for coming to the webinar today. My name is Kristen Wilson and I work at North Carolina State University Libraries. I’m currently splitting my time between my permanent role as associate head of Acquisitions & Discovery and the GOKb Editor role. My involvement with GOKb runs the gamut from figuring out how to collect data, to testing and planning the software releases, to helping build a community around the project. Today I’m going to talk a little bit about GOKb and how it aims to fill a new role in the knowledgebase ecosystem.
Here’s a quick overview of the topics I’ll cover.
Before I dive into the details, I want to give a quick overview of what GOKb is to give you some context.
In many ways GOKb is a lot like the knowledgebases we’re all used to working with – it’s a collection of data about titles, packages, and platforms offered as part of the academic publishing supply chain. But GOKb will also be different from traditional knowledgebases in several ways.
Enhanced – it will contain data not found in kbs in the past, like changes over time and an expanded set of identifiers Community managed – Partners will have the ability to edit the kb data directly, which allows us more ownership of the data. Users who are not partners will be able to suggest changes. Open– the data will be freely available under a CC0 license, which means anyone can use it for any purpose – no strings attached
What about link resolvers – A quick note here that GOKb is not a link resolver. But that doesn’t mean that someone couldn’t build one using GOKb, as we’ll see.
Who is working on GOKb?
Some useful background info to know is that GOKb was originally started as an offshoot of the Kuali OLE project in the US and the KB+ project in the UK. OLE is an open source ILS currently under development and being funded by the Mellon Foundation. KB+ is a project of JISC Collections, and it’s an ERM primarily marketed to libraries in the UK. Both of these projects realized they could be enhanced by integration with a knowledgebase. Since there was no open source product currently filling that role, they decided to create their own.
GOKb is currently being funded by the Mellon Foundation. The ultimate plan is for it to become a part of the Kuali OLE project when the Mellon funding ends.
While GOKb is primarily being worked on by these partners right now, we’re envisioning that there will be opportunities for others to begin getting involved later on this year. I’ll talk about that more later on.
Here’s a time line that gives a good overview of what’s been done with GOKb so far.
Currently, in my role as the GOKb editor, I’m working with the partners release to create documentation and a draw up a plan for collecting and loading data – as well as doing some initial data loading myself. We’re hoping to begin pulling in the OLE and KB+ partners this summer begin working on the data collection on a larger scale.
Now I’d like to switch gears for just a couple slides and talk a bit about the problem space that GOKb will address.
Over the decade or so, knowledge bases evolved primarily as a solution to the problem of access. They were really designed to power link resolvers and get patrons to articles. But as the ERMS came on the scene, we realized that they could be helpful for management purposes as well. So, the knowledgebase became part of some of the early ERMs.
These systems taught us a lot about what was really needed from an ERMS – notably the fact that add-on, downstream systems are not ideal. This has resulted in the current movement toward systems that integrate the functions that have previously been divided between the ERMS and the ILS.
While we’ve moved away from the add-on nature of these systems, I believe that this time of experimentation was a positive and necessary experience for growth. By interacting with knowledgebases in their early states, we’ve learned more about what we need them to do.
What are the lessons learned from the knowledge base experiment over the past ten years?
First off, we’ve learned that Kbs are needed for management too. What I think this really means that is the current way of doing knowledge bases – essentially a snapshot of what’s available right now – is not enough. To truly manage e-resources, we need to know more about the history and context of each title – title changes, publisher changes, relationships to other titles. We also need a history of what was – the titles that used to be in a package, the packages that used to be offered – as well some documentation of what’s to come.
Next, identifiers are crucial. They’re a critical component to managing connections, match-points, and relationships across our data.
These systems also need to support all electronic resources. Ebooks are no longer an afterthought. In fact, new workflows to support management of ebooks are beginning to really blur the roles between serials and monograph technical services staff.
And finally a flexible system should accommodate data from multiple sources. Most of us currently rely on numerous knowledge bases to access data and information with a variety of purposes. When these data are pulled into a single system, identifiers will need to be assigned or utilized to create relationships. And to make the best use of these identifiers, it’s important for systems to be knowledgebase-centric. The key data components should be a part of the knowledgebase.
Why is it so important that library management system be knowledgebase-centric?
The first reason is that we need to manage the right “thing.” When you start to look at the e-resources that libraries really purchase and manage, you realize that we’re not just dealing with titles. As you can see in the example I’ve used here, what I really need to manage is the electronic version of the title Tetrahedron Letters, that’s published by Elsevier, that’s part of the Freedom Collection, and that’s hosted on the platform ScienceDirect.
For this reason, we need to go beyond bibliographic description, which really only captures the title, format, and publisher (and often that’s wrong). In most traditional ILSs, the bib record is the center of the universe. Everything hangs off of that central point. I believe we need to see new systems in which the knowledgebase record takes on that central role.
And to that end, it’s important that the conceptual entity I’ve shown here have its own identifier, which will allow it to be more easily described and communicated across systems.
How can GOKb help address this problem space?
I’ll talk in a bit more detail about each of these points, but at a high level
Enhanced data means that we can manage what’s important for ERM, not just access Open data means that GOKb is not wedded to any one system – new integrations can be built Community managed data means that we can contribute directly to the quality of the knowlegdebase and see changes implemented faster
So now I’ll get into some specific ways that GOKb will address the problem space that I’ve been talking about.
The first is a concept within the GOKb data model known as the TIPP. This stands for title instance package platform. The TIPP is our way of addressing the need to manage the right thing when it comes to e-resources. You can see here that the TIPP has links to the title instance, package, and platform, in addition to the organizations related to these entities. By creating a component to represent the TIPP, we’re able to better identify and describe the entity that libraries are really managing.
When a system like OLE integrates with GOKb, it can use the GOKb TIPP identifier to create a record that corresponds to this TIPP, pull in metadata about that TIPP, and receive updates via API if anything changes.
GOKb will also manage changes over time.
Our goal in managing title changes is not to replicate cataloging rules, but to use a more lightweight approach that will better mirror the way that publishers and vendors view these changes. GOKb will use ISSN changes as the principle indicator of a change and will define all relationships as either earlier or later related titles.
GOKb will also manage the TIPPs within a package using a TIPP status. TIPPs can have a status of current, expected, or retired. When you look at a package, you’ll be able to see the retired TIPPs – in other words what used to be in the package. Additionally, if the information is available, GOKb will also be able to display the TIPPs that may be moving into a package in the future.
Finally, GOKb will capture organization role changes, especially for transfer titles. If a title changes publishers, a new TIPP will be created and a relationship established with the old publisher TIPP. Unlike bibliographic description, a new publisher/package/platform means a new record – which more accurately reflects the importance of these types of changes in managing e-resources.
The final component of the data I want to focus on is an expanded set of identifiers. The TIPP and title instance components in GOKb can be assigned any number of identifiers. So we will capture traditional information like print and e ISSNs, but we can also capture other types of identifiers, like vendor internal IDs or union catalog numbers. (Assuming people are willing to share them.) The goal here is to allow users to create a crosswalk between identifiers, which can be very helpful in comparing different sets of data or moving data between systems.
Open data offers a lot of practical benefits…
First of all, GOKb is purely data, it’s not tied to any one system. This means that it can be reused more easily than data that’s tied to a proprietary product. It does mean however that users cannot actually manage their local information in GOKb.
This brings us to the next point, which is that GOKb can support any project. Kuali OLE and KB+ are two projects that are already working on using GOKb, but any other project (open source or not) will be free to pull in GOKb’s data too. This could be something like an open source link resolver or an existing project, like CORAL, the open source ERMS.
GOKb’s data will be available via API, and documentation about that will be available with the public release.
And GOKb will also include a co-reference service, which will allow people to create crosswalks between different sets of identifiers, as I described earlier.
The community managed nature of GOKb also brings benefits to the project.
One of the chief critiques of GOKb, KB+ and other similar initiatives is that given the limited resources available how can these projects hope to offer a comprehensive listing of the offerings that are out there? It’s a fair point when you consider that some of the commercial Kbs are employing about 30 some people just to work on the data.
However, we feel that in some ways this misses the point:
There are already trends of librarians working by themselves or in groups maintain and update knowledge bases. This effort could be coordinated to bring benefits to all by reducing and hopefully eliminating any duplication of effort across institutions.
The groups listed above, in additional to some potential international partners, have all actively started to investigate the means by which they can share and collaborate on data management together, thus overcoming our own limited capabilities to build something greater than the sum of its parts.
And because we use open data and set no limits on what each of us can do with our data, we are free to go on establishing new partnerships with any group that wants to see this type of information openly available to all irrespective of platform/system/service or vendor.
We don’t all need to do everything, we can divide up the work to reflect different areas of expertise and according to priorities.
But it is an essential piece of the sustainability of these efforts.
GOKb currently utilizes two tools for community contributions to the knowledgebase.
The first, OpenRefine, is a data manipulation tool. The GOKb development team has written a custom extension for OpenRefine that allows users to apply rules and validate their data, so that it will be consistent and complaint with GOKb’s conventions. I’ll also mention that GOKb’s data standards are based on the KBART standard, which you’ll be hearing about later on in the webinar. So it’s been very helpful for us to work with data from publishers who have already adopted KBART.
The second tool is the GOKb web application. This is the place where users can go to browse and search data, make corrections to individual titles, and eventually report errors.
And I’ve got a couple quick screenshots of what those tools look like.
Here’s a view of OpenRefine (which is transitioning from GoogleRefine). You can see here that we’ve imported data from a spreadsheet or text file, and in Refine it can be viewed and manipulated. Over to the left, the user receives a series of error and warning messages. These are designed to ensure proper formatting and to check that the most important pieces of data are present. Once each error is resolved, the user will see an option to import it into GOKb.
And here’s a sample of what GOKb itself looks like. This is a view of the title instance. You can see that there’s a variety of metadata about the TI, including a publication history. (Which I see is either not correct, or missing something – but it’s just an example).
Here’s a view of a TIPP. You can see the relationships to each of the components that makes up the TIPP, in addition to the TIPP status. GOKb also supports some light workflow functionality. You can create a review request for any component and assign it to a user within the system. Review requests are also generated automatically after the import from OpenRefine if the system detects any conflicts or problems that need to be resolved.
So to wrap up, I wanted to talk a little bit about some next steps for GOKb.
As I mentioned, getting up and running with data collection is our current priority. We’re planning to hold workshops in the coming months with the Kuali OLE and KB+ partners to get them started on contributing data to GOKb.
We’re currently looking at future development for the project, and the two big areas we’re focusing on are ebooks and linked data. As I mentioned earlier, ebooks are just as critical to manage as e-jouranls, an we need to review the GOKb data model to ensure that it’s robust enough to handle ebooks too. We also want to expose GOKb’s identifiers as linked data – especially the identifiers for concepts like the TIPP and the organizations related to it. By making these identifiers into linked data, we’re hoping to codify the conceptual entities that we manage when we manage e-resources.
Finally, we’re focused on building community. This includes developing the existing community, but also beginning to put feelers out for partners outside the existing relationships. We’ve been working with a few international groups who are interested in participating in a similar way to KB+. But we’re also scouting out libraries who might like to participate.
Why should you get involved?
GOKb is a community, not a start up. So we’re not out to compete or dominate the marketplace. Rather, we are inclusive, and our goal is to make the best product possible for the benefit of all. So we welcome not only libraries and consortiums to participate, but publishers and vendors too. Ensuring consistency of data across the supply chain benefits everyone. And because we’re providing open data and software, there are no restrictions on what you can do with the data. We just hope that those to do find a use for GOKb once it’s available will consider contributing something back. We’re interested in hearing from national knowledgebase projects, vendors, and individual libraries who might be interested in participating. We’re happy to talk with you about time lines and what participation would mean.
You can feel free to contact me if you are interested in learning more about GOKb or with any questions you think of later.
Thanks for listening! Are there any questions I can answer right now?
Building the Global Open Knowledgebase
May 14, 2014
An enhanced knowledgebase
A community-managed knowledgebase
An open data knowledgebase
(What about link resolvers?)
WHAT IS GOKB?
GOKB TIME LINE
2012 2013 2014 20152011 2016
GOKb and KB+
collaborate on data
GOKb Phase I:
Proof of Concept
Funded by Mellon
Foundation & Kuali
Enhanced functionaityGOKb Phase II:
THE PROBLEM SPACE
•Kbs were primarily
used for access
•They became a
part of the add-on
•A positive and
THE PROBLEM SPACE
•Kbs are needed for
•We need to
manage all e-
•Systems should be
integrated and Kb-
Managing the right
for what we need to
Enhanced data means we can manage what’s
Open data means that the knowledgebase is not
wedded to any one system
Community-managed data means we contribute
directly to the quality of the knowledgebase
HOW CAN GOKB HELP?
ISSN change as principal indicator
Earlier Related Title and Later Related Title
Titles within a package on a platform (TIPPs)
Organization role changes, especially
CHANGES OVER TIME
ENHANCED DATA: IDENTIFIERS
GOKb is just data – it’s not tied to any one system
GOKb will support Kuali OLE and KB+ -- but it can
support any other project too
External systems will access data via API
GOKb will include a co-referencing service to
crosswalk between different sets of identifiers
DOING MORE TOGETHER
• Publisher Data
• Package information
• Standard licencesGlobal (GOKb)
• National/Consortial information
• National licences
• Central ServicesNational (KB+)
• Local holdings
• Financial information
Initial ingest: OpenRefine
Working with data: GOKb web application
Browse and search data
Submit error reports
TOOLS FOR CONTRIBUTORS
It’s a community, not a start up
Ensure consistency of data across supply chain
Open data and software
Extensible community model for data management
Structured participation will be possible for:
WHY SHOULD YOU GET INVOLVED?
Associate Head of Acquisitions & Discovery
North Carolina State University Libraries