GOKb: What it builds on, what it can build (code4lib 2012)
GOKb, the Global Open Knowledge base
What it builds on, and what it can build
John Mark Ockerbloom
University of Pennsylvania
Code4Lib Mid-Atlantic, October 17, 2012
• Managing electronic resources now involves lots of
redundant information management
– Across institutions
– (Penn, Lehigh, Villanova….)
– Across systems within an institution
– (e-resource discovery, catalog, link resolver, ERM, subscriptions)
• Info about electronic resources has both global and local
– What’s offered generally; what your inst. takes & manages
– Global components can be managed globally
• We can build systems, communities to manage global info
– Drawing on open source, linked open data principles
A community coming together
• Kuali OLE institutions
• JISC: KB+ project
• Mellon Foundation
• Previous standards work
• DLF: ERMI
• Requirements & workflows for acquiring, managing e-resources
• UKSG/NISO: KBART
• Data standards for simple information about offered e-resources
• W3C: Linked data/semantic web
• Flexible ways to represent and link together structured
information in open, standardized, extensible ways
What will GOKb produce?
• Flexible data model supporting ERM tasks
• covering all types of electronic resources
• Initial emphasis: journals
• Active repository of electronic resource data
• With no restrictions on use (CC0)
• Open mechanisms for accessing the data
• APIs usable both by OLE and other library
How GOKb will roll out
• Mellon project: June 2012-June 2014
• Will produce first version of deliverables
• Immediate follow-on support by OLE
• Variety of data, APIs may increase
• Developing long-term plan for governance,
• You can help
Each entity has:
-- Global unique Identifiers
-- Possibly associated documents
Global data Local data
Bill of materials model
a bibo:Journal , gokb:TitleInstance;
rdfs:label ”Academic Pediatrics" ;
bibo:issn ”1878-2859" ;
dcterms:publisher <http://gokb.org/org/Elsevier> ;
The GOKb pipeline
• Gather data
– FTP, feeds, manual entry…
• Normalize format and syntax
– Standard conversion routines
• Refine the content
– Rules engine (now evaluating possibilities)
– Via query APIs, websites, bulk downloads
• An editorial as well as programmatic process
Where does the data come from?
• From publishers and platform hosts
– Bulk data often dirty, needing correction
– Not a one-time process, need updating
• From participating libraries
– Specialized (and open access?) resources
– Corrections and additions (data and rules)
– Imports from JISC’s KB+ database
• From external partners
– via links involving GOKb identifiers
Linked open data
(Image from cafepress.com, which sells the mug at
What can we do with this data?
• Consume it!
• Improve it!
• Extend it?
– How to get to resources? (link resolver data)
– Which resources are open access?
– Which are being preserved?
– What rights apply to resources?
– What are the contents?
– Where can I get free versions of the content?
Some things to think about
• How can you build or configure your local
systems to take advantage of GOKb data?
• How can you help improve the quantity and
quality of data in GOKb?
• What useful new applications can you make
with GOKb data?
• What useful additional data can you link with
• GOKB website: http://gokb.org/
– (right now a blog; will have more info)
• Kuali OLE website: http://www.kuali.org/ole
– (And stick around for Michelle Suranofsky’s talk)
• We’d love to hear about your needs & ideas
– My email: email@example.com