Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Establishing the Connection: Creating a Linked Data Version of the BNB


Published on

Presentation for Talis Linked Data in Libraries event July 14 2011

Describes some of the choices made and lessons learned in migrating from traditional bibliographic metadata to linked open data.

Published in: Education
  • A very useful presentation about making a major bibliographic dataset into Linked Data.

    Very open about both the benefits and the challenges.

    From this I get the challenges as
    * decreasing resources
    * address multiple constituents (traditional libraries, researchers wanting to data mine catalogs, linked data developers & users)
    * licensing (get attribution while allowing wide reuse)
    * collaborate with the user communities to determine appropriate cross-domain formats
    * existing staff (librarians rather than IT experts), data, and hardware
    * new ways of thinking
    * 'legacy data wasn't designed for this purpose so starting can be problematic'
    * multiple options
    * need careful thought for data modelling, sustainability
    * need to be responsive to technical criticism
    * steep learning curve
    * iterative -- must be willing to make & learn from mistakes

    from slides 2, 8, 20, 21, 22 & 23
    Are you sure you want to  Yes  No
    Your message goes here

Establishing the Connection: Creating a Linked Data Version of the BNB

  1. 1. Establishing the Connection: Creating a Linked Data Version of the BNB Neil Wilson Head of Metadata Services
  2. 2. Changing Expectations Public Sector Metadata <ul><li>The Web has accelerated development of a collaboration culture & fostered expectations that information & content should be as freely available as the Internet itself </li></ul><ul><li>Many wider benefit arguments have been advanced for public bodies to make their data freely available </li></ul><ul><li>2009 saw an increasing Government commitment to the principle of opening up public data for wider re-use. </li></ul><ul><li>The “Putting the Frontline First: Smarter Government” report required “ the majority of government-published information to be reusable, linked data by June 2011” </li></ul>
  3. 3. Developing an Open Metadata Strategy Choices and Challenges <ul><li>When developing an open metadata strategy we wanted to: </li></ul><ul><li>Try and break away from library specific formats e.g. MARC and use more cross domain XML based standards e.g . DC, RDF etc </li></ul><ul><li>Develop the new formats with communities using the metadata </li></ul><ul><li>Get some form of attribution while also adopting a licensing model appropriate to the widest re-use of the metadata </li></ul><ul><li>Adopt a multi track approach addressing the needs of: </li></ul><ul><ul><li>Traditional libraries </li></ul></ul><ul><ul><li>Researchers wanting to ‘data mine’ catalogues </li></ul></ul><ul><ul><li>& new linked data developers & users </li></ul></ul><ul><li>… And deliver the above with decreasing resources </li></ul>
  4. 4. First Steps Toward An Open Metadata Strategy During 2010 We… <ul><li>Developed a capability to supply metadata using RDF/XML standards used in the wider web community </li></ul><ul><li>Conducted trials with a range of new users including: the UK Intellectual Property Office & UNESCO </li></ul><ul><li>Developed a free Z39.50 MARC record download service for libraries to assist with derived cataloguing etc </li></ul><ul><li>Hosted a linked data workshop with 40 representatives from key international organisations </li></ul>
  5. 5. Current Status Since August 2010 We Have: <ul><li>Created a new email enquiry point for BL metadata issues: [email_address] </li></ul><ul><li>Signed up nearly 400 organisations worldwide to the free MARC21 Z39.50 service </li></ul><ul><li>Worked with JISC, Talis & other linked data implementers on technical challenges, standards & licensing issues </li></ul><ul><li>Begun to offer sets of RDF/XML metadata under a Creative Commons 0 (CC0) license </li></ul><ul><li>Supplied multi-million record sets to organisations including: the Open Bibliography Project, the Open Library & Wikimedia Commons </li></ul>
  6. 6. Library Metadata & The Promise of Linked Data <ul><li>Traditional library metadata uses a self contained, proprietary document based model </li></ul><ul><li>The Semantic Web uses a more dynamic data based model to establish relationships between data elements via links </li></ul><ul><li>By migrating from traditional models libraries could begin to: </li></ul><ul><ul><li>Integrate their resources in the web, increasing visibility & reaching new users </li></ul></ul><ul><ul><li>Offer users a richer resource discovery experience </li></ul></ul><ul><ul><li>Transition from costly specialist technologies & suppliers & widen their choice of options </li></ul></ul><ul><li>Open Standards </li></ul><ul><li>Dynamic/Reactive </li></ul><ul><li>Links to external resources </li></ul><ul><li>Micro Portal - Interacts with users & systems in response to queries </li></ul><ul><li>Offers options for further inquiry </li></ul><ul><li>Proprietary, library specific standards </li></ul><ul><li>Passive </li></ul><ul><li>Self contained </li></ul><ul><li>Linear text -‘Read’ by users as result of database query </li></ul><ul><li>Offers end result </li></ul>‘ Semantic’ Metadata Properties Traditional Library Metadata Properties
  7. 7. Our Linked Data Journey… What to Offer? <ul><li>Wanted to offer data allowing useful experimentation & advancing discussions from theory to practice </li></ul><ul><li>Why BNB? </li></ul><ul><li>General database of published output and not an institutional catalogue of unique items </li></ul><ul><li>Mass produced works on all subjects, many with internationally recognised identifiers e.g. ISBN </li></ul><ul><li>Reasonably uniform format across 60 years of publication </li></ul><ul><li>Significant amount of data – 3 million records in various languages </li></ul>
  8. 8. Our Linked Data Journey… What do we need to get there? <ul><li>Wanted to undertake the work as an extension of existing activities and as an opportunity to develop expertise using: </li></ul><ul><li>Existing staff – librarians rather than IT experts </li></ul><ul><li>As many pre-existing tools or technologies as possible </li></ul><ul><li>Standard PC hardware for conversion </li></ul><ul><li>Library MARC21 data as a starting point </li></ul><ul><li>Established linked data resources to connect to </li></ul><ul><li>A proven platform that would enable us to concentrate on the data issues </li></ul>
  9. 9. Our Linked Data Journey… First stage: How To Migrate the Metadata? <ul><li>From a flat catalogue card model to something more appropriate… </li></ul><ul><li>Preliminaries: </li></ul><ul><ul><ul><li>Staff training in linked data modelling concepts & increased familiarisation with RDF & XML concepts </li></ul></ul></ul><ul><ul><ul><li>Experience of working with: JISC Open Bibliography Project & Others </li></ul></ul></ul><ul><ul><ul><li>Feedback on initial MARC to XML conversion work </li></ul></ul></ul><ul><li>Incremental approach adopted </li></ul><ul><ul><ul><li>Open Data License </li></ul></ul></ul><ul><ul><ul><li>RDF/XML Format </li></ul></ul></ul><ul><ul><ul><li>Add External Links </li></ul></ul></ul><ul><ul><ul><li>Re-model </li></ul></ul></ul><ul><ul><ul><li>Create Linked Data </li></ul></ul></ul>
  10. 10. Our Linked Data Journey… Second stage: Selecting trusted resources to link to <ul><li>To begin placing library data in a wider context & supplement or replace literal values in records </li></ul><ul><li>Looked for library sites: </li></ul><ul><ul><ul><ul><ul><li>Dewey Info </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>LCSH SKOS </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>VIAF </li></ul></ul></ul></ul></ul><ul><li>Plus more general sites: </li></ul><ul><ul><ul><ul><ul><li>GeoNames </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Lexvo </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>RDF Book Mashup </li></ul></ul></ul></ul></ul>
  11. 11. Our Linked Data Journey… Third Stage: Matching and Generating Links <ul><li>Three main approaches used: </li></ul><ul><li>Automatic Generation of URIs from elements in records e.g. DDC </li></ul><ul><li>Matching of text in records with linked data dumps e.g. personal names to VIAF & subjects to LCSH to identify URIs </li></ul><ul><li>Two stage crosswalk/matching process for some coded information e.g. MARC country & language codes for GeoNames </li></ul>
  12. 12. Our Linked Data Journey… MARC to RDF Conversion Workflow 1) Selection In-house utilities / MARC Report Exclusions (CIP; multiparts; serials) <ul><li>2) Pre-processing </li></ul><ul><ul><li>MARC Global </li></ul></ul><ul><ul><li>Normalise data values, </li></ul></ul><ul><ul><li>Remove trailing punctuation </li></ul></ul><ul><ul><li>Move/copy data values to </li></ul></ul><ul><ul><li>improve machine matching/transformation </li></ul></ul><ul><li>3) Character set conversion </li></ul><ul><li>In-house utilities </li></ul><ul><ul><li>Decomposed UTF-8 converted </li></ul></ul><ul><ul><li>to precomposed for conformance </li></ul></ul><ul><ul><li>with W3C recommendations </li></ul></ul>4) URI creation In-house utilities Create BL URIs in MARC fields) Harvest URIs from external sources 5) Data Transformation MARC Report & MARC 21/RDF XSLT Convert to RDF & Insert URI prefixes <ul><li>MARC to RDF Conversion Consists of multiple </li></ul><ul><li>automated steps using </li></ul><ul><li>a range of tools </li></ul>
  13. 13. Our Linked Data Journey… MARC to RDF Conversion Workflow
  14. 14. Our Linked Data Journey… Which took us from here...
  15. 15. Our Linked Data Journey… Via here...
  16. 16. Our Linked Data Journey… To here...
  17. 17. Preview Options <ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul>. <ul><li>Includes: BNB Books 2005-11 </li></ul><ul><li>485,000 records </li></ul><ul><li>18,000,000 RDF Triples </li></ul>
  18. 18. Sample ‘Labelled Concise Bound Description ’
  19. 19. Our Linked Data Journey… Journey’s End…Point? <ul><li>Preview Details at: </li></ul><ul><li>http:// </li></ul><ul><li>Roadmap for next steps includes: </li></ul><ul><ul><ul><li>Staged release over coming months for: books, serials, multi-parts etc </li></ul></ul></ul><ul><ul><ul><li>Aiming to update on a monthly basis once complete </li></ul></ul></ul><ul><ul><ul><li>Documentation & further refinement of data model </li></ul></ul></ul><ul><ul><ul><li>Looking at RDF triple dump option </li></ul></ul></ul><ul><ul><ul><li>What else might be offered? </li></ul></ul></ul>
  20. 20. Lessons Learned on the Journey General <ul><li>It is a new way of thinking </li></ul><ul><li>Legacy data wasn’t designed for this purpose so starting can be problematic </li></ul><ul><li>There are many opinions…but few real certainties Everyone is learning & multiple solutions exist so you may be the best judge </li></ul><ul><li>Don’t reinvent the wheel...there are often tools or experience you can use. Start simple & develop in line with evolving staff expertise </li></ul><ul><li>Give careful thought to data modelling & sustainability issues e.g. </li></ul><ul><ul><ul><li>Where possible use cross domain standards e.g. ISO codes in data </li></ul></ul></ul><ul><ul><ul><li>Select relevant & stable targets when providing links if you are doing so </li></ul></ul></ul>
  21. 21. Lessons Learned on the Journey Data Issues <ul><li>Reality check by offering samples for feedback to wider groups </li></ul><ul><li>Be prepared for some technical criticism in addition to positive feedback & try to continually improve in response </li></ul><ul><li>Conversion inevitably identifies hidden data issues…& creates new ones! </li></ul><ul><li>… But it’s often better to release an imperfect something than a perfect nothing! </li></ul>
  22. 22. Lessons Learned Along The Way Staff and Resource Issues <ul><li>It can be a steep learning curve so: </li></ul><ul><li>Look for training opportunities to develop staff skills to support new open metadata standards </li></ul><ul><li>Cultivate a culture of enquiry & innovation among staff to widen perspectives on new possibilities </li></ul><ul><li>Look into collaborative pilot projects with peer organisations to share resources & expertise </li></ul><ul><li>See what tools are already out there that can save you development time or assist in checking data </li></ul>
  23. 23. Final Thoughts… For Others Contemplating a Similar Journey <ul><li>It’s never going to be perfect first time </li></ul><ul><li>We expect to make mistakes </li></ul><ul><li>We aim to learn from them </li></ul><ul><li>We hope others will learn something too </li></ul><ul><li>… and that everyone benefits from the experience </li></ul><ul><li>So if anyone is thinking of undertaking a similar journey….. </li></ul><ul><li>Just do it! </li></ul>
  24. 24. Any Questions…? Images from