<ul><li>How our catalogues are evolving </li></ul><ul><li>Opening and sharing the data within them </li></ul><ul><li>Ed Ch...
<ul><li>Systems Development Librarian at the other place </li></ul><ul><li>Data ‘munger’ </li></ul><ul><li>Data consumer? ...
<ul><li>Control over data creation </li></ul><ul><li>Control over data consumption </li></ul><ul><li>Control over data env...
 
 
<ul><li>No longer the single authority for content and data </li></ul><ul><li>Commercial, social and academic discovery me...
 
<ul><li>Studies into Google Generation / ‘Generation Y’  1 </li></ul><ul><li>Cambridge Arcadia IRIS report 2009  2 </li></...
<ul><li>So far … </li></ul><ul><li>Evolution of catalogues </li></ul><ul><li>Changes in exposure of data </li></ul><ul><li...
 
<ul><li>Keyword based discovery services </li></ul><ul><li>New ways to exploit old data </li></ul><ul><ul><li>Relevancy ra...
 
<ul><li>Citations </li></ul><ul><li>Abstracts </li></ul><ul><li>Table of Contents </li></ul>
 
<ul><li>Tags </li></ul><ul><li>Public lists </li></ul><ul><li>Reader reviews </li></ul><ul><li>Dramatic growth in access p...
<ul><li>Web scale - resource discovery concept taken further </li></ul><ul><ul><li>Primo Central </li></ul></ul><ul><ul><l...
<ul><li>Catalogue data is now: </li></ul><ul><ul><li>Consumed as keywords (not left anchored) </li></ul></ul><ul><ul><li>F...
 
 
 
Our local catalogues National  /  international aggregations Joe Public Teenage software developer / hacker Booksellers We...
<ul><li>Bibliographic data linked to many aspects of successful teaching and research </li></ul><ul><ul><li>Citation lists...
<ul><li>“ Library catalogues have imposed on them librarian or supplier-made decisions about what can/can’t be searched an...
<ul><li>Success of distributed access outside of cultural heritage </li></ul><ul><li>Single point of discovery? </li></ul>...
<ul><li>Past few years have seen a massive release of public data in government and cultural heritage sectors </li></ul><u...
<ul><li>RLUK and JISC initiative </li></ul><ul><li>Galleries, libraries, archives, museums </li></ul><ul><li>The Discovery...
 
<ul><li>Why not? </li></ul><ul><li>WorldCat has done this for years </li></ul><ul><li>Schema.org microdata– some semantic ...
<h1 itemprop=&quot;name”>The Cambridge companion to Spenser edited by Andrew Hadfield. [electronic resource] /</h1> <span ...
<ul><li>Application Programme Interface (API) </li></ul><ul><li>Layered over LMS </li></ul><ul><li>Expose catalogue data f...
http://www.lib.cam.ac.uk/api/voyager/newtonSearch.cgi?searchArg=darwin&databases=depfacaedb
 
<ul><li>COMET project </li></ul><ul><li>80% of CUL bib records converted to Resource Description Framework (RDF) </li></ul...
Marc21 … 001 1000346 245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D. 1003-1171 / D...
1.  <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> &quot;Early medieval history ...
 
The Linking Open Data cloud diagram -  http://richard.cyganiak.de/2007/10/lod
<ul><li>Wikipedia </li></ul><ul><li>Archives Hub </li></ul><ul><li>British Library BNB </li></ul><ul><li>British Museum </...
<ul><li>More data out there for cataloguers to reuse </li></ul><ul><li>More access points in records </li></ul><ul><li>Bet...
<ul><li>Hard to understand and decode </li></ul><ul><li>Supporting ‘stack’ not up to scratch </li></ul><ul><li>No seriousl...
<ul><li>Initial attempts with RDF </li></ul><ul><li>Newer lightweight formats and databases </li></ul><ul><li>Focus on cit...
<ul><li>If developers are now consumers of our data … </li></ul>
<ul><li>Most Cambridge data could be released under a permissive license (PDDL) </li></ul><ul><li>Europeana Digital Librar...
<ul><li>No one wants OCLC to go under (partners on COMET) </li></ul><ul><li>Valued partners </li></ul><ul><li>Focus on sha...
<ul><li>Based on a 40 year old format </li></ul><ul><li>Based on a need to print a human readable card </li></ul><ul><li>S...
<ul><li>AACR2 / MARC21 uses punctuation to denote content (100$d) </li></ul><ul><li>Mixed fields (text and numbers) (020$a...
<ul><li>Marc21 is binary encoded </li></ul><ul><li>Web-friendly standards are now the norm (XML/JSON)  1 </li></ul><ul><li...
<ul><li>LOC Bibliographic Framework Transition declares a shift away from Marc21 </li></ul><ul><li>Is the delay in introdu...
<ul><li>Steering for RDA and Marc replacement needs non-librarian input or ownership </li></ul><ul><li>Offer from NISO to ...
<ul><li>It becomes (even) easier to go to Amazon </li></ul><ul><li>Our status as authoritative data providers will be (fur...
<ul><li>http://www.discovery.ac.uk - Discovery </li></ul><ul><li>Ncg4lib mailing list </li></ul><ul><li>http://okfn.org - ...
<ul><li>Ed Chamberlain  </li></ul><ul><ul><li>@edchamberlain </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul...
Upcoming SlideShare
Loading in …5
×

Developments in catalogues and data sharing

1,965 views
1,883 views

Published on

A talk given at the Bodleian libaries 'From cataloguing to metadata' event in November 2011
Personal opinions on changing trends in library metadata creation and consumption. Also considers the challenges and rewards associated providing and licensing data for re-use by machines and the people that program them.

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,965
On SlideShare
0
From Embeds
0
Number of Embeds
613
Actions
Shares
0
Downloads
8
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • I’ll look at how our catalogues, and thus the data within them has changed to meet the changing expectation of the user. Cataloguing to provide data that better serve the needs of machines (and the people that program them) Also something of a reflection on the changes that have taken place in the last 10 years …
  • I’m trying to frame the next 40 minutes or so as a narrative
  • When attempting to guess where we are going, it helps if we take a step back 1) To simplify things (a little) Librarians and cataloguers used to have full control of their data and the way it was used (consumed) - We created it (or paid others to do so for us) - Our readers consumed it, in our libraries, served via ledgers, card indexes and OPACs - We had / have policies + standards (AACR2, Marc21) procedures (LOC Authority control, organisation (RLUK, OCLC), technology (Z39.50, OPACS)
  • - We created and largely owned a closed ecosystem for our readers, our data and ourselves and it worked Through this control of production, control of consumption and control of material, total ownership we were successful Closed eco systems can be successful today just look at Apple not a dead concept itself, but I believe that it could not last for libraries …
  • And along with estate agents, travel agents, government, landlords and bookshops we needed to think again … We already had our own networks, but now there was a global one with a rapid pace of chance Whilst Apple could grow a new ecosystem, ours was under threat
  • 2) - We slowly lost our place as a single prime authority - for data - Commercial, social and academic discovery mechanisms Other sources of information for our users to turn to and eventually for content Also had to cope with a growth in digital content - Publishing shift to digital (took as while as journals came first, they were only a small part of our business - analytical cataloguing not standard practice) – this is resulting in massive changes in metadata and discovery usage …
  • Library still in its bubble Alternative discovery mechanisms and academic data &amp; content sources suddenly existed alongside our sealed environment – all very heavily branded, very slick, constantly evolving Some we pay for, some we contribute to, some we view as inferior competition – but they exist – all legitimate means to discover bibliographic material of interest to the researcher or the scholar and they act as a direct alternative to our traditional model All with their own data environments, standards, procedures, protocols – not necessarily ours In light of this I argue that we could not longer maintain the closed ecosystem – to argue as such has become a fallacy, even in the mighty libraries of Oxford and Cambridge with world class special collections
  • In the new environment, come new users termed Generation Y. Generation Y, it is argued have grown up and worked outside of our bubble all along - used to a very different mode of consumption for data and resources They are born between 1984 and 1990. but I would argue the concept can be stretched further, way back, probably anyone who has studied science since the mid to late 1990s … Cambridge Arcadia report 2009 Preference for search engine over catalogue Online over in-building Trust peers of librarian Still respect the library ‘brand’ All of this has lead to a direct and open questioning of the purpose of the academic library – never mind the public one
  • I’ll now very quickly go over how our services and interfaces have responded to this need for change, with points one and two, And what else is to come in points three and four …
  • Lightweight simple things Started small … Libraries gateway Search boxes in website – make it a focus Catalogue pages used to be a single link in a wall of text New approach to online services - Don’t hide things, don’t post rules, ‘ 15 tips from those in the know – it used to be that we guarded our knowledge’. When your library looks like a prison, this is pretty vital
  • Keyword based discovery services Rich faceting Greater linking New ways to OPAC is dead? -it is in your case, and I’m quite jealous… All possible due to richness of data – our authority controlled catalogue records generally work quite well in faceted environments – we gain a competitive edge over folk whose data is not in such good shape Catalogues are easier to pick up, easier to teach and provide a more cohesive experience, even if they don’t always work in the way we as Librarians would always like. Our data is still in use, it is valuable and relevant, partly as a result of these changes in interface And I know this, because when you launched Solo a couple of years ago, some of your undergrads became our post grads and told us what they thought of our interfaces
  • But our data has evolved along with the catalogue Enrichment – made possible by use of identifiers – something we do very well External data can be indexed alongside yours – redefines what a record means – breaks down the concept of it as composite entity Catalogue records co-exist alongside data from other sources –part of a framework of data
  • And its not just supplier data, nowadays, we let readers into our catalogues: Tags Public lists Reader reviews Dramatic growth in access points Input from true subject specialists (i.e. at least those who have read the book) Lack of structure (well, our structure is still there) No quality control Compromise of sanctity? I would argue that in the academic sphere, this is fairly self weeding, few people are likely to ‘deface’ a record On a specific niche subject area. They are also popular, Its expected by users, its useful for them and and we are doing it
  • Web scale -&gt; Resource discovery concept taken further to large centralise indexes In products such as Summon, library catalogue records are taken, merged, modified and stored as part of a central index of over 800 million items. WorldCat local works on these lines, as does Ebsco Discovery service. You use Primo central, its doing the same thing but not holding onto local content … yet These are the catalogue interfaces students will be using to search your records A recent development here lies in full text enrichment from Hathi Trust – records for print material are being boosted with full text where they can. And I’m sure that now they have the technology to do this, Serials Solutions are looking elsewhere for other sources of full text Early days, but a real paradigm shifter. Full text not necessarily a substitute for metadata, but if handled correctly in the keyword based world, it could blow things wide open … Or it could fail dreadfully – but, this development is being explored by major vendors True evolution of the catalogue to the ‘net’
  • Catalogue data now goes through several processes The record you create is not always the record readers will see The way it is searched and accessed Yet we still build it with the same rules and container formats as we did 20 years ago
  • This may not always be to our tastes and practices as librarians, but in terms of reader experience this has provided a dramatic improvement in service quality, as they have a fighting chance of understanding our interfaces if all this works – we are perhaps in better shape – perhaps on the same page as some of the competition, at least for the generation Y, and for a fair few others I believe we are now doing a better job …
  • What comes next is rather unknown … with no map to guide us In management speak, we may be able to meet generation Y’s use cases, but we also need to be ready for the use cases we’ve not yet thought of …
  • (Talk over black screen …) We have to stop and think about what has changed. We’ve lost the bubble around the library. Our old, successful closed data ecosystem has been rendered obsolete or at least severely degraded by competition on the web. As data and service providers, we’ve still had to change and innovate to get back to somewhere on the same page. Much of this change has come from our technology partners, our suppliers and our vendors as well as a growing community of open source software developers in and around libraries. Its been a pretty painful few years and we’ve all had to play a lot of catch-up. To a certain extent, we are still on the back-foot. To cope better in the future, we need to get better at handling change, we need to be faster, quicker, more ready to evolve. Stepping totally away from the closed model, we cannot exist in isolation. As well as vendors and open source providers, we do need external help from outside the library community to prepare and innovate for the future One way to get encourage this help is by finding ways to making our catalogue data easier to share and for others to reuse Even in its new form of a discovery service, The library catalogue is still a silo and still exists as something of a barrier to sharing. And despite the changes in interfaces we’ve gone through, the way we create, own and share data has until now largely carried on as normal … The practices here are as much of a barrier to sharing as the technology around the data Lets look at how most research libraries currently share data …
  • One way to prepare is to open up. We need to share and open up our raw data and to make it easier for others to re-use. I would argue each of these groups has an equal right to our raw data as much as we do, each would have different use cases for it And by and large, in the field of online services, I’m talking about software developers but in many areas Allow others to innovate on our data on our behalf, think of those use cases and explore them.
  • And there is demand. This slide is based on the ideas of a certain Cambridge academic. Bibliographic data linked to many aspects of teaching and research Citation lists – measure output Shared bibliography – core of research group work Reading lists – backbone of undergraduate teaching Quality of data – in terms of consistency and accuracy and form we are much easier to handle than museums and archives All exists already, but not in an open, linked capacity that can be tied quickly and easily into other institutional and external services
  • This is my colleague Katies’ write up of a talk lead by Owen Stephens it really sums it all up …
  • Success of distributed access outside of cultural heritage – Amazon can put a lot of their success down to distributed marketing. When you discover an amazon product, its not necessarily on the Amazon site, but they’ve shared their data in such a way that you can get to their catalogue And single means / point of discovery a myth - our one stop shops are actually our first stop shops, or second stop shops. Unrealistic to expect everyone who may want to access your collections to come to you, to your interfaces and domain And in case we are worried about selling out our ‘IP’ – most of this data was funded by the taxpayer, that includes business and web startups.
  • We are not alone in thinking like this. There is a national and international trend for the public release of data Get ourselves into this domain
  • This is recognised nationally by the JISC, who earlier this year launched the discovery initiative Oxford text archive contributed a project, we did with catalogue data and they are funding some very exciting work …
  • Search engines – we’ve open our Aquabrowser catalogue up to web crawlers Argument in the past has always been why should we bother, I would argue why should we withhold? We&apos;ve actually been here for a while (Worldcat) - no-one is using it (or are they), Used semantic tags in HTML to indicate some structure, author, title, format, availability Still a commercial application designed from advertising - Not great fit Schema data - getting better What is the use case? – Google is designed to sell … For what its worth, Google have only taken 10% So perhaps the primary reason for doing this was to shut people up! I think there are better ways to share data than simply letting the spiders in …
  • All semantic web talks include this – it grows every year, will be interesting to see if growth is sustained.
  • More access points to records Better mechanisms for record enrichment Revised cataloguing workflows – imagine LOC subject and name authority entries that simply update themselves Access to developers
  • Hard to understand and decode Supporting ‘stack’ not up to scratch No seriously compelling use case (yet) Other ways to provide linked data
  • Whatever the format, there are common challenges in getting our data to them … Now some of the challenges involved in getting there
  • For this to work, we need to lift the legal barriers to sharing data, make it public, make it open, and open as defined by the wider Internet community Tended to have a lot of restrictions on record re-sharing in the past, there has been a lot of movement on this area in the past two years Most Cambridge data could be released under a permissive license Europeana Digital Library approve CC0 licensing of data OCLC looking at attribution only licensing British Library BNB – Creative Commons ‘Zero’ Move away from ‘non-commercial’ wording
  • No one involved in the library open data movement wants OCLC to go under Record vendors are valued partners when you have truck loads of legal deposit turning up every week Focus on sharing ‘non-marc21’ formats Of greater use to the non-Librarian I think we are seeing a shift in the way we operate
  • Based on a 40 year old format Based on a need to print a human readable card Syntax, vocabulary, fields and content all intertwined According to OCLC Research : Only 10% of all Marc tags in Worldcat appear in 100% of all Worldcat records 65% of tags appear in less that 1% of records.
  • Marc – data rich, structurally poor. If you as cataloguers agonise about where to put punctuation, we developers and hackers agonise about taking it out. Its waste. We are not printing out card indexes anymore, so why use formats and standards designed for that purpose? Provide true granular data and let the interface render it depending on the rules .. Very difficult to map to RDF and emerging standards – that $d in particular, especially with you use bs, ds, and cs … Mixed fields (text and numbers) (020$a) Duplication author name 100 and 245$c ? format 100 record fields?
  • Marc21 is binary encoded – hard to crack, needs specialised code libraries that few software developers are willing to learn or support Most developers know how to deal with XML / JSON and are happy with it Also, all that bs, ds, and cs? Its really hard to understand and remember. It’s a dark art. To understand easily data needs to be human readable - Numbers for field names, a whole website dedicated to explaining them To learn to code with library data, you technically need to learn to catalogue – an artificial barrier Bad encoding allowed by Marc - can crash whole systems when imported – XML would stop this
  • LOC Bibliographic Framework Transition declares a shift away from Marc. Delay in introduction of RDA until we get a ‘better container’ No system vendor is going forward with Marc21 as the internal storage mechanism in their next generation of systems. They may allow you to write records in it. Will take 10+ years What is to come next?
  • So despite the change its my worry that those in charge of Marc21 and RDA developments are not thinking widely enough about the new open ecosystem in which our data must inhabit
  • If we don’t try and shift … It becomes easier to go to Amazon – who have awesome API’s Or even Google books (theirs are rubbish) Our status as an authority of data providers will be further eroded No-one will want to play with us if we do not share
  • Developments in catalogues and data sharing

    1. 1. <ul><li>How our catalogues are evolving </li></ul><ul><li>Opening and sharing the data within them </li></ul><ul><li>Ed Chamberlain </li></ul><ul><li>Systems Development Librarian – Cambridge University Library </li></ul>
    2. 2. <ul><li>Systems Development Librarian at the other place </li></ul><ul><li>Data ‘munger’ </li></ul><ul><li>Data consumer? </li></ul>
    3. 3. <ul><li>Control over data creation </li></ul><ul><li>Control over data consumption </li></ul><ul><li>Control over data environment </li></ul><ul><li>Control over data technology </li></ul>
    4. 6. <ul><li>No longer the single authority for content and data </li></ul><ul><li>Commercial, social and academic discovery mechanisms </li></ul><ul><li>Explosion of digital content </li></ul><ul><li>Illusion of ‘all on the web’ </li></ul>
    5. 8. <ul><li>Studies into Google Generation / ‘Generation Y’ 1 </li></ul><ul><li>Cambridge Arcadia IRIS report 2009 2 </li></ul><ul><ul><li>Preference for search engine over catalogue </li></ul></ul><ul><ul><li>Online over in-building </li></ul></ul><ul><ul><li>Trust tutors and peers over Librarian </li></ul></ul><ul><ul><li>Still respect the library ‘brand’ </li></ul></ul><ul><ul><li>1) ”The Google generation: the information behaviour of the researcher of the future” </li></ul></ul><ul><ul><li>Aslib Proceedings, V60, issue 4 10.1108/00012530810887953 </li></ul></ul><ul><ul><li>2) Arcadia IRIS Project report - http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf </li></ul></ul>
    6. 9. <ul><li>So far … </li></ul><ul><li>Evolution of catalogues </li></ul><ul><li>Changes in exposure of data </li></ul><ul><li>To come? </li></ul><ul><li>Greater sharing of data </li></ul><ul><li>Library data used in non-library environments </li></ul>
    7. 11. <ul><li>Keyword based discovery services </li></ul><ul><li>New ways to exploit old data </li></ul><ul><ul><li>Relevancy ranking </li></ul></ul><ul><ul><li>Rich faceting </li></ul></ul><ul><ul><li>Greater linking </li></ul></ul><ul><ul><li>Search is the new browse </li></ul></ul><ul><ul><li>Repositories and archives </li></ul></ul><ul><li>Is the OPAC dead? </li></ul>
    8. 13. <ul><li>Citations </li></ul><ul><li>Abstracts </li></ul><ul><li>Table of Contents </li></ul>
    9. 15. <ul><li>Tags </li></ul><ul><li>Public lists </li></ul><ul><li>Reader reviews </li></ul><ul><li>Dramatic growth in access points </li></ul><ul><li>Input from true subject specialists </li></ul><ul><li>Lack of structure </li></ul><ul><li>No quality control </li></ul><ul><li>Compromise of sanctity? </li></ul>
    10. 16. <ul><li>Web scale - resource discovery concept taken further </li></ul><ul><ul><li>Primo Central </li></ul></ul><ul><ul><li>Summon </li></ul></ul><ul><ul><li>Ebsco Discovery </li></ul></ul><ul><ul><li>Worldcat local </li></ul></ul><ul><li>Hathi trust data can be used for full text searching of print collections </li></ul>
    11. 17. <ul><li>Catalogue data is now: </li></ul><ul><ul><li>Consumed as keywords (not left anchored) </li></ul></ul><ul><ul><li>Facted (not browsed) </li></ul></ul><ul><ul><li>Supplemented </li></ul></ul><ul><ul><li>Transformed </li></ul></ul><ul><ul><li>Merged </li></ul></ul><ul><ul><li>Amalgamated </li></ul></ul>
    12. 21. Our local catalogues National / international aggregations Joe Public Teenage software developer / hacker Booksellers Web start-ups Search engines Wikipedia Other libraries Research group website
    13. 22. <ul><li>Bibliographic data linked to many aspects of successful teaching and research </li></ul><ul><ul><li>Citation lists – measure output </li></ul></ul><ul><ul><li>Shared bibliography – core of research group work </li></ul></ul><ul><ul><li>Reading lists – backbone of undergraduate teaching </li></ul></ul><ul><ul><li>High quality data needed for re-use </li></ul></ul><ul><li>Not all possible whilst data resides in the library ‘silo’ </li></ul>
    14. 23. <ul><li>“ Library catalogues have imposed on them librarian or supplier-made decisions about what can/can’t be searched and in what way.  Some of these decisions are limited by current cataloguing rules, but not all; often the data is recorded, but not in a usable way, or is there but isn’t tapped by the interface.  For example, in most catalogues you can limit by publication type to newspapers, but you can’t limit by frequency of the issues.” </li></ul><ul><li>“ Releasing data means that people can start to use it in the way they want to.” </li></ul>
    15. 24. <ul><li>Success of distributed access outside of cultural heritage </li></ul><ul><li>Single point of discovery? </li></ul><ul><li>Taxpayer generated – give it back! </li></ul>Why not share?
    16. 25. <ul><li>Past few years have seen a massive release of public data in government and cultural heritage sectors </li></ul><ul><ul><li>Open Government Data - http://data.gov.uk </li></ul></ul><ul><ul><li>Open Knowledge Foundation - http://okfn.org </li></ul></ul><ul><li>EU Commission mandate to open data </li></ul><ul><li>Shared in ways for easy reuse and linking </li></ul>
    17. 26. <ul><li>RLUK and JISC initiative </li></ul><ul><li>Galleries, libraries, archives, museums </li></ul><ul><li>The Discovery principles propose that: </li></ul><ul><ul><li>' Open metadata creates the opportunity for enhancing impact through the release of descriptive data about library, archival and museum resources. It allows such data to be made freely available and innovatively reused to serve researchers, teachers, students, service providers and the wider community in the UK and internationally.' </li></ul></ul>http://discovery.ac.uk
    18. 28. <ul><li>Why not? </li></ul><ul><li>WorldCat has done this for years </li></ul><ul><li>Schema.org microdata– some semantic structure </li></ul><ul><li>Use case for catalogue data in an advertising environment? </li></ul><ul><li>Google taken 10% (so far) </li></ul>
    19. 29. <h1 itemprop=&quot;name”>The Cambridge companion to Spenser edited by Andrew Hadfield. [electronic resource] /</h1> <span style=&quot;display: none;&quot; itemprop=&quot;publisher&quot;>Cambridge University Press,</span> <span style=&quot;display: none;&quot; itemprop=&quot;datePublished&quot;>2001.</span>
    20. 30. <ul><li>Application Programme Interface (API) </li></ul><ul><li>Layered over LMS </li></ul><ul><li>Expose catalogue data feeds for developers </li></ul><ul><li>Anyone can use them </li></ul><ul><li>Simple request, simple response </li></ul><ul><li>http://www.lib.cam.ac.uk/api </li></ul>
    21. 31. http://www.lib.cam.ac.uk/api/voyager/newtonSearch.cgi?searchArg=darwin&databases=depfacaedb
    22. 33. <ul><li>COMET project </li></ul><ul><li>80% of CUL bib records converted to Resource Description Framework (RDF) </li></ul><ul><li>Enriched with direct links to the Library of Congress </li></ul><ul><li>Vocab in-line with British Library work </li></ul><ul><li>OCLC FAST and VIAF authority sources </li></ul><ul><li>http://data.lib.cam.ac.uk </li></ul>
    23. 34. Marc21 … 001 1000346 245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D. 1003-1171 / DC XML … <dc:identifer>1000346</dc:identifer> <dc:title>Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171</dc:title> RDF triples … <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> &quot;Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171&quot;
    24. 35. 1. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> &quot;Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171&quot; . 2. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/1cb251ec0d568de6a929b520c4aed8d1> . 3. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b5670335d> . 4. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/identifier> &quot;UkCU1000346&quot; . 5. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/issued> &quot;1981&quot; . 6. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> . 7. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/language> <http://id.loc.gov/vocabulary/iso639-2/eng> . 8. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://RDVocab.info/ElementsplaceOfPublication> <http://id.loc.gov/vocabulary/countries/ii>
    25. 37. The Linking Open Data cloud diagram - http://richard.cyganiak.de/2007/10/lod
    26. 38. <ul><li>Wikipedia </li></ul><ul><li>Archives Hub </li></ul><ul><li>British Library BNB </li></ul><ul><li>British Museum </li></ul><ul><li>Library of Congress </li></ul><ul><li>LOD at Bibliothèque nationale de France </li></ul><ul><li>BBC Nature </li></ul><ul><li>University of Southampton </li></ul><ul><li>Open University </li></ul>
    27. 39. <ul><li>More data out there for cataloguers to reuse </li></ul><ul><li>More access points in records </li></ul><ul><li>Better mechanisms for record enrichment </li></ul><ul><li>Scope for revised cataloguing workflows </li></ul><ul><li>Records have a permanent identity on the web </li></ul>
    28. 40. <ul><li>Hard to understand and decode </li></ul><ul><li>Supporting ‘stack’ not up to scratch </li></ul><ul><li>No seriously compelling use case (yet) </li></ul><ul><li>Other ways to provide linked data </li></ul><ul><li>Use URIs for people and things </li></ul>
    29. 41. <ul><li>Initial attempts with RDF </li></ul><ul><li>Newer lightweight formats and databases </li></ul><ul><li>Focus on citation metadata for the sciences </li></ul><ul><li>New ways for scientists to share and work with bibliography </li></ul><ul><li>http://openbiblio.net/ </li></ul><ul><li>http://openbiblio.net/principles/ </li></ul>
    30. 42. <ul><li>If developers are now consumers of our data … </li></ul>
    31. 43. <ul><li>Most Cambridge data could be released under a permissive license (PDDL) </li></ul><ul><li>Europeana Digital Library approve Creative Commons ‘Zero’ licensing of data </li></ul><ul><li>British Library BNB – Creative Commons ‘Zero’ </li></ul><ul><li>OCLC looking at attribution only licensing </li></ul><ul><li>Move away from ‘non-commercial’ wording </li></ul>Open Data Commons Public Domain Dedication and License (PDDL)
    32. 44. <ul><li>No one wants OCLC to go under (partners on COMET) </li></ul><ul><li>Valued partners </li></ul><ul><li>Focus on sharing ‘non-marc21’ formats of greater use to the non-Librarian </li></ul><ul><li>Vendors aim to profit from services based on data rather than data for its own sake? </li></ul>
    33. 45. <ul><li>Based on a 40 year old format </li></ul><ul><li>Based on a need to print a human readable card </li></ul><ul><li>Syntax, vocabulary, field names and content all intertwined </li></ul><ul><li>According to OCLC Research : </li></ul><ul><ul><li>Only 10% of all Marc tags in Worldcat appear in 100% of all Worldcat records </li></ul></ul><ul><ul><li>65% of tags appear in less that 1% of records. </li></ul></ul>
    34. 46. <ul><li>AACR2 / MARC21 uses punctuation to denote content (100$d) </li></ul><ul><li>Mixed fields (text and numbers) (020$a) </li></ul><ul><li>Duplication </li></ul><ul><ul><li>author name </li></ul></ul><ul><ul><li>format </li></ul></ul><ul><ul><li>One hundred notes fields (or close enough) ? </li></ul></ul>df100$aBradford, Gamaliel$d1863 - 1932. <authorParsed> <surname>Bradford</surname> <restOfName> Gamaliel</restOfName> <birthDate>1863</birthDate> <birthDateNormalised>18630101</birthDateNormalised> <deathDate>1932</deathDate> <deathDateNormalised>19320101</deathDateNormalised> </authorParsed>
    35. 47. <ul><li>Marc21 is binary encoded </li></ul><ul><li>Web-friendly standards are now the norm (XML/JSON) 1 </li></ul><ul><li>Numbers for field names? </li></ul><ul><li>Bad character encoding allowed </li></ul>
    36. 48. <ul><li>LOC Bibliographic Framework Transition declares a shift away from Marc21 </li></ul><ul><li>Is the delay in introduction of RDA until we get a ‘better container’ ? </li></ul><ul><li>No system vendor is going forward with Marc21 </li></ul><ul><li>Will take 10+ years </li></ul><ul><li>What is to come next? </li></ul>
    37. 49. <ul><li>Steering for RDA and Marc replacement needs non-librarian input or ownership </li></ul><ul><li>Offer from NISO to take the work on </li></ul>Karen Coyle criticises the Marc21 Bibliographic Framework Transition Initiative for not including museums, publishing, and IT professionals … She argues that our data is not just for us to consume alone … “ The next data carrier for libraries needs to be developed as a truly open effort. It should be led by a neutral organization (possibly ad hoc) that can bring together  the wide range of interested parties and make sure that all voices are heard. Technical development should be done by computer professionals with expertise in metadata design. The resulting system should be rigorous yet flexible enough to allow growth and specialization.” http://kcoyle.blogspot.com/2011/08/bibliographic-framework-transition.html
    38. 50. <ul><li>It becomes (even) easier to go to Amazon </li></ul><ul><li>Our status as authoritative data providers will be (further) eroded </li></ul><ul><li>No-one will want to play with us if we cannot learn to share </li></ul>
    39. 51. <ul><li>http://www.discovery.ac.uk - Discovery </li></ul><ul><li>Ncg4lib mailing list </li></ul><ul><li>http://okfn.org - Open Knowledge Foundation </li></ul><ul><li>http://data.lib.cam.ac.uk </li></ul>
    40. 52. <ul><li>Ed Chamberlain </li></ul><ul><ul><li>@edchamberlain </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>http://www.slideshare.net/EdmundChamberlain/ </li></ul></ul>

    ×