Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Text to data

1,026 views

Published on

talk/ rant about Marc21 derived metadata

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Text to data

  1. 1. + Text to data MashCat 2012 Ed Chamberlain
  2. 2. + Me  Librarian (systems)  Data ‘munger’  Data consumer?
  3. 3. + The way it used to be …  Control over record consumption  Control over record environment  Control over technology
  4. 4. +
  5. 5. + Competition … No longer the single authority for content and description Commercial, social and academic discovery mechanisms Explosion of digital content  Illusion of ‘all on the web’
  6. 6. + Fit for purpose?  Studies into Google Generation / ‘Generation Y’ 1  Cambridge Arcadia IRIS report 2009 2  Preference for search engine over catalogue  Online over in-building  Trust tutors and peers over Librarian 1) ”The Google generation: the information behaviour of the researcher of the future”  Still respect the library ‘brand’ Aslib Proceedings, V60, issue 4 10.1108/00012530810887953 2) Arcadia IRIS Project report - http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf
  7. 7. + Improve catalogues  Keyword based discovery services  New ways to exploit old data  Relevancy ranking  Rich faceting  Greater linking  Search is the new browse  Repositories and archives  Is the OPAC dead?
  8. 8. + Different but the same? Catalogue data is now:  Consumed as keywords (not left anchored access points)  Faceted (not browsed)  Supplemented  Transformed  Merged  Amalgamated
  9. 9. + Prepare for the future …  „Use case you‟ve not yet thought of‟  „Consumer as producer‟  „Pro-Am‟  „Free from silo‟  Developers as well as readers  Preference for data over text
  10. 10. + Our local catalogues Research group website Wikipedia Web start-ups National / international aggregations Joe Public Library data Search engines Other Booksellers libraries Teenage software developer / hacker
  11. 11. + Libraries have a lot to offer  Bibliographic data linked to many aspects of successful teaching and research  Citation lists – measure output  Shared bibliography – core of research group work  Reading lists – backbone of undergraduate teaching  High quality data needed for re- use  Not all possible whilst data resides in the library ‘silo’
  12. 12. +  Open metadata creates the opportunity for enhancing impact through the release of descriptive data about library, archival and museum resources. It allows such data to be made freely available and innovatively reused to serve researchers, teachers, students, service providers and the wider community in the UK and internationally. http://discovery.ac.uk
  13. 13. + Open data releases …
  14. 14. + But …  Is Marc21 the right format for developers (or libraries?)  Is it easy to convert into something more palatable?
  15. 15. + What can we do with an ISBN?  Build Union catalogues  Find existing or alternative records (copy catalogue)  Find related works (XISBN, ISBNThing)  Match and mash with resources on the web:  Images  Reviews  Citations and references
  16. 16. + 020 - ISBN What cataloguer record users What data consumers want: want:  Accuracy – Accuracy  Contextualization – Contextualization  Access point – Access point  Something legible to read – Reusability – Granularity
  17. 17. + So …  Take ISBN from an 020$a  my $isbn = $record->field(020)->as_string("a");  0123456789(pbk)  (pbk) ?  Is it the same as (.pbk) I noticed earlier?  I‟m a developer – I can solve this …  Regex /^[0-9]+$/ - just gets numbers …  Oh hang on, don‟t some ISBNS end in X?  And all that information on hardback /paperback is lost …
  18. 18. + Non Marc …  <identifier type=“isbn” relation=“hardback”>0123456789x</isbn>  identifier: {"id": "0123456789", "type": "isbn”, “rel”:”hardback”}  <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_100045> <http://purl.org/dc/terms/identifier >"urn:isbn:2853990060" .<http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b56 70335d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> http://purl.org/ontology/bibo/Book.
  19. 19. + Advantages  Self describing (if you read English)  Granular  Data NOT text for display (although this can be easily generated)
  20. 20. + $100 … • 1001_ |a Greenwood, James, |d 1832-1929. • Greenwood, James, 1832-1929. "author" : [ { "birthDate" : "1832", "firstname" : " James", "deathDate" : "1929", "name" : "Greenwood, James", "lastname" : "Greenwood" } ]
  21. 21. my @exportAuthors=(); my @authors =();+ my $eachAuthor =; if ($record->field(100)) { @authors = $record->field(100); foreach $eachAuthor(@authors) { my %exportAuthor =(); my $authorFull = trim($eachAuthor->subfield(a)); $exportAuthor{name} = $authorFull; my @parsed_author=split(/,/, $authorFull); $exportAuthor{lastname} = $parsed_author[0]; $exportAuthor{firstname} = $parsed_author[1]; my $dates = $eachAuthor->subfield(d); my ($birthDate,$deathDate); # The glorious 100$d disassembled ... if ($dates) { #first of all, get rid of ca. and fl. which arent real birth or death dates if ($dates=~/fl.|ca./){ #do nothing } #otherwise, if date contains a hyphen, assume range #but fix also works for unterminated dates? elsif ($dates=~/-/) { my @dates=split(/-/,$dates); $exportAuthor{birthDate} = trim($dates[0]); if ($dates[1]) { $exportAuthor{deathDate} = trim($dates[1]); } #No Hyphen - assume single date - look for definitive birth event with a d ... } elsif ($dates=~/b./) { $exportAuthor{birthDate} = trim($dates[0]); # - look for definitive death event with a d ... } elsif ($dates=~/d./) { $exportAuthor{deathDate} = trim($dates[0]); # Final assumption for authors with recorded dates but with single date no hyphen. Assume its a birthdate? } else { $exportAuthor{birthDate} = trim($dates[0]); } # produce output for dates ... } # Assemble author object push(@exportAuthors,%exportAuthor); # End author loop } # Add list of authors to export object $exportRecord{author} = @exportAuthors; }
  22. 22. + How is this being solved?  Fix it at the source:  RDA  Marc transition initiative  Other initiatives – BL, OCLC linked data releases  Onyx  Mods
  23. 23. + Pragmatism: the end of big standards  Adoption of one new standard (or several) for its own sake is pointless  Fit in around changing needs of libraries and systems  Data needs to be flexible and re-purposable  No standard to „rule them all‟ in the post Marc21 world
  24. 24. + If we do nothing?

×