OMG! MY METADATA IS AS FRESH AS THE BACKSTREET BOYS: HOW GOOGLE REFINE CAN UPDATE, CLEAN UP ANDLINK YOUR METADATA TO THE WIDER WORLD SARAH BETH WEEKS LIBRARY TECHNOLOGY CONFERENCE 2013 WEEKSS@STOLAF.EDU @RASCALWHALE
SAMPLE PROJECT: NORDIC AMERICAN IMPRINTSSituation: Wanted to match publishers of our books against alist of important Nordic American Publishers (compiled by PennyHuf fman) to find materials for our special collections.Problem: Hard to compare when publication info is notcontrolled:
ANSWER: GOOGLE REFINE!Google Refine can “match and merge” messy data filled with: Random, leading or trailing spaces stray punctuation typos odd capitalization and more!
END RESULT? Using Google Refine we were able to reduce the 3230 unique values for city (260|a) to just 1153. For publishers (260|b) we went from 11342 unique names for publishers to approximately 6500. This project helped to identify over 2,000 potential candidates for our Nordic American Imprints collection. (These are still being evaluated). The controlled publishers, cities of publications and dates will be added to a local 9xx field for faceting in our future special collections discover tool. Users will be able to browse our Nordic American Imprints collection by publisher, city or state.
THIS NEW DATA IS NOW ADDED TO YOUR SPREADSHEET
TO SEE WHAT COLUMNS (DATA) YOU CAN ADD FROM FREEBASE:Browse the properties at: http://schemas.freebaseapps.com /
MATCH LOCAL SUBJECT HEADING TO LC (FREEYOURMETADATA.ORG)
SPARQL ENDPOINTS Install the RDF Extension for Google Refine http://refine.deri.ie/ SPARQL Endpoints http://labs.mondeca.com/sparqlEndpointsStatus/index.html CKAN Data Hub: http://datahub.io/dataset/