Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tools for Data Manipulation - UKAD Open Refine Workshop

829 views

Published on

Held at Jisc London 18th March 2016.

Details at http://www.nationalarchives.gov.uk/archives-sector/engaging-with-ukad.htm

Published in: Education
  • Be the first to comment

  • Be the first to like this

Tools for Data Manipulation - UKAD Open Refine Workshop

  1. 1. Adrian Stevenson, Senior Technical Coordinator, Jisc Manchester Tools for Data Manipulation UKAD Open RefineWorkshop, Jisc London, 18th March 2016
  2. 2. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 2 Workshop Resources Available from: http://data.archiveshub.ac.uk/workshops/ukad2016/readme.html Link to Open Refine and plugins Link to example data used for workshop Link to completed Open Refine project from todays workshop
  3. 3. Open Refine OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. Main Uses: • Explore data • Clean and transform data • Reconcile and match data Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 3
  4. 4. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 4 Installing and running Open Refine Download from: http://openrefine.org/download.html Run and in a web browser go to: http://127.0.0.1:3333/ Select ‘create project’ and browse for Archives Hub example csv data file Note: May need to clear browser cache to see new projects
  5. 5. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 5 Clean andTransform - Facets and Clustering Strip white space Transform Upper case, title case Split multi valued cells or Edit col > Split several cols Facet on label Order by count Cluster and rename rows Undo
  6. 6. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 6 Clean - Remove Duplicate rows Sort on column with duplicates and reorder permanently Facet duplicates to check Watch for OR switching from rows to records view Edit cells > Blank Down Facet by blank Remove all matching Essence of Open Refine is using facets and filters to isolate rows and invoke commands to affect all these rows together
  7. 7. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 7
  8. 8. URIs LD Design Issues Triples http://www.w3.org/DesignIssues/LinkedData.html 8
  9. 9. Triples Triples statements »‘Things’ have ‘properties’ with ‘values’ »Subject – Predicate - Object Archival Resource Repository Provides Access To Pride and Prejudice Jane Austen Is Author Of Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 9 Triples are the basis of RDF and Linked Data
  10. 10. owl:sameAs Hub Person - owl:sameAs -VIAF Person <http://data.archiveshub.ac.uk/id/person/nra/webbma rthabeatrice1858-1943socialreformer> owl:sameAs <http://viaf.org/viaf/86607236> . Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 10
  11. 11. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 11 Matching Names toVIAF May need to join columns together, for example to give more consistent name form, e.g using: cells["FamilyName"].value + ", " + cells["GivenName"].value + ", " + cells["Dates"].value
  12. 12. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 12 Matching Names toVIAF VIAF reconciliation service details at: http://iphylo.blogspot.co.uk/2013/04/reconciling-author-names-using-open.html May need to add as a ‘standard service’ under Reconcile > Start reconciling. Service URL is: http://iphylo.org/~rpage/phyloinformatics/services/reconcil iation_viaf.php Other recon services e.g. LCSH at: https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data- Sources
  13. 13. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 13 RDF Export Download RDF Refine Extension from http://refine.deri.ie/ Unzip Open Project > Browse workspace directory Create ‘extensions’ folder (if doesn’t exist) Copy RDF Refine unzipped folder to workspace directory Restart Open Refine Need to create column withVIAF URIs for export: "http://viaf.org/viaf/"+cell.recon.match.id
  14. 14. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 14 Matching Subjects to LCSH Click RDF button in the top right corner, select ‘Add reconciliation service, Based on SPARQL endpoint’. Add following parameters: Name: LCSH Endpoint URL: http://sparql.freeyourmetadata.org/ Graph URI: http://id.loc.gov/authorities/subjects Type:Virtuoso Label properties: check only skos:prefLabel
  15. 15. Martha BeatriceWebb Place of birth:Gloucester, England Place of death: Liphook, Hampshire, England Life dates: 1858-1943 Epithet: social reformer and historian Family name:Webb Image from: BeatriceWebb letters BeatriceWebb (1858 - 1943). Fabian Socialist, social reformer, writer, historian, diarist.Wife, collaborator and assistant of SidneyWebb, later Lord Passfield.Together they contributed to the radical ideology first of the Liberal Party and later of the Labour Party. from: BeatriceWebb,A summer holiday in Scotland, 1884. BeatriceWebb (1858-1943), nee Potter, social reformer and diarist. Married to SidneyWebb, pioneers of social science. She was involved in many spheres of political and social activity including the Labour Party, Fabianism, social observation, investigations into poverty, development of socialism, the foundation of the National Health Service and post war welfare state, the London School of Biographical Notes Works Our Partnership My Apprenticeship The case for the factory acts BeatriceWebb’s diaries; edited by MargaretCole The Diary Knows http://dbpedia.org/page/George_Bernard_Shaw http://dbpedia.org/page/Sidney_Webb,_1st_Bar on_Passfield 15Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/
  16. 16. Contact Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 16 Adrian Stevenson SeniorTechnical Coordinator Jisc Manchester http://www.jisc.ac.uk adrian.stevenson@jisc.ac.uk http://www.twitter.com/adrianstevenson https://www.linkedin.com/in/adrianstevenson
  17. 17. Tools for Data Manipulation - Workshop resources at http://data.archiveshub.ac.uk/workshops/ukad2016/ 17 CC License This presentation available under creative commons Non Commercial-Share Alike: http://creativecommons.org/licenses/by-nc/2.0/uk/

×