Open Refine for Librarians
How a power tool for Google is now
being used by librarians to clean up
data and connect it to the world
Mita Williams
Scholarly Communications Librarian
University of Windsor
October 24, 2018 : 2:45 - 3:15 pm
NISO: That Cutting Edge: Technology’s Impact on Scholarly
Research Processes in the Library
PART ONE:
AN INTRODUCTION
TO OPEN REFINE
The most popular library tool you’ve never heard of…
link
link
link
link
link
link
link
link
PART TWO:
WHY NOT KEEP USING EXCEL?
The most popular library tool you’ve never heard of…
Why use Open Refine?
• Ability to handle more types of data
TSV, CSV, *SV, Excel (.xls and .xlsx), JSON, XML, RDF as XML
• Ability to handle larger amounts of data
Excel’s max: 1,048,576 rows by 16,384 columns
• Better control of data
• Ability to script processes
• Ability share and reproduce these scripts
link
link
link
link
link
link
link
PART THREE:
HOW ARE LIBRARIANS
USING OPENREFINE?
!!! OpenRefine is NOT Excel !!!
• Institution changing their library management system
and wished to migrate their catalogue data
• Approximately 50,000 bibliographic records
• MARC output from existing system would not load into
new system
link
link
link
link
link
link
link
link
link
link
link
link
link
link
Been there! Done that! Bought the t-shirt!
(Any questions?)

Williams Open Refine for Librarians

  • 1.
    Open Refine forLibrarians How a power tool for Google is now being used by librarians to clean up data and connect it to the world Mita Williams Scholarly Communications Librarian University of Windsor October 24, 2018 : 2:45 - 3:15 pm NISO: That Cutting Edge: Technology’s Impact on Scholarly Research Processes in the Library
  • 2.
    PART ONE: AN INTRODUCTION TOOPEN REFINE The most popular library tool you’ve never heard of…
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    PART TWO: WHY NOTKEEP USING EXCEL? The most popular library tool you’ve never heard of…
  • 12.
    Why use OpenRefine? • Ability to handle more types of data TSV, CSV, *SV, Excel (.xls and .xlsx), JSON, XML, RDF as XML • Ability to handle larger amounts of data Excel’s max: 1,048,576 rows by 16,384 columns • Better control of data • Ability to script processes • Ability share and reproduce these scripts
  • 13.
  • 14.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
    PART THREE: HOW ARELIBRARIANS USING OPENREFINE? !!! OpenRefine is NOT Excel !!!
  • 21.
    • Institution changingtheir library management system and wished to migrate their catalogue data • Approximately 50,000 bibliographic records • MARC output from existing system would not load into new system link
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
    Been there! Donethat! Bought the t-shirt! (Any questions?)