3. HOW DOES IT COMPARE TO OTHER TOOLS?
OpenRefine
• Can batch edit
rows and columns
• Excellent for
exploring &
transforming data
• No schema
needed
• Data is always
visible
Spreadsheets
• Edit one cell at a
time
• Excellent for data
entry, functions,
calculations
• No schema
needed
• Data is always
visible
Databases
• Schema and
scripting language
needed for editing
• Data is mostly out
of site unless
programming is
used to run
queries or build
views
4. LIVE DEMO – BASIC ORIENTATION
• Create/open/import project
• Basic navigation
• The zones of central viewing area; the functions of the “All” column
vs. the other columns
• Export options
• Undo/redo
• Facet/filter
5. LIVE DEMO – EXPLORING & TRANSFORMING
• Faceting options
• Flag and remove
• Common transforms
• Transform; Add column based on this column
• GREL
• search/replace with multiple commands
• cell.cross
• Split/join cells
6. GETTING STARTED
Are you seeing this error when you open a project?
You can ignore it. It is trying to reach the Freebase service that
no longer exists.
7. USEFUL GREL OPERATIONS
Search and replace -
value.replace (",","")
“Atlanta, GA” becomes “Atlanta GA”
You can combine multiple commands together by connecting
them with periods.
value.replace (",","").replace (":","")
“Atlanta, GA: 30303” becomes “Atlanta GA 30303”
8. USEFUL GREL OPERATIONS
Replace (transform) the values in your current column with
those from another column in the same project:
cells["column"].value
where column represents the name of the column you are
getting the values from
9. USEFUL GREL OPERATIONS
Concatentation:
Adding a string to the value of the current column –
"added string" + cells["current column"].value
Combining the values of two columns -
cells["column1"].value + " " +
cells["column2"].value
Note – if any of the cells have blank values, problems will arise: see
http://kb.refinepro.com/2011/07/merge-2-columns-that-have-both-
blank.html
10. USEFUL GREL OPERATIONS
Changing the date format of a string formatted date:
Note: True date formats in OpenRefine are colored in green and formatted like
this: 2018-10-03T00:00:00Z. But you may have imported dates that retained their
text format (particularly if you turned off the option to parse text into numbers and
dates during the import process, as this speeds up the import process).
To transform 2018-10-03 to display just the year 2018:
toString(toDate(value),"yyyy")
The GREL first converts the expression to date format, takes just
the year, then converts it back to string.
11. USEFUL GREL OPERATIONS
Import a column from a different project into your current
project based on a matching column (cell.cross function):
cell.cross("JSTOR 201806 JR1", "Print
ISSN").cells["Reporting Period Total"].value[0]
Use the “add a column based on this column” menu option
on your Print ISSN column. The other project is “JSTOR
201806 JR1”, you are matching that project’s “Print ISSN”
column, and you are importing that project’s “Reporting
Period Total” column.
12. CLUSTERING DEMO
Clustering – a semi-automated process to identify groups of
different values that might represent the same thing, then
correct or normalize them:
“organization” AND “organisation”
“New York” AND “new york“
“François Mauriac” AND “Francois Mauriac”
13. RECONCILIATION
A service that semi-automates the process of matching data in
your project to authoritative data in other sources, for example:
• VIAF (Virtual International Authority File)
• FAST (Faceted Application of Subject Terminology)
• Library of Congress Subject Headings
• Journal TOCs
Other reconcilable data sources
14. RECONCILIATION
Wikidata reconciliation is the only built in service. Any
others must be added.
To reconcile against only the LC source in VIAF:
http://refine.codefork.com/reconcile/viafproxy/LC
From the column menu: Reconcile:
Start reconciling…
Step 1
Step 4
Step 3
Step 2
16. RECONCILIATION
Next steps:
• Verify the matched titles.
The links will take you to
the LC Name Authority
File records so you can
check.
• Select matches for the
unmatched titles by either
clicking the single or
double check marks:
the single check mark
matches just that cell; the
double check mark matches
all identical cells
17. RECONCILIATION
Now you have a list of proper LC
headings.
To get the match IDs for the column
you just reconciled:
• Edit Column – Add column
based on this column
• Name the new column
• In “Expression” box enter:
cell.recon.match.id
18. ADDITIONAL RESOURCES
• Using OpenRefine (2013), by Ruben Verborgh and Max De
Wilde
A somewhat dated but still useful book that provides a
comprehensive introduction to OpenRefine.
• Cleaning Data with OpenRefine:
https://libjohn.github.io/openrefine/
An excellent tutorial developed by John Little at Duke
University Libraries.
19. ADDITIONAL RESOURCES
• OpenRefine’s Documentation page:
http://openrefine.org/documentation.html
Links to several online courses and an extensive curated
tutorial list
• Official documentation and reference for the General Refine
Expression Language (GREL):
https://github.com/OpenRefine/OpenRefine/wiki/Documentatio
n-For-Users#reference
20. ADDITIONAL RESOURCES
• Reconciling author names using Open Refine and VIAF:
http://iphylo.blogspot.com/2013/04/reconciling-author-names-
using-open.html
• Reconciling Smithsonian Library data with VIAF:
https://allysonota.weebly.com/uploads/5/7/9/6/57968819/ota_viaf
.pdf
• Reconciliation in OpenRefine, videos by Owen Stephens
https://www.youtube.com/watch?v=q8ffvdeyuNQ (part 1)
https://www.youtube.com/watch?v=q8ffvdeyuNQ (part 2)