2. Programm
11-11:05 -- Introduction to the session and
presenters
PRESENTATION OF PROJECTS
11:05-11:20 – Jodi: Mapping Titan, Mapping
Paintings
11:20-11:35 – Catherine: Mapping Sculpture
PRESENTATION OF TOOLS
11:35-12:05 – Angela: OpenRefine, TimelineJS
12:05-12:35 – Catherine: Palladio, CARTO
Hands-on
3.
4. OpenRefine
Cleaning up messy data from a
spreadsheet
Spelling errors
Uniform data
Removing whitespace
Splitting columns
Enriching data from external sources
Etc.
You won’t be analysing your data one by one, but
in groups and sets. Therefore the application is
suitable for very large data sets.
5. OpenRefine
Apart from cleaning data, you can also
use Open Refine for different purposes
Word counts in sets
Combine sheets
Enriching reconciled data with Open Refine:
Import data from Wikidata or VIAF
6. OpenRefine
Free, open source software
Works best with Google Chrome (less with Safari and
Explorer)
Written in Java. Requires Java JRE
Works with Interactive Data Transformation tools (IDTs),
which allows to change a big data set at one time. It is
similar to a spreadsheet, but has more functionalities.
Works as a destop application. It does not store your
data. Save them! It may be used in several tabs
contemporaneously.
The .exe file opens a terminal window as web application,
where the little server is running. It needs to remain open.
Runs offline through the terminal window.
7. OpenRefine
Chose a project and upload it.
Rename project (save it later, Open Refine does not save
or store automatically!!)
Use code UTF-8
Configure your data: You will be shown a preview of your
data. In the lower blue field, make sure “Parse data as” is
set to “CSV / TSV / separator-based files”. Where it says
character encoding, click in the blank field next to it and
select UTF-8 from the pop-up window of encodings. Make
sure the first row with your column headers is recognized
as headers (boldfaced) and not as your data. If it is not
automatically recognized, check the click box for “Parse
next ‘1’ line(s) as column headers”. Since our exercise file
is a CSV, activate the radio button “commas (CSV)” as the
separator.
8. OpenRefine – basic clean
up
Text facet -> cluster
Get rid of whitespace: «Edit cells» -> «Common
transforms» -> «Trim leading and trailing whitespace» /
«Collapse consecutive whitespace»
Divide columns: «Edit column» -> «Split into several
columns…»
Reorder columns
Cluster: «Edit cells» -> «Cluster and edit…» (only works
for entire clusters to be merged, no selection possible)
Replace: Edit cells -> replace
Undo/redo: step by step index in the menu
Cancelling: Text facet –> chose what to eliminate and
place a star –> back to facet by star –> true –> under all –
facet by star –> remove all matching rows
9. OpenRefine - transform
Exchange values: Edit cells -> transform ->
GREL language -> transform the value
Replace: value.replace(‘xx’, ‘x’)
Add characters to a column: “prefix” + value
Cleaning up a date to show only the year:
datePart(value,'year')
GREL : General Refine Expression Language on
GitHub
https://github.com/OpenRefine/OpenRefine/wiki/Gen
eral-Refine-Expression-Language
10. OpenRefine – example from
Wikipedia – Italian artists
Download table from Wikipedia
You want to separate names and years
Add column based on this column
Edit cells -> replace (to change the brakets into a colon, to be
used later as idenfier)
Edit column – split into several columns (use colon as identifier)
Replace ) by null
Value + «, « + cells(«mycell»).value
Person separate: edit column – add column based on this
column – value.split(« «)[1]
○ 1= last name / 0= first name
Add last name, first name together: value + «, « +
cells[«Firstname»].value
Another option: Split cells: Choose ‘Edit cells’, ‘Split multi-
valued cells’, entering ‘|’ as the value separator.
11. OpenRefine for Data
enrichment
(using Linked Open Data)
Fetch URLs using Refine
Contruct URL queries to retrieve
information from a simple web API
Using query services like:
Wikidata
Google maps API
VIAF (Virtual International Authority File)
etc.
12. Retrieving data from
Wikidata
You need a column Wikidata_uri
Create a column Wikidata_id: Edit column –> add
column based on this column –> for the ID extraction
enter value
replace(value,"http://www.wikidata.org/entity/", "")
On Wikidata_id column: Edit column -> add column
by fetching URLs -> if you want to query birth dates
enter value «P569»
("https://tools.wmflabs.org/openrefine-
wikidata/en/fetch_values?item="+value+"&prop=P56
9") -> name column «date_of_birth_Wikidata». The
result is in JSON.
Clean data by -> edit cells -> transform -> for value
enter forEach(value.parseJson().values,v,v).join(";")
Cleaning up a date to show only the year:
datePart(value,'year')
13. Retrieving data from
Wikidata
Reconcile (how simple is this!!)
Chose source – Wikidata (in case include
other columns too)
Start reconciling – record will be
automatically linked to Wikidata (some rest
has to be done manually)
Use values as identifiers
14. OpenRefine - export
At the end: export your data set! (Open
Refine does not change your original
data set)
Single column export -> facet -> chose
facet -> export csv
Full sheet export -> comma-separated
value
It is also possible to only export parts of
your sheet.
15. OpenRefine tutorials
http://openrefine.org/
https://programminghistorian.org/en/lessons/cleaning
-data-with-openrefine
https://github.com/miriamposner/get-started-with-
openrefine/blob/master/get-started-with-
openrefine.md
https://github.com/OpenRefine/OpenRefine/wiki/Doc
umentation-For-Users
Retrieving data from Wikidata or VIAF
https://medium.com/the-bytegeist-blog/enriching-
reconciled-data-with-openrefine-89b885dcadbb
There are many more!!
16.
17. Timelines (selection)
Timeline JS (Northwestern University)
https://news.northwestern.edu/stories/2012
/03/knight-lab-digital-timelines/ (with
examples and spreadsheet)
Neatline – for Omeka
http://docs.neatline.org/creating-records.html
Google Timeline
https://www.google.com/maps/timeline?pb
Office Timelines (for Excel or Powerpoint)
https://templates.office.com/en-
us/Timelines?page=1
18. TimelineJS
With Google Chrome and Google Spreadsheets
Advantages
Easy to use for a chronological visualization
Incorporates maps and images from the web
Can be incorporated into Websites and
Powerpoints
Disadvantages
Limited interactivity
Only uses images published on the web, not
from own collection
19. TimelineJS
With Google Chrome
https://timeline.knightlab.com/
Botticelli spreadsheet:
https://docs.google.com/spreadsheets/d/
1BAg-2_XZM-
Oap1cwQoftBcYjrJYBjXOSNOqdXBwQ
WyY/edit#gid=0
Botticelli timeline (imbedded link to
website or presentation)
20. Thank you !
Dr. Angela Dressen
Villa I Tatti, The Harvard University Center
for Italian Renaissance Studies / Florenz,
Italy
adressen@itatti.harvard.edu
Discipline Representative for Digital
Humanities at the Renaissance Society of
America (RSA)
Editor's Notes
Cleaning up your own accumulated data or data gathered from the net. Works with an algorithm.
Wikidata provides an endpoint for querying data as a URL. Once you know the property you would like to retrieve, the objective is to use OpenRefine to build a query string and retrieve the data you want from that endpoint.