SlideShare a Scribd company logo
1 of 20
OPENREFINE
Tricia Clayton
Collection Assessment and Discovery Librarian
Georgia State University
WHAT IS OPENREFINE?
OpenRefine
Main
Functions
Clean &
Transform
Extend &
ReconcileExplore
http://openrefine.org
HOW DOES IT COMPARE TO OTHER TOOLS?
OpenRefine
• Can batch edit
rows and columns
• Excellent for
exploring &
transforming data
• No schema
needed
• Data is always
visible
Spreadsheets
• Edit one cell at a
time
• Excellent for data
entry, functions,
calculations
• No schema
needed
• Data is always
visible
Databases
• Schema and
scripting language
needed for editing
• Data is mostly out
of site unless
programming is
used to run
queries or build
views
LIVE DEMO – BASIC ORIENTATION
• Create/open/import project
• Basic navigation
• The zones of central viewing area; the functions of the “All” column
vs. the other columns
• Export options
• Undo/redo
• Facet/filter
LIVE DEMO – EXPLORING & TRANSFORMING
• Faceting options
• Flag and remove
• Common transforms
• Transform; Add column based on this column
• GREL
• search/replace with multiple commands
• cell.cross
• Split/join cells
GETTING STARTED
Are you seeing this error when you open a project?
You can ignore it. It is trying to reach the Freebase service that
no longer exists.
USEFUL GREL OPERATIONS
Search and replace -
value.replace (",","")
“Atlanta, GA” becomes “Atlanta GA”
You can combine multiple commands together by connecting
them with periods.
value.replace (",","").replace (":","")
“Atlanta, GA: 30303” becomes “Atlanta GA 30303”
USEFUL GREL OPERATIONS
Replace (transform) the values in your current column with
those from another column in the same project:
cells["column"].value
where column represents the name of the column you are
getting the values from
USEFUL GREL OPERATIONS
Concatentation:
Adding a string to the value of the current column –
"added string" + cells["current column"].value
Combining the values of two columns -
cells["column1"].value + " " +
cells["column2"].value
Note – if any of the cells have blank values, problems will arise: see
http://kb.refinepro.com/2011/07/merge-2-columns-that-have-both-
blank.html
USEFUL GREL OPERATIONS
Changing the date format of a string formatted date:
Note: True date formats in OpenRefine are colored in green and formatted like
this: 2018-10-03T00:00:00Z. But you may have imported dates that retained their
text format (particularly if you turned off the option to parse text into numbers and
dates during the import process, as this speeds up the import process).
To transform 2018-10-03 to display just the year 2018:
toString(toDate(value),"yyyy")
The GREL first converts the expression to date format, takes just
the year, then converts it back to string.
USEFUL GREL OPERATIONS
Import a column from a different project into your current
project based on a matching column (cell.cross function):
cell.cross("JSTOR 201806 JR1", "Print
ISSN").cells["Reporting Period Total"].value[0]
Use the “add a column based on this column” menu option
on your Print ISSN column. The other project is “JSTOR
201806 JR1”, you are matching that project’s “Print ISSN”
column, and you are importing that project’s “Reporting
Period Total” column.
CLUSTERING DEMO
Clustering – a semi-automated process to identify groups of
different values that might represent the same thing, then
correct or normalize them:
“organization” AND “organisation”
“New York” AND “new york“
“François Mauriac” AND “Francois Mauriac”
RECONCILIATION
A service that semi-automates the process of matching data in
your project to authoritative data in other sources, for example:
• VIAF (Virtual International Authority File)
• FAST (Faceted Application of Subject Terminology)
• Library of Congress Subject Headings
• Journal TOCs
Other reconcilable data sources
RECONCILIATION
Wikidata reconciliation is the only built in service. Any
others must be added.
To reconcile against only the LC source in VIAF:
http://refine.codefork.com/reconcile/viafproxy/LC
From the column menu: Reconcile:
Start reconciling…
Step 1
Step 4
Step 3
Step 2
RECONCILIATION
Choose:
• what type of entity
to reconcile
against
• if you want it to
auto match
candidates with
high confidence
RECONCILIATION
Next steps:
• Verify the matched titles.
The links will take you to
the LC Name Authority
File records so you can
check.
• Select matches for the
unmatched titles by either
clicking the single or
double check marks:
the single check mark
matches just that cell; the
double check mark matches
all identical cells
RECONCILIATION
Now you have a list of proper LC
headings.
To get the match IDs for the column
you just reconciled:
• Edit Column – Add column
based on this column
• Name the new column
• In “Expression” box enter:
cell.recon.match.id
ADDITIONAL RESOURCES
• Using OpenRefine (2013), by Ruben Verborgh and Max De
Wilde
A somewhat dated but still useful book that provides a
comprehensive introduction to OpenRefine.
• Cleaning Data with OpenRefine:
https://libjohn.github.io/openrefine/
An excellent tutorial developed by John Little at Duke
University Libraries.
ADDITIONAL RESOURCES
• OpenRefine’s Documentation page:
http://openrefine.org/documentation.html
Links to several online courses and an extensive curated
tutorial list
• Official documentation and reference for the General Refine
Expression Language (GREL):
https://github.com/OpenRefine/OpenRefine/wiki/Documentatio
n-For-Users#reference
ADDITIONAL RESOURCES
• Reconciling author names using Open Refine and VIAF:
http://iphylo.blogspot.com/2013/04/reconciling-author-names-
using-open.html
• Reconciling Smithsonian Library data with VIAF:
https://allysonota.weebly.com/uploads/5/7/9/6/57968819/ota_viaf
.pdf
• Reconciliation in OpenRefine, videos by Owen Stephens
https://www.youtube.com/watch?v=q8ffvdeyuNQ (part 1)
https://www.youtube.com/watch?v=q8ffvdeyuNQ (part 2)

More Related Content

What's hot

Microsoft Excel- basics
Microsoft Excel-  basicsMicrosoft Excel-  basics
Microsoft Excel- basicsjeshin jose
 
Access 2013 Unit A
Access 2013 Unit AAccess 2013 Unit A
Access 2013 Unit Ajarana00
 
Ms excel ppt
Ms excel pptMs excel ppt
Ms excel pptSuraj Jha
 
Lecture 01 introduction to database
Lecture 01 introduction to databaseLecture 01 introduction to database
Lecture 01 introduction to databaseemailharmeet
 
DSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable EntitiesDSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable EntitiesAtmire
 
ms excel presentation...
ms excel presentation...ms excel presentation...
ms excel presentation...alok1994
 
Basic introduction to ms access
Basic introduction to ms accessBasic introduction to ms access
Basic introduction to ms accessjigeno
 
Using SQL Queries to Insert, Update, Delete, and View Data.ppt
Using SQL Queries to Insert, Update, Delete, and View Data.pptUsing SQL Queries to Insert, Update, Delete, and View Data.ppt
Using SQL Queries to Insert, Update, Delete, and View Data.pptMohammedJifar1
 
MS Excel Learning for PPC Google AdWords Training Course
MS Excel Learning for PPC Google AdWords Training CourseMS Excel Learning for PPC Google AdWords Training Course
MS Excel Learning for PPC Google AdWords Training CourseRanjan Jena
 
A practical tutorial to excel
A practical tutorial to excelA practical tutorial to excel
A practical tutorial to excelMunna India
 
LESSON 3: BASIC SKILLS ON MICROSOFT EXCEL
LESSON 3: BASIC SKILLS ON MICROSOFT EXCELLESSON 3: BASIC SKILLS ON MICROSOFT EXCEL
LESSON 3: BASIC SKILLS ON MICROSOFT EXCELbonzy_30
 
digital libraries, library buildings in digital era
digital libraries, library buildings in digital eradigital libraries, library buildings in digital era
digital libraries, library buildings in digital erapardeeprattan
 
Ms excel basic about Data, graph and pivot table
Ms excel basic about Data, graph and pivot table Ms excel basic about Data, graph and pivot table
Ms excel basic about Data, graph and pivot table Alomgir Hossain
 
Dbms Interview Question And Answer
Dbms Interview Question And AnswerDbms Interview Question And Answer
Dbms Interview Question And AnswerJagan Mohan Bishoyi
 
Lecture 1 introduction to vb.net
Lecture 1   introduction to vb.netLecture 1   introduction to vb.net
Lecture 1 introduction to vb.netMUKALU STEVEN
 

What's hot (20)

Microsoft Excel- basics
Microsoft Excel-  basicsMicrosoft Excel-  basics
Microsoft Excel- basics
 
Excel Pivot Tables
Excel Pivot TablesExcel Pivot Tables
Excel Pivot Tables
 
Access 2013 Unit A
Access 2013 Unit AAccess 2013 Unit A
Access 2013 Unit A
 
MS Excel
MS ExcelMS Excel
MS Excel
 
Ms excel ppt
Ms excel pptMs excel ppt
Ms excel ppt
 
Dspace software
Dspace softwareDspace software
Dspace software
 
Lecture 01 introduction to database
Lecture 01 introduction to databaseLecture 01 introduction to database
Lecture 01 introduction to database
 
Ms excel
Ms excelMs excel
Ms excel
 
DSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable EntitiesDSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable Entities
 
ms excel presentation...
ms excel presentation...ms excel presentation...
ms excel presentation...
 
Basic introduction to ms access
Basic introduction to ms accessBasic introduction to ms access
Basic introduction to ms access
 
Using SQL Queries to Insert, Update, Delete, and View Data.ppt
Using SQL Queries to Insert, Update, Delete, and View Data.pptUsing SQL Queries to Insert, Update, Delete, and View Data.ppt
Using SQL Queries to Insert, Update, Delete, and View Data.ppt
 
MS Excel Learning for PPC Google AdWords Training Course
MS Excel Learning for PPC Google AdWords Training CourseMS Excel Learning for PPC Google AdWords Training Course
MS Excel Learning for PPC Google AdWords Training Course
 
A practical tutorial to excel
A practical tutorial to excelA practical tutorial to excel
A practical tutorial to excel
 
LESSON 3: BASIC SKILLS ON MICROSOFT EXCEL
LESSON 3: BASIC SKILLS ON MICROSOFT EXCELLESSON 3: BASIC SKILLS ON MICROSOFT EXCEL
LESSON 3: BASIC SKILLS ON MICROSOFT EXCEL
 
Charts and pivot tables
Charts and pivot tablesCharts and pivot tables
Charts and pivot tables
 
digital libraries, library buildings in digital era
digital libraries, library buildings in digital eradigital libraries, library buildings in digital era
digital libraries, library buildings in digital era
 
Ms excel basic about Data, graph and pivot table
Ms excel basic about Data, graph and pivot table Ms excel basic about Data, graph and pivot table
Ms excel basic about Data, graph and pivot table
 
Dbms Interview Question And Answer
Dbms Interview Question And AnswerDbms Interview Question And Answer
Dbms Interview Question And Answer
 
Lecture 1 introduction to vb.net
Lecture 1   introduction to vb.netLecture 1   introduction to vb.net
Lecture 1 introduction to vb.net
 

Similar to OpenRefine

Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptxShshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx086ChintanPatel1
 
IMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESSIMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESS23HARSHU
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Brij Mishra
 
Erlwood KNIME nodes 2014
Erlwood KNIME nodes 2014Erlwood KNIME nodes 2014
Erlwood KNIME nodes 2014James Lumley
 
Obiee metadata development
Obiee metadata developmentObiee metadata development
Obiee metadata developmentdils4u
 
Querying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptxQuerying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptxQuyVo27
 
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...Anna Loughnan Colquhoun
 
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BI
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BI
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIEzekielJames8
 
Automation Of Reporting And Alerting
Automation Of Reporting And AlertingAutomation Of Reporting And Alerting
Automation Of Reporting And AlertingSean Durocher
 
Pl sql best practices document
Pl sql best practices documentPl sql best practices document
Pl sql best practices documentAshwani Pandey
 
Java development with the dynamo framework
Java development with the dynamo frameworkJava development with the dynamo framework
Java development with the dynamo frameworkPatrick Deenen
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343Edgar Alejandro Villegas
 
How To Automate Part 2
How To Automate Part 2How To Automate Part 2
How To Automate Part 2Sean Durocher
 
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...Saurabh Patel
 

Similar to OpenRefine (20)

Oracle report from ppt
Oracle report from pptOracle report from ppt
Oracle report from ppt
 
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptxShshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
Shshsjsjsjs-4 - Copdjsjjsjsjsjakakakaaky.pptx
 
IMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESSIMPORT AND EXPORT UTILITIES IN MS-ACCESS
IMPORT AND EXPORT UTILITIES IN MS-ACCESS
 
Using Spreadsheets.pptx
Using Spreadsheets.pptxUsing Spreadsheets.pptx
Using Spreadsheets.pptx
 
Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012Exciting Features for SQL Devs in SQL 2012
Exciting Features for SQL Devs in SQL 2012
 
Erlwood KNIME nodes 2014
Erlwood KNIME nodes 2014Erlwood KNIME nodes 2014
Erlwood KNIME nodes 2014
 
Obiee metadata development
Obiee metadata developmentObiee metadata development
Obiee metadata development
 
Querying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptxQuerying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptx
 
Introduction to Microsoft Excel
Introduction to Microsoft ExcelIntroduction to Microsoft Excel
Introduction to Microsoft Excel
 
BI Suite Overview
BI Suite OverviewBI Suite Overview
BI Suite Overview
 
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...
Luke Cushanick Admin Tips and Tricks for Salesforce Trailblazer Community Chr...
 
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BI
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BI
DSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BIDSN_Power BI
 
PowerBI Training
PowerBI Training PowerBI Training
PowerBI Training
 
Automation Of Reporting And Alerting
Automation Of Reporting And AlertingAutomation Of Reporting And Alerting
Automation Of Reporting And Alerting
 
Pl sql best practices document
Pl sql best practices documentPl sql best practices document
Pl sql best practices document
 
Java development with the dynamo framework
Java development with the dynamo frameworkJava development with the dynamo framework
Java development with the dynamo framework
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
 
Etl2
Etl2Etl2
Etl2
 
How To Automate Part 2
How To Automate Part 2How To Automate Part 2
How To Automate Part 2
 
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
 

More from Georgia Libraries Conference (formerly Ga COMO).

More from Georgia Libraries Conference (formerly Ga COMO). (20)

Public Libraries as Partners for Community Health: Five Years of Evidence-Bas...
Public Libraries as Partners for Community Health: Five Years of Evidence-Bas...Public Libraries as Partners for Community Health: Five Years of Evidence-Bas...
Public Libraries as Partners for Community Health: Five Years of Evidence-Bas...
 
Everyone In!: Building and maintaining culture on your team
Everyone In!: Building and maintaining culture on your teamEveryone In!: Building and maintaining culture on your team
Everyone In!: Building and maintaining culture on your team
 
Creating a culture of welcome: Celebrating diversity and serving the informat...
Creating a culture of welcome: Celebrating diversity and serving the informat...Creating a culture of welcome: Celebrating diversity and serving the informat...
Creating a culture of welcome: Celebrating diversity and serving the informat...
 
Journey with Jones: Creating Virtual Tours to Generate Global Awareness
Journey with Jones: Creating Virtual Tours to Generate Global AwarenessJourney with Jones: Creating Virtual Tours to Generate Global Awareness
Journey with Jones: Creating Virtual Tours to Generate Global Awareness
 
So you want to manage? The Dos & Don'ts of personnel management.
So you want to manage? The Dos & Don'ts of personnel management.So you want to manage? The Dos & Don'ts of personnel management.
So you want to manage? The Dos & Don'ts of personnel management.
 
Building the Foundation For Grant Seeking in Public Libraries
Building the Foundation For Grant Seeking in Public LibrariesBuilding the Foundation For Grant Seeking in Public Libraries
Building the Foundation For Grant Seeking in Public Libraries
 
Preserving the History of a Consolidated University
Preserving the History of a Consolidated UniversityPreserving the History of a Consolidated University
Preserving the History of a Consolidated University
 
Supporting Libraries Through Advocacy
Supporting Libraries Through AdvocacySupporting Libraries Through Advocacy
Supporting Libraries Through Advocacy
 
Only So Much Time in the Day: Time Management Strategies for Success
Only So Much Time in the Day: Time Management Strategies for SuccessOnly So Much Time in the Day: Time Management Strategies for Success
Only So Much Time in the Day: Time Management Strategies for Success
 
Assessment during a pandemic: Using ACRL’s project OUTCOME to assess instruct...
Assessment during a pandemic: Using ACRL’s project OUTCOME to assess instruct...Assessment during a pandemic: Using ACRL’s project OUTCOME to assess instruct...
Assessment during a pandemic: Using ACRL’s project OUTCOME to assess instruct...
 
The Challenges of Collection Management During Fiscal Uncertainty
The Challenges of Collection Management During Fiscal UncertaintyThe Challenges of Collection Management During Fiscal Uncertainty
The Challenges of Collection Management During Fiscal Uncertainty
 
Are We Building Bridges or Walls? Opportunities and Challenges in Mitigating ...
Are We Building Bridges or Walls? Opportunities and Challenges in Mitigating ...Are We Building Bridges or Walls? Opportunities and Challenges in Mitigating ...
Are We Building Bridges or Walls? Opportunities and Challenges in Mitigating ...
 
LC Call Number 101: “What does it all mean?!”, the LC Classification and Shel...
LC Call Number 101: “What does it all mean?!”, the LC Classification and Shel...LC Call Number 101: “What does it all mean?!”, the LC Classification and Shel...
LC Call Number 101: “What does it all mean?!”, the LC Classification and Shel...
 
Strengthening the School to College Pipeline: Building National History Day P...
Strengthening the School to College Pipeline: Building National History Day P...Strengthening the School to College Pipeline: Building National History Day P...
Strengthening the School to College Pipeline: Building National History Day P...
 
History, Libraries and Archives
History, Libraries and ArchivesHistory, Libraries and Archives
History, Libraries and Archives
 
Georgia Helen Ruffin Reading Bowl (GaHRRB) is Celebrating 20 Years of Success...
Georgia Helen Ruffin Reading Bowl (GaHRRB) is Celebrating 20 Years of Success...Georgia Helen Ruffin Reading Bowl (GaHRRB) is Celebrating 20 Years of Success...
Georgia Helen Ruffin Reading Bowl (GaHRRB) is Celebrating 20 Years of Success...
 
Brick House: Building Stronger Academic Connections for Student Learning Success
Brick House: Building Stronger Academic Connections for Student Learning SuccessBrick House: Building Stronger Academic Connections for Student Learning Success
Brick House: Building Stronger Academic Connections for Student Learning Success
 
Successful User Experience: Active Listening + Creative Solutions = Building ...
Successful User Experience: Active Listening + Creative Solutions = Building ...Successful User Experience: Active Listening + Creative Solutions = Building ...
Successful User Experience: Active Listening + Creative Solutions = Building ...
 
Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...
Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...
Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...
 
Let’s Get Down to Business: An Academic Library Instagram Experience
Let’s Get Down to Business: An Academic Library Instagram ExperienceLet’s Get Down to Business: An Academic Library Instagram Experience
Let’s Get Down to Business: An Academic Library Instagram Experience
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 

Recently uploaded (20)

9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 

OpenRefine

  • 1. OPENREFINE Tricia Clayton Collection Assessment and Discovery Librarian Georgia State University
  • 2. WHAT IS OPENREFINE? OpenRefine Main Functions Clean & Transform Extend & ReconcileExplore http://openrefine.org
  • 3. HOW DOES IT COMPARE TO OTHER TOOLS? OpenRefine • Can batch edit rows and columns • Excellent for exploring & transforming data • No schema needed • Data is always visible Spreadsheets • Edit one cell at a time • Excellent for data entry, functions, calculations • No schema needed • Data is always visible Databases • Schema and scripting language needed for editing • Data is mostly out of site unless programming is used to run queries or build views
  • 4. LIVE DEMO – BASIC ORIENTATION • Create/open/import project • Basic navigation • The zones of central viewing area; the functions of the “All” column vs. the other columns • Export options • Undo/redo • Facet/filter
  • 5. LIVE DEMO – EXPLORING & TRANSFORMING • Faceting options • Flag and remove • Common transforms • Transform; Add column based on this column • GREL • search/replace with multiple commands • cell.cross • Split/join cells
  • 6. GETTING STARTED Are you seeing this error when you open a project? You can ignore it. It is trying to reach the Freebase service that no longer exists.
  • 7. USEFUL GREL OPERATIONS Search and replace - value.replace (",","") “Atlanta, GA” becomes “Atlanta GA” You can combine multiple commands together by connecting them with periods. value.replace (",","").replace (":","") “Atlanta, GA: 30303” becomes “Atlanta GA 30303”
  • 8. USEFUL GREL OPERATIONS Replace (transform) the values in your current column with those from another column in the same project: cells["column"].value where column represents the name of the column you are getting the values from
  • 9. USEFUL GREL OPERATIONS Concatentation: Adding a string to the value of the current column – "added string" + cells["current column"].value Combining the values of two columns - cells["column1"].value + " " + cells["column2"].value Note – if any of the cells have blank values, problems will arise: see http://kb.refinepro.com/2011/07/merge-2-columns-that-have-both- blank.html
  • 10. USEFUL GREL OPERATIONS Changing the date format of a string formatted date: Note: True date formats in OpenRefine are colored in green and formatted like this: 2018-10-03T00:00:00Z. But you may have imported dates that retained their text format (particularly if you turned off the option to parse text into numbers and dates during the import process, as this speeds up the import process). To transform 2018-10-03 to display just the year 2018: toString(toDate(value),"yyyy") The GREL first converts the expression to date format, takes just the year, then converts it back to string.
  • 11. USEFUL GREL OPERATIONS Import a column from a different project into your current project based on a matching column (cell.cross function): cell.cross("JSTOR 201806 JR1", "Print ISSN").cells["Reporting Period Total"].value[0] Use the “add a column based on this column” menu option on your Print ISSN column. The other project is “JSTOR 201806 JR1”, you are matching that project’s “Print ISSN” column, and you are importing that project’s “Reporting Period Total” column.
  • 12. CLUSTERING DEMO Clustering – a semi-automated process to identify groups of different values that might represent the same thing, then correct or normalize them: “organization” AND “organisation” “New York” AND “new york“ “François Mauriac” AND “Francois Mauriac”
  • 13. RECONCILIATION A service that semi-automates the process of matching data in your project to authoritative data in other sources, for example: • VIAF (Virtual International Authority File) • FAST (Faceted Application of Subject Terminology) • Library of Congress Subject Headings • Journal TOCs Other reconcilable data sources
  • 14. RECONCILIATION Wikidata reconciliation is the only built in service. Any others must be added. To reconcile against only the LC source in VIAF: http://refine.codefork.com/reconcile/viafproxy/LC From the column menu: Reconcile: Start reconciling… Step 1 Step 4 Step 3 Step 2
  • 15. RECONCILIATION Choose: • what type of entity to reconcile against • if you want it to auto match candidates with high confidence
  • 16. RECONCILIATION Next steps: • Verify the matched titles. The links will take you to the LC Name Authority File records so you can check. • Select matches for the unmatched titles by either clicking the single or double check marks: the single check mark matches just that cell; the double check mark matches all identical cells
  • 17. RECONCILIATION Now you have a list of proper LC headings. To get the match IDs for the column you just reconciled: • Edit Column – Add column based on this column • Name the new column • In “Expression” box enter: cell.recon.match.id
  • 18. ADDITIONAL RESOURCES • Using OpenRefine (2013), by Ruben Verborgh and Max De Wilde A somewhat dated but still useful book that provides a comprehensive introduction to OpenRefine. • Cleaning Data with OpenRefine: https://libjohn.github.io/openrefine/ An excellent tutorial developed by John Little at Duke University Libraries.
  • 19. ADDITIONAL RESOURCES • OpenRefine’s Documentation page: http://openrefine.org/documentation.html Links to several online courses and an extensive curated tutorial list • Official documentation and reference for the General Refine Expression Language (GREL): https://github.com/OpenRefine/OpenRefine/wiki/Documentatio n-For-Users#reference
  • 20. ADDITIONAL RESOURCES • Reconciling author names using Open Refine and VIAF: http://iphylo.blogspot.com/2013/04/reconciling-author-names- using-open.html • Reconciling Smithsonian Library data with VIAF: https://allysonota.weebly.com/uploads/5/7/9/6/57968819/ota_viaf .pdf • Reconciliation in OpenRefine, videos by Owen Stephens https://www.youtube.com/watch?v=q8ffvdeyuNQ (part 1) https://www.youtube.com/watch?v=q8ffvdeyuNQ (part 2)