SlideShare a Scribd company logo
1 of 23
We’ll start at the top of the ½
hour
And as a reminder – this session will be recorded (please keep videos off
since I am recording today)
Merging, Clustering, and
Integrations…oh my!
Terry Reese
Topics
This may be a bit of an eclectic session – as I’m interested
in covering a range of topics that are coming up during
this period of remote work.
merging records
Integrations (oclc and alma)
building clustering tools
moving marc data in and out of openrefine
Merge Records
Tool
Allows users to merge MARC data
from two files
Allows users to merge unique data,
selected data and all data.
Merge Records Tool
Merge Record functions with two modes
1. Merging data across two files
◦ Use Case: Merging one or more fields from one file into another based on a match point
2. Merging Duplicate data into a single record set
◦ Use Case: Source file has multiple duplicates or versions of records. The goal is to merge all like
records into one using a match point.
Merging Records
Merging Data Across Two Files
◦ This mode is activated by:
◦ Source File and Merge File are different
Merging Records
Merging Data: Consolidating data in the same
file
◦ This mode is activated by:
◦ Source file and Merge File are the Same
Merging Records
How merging occurs
◦ Matches occur by setting the Identifier
◦ Identifier is either a field or field + subfield
◦ Merges can be multiple fields – delimited by a pipe. For
example: 001|901$a
◦ What if there is no common match point?
◦ You can use the MARC21 Option
◦ Unicode Encoded option chances how field data
is evaluated. By default, values are evaluated as
binary data – but by selecting this option, data
will be evaluated as characters. This is important
when working with multibyte languages
Let’s talk about
Integrations
How MarcEdit Works with Alma
• https://developers.exlibrisgroup.com/alma/apis/bibs
• Because the API is rate limited (i.e., you can only process so many transactions concurrently through
the API, and all Alma operations use the API), MarcEdit limits API processes to a single thread. It
takes a little longer, but eliminates the possibility that using MarcEdit to automate workflows will
bring down your system because the tool is trying to communicate with the system too quickly.
MarcEdit works through the following API endpoints:
• Edit holdings data (and Holdings Records)
• Create and Update bibliographic data
• Extract Records
• Though discovery should be done via Z39.50 or SRU (which is preferred)
This this API, MarcEdit can:
Working with OCLC Connexion
https://youtu.be/a7Cen0gxFCw?list=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg
Call Number
Classifications
Integrates with OCLC Classify Service
◦ Provides LC or Dewy Numbers
◦ Processes data based on OCLC
Number, ISBN, ISSN, Author/Title
Pair
◦ Generates Call Numbers based on
the most widely used call number
Working with
OCLC’s
Metadata API
MARCEDIT CAN WORK DIRECTLY
WITH WORLDCAT VIA THE
METADATA API.
MarcEdit: Batch
WorldCat
Holdings
Management
MarcEdit: Batch
Bibliographic
Record Upload
MarcEdit has
OpenRefine
Options
EXPORT AND IMPORT DATA FROM
MARC INTO FORMATS OPENREFINE
CAN READ AND UNDERSTAND
Cluster Format
Support
MARC support
• The tool provides direct manipulation of MARC data,
with clustering options to edit at the field, or subfield
level, with clustering criteria set at the field/subfield
level
• MARC data process supports any character encoding
Delimited Data Support
• New as of 3/2/2018 (on Windows/Linux; Mac coming)
• Allows for indexing by column
• Initial implementation limits editing to files with 100
or fewer columns
• Delimited formats supported:
• Tab Delimited
• Comma Delimited
Why not just
use
OpenRefine?
If you know how to use OpenRefine, please use it
• OpenRefine provides a much wider and rich set of functionality
and designed specifically to help users deal with data issues found
within largely unstructured data
But MARC is a pain
• While OpenRefine has some MARC importing tools, the tooling
specifically to MARC isn’t particularly good or intuitive
• OpenRefine has some practical data (file size) limits when run on a
desktop/client
MarcEdit’s clustering support was built with catalogers
in mind
• To make working with library formats easy
• To minimize the need to migrate data between different formats
(and risk losing information in the conversion processes)
Clustering in
MarcEdit 7
MarcEdit’s built-in clustering tools
support native grouping and batch
editing and works well on file sizes of
a million records and smaller (can
work on large sets, but the larger the
file, the longer the cluster operation
takes)
Clustering
Options
• Levenshtein Distance
• This algorithm is best for people, places, and subjects
• This algorithm builds clusters based on the number of
positions/character difference between a word or phase
• This algorithm is generally faster
• Composite Coefficient
• This algorithm is best for highly variable data where a great deal
of fuzziness is desired.
Clustering Algorithms
• Default – normalized data tokens
• Reese, Terry and Terry Reese become reeseterry and terryreese
• Works best for spelling errors, form changes
• Fingerprint tokens
• Reese, Terry and Terry Reese both become reese terry
• Works best when you are unsure of the quality of your data
Token Types
Clustering
Changes
Clustered changes are queued and
stacked. Changes happen once all
edits have been set.
Clustered changes can be made by
group, across groups, or selected
items within a group
Additional
Clustering
Enhancements
Clustering support works on non-MARC data as well
◦ Specifically, tab delimited formatted data
◦ Supports the extraction of data based on the clusters selected
(rather than changing data)
Questions

More Related Content

Similar to MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh my!

1-informatica-training
1-informatica-training1-informatica-training
1-informatica-trainingKrishna Sujeer
 
Fitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemFitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemTerry Reese
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02Guillermo Julca
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on EverythingDavid Phillips
 
Java Abs Java Productivity Creator & Analyzer
Java Abs   Java Productivity Creator & AnalyzerJava Abs   Java Productivity Creator & Analyzer
Java Abs Java Productivity Creator & Analyzerncct
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsCalpont
 
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic ApplicationsBuilding High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic Applicationsguest40cda0b
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackRohit Sharma
 
data stage-material
data stage-materialdata stage-material
data stage-materialRajesh Kv
 
Intro to Data Structure & Algorithms
Intro to Data Structure & AlgorithmsIntro to Data Structure & Algorithms
Intro to Data Structure & AlgorithmsAkhil Kaushik
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Mark Kromer
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platformLuis Cabaceira
 
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce ijujournal
 
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEHMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEijujournal
 

Similar to MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh my! (20)

1-informatica-training
1-informatica-training1-informatica-training
1-informatica-training
 
stigbot_beta
stigbot_betastigbot_beta
stigbot_beta
 
Fitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemFitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystem
 
Presto
PrestoPresto
Presto
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
Java Abs Java Productivity Creator & Analyzer
Java Abs   Java Productivity Creator & AnalyzerJava Abs   Java Productivity Creator & Analyzer
Java Abs Java Productivity Creator & Analyzer
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
 
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic ApplicationsBuilding High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic Applications
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
data stage-material
data stage-materialdata stage-material
data stage-material
 
Intro to Data Structure & Algorithms
Intro to Data Structure & AlgorithmsIntro to Data Structure & Algorithms
Intro to Data Structure & Algorithms
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
hive lab
hive labhive lab
hive lab
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platform
 
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
HMR Log Analyzer: Analyze Web Application Logs Over Hadoop MapReduce
 
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEHMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
 

More from Terry Reese

MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...Terry Reese
 
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...Terry Reese
 
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A PrimerMarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A PrimerTerry Reese
 
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEditMarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEditTerry Reese
 
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...Terry Reese
 
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit MacMarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit MacTerry Reese
 
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...Terry Reese
 
Rejoining the Information access landscape
Rejoining the Information access landscapeRejoining the Information access landscape
Rejoining the Information access landscapeTerry Reese
 
Open metadata, open systems…redrawing the library metadata landscape
Open metadata, open systems…redrawing the library metadata landscapeOpen metadata, open systems…redrawing the library metadata landscape
Open metadata, open systems…redrawing the library metadata landscapeTerry Reese
 
Getting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEditGetting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEditTerry Reese
 
Thinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage WorkflowThinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage WorkflowTerry Reese
 
The world beyond MARC: let’s focus on asking the right questions
The world beyond MARC: let’s focus on asking the right questionsThe world beyond MARC: let’s focus on asking the right questions
The world beyond MARC: let’s focus on asking the right questionsTerry Reese
 
Reframing Public Housing: Visualization and Data Analytics in History
Reframing Public Housing: Visualization and Data Analytics in History Reframing Public Housing: Visualization and Data Analytics in History
Reframing Public Housing: Visualization and Data Analytics in History Terry Reese
 
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEditTerry Reese
 
Preparing Catalogers for Linked data
Preparing Catalogers for Linked dataPreparing Catalogers for Linked data
Preparing Catalogers for Linked dataTerry Reese
 
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...Terry Reese
 
Practical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic dataPractical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic dataTerry Reese
 
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNextAALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNextTerry Reese
 
Making RDA Easy(er) with MarcEdit
Making RDA Easy(er) with MarcEditMaking RDA Easy(er) with MarcEdit
Making RDA Easy(er) with MarcEditTerry Reese
 
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...Terry Reese
 

More from Terry Reese (20)

MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
 
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
 
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A PrimerMarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
 
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEditMarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
 
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
 
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit MacMarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
 
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
 
Rejoining the Information access landscape
Rejoining the Information access landscapeRejoining the Information access landscape
Rejoining the Information access landscape
 
Open metadata, open systems…redrawing the library metadata landscape
Open metadata, open systems…redrawing the library metadata landscapeOpen metadata, open systems…redrawing the library metadata landscape
Open metadata, open systems…redrawing the library metadata landscape
 
Getting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEditGetting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEdit
 
Thinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage WorkflowThinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage Workflow
 
The world beyond MARC: let’s focus on asking the right questions
The world beyond MARC: let’s focus on asking the right questionsThe world beyond MARC: let’s focus on asking the right questions
The world beyond MARC: let’s focus on asking the right questions
 
Reframing Public Housing: Visualization and Data Analytics in History
Reframing Public Housing: Visualization and Data Analytics in History Reframing Public Housing: Visualization and Data Analytics in History
Reframing Public Housing: Visualization and Data Analytics in History
 
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
 
Preparing Catalogers for Linked data
Preparing Catalogers for Linked dataPreparing Catalogers for Linked data
Preparing Catalogers for Linked data
 
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
 
Practical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic dataPractical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic data
 
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNextAALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
 
Making RDA Easy(er) with MarcEdit
Making RDA Easy(er) with MarcEditMaking RDA Easy(er) with MarcEdit
Making RDA Easy(er) with MarcEdit
 
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
 

Recently uploaded

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 

Recently uploaded (20)

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 

MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh my!

  • 1. We’ll start at the top of the ½ hour And as a reminder – this session will be recorded (please keep videos off since I am recording today)
  • 3. Topics This may be a bit of an eclectic session – as I’m interested in covering a range of topics that are coming up during this period of remote work. merging records Integrations (oclc and alma) building clustering tools moving marc data in and out of openrefine
  • 4. Merge Records Tool Allows users to merge MARC data from two files Allows users to merge unique data, selected data and all data.
  • 5. Merge Records Tool Merge Record functions with two modes 1. Merging data across two files ◦ Use Case: Merging one or more fields from one file into another based on a match point 2. Merging Duplicate data into a single record set ◦ Use Case: Source file has multiple duplicates or versions of records. The goal is to merge all like records into one using a match point.
  • 6. Merging Records Merging Data Across Two Files ◦ This mode is activated by: ◦ Source File and Merge File are different
  • 7. Merging Records Merging Data: Consolidating data in the same file ◦ This mode is activated by: ◦ Source file and Merge File are the Same
  • 8. Merging Records How merging occurs ◦ Matches occur by setting the Identifier ◦ Identifier is either a field or field + subfield ◦ Merges can be multiple fields – delimited by a pipe. For example: 001|901$a ◦ What if there is no common match point? ◦ You can use the MARC21 Option ◦ Unicode Encoded option chances how field data is evaluated. By default, values are evaluated as binary data – but by selecting this option, data will be evaluated as characters. This is important when working with multibyte languages
  • 10. How MarcEdit Works with Alma • https://developers.exlibrisgroup.com/alma/apis/bibs • Because the API is rate limited (i.e., you can only process so many transactions concurrently through the API, and all Alma operations use the API), MarcEdit limits API processes to a single thread. It takes a little longer, but eliminates the possibility that using MarcEdit to automate workflows will bring down your system because the tool is trying to communicate with the system too quickly. MarcEdit works through the following API endpoints: • Edit holdings data (and Holdings Records) • Create and Update bibliographic data • Extract Records • Though discovery should be done via Z39.50 or SRU (which is preferred) This this API, MarcEdit can:
  • 11. Working with OCLC Connexion https://youtu.be/a7Cen0gxFCw?list=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg
  • 12. Call Number Classifications Integrates with OCLC Classify Service ◦ Provides LC or Dewy Numbers ◦ Processes data based on OCLC Number, ISBN, ISSN, Author/Title Pair ◦ Generates Call Numbers based on the most widely used call number
  • 13. Working with OCLC’s Metadata API MARCEDIT CAN WORK DIRECTLY WITH WORLDCAT VIA THE METADATA API.
  • 16. MarcEdit has OpenRefine Options EXPORT AND IMPORT DATA FROM MARC INTO FORMATS OPENREFINE CAN READ AND UNDERSTAND
  • 17. Cluster Format Support MARC support • The tool provides direct manipulation of MARC data, with clustering options to edit at the field, or subfield level, with clustering criteria set at the field/subfield level • MARC data process supports any character encoding Delimited Data Support • New as of 3/2/2018 (on Windows/Linux; Mac coming) • Allows for indexing by column • Initial implementation limits editing to files with 100 or fewer columns • Delimited formats supported: • Tab Delimited • Comma Delimited
  • 18. Why not just use OpenRefine? If you know how to use OpenRefine, please use it • OpenRefine provides a much wider and rich set of functionality and designed specifically to help users deal with data issues found within largely unstructured data But MARC is a pain • While OpenRefine has some MARC importing tools, the tooling specifically to MARC isn’t particularly good or intuitive • OpenRefine has some practical data (file size) limits when run on a desktop/client MarcEdit’s clustering support was built with catalogers in mind • To make working with library formats easy • To minimize the need to migrate data between different formats (and risk losing information in the conversion processes)
  • 19. Clustering in MarcEdit 7 MarcEdit’s built-in clustering tools support native grouping and batch editing and works well on file sizes of a million records and smaller (can work on large sets, but the larger the file, the longer the cluster operation takes)
  • 20. Clustering Options • Levenshtein Distance • This algorithm is best for people, places, and subjects • This algorithm builds clusters based on the number of positions/character difference between a word or phase • This algorithm is generally faster • Composite Coefficient • This algorithm is best for highly variable data where a great deal of fuzziness is desired. Clustering Algorithms • Default – normalized data tokens • Reese, Terry and Terry Reese become reeseterry and terryreese • Works best for spelling errors, form changes • Fingerprint tokens • Reese, Terry and Terry Reese both become reese terry • Works best when you are unsure of the quality of your data Token Types
  • 21. Clustering Changes Clustered changes are queued and stacked. Changes happen once all edits have been set. Clustered changes can be made by group, across groups, or selected items within a group
  • 22. Additional Clustering Enhancements Clustering support works on non-MARC data as well ◦ Specifically, tab delimited formatted data ◦ Supports the extraction of data based on the clusters selected (rather than changing data)