This is a beginners video developed to give new users to MarcEdit's regular expression syntax a primer and examples on how to use the language. It provides information on strategies, resources, and hopefully, some useful hints to help get people started.
These slides accompanied a youtube video which is available at: https://youtu.be/7YXvS4xBEfw
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Getting Started with Regular Expressions in MarcEdit
1. Getting Started with
Regular Expressions
in MarcEdit
TERRY REESE
HEAD OF DIGITAL INITIATIVES, THE OHIO STATE
UNIVERSITY
2. Topics
MarcEdit Regular Expression Support Information
Understanding .NET Regular Expressions
◦ Major components of the language
◦ Understanding grouping mechanisms and references
How Does MarcEdit implement expressions
Getting Regular Expression Help
3. MarcEdit Regular Expression
Support
Functions that presently support regular expressions
◦ Delete Field
◦ Edit Field
◦ Copy Field
◦ Swap Field
◦ Build New Field
◦ Extract/Delete Records
◦ Validation Processing
◦ Linked Data tooling
◦ More…
4. MarcEdit Regular Expression
Support
When processing regular expressions with MarcEdit, MarcEdit makes
entire fields or subfields available for processing
◦ i.e., when processing a delete field function – all data from =[field number]
are part of the field that can be queried.
MarcEdit’s regular expression by default deals with one field at a time
(i.e., regular expressions do not allow you to find data across fields by
default)
MarcEdit’s Regular Expression Support is defined by Microsoft .NET’s
Regular Expression object
◦ This object uses a syntax that looks Perl-like, but has some differences.
5. Microsoft’s Regular Expression
language
Concepts:
◦ Character escapes
◦ Anchors
◦ Character classes
◦ Grouping
◦ Qualifiers
◦ Substitutions
MSDN Documentation: https://msdn.microsoft.com/en-
us/library/az24scfc(v=vs.110).aspx
PDF Quick Reference:
http://download.microsoft.com/download/D/2/4/D240EBF6-A9BA-4E4F-
A63F-AEB6DA0B921C/Regular%20expressions%20quick%20reference.pdf
6. How we use Regular
Expressions in MarcEdit
Your most important parts of the regular expression language are:
1. Character escapes: drn$x##
2. Character Classes [] & [^]
3. Grouping Elements ()
4. Anchors: ^$
5. Quantifiers: *?+{#}
6. Substitutions: $#
7. How Expressions Manifest in
MarcEdit
Part of understanding regular expressions in
MarcEdit, is understanding what data is exposed to
the Regular expression engine.
Each of MarcEdit’s global edit functions see different
levels of data
This is important to understand when:
Creating processing strategies
Knowing which global editing function to choose
9. Replace Function
Provides:
Access to all field data
Can be processed across fields
(lines)
Can do preconditional
sorting/evaluation before
evaluating for replacement (can
search for data in one field, and
then perform and action on
another if true)
Provides most access to record
data for evaluation
11. Add/Delete Function
Provides:
Access to all field data from the
equal sign to end of line
No option to evaluate across fields
Only available when deleting data
13. Edit Field Data Function
Provides:
Access all data after the indicators
(no indicator or field data access)
Can be used to break up fields into
new fields and do recursive
searching
16. Regular Expression Basics
I like to think of regular expressions the same way as I think of
diagraming a sentence.
http://www.english-grammar-
revolution.com/images/puzzler_words_october_2012.jpg
17. Regular Expression Basics
I am trying to look at the data I want to replace and break it into its
component parts. For example if I wanted to add a period to the 500 if
it is missing
Source Fields:
=500 $aPrime meridians: Greenwich and Washington
=500 $aPrime meridians: Greenwich and Washington?
Structure:
Expression: (=500.*[^W])$
18. Examples
Looking at example.txt using the replace function:
◦ Add a period to the 500 if it is missing
◦ Add a $h of cartographic resources between the $a and $c .
◦ Split the 856 into two fields, breaking on the $u.
19. Examples 1
◦ Add a period to the 500 if it is missing
◦ Find What: (=500.*[^W])$
◦ Replace With: $1.
Explanation:
◦ (=500.*[^W])$
◦ Searches for the 500, then matches all data in the line, until you get to the final character. It
then evaluates the final character to see if it’s a not a word character
20. Example 2
◦ Add a $h of cartographic resources between the $a and $c .
Find What: (=245.{4})($a.*)(/.*)
◦ (=245.{4})
◦ Match the 245 field with any value in the next 4 characters being valid.
◦ ($a.*)
◦ Select everything within the subfield a
◦ (/$c.*)
◦ Select the / value and the subfield c (and other data)
Replace With: $1$2$$h[cartographic resource] $3
21. Example 3
Split the 856 into two fields, breaking on the $u.
◦ Find What: (=856.{4})($u.*[^$])($u.*)
◦ (=856.{4})
◦ Matches the 856 field
◦ ($u.*[^$])
◦ Match $u, but stop at the end of the subfield
◦ ($u.*)
◦ Match reminder of field
◦ Replace With: $1$2n=856 41$3
22. Lcase/ucase
MarcEdit’s regular expression engine includes to extension functions for
dealing with case switching of characters.
◦ lcase & ucase
◦ Usage: (=450.{4})($a.)(.*)
◦ $1$2lcase($3)
◦ Example: Find the 500 with all upper case characters and convert the case of
all values but the first letter in the sentence to lower case.
23. Multi-Field Replacements
By default, MarcEdit handles one field at a time when doing regular
expressions.
◦ However, when you need to do evaluations against multiple fields, you can
by adding /m to the end of your replacement in the Replace Function in the
MarcEditor
◦ This is a special function added to the MarcEdit regular expression engine
24. Delete Field Function
The delete field function exposes all the data in the field to be acted
upon as a regular expression.
◦ i.e. =856 .*
◦ So the first value in the Delete Field evaluation is an =, not the subfield data
◦ The reason to do this is to allow for explicit evaluations of indicators.
25. Getting Regular Expression
Help
The MarcEdit Listserv has a number of regular expression experts that
provide a lot of help to users looking for it
http://metis3.gmu.edu/cgi-bin/wa?A0=MARCEDIT-L