1. Reusing Collection Metadata as Data
Mapping the Spanish Mission Landscape Workshop
March 2, 2019 | University of Texas at Austin
Presentation by: Itza Carbajal, Latin American Metadata Librarian
2. who creates metadata?
● WHO DOESN’T is the real question
● Individuals
○ Tagging of photos, file naming, project contributions
● Information Science professionals (librarians, archivists, database managers, etc)
○ Cataloging book records
○ Access mechanisms such as finding aids, online repositories, CMS
○ Databases
● Mixed media creators
○ Film production, photography, software developers, music producers
● Publishing
○ Publication agencies, writers working with digital materials, illustrators
4. what type of metadata is typically captured?
Administrative
Metadata used in managing and administering collections and information
resources
Descriptive
Metadata used to identify and describe collections and related information
resources
Technical
Metadata related to how a system functions or metadata behaves
5. re-purpose metadata for digital scholarship
● Classroom Instruction
○ Discovery and deep group discussions
● Layered Analysis
○ Geographic Information systems
● In depth searchability
○ Transcription
6. capturing metadata
Scribe an open source framework for community transcription
built by NYPL Labs in collaboration with Zooniverse
Scraper gets data out of web pages and into spreadsheets
Optical Character Recognition (OCR) technologies -
including programs like Google Drive, Tesseract or Adobe
Acrobat that can detect text to make it searchable/readable
*Rate of accuracy varies and access to affordable software not consistent
7. accessing existing metadata
Digital Public Library of America (DPLA) open API enables people to use
millions of records describing cultural heritage resources held by
institutions across the US.
Flickr has over 5 billion photos with valuable metadata such as tags,
geolocation, and Exif data
The Europeana provides access to over 50 million digitised items – books,
music, artworks and more from thousands of European archives, libraries
and museums
HathiTrust Digital Library has more than 2 million volumes are in the
public domain and freely viewable on the Web
8. analyzing metadata
Map Warper built by NYPL Labs is a tool suite used to align (or "rectify")
historical maps to the digital maps of today.
Gephi an open-source software for network visualization and analysis of
data sets to summarize their main characteristics, often with visual
methods.
MALLET is a Java-based package for statistical natural language
processing, document classification, clustering, topic modeling,
information extraction, and other machine learning applications to text.
9. manipulating Metadata
OpenRefine - clean up messy or inconsistent data
Data Wrangler - used to merge, delete, autofill, filling in missing
data or incorporating data from another source, and move
information in your set.
Data Science Toolkit - set of open-source tools for data
science information transformation needs