Welcome, thank you for attending our session today, I’m Rebekah Cummings….
As you can probably tell from our title, today we are going to talk today about the intersection between digital humanities and libraries really looking at three things:
How libraries support digital humanities and some of the ways we are natural collaborators in this endeavor, How libraries are engaging in DH projects looking at several case studies of interesting DH projects happening in libraries across the country Lastly, we are going to touch on on some emerging trends at the intersection of libraries and DH that draw on the expertise of both fields.
Because this is a diverse group today, I don’t want to assume that everyone here knows exactly what happens in libraries. As researchers I’m sure you all know the library is where you can go to find valuable resources, but you may not know that in addition to being a service profession, librarians at academic libraries also conduct their own research in the field of library and information science.
Librarians also have very strong professional ethics on promoting access to information, universal education, and public service.
In her presentation, “Why the Digital Humanities” Lisa Spiro gives five motivations for why we should engage in digital humanities work. [read list] When you think about the role of libraries and the ethics of librarians, it’s easy to see the natural affinity between libraries and the digital humanities.
That being said, there are many ways that libraries can and have been supporting the digital humanities. Anna and I are going to look closer at some of the items on this list throughout this presentation, but the one point I want to emphasize with this slide. I think the most obvious role of the library is as a physical space for DH projects to happen. We are usually large buidllings centrally located on campus. We are a neutral space which works well for the interdisciplinary nature of DH work, but there are also a host of services and expertise that the library has to offer DH projects.
Libraries have a long history of digitization Chronicling America – huge newspaper project Hathi Trust – supporting large scale text mining projects for books Utah American Indian digital archive – large collaborative project between the Native American tribes of Utah, University of Utah Marriott Library, Utah Dept of Heritage and Arts, and the American West Center, producing a regional history portal
Just the J Willard Marriott Library alone hosts over 2 million digital photographs, newspaper pages, maps, books, audio recordings, and other items. Many more great collaborators on digitization here in Utah
Metadata – or descriptive information that allows you to search for If you digitize something, you need to have the metadata to make it discoverable and useful, Librarians are used to creating metadata in a variety of formats, making them ideal collaborators for digital humanities projects that require wrangling information.
Valley of the Shadow – this is an early digital history site developing archive of primary source documents from Civil War communities, started in the 1990s, needed major preservation and reformatting in 2009, took two years to complete.
Many DH projects involve tool building, developing online exhibits, or developing online archives or visualizations.
Not everything that we place on the web has to last forever, but it is always great to at least consider this issue when starting a new project. Librarians might already be engaged in large scale digital preservation, and they might have the infrastructure available already that can help.
Challenges Evolving standards and best practices – we might do something 5 years ago that we needed to do more work in a certain area, like including geospatial information in our metadata so items can be easily browseable on a map. Just like everything, trends in what libraries do, for example in descriptive metadata, there was a movement towards user based tags or folksonomies that is shifting back to controlled vocabularies. When is enough enough? We’re currently engaged in a large scale review of our digital collections metadata at the Marriott Library (we have over 2 million items) to spot areas for improvement.
It’s not just standards and best practices that have changed over time. There have been significant technical improvements since libraries started digitizing in the mid to late 90’s. One common feature of digitization is optical character recognition or OCR, which is the conversion of images of typed, printed, or handwritten text into machine encoded text that can be searched by scholars. OCR has continued to get better over time with less rates of error and the ability to scan more irregular text. New projects are emerging such as the EMOP project which is now attempting to make early prints with its variability of baselines and fonts into text that can be OCR’d and mineable for DH projects.
These rapid technological improvements do present a challenge for libraries though. Over the years, we’ve scanned a lot of things with bad OCR. At what point do you rescan? And how much work shoud we do to clean up OCR to make sure it’s completely free of mistakes?
The last challenge we are going to mention today – with our limited time – is the challenge of anticipating future potential uses of our digitized collections. We are already looking back at past digitization projects thinking, “If we had only done it this way, it would have been so much better!” For instance, many of our rights statements don’t allow for public and flexible use of digital collections. Much of our data is locked up in proprietary vendor based repository systems that don’t really allow people to get at our data easily. Only recently do our deeds of gift include how donated materials can be shared online. As Anna mentioned, some of the ways that we captured information such as geographic coverage, didn’t account for the GIS work that DH projects often require. At what point do we have to stop creating metadata knowing that our time and resources are finite.
We simply can’t anticipate every potential future use of our digital collections.
While we are talking about ways that libraries are supporting dh, still a lot of work to be done. Some libraries have developed collaborations with digital humanities centers placed inside the library, but a recent survey showed that of the libraries who are supporting digital humanities projects, this is being done in an informal or ad hoc basis.
Make a note here that these are some of the things that lsome libraries do, but not all: “And, as my colleagues and I found when we conducted a survey through the Association of Research Libraries, with the exception of a few well-known programs, most library-based DH is being done in a very piecemeal fashion. Forty-eight percent of survey respondents described their libraries’ digital humanities support as “ad hoc” (Bryson, Varner, Pierre, & Posner, 2011, p. 16). Relatively few libraries have dedicated digital humanities centers or programs, and many existing initiatives are still in the developmental stages.” https://docs.google.com/document/d/1tch6xW7bh_vbJOzG7xYs9z6yX-ATC68x-Fg-BRVD298/edit
One example of ways libraries make DH projects happen can be found with Chronicling America, the largescale newspaper digitization effort by the Library of Congress and the National Endowment for the Humanities has formed the basis of many digital humanities projects, like Viral Texts which explores reprinting patterns in early newspapers, and the Image analysis for archival discovery which is developing ways to automatically track poems published in newspapers.
Couldn’t have imagined when this project started over 10 years ago that people would be making use of the data in this way. At the same time, funding priorities and rights restriction for this particular project ensure that people can explore newspapers this way up only until 1922.
Also, there are many digital newspapers that aren’t in this database. For example, digitalnewspapers.org from the U of U has been adding many more papers they aren’t in Chronicling America. Still a place for someone to do large scale newspaper aggregation.
Another interesting case study of how libraries are experimenting in DH is the “Robots Reading Vogue” project out of Yale University Libraries. The Yale Library purchased over 6 terabytes of data from the Vogue Archive and conducted experiments in topic modeling, n-gram search, and colormetric analysis. This research looked at over 2,800 covers and 440,000 pages of Vogue, which of course holds many stories about the way we think as a culture about beauty, women, race and gender, and a host of other topics.
Here is just one example of text mining done of the Vogue data. You can see here the frequency of the descriptive words lovely, pretty, beatiful, and sexy. One of the most interesting things I see is how the word “sexy” was almost non-existent in Vogue until the 1960’s and it spiked in the 1970’s. By 2000 “sexy” had surpassed lovely and pretty in sheer usage although it’s usage has dipped in the new millenia and “pretty” has made a bit of a comeback. Apparently the word “beautiful” has always been in fashion.
What I love about this example is not just that it’s a really cool DH project, but that it highlights another role that libraries can play, which is acquiring datasets for these large, expensive projects. While I don’t know the exact cost of the vogue dataset, it is in the five figures. Although our collection budgets are far from limitless, in the future I think it’s entirely possible that libraries will play a larger role acquiring and stewarding datasets for DH projects.
The next case study that we are going to look at is from a team of UCLA students taking a capstone class in the Digital Humanities. These five students worked closely with experts at the Getty Research Institute to look through GRI’s extensive database, the Getty Provenance Index, which holds 1.5 million records on the sale and transmission of artwork. Among the students’ findings were that Nazis used “code words” to designate art objects seized from Jewish families and that Roman families tended to hang artwork depicting particular subject matter in specific rooms in their houses.
What I love about this project is that not only is the student group composed of people from information studies, art history, comp. lit, and musicology but that it relied heavily on data and expertise from the library, museum, and archive world.
- Metadata (linked data) - Packaging digital library objects for digital humanists (ex: Doc South) - Data curation and managing research assets - Multiple Access Points - MWDL/ DPLA – Dan Cohen and Libraries!
Linked data is a term coined by Tim Berners Lee to describe the Semantic Web, a way of structuring information that in a way that provides more opportunities for use and reuse and augmentation than the web pages we are often using now – data can be queried, visualized, mashed up, and enhanced in new ways. Like this map of connections for Louis Armstrong from the Linked Jazz project, that allows researchers to explore transcripts of interviews and the connections between musicians.
This is another example of a library linked data project that showcases the relationships behind an archive of primary source documents. Data structure behind this allows you to see the relationships between people, places, and groups. Clicking on any of the areas of this map will bring up the primary source documents that allow you to explore history in a new way. Returning browsing and serendipity to an online environment.
This is a small snapshot of the variants in one person’s name that we can see regionally across many libraries in the mountain west region. One of the first steps in being able to do those types of linked data projects I was just discussing is having specific terms and clean metadata.
A human can glance at all those name variants and realize that those are all the same person, but computer aren’t as smart as us!
Waiting to hear back about a collaborative grant where a bunch of libraries in the mountain west region can explore processes and software to start representing our information as Linked Data, maybe we’ll be back next year to talk about it!
From University of North Carolina at Chapel Hill, they’ve released primary source documents marked up in TEI for researchers to download raw data files so they can fully explore the collection with whatever visualization tools they want to use.
Interesting way of presenting traditional library digital collection materials by giving researchers direct access to all the information associated with an item, not just a scanned image and some descriptive metadata. Creates a free digital humanities dataset.
This would also take a ton of time compared to what we’re currently doing now with digital collections, but could be a great collaborative project with scholars who are interested in learning markup!
Another emerging trend in libraries is the ability to access archival materials online and from multiple access points. Libraries have been digitizing and putting archival collections online for some time, but now that data is being aggregated into larger pools of information so that it can be found more easily. Regionally, we’ve been doing this work for almost 15 years through the Mountain West Digital Library, which provides free access to almost a million records from nearly 200 member institutions in the Mountain West region. MWDL then serves up those records to the Digital Public Library of America where they can be searched alongside data from over 11 million items from across the country.
How is this useful for DH? Now if you are looking for primary source documents on a subject like Japanese American internment camps you can compare information from the National Archives and the Topaz Museum in Delta in one easy to use interface on DPLA.
I think it’s also worth noting that the Executive Director of the Digital Public Library of America is DH scholar Dan Cohen who has done incredible work promting DPLA to all kinds of communities from academics to K-12 students.
The power of an aggregator like DPLA goes beyond just findability of archival materials. DPLA is also an excellent data source with an open API that allows for bulk downloads of data and allows you to access and manipulate library data without even going to the DPLA website. They host hackathons all across the country and encourage people to play with the data and develop apps that are then put in the DPLA App Library on so that users can search and experiment in new ways.
The last emerging trend that I am going to talk about is the growth of data management and data curation in libraries. This is actually the role that I play at the Marriott Library, working with researchers to help them manage and share their data according to best practices. While this service has a long history in the sciences and social sciences, it is gaining popularity with humanities scholars as well. One of the reasons for this is because NEH’s Office of Digital Humanities, the Bill and Melinda Gates Foundation, and others are requiring data management plans as part of grant proposals. That means that humanists are now having to think about their data from the beginning of their project – what data they will be collecting or mining, how to describe and organize it, how it will be stored during the project, and shared, if possible, at the completion of their project.
Digital humanities requires a diverse set of skills both subject specific and technical. It also requires manipulating, organizing, housing, and visualizing large amounts of information, which is where librarians can be useful collaborators. We hope this has given you a good overview of some of the areas where DH and libraries intersect and compliment each other. We’re excited to be part of the conversation here in Utah as the digital humanities community comes together in new ways through this symposium!
Your digital humanities are in my library! No, your library is in my digital humanities! How libraries are enabling and engaging in digital humanities projects
Your digital humanities are in
my library! No, your library is
in my digital humanities!
How libraries are enabling
and engaging in digital
Rebekah Cummings, Research Data Management Librarian, University of Utah
Anna Neatrour, Metadata Librarian, University of Utah
Utah Digital Humanities Symposium, Utah Valley University
February 26, 2016
Why do digital humanities?
1. Provide wide access to cultural information
2. Enable manipulation of that data
3. Transform scholarly communication
4. Enhance teaching and learning
5. Make a public impact.
-Lisa Spiro, “Why the Digital Humanities?”
How libraries support DH
• Offer physical/neutral
• Offer virtual server space
• Librarians as project
• Find data/ acquiring data
• Metadata creation/
• Librarians as collaborators
• Digital preservation
• Data management
DH Support: Digital Preservation
• Early digital archive of
documents, started in
• Required major
reformatting in 2009,
helped with substantial
grant funding http://valley.lib.virginia.edu/
Challenge #2 – Technological Improvements
Text Scans - University of
Virginia Libraries - 1998
The Initiative for Digital
Humaities, Media, and
Culture at Texas A&M
Challenge #3 – Anticipating future use
of digitized materials
• Rights statements
• Inflexible vendor based repository systems
• Deeds of gift that cover future use
• Structuring metadata to enable GIS/spatial
• Robust metadata
How [48% of] libraries support DH
Survey of libraries
humanities in 2008 found
that only a few had a
dedicated center for DH,
with almost half the
respondents reporting that
DH services were provided
on an ad hoc basis,
By Staecker - Own work, Public Domain,
Case Study: Robots Reading Vogue
Peter Leonard, Librarian for Digital Humanities Research, Yale University Library
Lindsay King, Public Services Librarian, Yale University Library
Emerging Trend: Libraries and Linked
Pratt Institute School of
Library Information Science
Emerging Trend: Libraries and Linked
Kansas City Public Library
and many partners
Linked Data: Regional Library Work
Authority Control, or one
name to rule them all
● Savage, C. R. (Charles Roscoe),
● C. R. Savage (Charles Roscoe
Savage and George Ottinger),
Pioneer Art Gallery, East
Temple Street, Salt Lake City,
● Charles R. Savage
● C. R. (Charles Roscoe) Savage,
● Savage, C. R. C.R. Savage Bust Portrait, BYU Special
Emerging Trend: Digital Library Objects
designed for DH
Emerging Trend: Multiple Access
Open APIs Hack-a-thons Visualizations
Bulk data downloads Twitterbots
Emerging Trend: Data Management
“The activity of managing digital materials for
research: digital curation, digital stewardship,
data curation, digital archiving.” – Trevor Munoz,
University of Maryland
Librarians as DH collaborators
• Subject knowledge
• Data visualization
• Data organization and
• Identify and acquire
• Physical space
Libraries + Digital
Two great things that
go great together!
• Dh+lib Community, http://acrl.ala.org/dh/
• Lisa Spiro, “Why Digital Humanities?”
• Trevor Munoz, Digital Humanities in Libraries isn’t a Service,
• Miriam Posner, “No Half Measures: Overcoming Common Challenges to
Doing Digital Humanities in the Library,” Journal of Library
Administration 53:1 (January 2013)
Thank you! Questions?
Rebekah Cummings, Research Data
Anna Neatrour, Metadata