I’m so excited to be here to talk about some of the innovative work we’re doing at NYPL. We’ve been going through some major changes within NYPL Labs and NYPL Digital and I don’t know if there could be a more appropriate backdrop to this talk about change than what’s happening today in the US. I want to talk today about some of the new directions we’re taking and a new aggregation project we’re taking on at NYPL,
but I want to focus that discussion within the lens of change--whether it’s political, economic, or organizational--I would guess that no one here is a stranger to disruptive change,
but until recently, I’ve been lucky enough in my career that I’ve never really had to think seriously about how change can negatively affect access and preservation of digital cultural heritage, and how we can help it become more resilient to changes in the digital world.
In my capacity within NYPL Labs, as manager of metadata services, I’ve had the opportunity to work on both the innovative R&D side of digital cultural heritage and the production and application maintenance side, so I’ve gotten multiple perspectives on how we provide and sustain access to cultural heritage materials on the web, and I’ve seen how there are different challenges to stewarding digital materials online than there are in the physical archive. So when I got the prompt for this session, “Disruptive innovation in aggregation,” I bristled a bit at the term “disruptive innovation” applied to cultural heritage organizations. Who or what exactly are we disrupting with innovation? Our competitors--google? Vendors? Other libraries? Or are we talking about disrupting our own institutions? Old modes of operations and thinking? If we mean the latter, I want to challenge this notion of disruptive innovation to argue that perhaps we are disrupted enough by external change and should instead focus our innovation on mitigating the very real risks to cultural heritage online by exploring new ways to build systems, practices, and organizational cultures that support its integrity in spite of disruptive change.
Probably the most obvious risk to cultural heritage is that access will be disrupted. It’s amazing to me that in an age of maturing digital libraries we still see digital projects and exhibitions going online without sustainability plans.
One project my metadata production team is working on right now is the migration of ~600 holocaust memorial books from one of NYPL’s earliest but most heavily visited digital sites to our Digital Collections repository and website as well as DPLA.
This has been a heavy lift, involving locating the master tiffs, repairing file headers, re-ingesting and normalizing metadata, and adding access points to replicate some of functionality of the original site. Migration can be a costly process, and it’s one that libraries don’t think to budget into core operations. And while it’s easy to get grants to pursue innovation, it’s nearly impossible to secure new money for maintenance. When we don’t consider the costs of maintaining access to cultural heritage with new innovative projects, the technical debt we incur may be something we can never pay off.
Another risk to the integrity of cultural heritage is the endangerment of user understanding through the loss of context that can result from successive generations of metadata aggregation, metadata crosswalks to other schemas, or metadata enrichments where quality is not carefully considered. Our physical archives and their service models are structured and controlled in way that puts context and provenance at the forefront of the user experience. In the unmediated environment of the web, it is incredibly easy to separate artifacts from their archival or curatorial context which is a disservice not only our materials and culture, leaving it open to misinterpretation, but to users who are denied all available background and cues to understanding and properly and respectfully reusing this content.
I’m so happy to see lately larger conversations addressing issues of cultural sensitivity, ethics, and privacy in serving and describing cultural heritage collections on the web. Just last week, I read an excellent case study by archivist Elvia Arroyo-Ramirez on her experience soliciting help on processing a born digital collection with filenames containing non-Latin characters.
In the text of her talk, she quotes MIT library director Chris Bourg in cautioning us about our default approaches to processing and providing access to digital, that “despite the democratizing promise of technology… the digital tools we build and provide are likely to reflect and perpetuate stereotypes, biases, and inequalities.” While the tools and protocols of the web may serve many core functions of our daily lives well, it is easy to ignore the fact that the creators of many of these tools and their attitudes towards privacy, ethics, and cultural sensitivity are not representative of most users of the web and may not consider the unintended consequences of their technologies. It’s important to recognize that the web protocols available on the web cannot provide the same respect for cultural norms and fine-grained controls over access, reuse, and description that archivists and curators can give in the reading room.
So knowing these risks, what can we do about it? We often don’t have control over when people, funding, or priorities change, but we can acknowledge and anticipate these inevitable changes and build resilient systems, processes, and services that can fail gracefully and adapt quickly. Now I am just an armchair academic, if that, but in the face of many changes the past few months, I’ve been reading up on the area of resilience thinking, and while I am no expert in this subject at all, I’ve found it a useful frame for questioning and evaluating the digital library work we do in support of cultural heritage.
Resilience thinking grew out of the field of socio-ecological economics in the 1970s and has since gained popularity in many fields interested in managing the effects of change on complex systems. I frequently hear libraries aspirationally refer to their interconnected applications as “ecosystems” so perhaps this is at least a good analogy for looking at these issues.
Now it might seem that the work commonly thought of as “innovation” has had little to do with building resilient core systems and processes, in fact it’s probably a fair criticism of innovation labs to ask that with all of the points of potential failure within the library, why should we direct resources towards projects that seem to exist along the edges of usefulness to core library services? It’s a criticism we’ve become familiar with at Labs, coming from colleagues within the library community and even from within our own institution, but I want to explore, through this lens of resilience thinking, how some of the innovative work we’ve done in Labs has (maybe unintentionally) shown characteristics of the processes that support resilient systems and shows promise for application in our development in core products and services.
The first characteristic of resilient systems I want to explore is redundancy. By distributing responsibility for a system’s or service’s operations across many components or constituents, you reduce the risk of catastrophic loss when one component fails.
In January of this year, we released over 180,000 of our Digital Collections items to the public domain. But we did not stop at merely stamping these items with an open license. We provided hi-resolution tiff images for free download, offered a data dump of corresponding metadata
and developed creative and engaging digital demo projects to encourage the public to share in ownership of our cultural heritage through reuse and remix. While I don’t advocate this as a library’s primary digital preservation strategy, getting more copies of our digitized images out into the world and getting the public to feel more responsible for preservation is one way we generate awareness for a very hidden but core function of the library.
Another way to build resiliency is to increase the number and diversity of backgrounds and perspectives contributing to a system’s design or operations. This characteristic is most evident in the crowdsourcing projects that Labs is most well-known for.
One of common design principles in projects such as What’s on the Menu, Building Inspector, Map Warper, Scribe, and Oral History Transcript Editor
is the breaking down of complex jobs into their most simplest and engaging tasks to encourage participation from the widest range of users possible, regardless of age, background, or level of education.
Another way to support resiliency is through tight feedback mechanisms. When we monitor the effects of our systems and actions, we can more easily develop ways to react and respond to that feedback and to help our systems heal more quickly. This is an area we have just begun exploring in Metadata Services through early data quality audits and planning for ongoing monitoring, but if we’re looking at this as more analogy that reality, Labs’ model of working out in the open, publishing open-source code is a good example.
For most of our projects, we host our source code in github while we’re building it.
We’ve also seen benefits for the larger cultural heritage community by sharing our code open-source, with numerous reuses, like our Oral History Transcript Editor
which was adopted by the State Library of New South Wales in Australia. We also make efforts to get out and talk about our works in progress, not just to share our work, but to invite feedback from peers friends in the library community and beyond.
Modularity has recently become a more critical guiding principle for Labs application development. With a number of recent projects, we’ve recognized the possibility in being able to easily scale or adapt an application by composing it of modular, pluggable components that can be utilized as needed as priorities or resources shift. You can see this modularity evident in the github repositories for these projects.
This architecture diagram for our recent local aggregation project--The Registry-- shows the many components that make up this complex project,
with each corresponding to its own discrete Github repo. With this last project, The Registry, we’ve had a chance to take these principles of resilience and apply them to the process of bringing innovation into production. The Registry grew out of a charge to imagine how we could surface our hidden collections and make them more accessible to the public.
Because we describe books, archival materials, records, works of art, and their digital surrogates, we currently have four major online systems for description with many of these materials represented in more than one system.
In addition to these four core systems, we have resource descriptions in countless dictionary catalogs, unpublished inventories, databases, excel spreadsheets, bibliographies, and more.
We also have a large body of staff at NYPL who have been here for years who have encyclopedic knowledge of their domains.
We’ve also extracted so much structured data from our collections through our crowdsourcing projects, but these datasets still sit in their respective projects’ databases.
So, there’s much potential lost in this hidden data, not to mention information sitting in the minds of our staff and the public. So we wondered, what if we could truly let anyone say anything about anything? What if we could use linked data to empower people to connect their knowledge and stories to our cultural record?
The result was a framework for both discovery and description of our collections, particularly our unique resources, to provide a common language for existing data silos using RDF, and a low-barrier way for staff and even users to add or enhance descriptions of resources to this collections graph.
This was a data model and aggregation project that drew heavily from the fine work of some of the best aggregators--Europeana and DPLA--and tried to use common patterns and vocabularies, such as the entification of people, organizations, and topics.
We experimented with enrichment using data from the OCLC Classify service, VIAF, Wikipedia, Hathi Trust, and created a prototype interface to our data to roadtest our model through indexing and search.
At this point we had aggregated data from our 4 major data silos and had a lot of refinement, enrichment and quality improvement to do when our R&D work was disrupted by a necessary but abrupt shift in priorities to reconfigure our entire NYPL Digital department and solve probably the least Labs-iest research problem, but a very critical institutional problem of aggregation--
our library catalog interface. A recent collaboration in shared physical storage necessitated that we adjust our catalog infrastructure to display our partners’ holdings and serve them to our users, something not easily supported by our current vendor.
In approaching this we were also asked to address some other issues in discovery--to provide a smooth and seamless fulfillment experience, integrate electronic resources, and to clarify user interface issues that make it difficult for users to distinguish between our circulating branch materials and resources only available in the research centers. We assessed possible vendor solutions but decided that if we were to take on this work, we should take two steps forward instead of entering into another vendor contract.
Since we had already invested work in the Registry model, we decided to apply this to our catalog problem to lay the foundation for future integrations with our archives portal, our prints and photographs catalog, our digital collections and any number of card files, indexes, spreadsheets or datasets converted to our core metadata profile. This work will also allow us to leverage work ids from OCLC to provide users with work-based clustering, and it will allow us to provide agent entities to better manage name authorities and provide users with a new kind of jumping off point for research. We will also start to bridge this work with our digitized cultural heritage, starting with links and embedded viewers for digital content.
In our new team configuration for this project, we brought our experiences to the table to come up with some guiding principles for design. Unsurprisingly, what comes out in these values are some of the same resilience-building strategies, such as inclusion, diversity, and feedback, that we’ve been infusing our past projects with. But part of this shift in priorities has also necessitated an integration of Labs staff R&D staff into the whole of our larger NYPL Digital org. While I have been highlighting Labs’ work up unto this point, our Discovery project team has been a collaboration of amazingly talented members from across NYPL Digital, and this new diversity of perspectives and past experience, I think, has only strengthened our ability as team to bring this project to a successful alpha stage on a very tight deadline.
Of all the resilience-building strategies we’ve applied to this project, modularity has been one of our brightest guiding lights. Not only will building atomistic but interconnected components reduce the risk of failure when staff turnover or a technology becomes obsolete, it will allow us to do more work in parallel, so we can iterate more quickly.
We are using an API-based approach to keep our messaging between components abstracted from their specific technologies and using a plugin-based architecture made up of different serializers for applying metadata enrichments.
We’ve also come to understand the importance of decoupling our data from our system--using RDF as our data structure will allow us to migrate between data storage and serialization technologies and to communicate more easily with external applications. We are also conscious of building a data model and metadata application profile that can speak a common language with our peers in this area, and so we’ve drawn from existing and emerging vocabularies and patterns.
We’ve realized that we will also need to apply another resilience-building strategy--building trust and cooperation among our library stakeholders to make this project a success, so at these early stages, meeting minimum requirements while providing a stable level of quality is key. We’ve used some placeholder properties to mimic more closely the traditional catalog experience--we won’t be showing agent entities or converting all of our LCSH headings to FAST for our early internal releases. But we will gradually add these features more prominently, particularly as we can improve quality, to show stakeholders the potential that lies in our data.
Most importantly, we’ve been conscious of leveraging our existing networks and building new relationships with stakeholders at many levels across the library through user interviews, shadowing, to ensure we get a diversity of feedback as we iterate through versions of this platform. We recognize that while we have been experimenting on the edges, these colleagues have been maintaining core systems and services and managing the risk to cultural heritage the best they can with the tools and resources available to them, so rather than disrupting the critical stewardship work they’re doing, we can innovate together. Does this mean we won’t return to the creative, boundary-pushing projects we’ve become known for? I don’t know for sure, but I hope it doesn’t because while innovating for resilience is necessary to our success, so is innovating for our drive, our spirit, and hope for the future.
Europeana Network Association AGM 2016 - 9 November - Speaker Shawn Averkamp
Change at NYPL
Shawn Averkamp, Manager of Metadata Services,
9 November 2016
How do you handle
How do we innovate to make cultural
heritage more resilient to disruptive
Change challenges online cultural
Different stewardship challenges exist for cultural heritage
online than for physical archives
What’s at stake?
User understanding, context
Cultural sensitivity, privacy, respect
What’s at stake: Continued access
Risks: Disruption due to neglect, aging websites,
costs of migration, high staff turnover, little
What’s at stake: User comprehension
Risks: Loss of context and meaning due to
successive generations of aggregation, “dumbing
down” from one metadata schema to another,
What’s at stake: Cultural sensitivity,
Risks: Limited ability to apply cultural norms,
fine-grained access, and discovery controls
Cultural sensitivity, privacy, respect
“...despite the democratizing promise of
technology… the digital tools we build and provide
are likely to reflect and perpetuate stereotypes,
biases, and inequalities.”
-- Chris Bourg
Can we reduce risk by building
“...the capacity of a system to absorb disturbance and
reorganize while undergoing change so as to still retain
essentially the same function, structure, identity, and
-- Brian Walker and David Salt
“...a way of looking at why some systems
collapse when they encounter shock, and some
-- Rob Hopkins
What can innovation
labs teach us about
“Redundancy provides ‘insurance’ within a
system by allowing some components to
compensate for the loss or failure of others.”
-- Stockholm Resilience Centre
“If a variety of people participate, from a diversity
of backgrounds and perspectives, it can uncover
perspectives that may not be acquired through
more traditional scientific processes.”
-- Stockholm Resilience Centre
Tight feedback mechanisms
“Feedbacks are the two-way ‘connectors’ between
variables that can either reinforce (positive
feedback) or dampen (negative feedback)
-- Stockholm Resilience Centre
“...bringing the results of our actions closer to
home, so that we cannot ignore them.”
-- Rob Hopkins
“Resilience capacity will be increased when
system components have enough independence
that damage or failure of one part or component
of a system is designed to have a low probability
of inducing failure of other similar or related
components in the system.”
● What if linked data could turn old databases and
description into new points of discovery to our resources?
● What if linked data was a low-barrier way for staff to share
their expertise and institutional knowledge about our
● What if linked data could empower our community to
connect their knowledge and stories to the cultural record
we steward for them?
What if we could let anyone say anything about anything?
The Registry: goals
Give all of our collections and items URIs, so anyone can
say anything about them.
Make it easy for our staff to add hidden or underdescribed
resources to this graph.
Build in mechanisms for staff and the public to say more
about these resources and connect them to others and
the wider web.
Provide public endpoints, so developers at NYPL and
beyond can build things on top of our data.
Discovery: short-term objectives
1. Make shared collections available (NYPL, Princeton U.,
2. Enable electronic hold request of materials for easier
3. Make e-resources more discoverable and integrated
4. Improve the usability of the online catalog
5. Implement and support emerging industry standards
Discovery: long-term objectives
1. Minimize the steps and complexity required to connect a patron
with a resource, physical or digital.
2. Evolve the OPAC from an inventory system to a discovery
3. Empower staff to influence and shape how patrons interact and
4. Lay the foundations of a platform to power NYPL projects for
the next 5-10 years.
5. Advance the field of cultural heritage discovery through
innovation and demonstration.
Modularity -- systems
Serializer plugins for data enrichments
OCLC Classify: Work IDs, LCC, FAST
Future plugins: Worldcat data, Wikidata, VIAF,
Modularity -- data
Data model based on:
Portland Common Data Model (PCDM) : structural
DPLA/Europeana : descriptive metadata for discovery
BIBFRAME : item/copy-level metadata for fulfillment
Organization schema : Research center and division entity
SKOS : entities, controlled vocabularies, codelists
Trust and cooperation -- soft feature
Meet immediate, core requirements at reasonable levels of
Temporary properties for literal representations of more
Demonstrate value of new enrichments
Clustering by OCLC Work IDs
Related items by added LC Classifications
Trust and cooperation -- networks
Leverage existing connections across the library for
feedback and to share progress
User interviews with stakeholders we don’t yet know
Shadow front-facing staff to learn about user needs
This presentation: http://bit.ly/NYPLatEuropeana2016
Bourg, Chris (2015). “ Never neutral: Libraries, technology, and inclusion.”
Hopkins, Rob (2016). “Resilience thinking.”
ResilientCity.org (2016). “Resilient design principles.”
Stockholm Resilience Centre (2016). “GRAID at Stockholm Research Centre.”
Walker and Salt (2008). Resilience Thinking. Island Press
(slide 2) Wide World Photos, Inc. “Three forks,”
Unknown. (1907). “Interior work : construction of Astor Hall and the Fifth Avenue
Hine, Lewis Wickes (1921). “A third year high school girl in the chemical laboratory
of a rural consolidated school, October 1921.”