Seeing a Butterfly & Knowing What It Is: BHL: Past > Present > Future. Martin R. Kalfatovic. 2019 BHL Annual Meeting. Cornell University, Ithaca, NY. 30 April 2019.
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Seeing a Butterfly & Knowing What It Is: BHL: Past > Present > Future
1. Free & Open Access to
Biodiversity Literature
Seeing a Butterfly & Knowing What It Is
Martin R. Kalfatovic
@UDCMRK & @BHLProgDirector
30 April 2019 | Ithaca, NY
BHL: Past>Present>Future
3. A dark Vanessa with a crimson band
Wheels in the low sun, settles on the sand
And shows its ink-blue wingtips flecked with white.
And through the flowing shade and ebbing light
A man, unheedful of the butterfly –
Some neighbor’s gardener, I guess – goes by
Trundling an empty barrow up the lane.
Vladimir Nabokov, Pale Fire (1962)
7. BHL is a Global Consortium
20 MEMBERS
AS OF MAY 2019
22 AFFILIATES
80+ WORLDWIDE PARTNERS
8. MEMBERS
• American Museum of Natural History Library
• BHL Australia
• BHL México
• Cornell University Library
• Field Museum of Natural History Library
• Harvard University Botany Libraries
• Harvard University, Museum of Comparative
Zoology, Ernst Mayr Library
• Library of Congress
• The LuEsther T. Mertz Library, The New York
Botanical Garden
• Missouri Botanical Garden, Peter H. Raven
Library
• Museum für Naturkunde Berlin
• Muséum national d’Histoire naturelle
• National Library Board, Singapore
• Natural History Museum Library, London
• Royal Botanic Gardens, Kew, Library, Art &
Archives
• Smithsonian Libraries
• United States Department of Agriculture, National
Agricultural Library
• University Library, University of Illinois Urbana-
Champaign
• University of Toronto Libraries
• Yale University
9. AFFILIATES
• Academy of Natural Sciences of Drexel
University, Library and Archives
• Auckland War Memorial Museum
• BHL Africa
• BHL China
• BHL Egypt
• BHL SciELO (Brazil)
• Bibliothèque cantonale et universitaire -
Lausanne
• California Academy of Sciences Library
• Canadian Museum of Nature
• Chicago Botanic Garden, Lenhardt Library
• Internet Archive
• Lloyd Library & Museum
• Los Angeles County Arboretum & Botanic Garden
• Marine Biological Laboratory/Woods Hole
Oceanographic Institution Library (MBLWHOI
Library)
• Mendel Museum
• Narodni Museum (National Museum, Prague)
• Natural History Museum Los Angeles County
• Naturalis Biodiversity Center
• Oak Spring Garden Foundation
• Smithsonian Institution Archives
• United States Geological Survey Libraries
Program
• University Library Johann Christian Senckenberg
14. Executive Committee
BHL GOVERNANCE
BHL Members’ Council
IMMEDIATE
PAST-CHAIR
Dr. Nancy E. Gwinn
Smithsonian Libraries
CHAIR
Constance Rinaldo
Harvard, Ernst Mayr
Library, MCZ
VICE-CHAIR
Jane Smith
Natural History
Museum, London
SECRETARY
Doug Holland
Missouri Botanical
Garden
16. 2019.03 Catalogue of Life Global Team
Meeting (Champaign) | 25 people
2019.03 Plant Humanities (DC)
2018.12 Plant Humanities (DC)
2018.10 Global Summit of Research
Museums (Berlin) | 100 people
2018.07 Botany 2018 (Rochester) | 50
people
2018.06 2nd Annual Digital Data in
Biodiversity Research Conference
(Berkeley)
2018.06 I Annotate 2018 (San
Francisco) | 100 people
2018.05 EuropeanaTech (Rotterdam)
Presentations & Meetings
17.
18.
19. Presentations & Meetings
2018.02 FEDLINK Great Escape Tour (DC)
| 20 people
2018.03 2018 BHL Annual Meeting (Los
Angeles) | 35 people
2018.03 2018 BHL Annual Meeting
Reception (Los Angeles) | 35 people
2018.04 National Library Week Open
House (DC) | 60 people
2018.06 Smithsonian Libraries Intern Open
House (DC) | 50 people
2018.06 eBooks at the Smithsonian (DC) |
20 people
2018.06 Special Libraries Association
Annual Meeting (Baltimore) | 38 people
2018.07 IVLP China Group Tour (DC) | 12
people
2018.08 TDWG + SPNHC 2018 Joint
Meeting (Dunedin) | 100 people
2018.09 American Libraries Association
Visiting Staff Tour (DC) | 10 people
2018.10 Smithsonian Coast to Coast (San
Diego) | 100 people
20. Continuing Education
Grace Costantino
Completed program at the University
of Denver for a graduate certificate in
Communication Management with a
concentration in Marketing
Communication.
The skills obtained from that
program have been instrumental in
many of the strategies and initiatives
launched for BHL, including the BHL
Members' Dashboards.
21. Presentations & Meetings
2018.03 2018 BHL Annual Meeting
(Los Angeles) | Presentation | 35
people
2018.04 Field Museum training
(Virtual) | Workshop | 6 people
2018.05 California Academy of
Sciences training (Virtual) | Workshop |
1 people
2018.07 Oak Spring Garden
Foundation + Chicago Botanic Garden
training (Virtual) | Workshop | 6 people
2018.07 Museum für Naturkunde + UB
Johann Christian Senckenberg
(Virtual) | Workshop | 5 people
22. Other Highlights
Attentive contribution to BHL routine
operations and behind the scenes work
keeping the BHL wheels greased
Implemented an improved alternative
BHL Staff call schedule in an attempt to
offer more ad hoc training opportunities
and alleviate workload on Secretariat
staff.
Lead 6 conference calls on time and on
task with an average of 16 BHL Staff as
well as 3 fruitful discussions with several
Partners during the ad hoc sessions.
Independently identified bug in BHL
Reporting Statistics and clearly
articulated the need to resolve the issue
with the Technical Team. As a result BHL
collection stats are now more accurate
and reporting documentation has been
updated.
23. Workshops & Orientations
2018.02 Orientation to California
Academy of Sciences | 4 people
2018.03 Orientation to University
Library Johann Christian
Senckenberg | 8 people
2018.05 Orientation to Museum
fur Naturkunde Berlin | 4 people
2018.05 Orientation to Lloyd
Library and Museum | 3 people
2018.08 Orientation to Auckland
War Memorial Museum | 3
people
2018.12 Orientation to Natural
History Museum of Los Angeles
County | 5 people
26. Technical Advisory Group
2018 Technical Development
Martin R. Kalfatovic
BHL Program Director
Carolyn Sheffield
BHL Program Manager
Mike Lichtenberg
BHL Developer
Joel Richard
Smithsonian Libraries
Susan Lynch
The New York
Botanical Garden
28. BHL Dark Storage
(Smithsonian)
• Year 1 (2016) BHL: 109 TB
• Year 2 (2017) BHL: 122 TB
• Year 3 (2018) BHL: 135 TB
• Year 4 (2019) BHL: 148 TB
• Year 5 (2020) BHL: 162 TB
29. New Service: Full Text Search
Search across the text of all 55+ million pages in BHL!
Search results display hits for search terms within both
the bibliographic information + the full text of books in BHL.
Filter search results by content type, publication date, subject,
language, and author with new faceted browsing.
Use “search inside” to search for terms within a book you
are viewing.
30. Improved Service: Updated APIs
The BHL Application Programming Interface (API) is
a set of REST-like web services that can be invoked
via HTTP queries (GET/POST requests) or SOAP.
Responses can be received in one of three formats:
JSON, XML, or XML wrapped in a SOAP envelope.
31. Augmented Content: Transcription
Transcriptions in BHL have been moved to production (21
November 2018). Those in the initial test phase have been adding
transcriptions visible in the BHL UI. Once we have a somewhat
sizable mass of transcriptions available, we will begin promoting on
social media, most likely in early 2019. We’ll keep you all in the
loop on those plans so that you might cross-promote during that
time should you so choose.
Some existing examples:
• Harvard MCZ
• NYBG
• SIA
• Museum Victoria
44. 2018 Dues Spending
Income
Income + Carryover $292,212.66
Member Dues $196,637.91
Affiliate Dues $24,977.00
Donations $9,566.00
Carryover $60,914.45
Outreach Revenue $117.30
45. 2018 Dues Spending
Secretariat
BHL Program Director (50%) (S&B) $0.00
BHL Program Manager (S&B) $0.00
Collections Coordinator (S&B) $0.00
Community Manger (S&B) $0.00
Global Coordinator Funding $0.00
Travel & other expenses (includes Exec Comm) $17,994.30
Other equipment (computers, etc.) $0.00
Gemini, Wiki, Flickr, & Metrics Subscriptions $1,639.31
Persistent Identifiers
TOTAL $19,633.61
46. 2018 Dues Spending
Technical Development
BHL Tech Director (S&B) $0.00
Data Analyst (S&B) $0.00
Travel $0.00
Equipment $3,034.19
Contract Programmer $164,800.00
TOTAL $167,834.19
47. 2018 Dues Spending
Digitization & Indirect
Internet Archive $9,050.00
Indirect Costs $950.00
Scanning support (FedEx, other transportation) $448.35
Macaw support $0.00
Cluster Storage Support $0.00
Smithsonian Storage Costs $0.00
TOTAL $10,448.35
48. 2018 Dues Spending
Meetings & Conferences
BHL Members travel $5,772.38
BHL Meeting cost $8,405.32
TOTAL $14,177.70
53. A score of small butterflies, all of one kind,
were settled on a damp patch of sand, their
wings erect and closed, showing their pale
undersides with dark dots and tiny orange-
rimmed peacock spots along the hind-wing
margins; one of Pnin’s shed rubbers disturbed
some of them … revealing the celestial hue of
their upper surface... “Pity Vladimir
Vladimirovich is not here,” remarked Chateau.
“He would have told us all about these
enchanting insects.”
Vladimir Nabokov
Pnin (1957)
60. 56+ MILLION
PAGES
TITLES VOLUMES
147,000+ 242,000+
188+ MILLIONINSTANCES OF TAXONOMIC NAMES
790+IN-COPYRIGHT TITLES LICENSED FOR BHL
AGREEMENTS
WITH 350+
LICENSORS
*Stats as of 17 March 2019
69. Where
Chicago, IL, The Field Museum
When
May 2019
Why
Making the Case for Natural History
Collections
Who
BHL Poster: BHL and Specimen
Collection Data: The Needle in the
Festuca Stack
71. Constance Rinaldo. Introduction to the Symposium:
Improving access to hidden scientific data in the
Biodiversity Heritage Library
Roderic Page. Text-mining BHL: towards new
interfaces to the biodiversity literature
Nicole Kearney. It’s Not Always FAIR: Choosing the
Best Platform for Your Biodiversity Heritage
Literature
Dmitry Mozzherin. Finding scientific names in
Biodiversity Heritage Library, or how to shrink Big
Data
Martin R. Kalfatovic. BHL and Specimen Collection
Data: The needle in the Festuca stack
Gretchen Stahlman. Facilitating Data Discovery in
BHL Texts through Innovative Text Mining Strategies
and Community Driven Initiatives
Improving access to hidden scientific
data in the Biodiversity Heritage Library
74. Feedback Management Plan
Improve BHL FAQ on BHL About Site to answer
most commonly asked questions. Many are
already addressed but more could be added.
Retain feedback web-form as is but include it as
part of the BHL FAQ; as answers to the questions:
A) “How can I submit feedback to the BHL?” B)
“How can I submit a request for a book to be
digitized?”
Redirect current “Feedback” link in BHL website
header to BHL FAQ on About Site. Change the
link to read “FAQ” instead.
Redirect “Report an Error” button in BHL book
viewer to BHL FAQ on About Site. Change the alt
text to read “FAQ” instead.
Feedback@biodiversitylibrary.org will continue to
work as normal for BHL users and Staff. This
email automatically forwards to the Gemini issue
tracking system; generates one new Gemini ticket
per email.
Goal to implement changes in June.
Future iteration of BHL website should include a
flagging function (similar to the one on the Internet
Archive website) that allows users to more easily
indicate which materials contain metadata errors,
missing pages, gaps in a series, etc.
75. Feedback Management Plan
Improve BHL FAQ on BHL About Site to answer
most commonly asked questions. Many are
already addressed but more could be added.
Retain feedback web-form as is but include it as
part of the BHL FAQ; as answers to the questions:
A) “How can I submit feedback to the BHL?” B)
“How can I submit a request for a book to be
digitized?”
Redirect current “Feedback” link in BHL website
header to BHL FAQ on About Site. Change the
link to read “FAQ” instead.
Redirect “Report an Error” button in BHL book
viewer to BHL FAQ on About Site. Change the alt
text to read “FAQ” instead.
Feedback@biodiversitylibrary.org will continue to
work as normal for BHL users and Staff. This
email automatically forwards to the Gemini issue
tracking system; generates one new Gemini ticket
per email.
Goal to implement changes in June.
Future iteration of BHL website should include a
flagging function (similar to the one on the Internet
Archive website) that allows users to more easily
indicate which materials contain metadata errors,
missing pages, gaps in a series, etc.
76. WORMS & COL & COL+
Catalogue of Life & Catalogue of Life Plus
meeting. Urbana, 19-20 March 2019
World Register of Marine Species
(WoRMS). 12 March 2019
77. NAME FINDING
Name Strings
188,147,990
Show Unique Name Strings
30,128,831
The number of unique name
strings.
Show Verified Name Strings
9,972,029
The number of unique and
verified name strings. A
verified name is one that has
been resolved against a name
authority (NameBank,
Catalogue of Life, etc).
78. Top Name Searches
/name/Ornithorhynchus_anatinus: 2,717 (0.07%)
/name/Canis_lupus_familiaris: 1,723 (0.04%)
/name/Acinonyx_jubatus: 1,624 (0.04%)
/name/Raphanus_sativus: 1,425 (0.04%)
/name/Ursus_maritimus: 1,359 (0.03%)
87. Technical Advisory Group
2019 Technical Development
Martin R. Kalfatovic
BHL Program Director
Bianca Crowley
BHL Collections
Manager
Mike Lichtenberg
BHL Developer
Joel Richard
Technical Coordinator
Susan Lynch
The New York
Botanical Garden
NEW ROLES
89. BHL Dark Storage
(Smithsonian)
• Year 1 (2016) BHL: 109 TB
• Year 2 (2017) BHL: 122 TB
• Year 3 (2018) BHL: 135 TB
• Year 4 (2019) BHL: 148 TB
• Year 5 (2020) BHL: 162 TB
90. Technical Goals: 2019
Evaluate Metadata Model Document
Review/Revise DOI Strategy
Incremental improvements
BHL EVO
Plan for IIIF implementation
Review/Revise BHL Data Model
System Planning
95. BHL EVO Goals: 2019
System Requirements
Overall things the system should do. This flows into the API.
Data Requirements
What rules and such are needed for data integrity, etc.
UI Requirements
What things should users be able to do in the User Interface
Data Model
A description of the actual data model we will be using
Components and Implementation
Possible ideas on what tools or technologies to use for
implementation.
96. DRAFT System Requirements
IIIF Image Viewer
Rich REST API
RDBMS
Strong Authentication Layer / API Keys
Support for Digitized Page Images, PDFs, Artwork Images, others?
E type of thing needs a data model
Basic User Functions
Login
Create Lists
Save Searches
Advanced User Functions
Crowdsourcing of certain content (OCR/Transcriptions, Corrections, etc)
Separation from Internet Archive - BHL is the source of the data
Send to Internet Archive content that is added to BHL and for which we have
permission
Update Internet Archive with structural and metadata changes at BHL for which we
have permission
Continue to ingest content from IA that is pertinent to BHL and not specifically in the
BHL collection.
97. DRAFT Components & Implementation
Authentication Layer
Processes users, access tokens, API keys, etc.
Supports crowdsourcing
Supports blocking, rate limits, blacklisting, etc.
Needs a permissions and API-space model
Needs a technology (JWT? NodeJS? Tyk.io?)
Calls the API
Every request goes through this layer
API
REST-Based
Handles Data requests
Handles Asset Requests
Needs a technology (Node
Does not worry about permissions, Lives behind the API
98. DRAFT Components & Implementation
Asset Store & Delivery
REST-Based (question)
Handles and serves all media (Images, PDF, OCR, Audio, etc)
Needs a data model
Needs a technology
Needs some aspects of digital preservation
Does not worry about permissions, Lives behind the API
Data Store
REST-Based
Needs a data model (NoSQL, TripleStore, WikiBase, RDBMS)
Needs a technology (Apache Cassandra, MongoDB, etc)
Does not worry about permissions, Lives behind the API
99. DRAFT Components & Implementation
Search Index
Handles Searching
Handles Indexing Content
All requests pass to the Authentication Layer
The website is just another "user" of the system
Automated Processing
Also another "user" of the system
Nightly processes, reports, maintenance, etc
Filtering System
Conversion of one format to another
100. DRAFT Components & Implementation
Website
Plain website with lots of AJAX
IIIF Viewer
PDF Viewer
Media Player (HTML5)
All requests pass to the Authentication Layer
The website is just another "user" of the system
102. CASH & IN-KIND CONTRIBUTIONS
DIRECT STAFF
$1,660,625.88
VALUE
OF
MEMBER & AFFILIATE
CONTRIBUTIONS 2017
OTHER
$227,690.32
2016
VS
2017
TOTAL IN-KIND
CONTRIBUTIONS
2016
$1,817,543.82
2017
$1,888,316.20
21.6
TOTAL MEMBER &
AFFILIATE FTEs
WORKING ON BHL
IN 2017
105. I cannot separate the aesthetic
pleasure of seeing a butterfly and the
scientific pleasure of knowing what it
is.
Vladimir Nabokov
Interview in Sports Illustrated (1959)
107. The Tragedy of the Commons
…and the current success of BHL
The tragedy of the commons develops in this way. Picture a
pasture open to all. It is to be expected that each herdsman
will try to keep as many cattle as possible on the commons.
Such an arrangement may work reasonably satisfactorily for
centuries because tribal wars, poaching, and disease keep the
numbers of both man and beast well below the carrying
capacity of the land. Finally, however, comes the day of
reckoning, that is, the day when the long-desired goal of social
stability becomes a reality. At this point, the inherent logic of
the commons remorselessly generates tragedy.
Garrett Hardin. "The Tragedy of the Commons." Science (13 Dec 1968): Vol. 162, Issue 3859, pp. 1243-
1248. DOI: 10.1126/science.162.3859.1243
108. BHL is a Global Partnership
… how do we manage catastrophic success?*
* Generous Thinking: A Radical Approach to Saving the University
(2019) by Kathleen Fitzpatrick
• Community building becomes another demand on
resources
• Distributed development is prone to slowness
• Sustainability of non-profit projects are antithetical
to non-profit structure
109. BHL is a Global Partnership
… how do we manage catastrophic success?*
* Generous Thinking: A Radical Approach to Saving the University
(2019) by Kathleen Fitzpatrick
• Social sustainability -- how do people (institutions)
commit not just to the THING but to the
community
• Community is an imaginary construct that papers
over differences
• Community is an alibi for what should be societal
funded activities
110. BHL is a Global Consortium
…what does that mean going forward?
• The current “Member” and “Affiliate” model does
not allow for nuanced partnership with BHL
• The number of well-funded and well-staffed
libraries in natural history museums and botanical
gardens is finite
• BHL has probably tapped, globally, nearly all the
potential Members from the natural history and
botanical garden community
111. BHL is a Global Consortium
…what does that mean going forward?
• Large research institutions and national libraries
have better resources, excellent collections, but
rarely a single-minded focus on biodiversity
• Supporting growth of Members & Affiliates in the
current model with existing central staff levels is
not sustainable
• Increasing central staffing to support additional
Members and/or Affiliates is not feasible without a
quantum growth of partnerships
112. BHL succeed through effective
organization & management
… but has BHL outgrown the 2012 model?
• Does the current organizational structure allow
for:
• Effective representation of all partners?
• Provide appropriate means for input in the
governance of BHL?
• Allow for reflective and strategic planning for
BHL?
• Allow for nimble and reactive management
of BHL?
113. BHL is the largest repository of
biodiversity literature
… but how can it be more useful?
• Maximize impact of content acquisition:
• Prioritize digitization (or ingest) of high
impact collections and content
• Deprecate “yet another copy of …”
• Stop chasing content from difficult sources
• Encourage partners to work on aggregating
existing content instead of new digitization
• Seek relevant and unique local sources of content
114. BHL is the largest repository of
biodiversity literature
… but how can it be more useful?
• Prioritize curation of existing content
• Enhance metadata
• Increased pagination
• Develop new tools and services to maximize
machine use of BHL content
• Integrate BHL more deeply in the wider library
community ecosystems (human and technical)
115. BHL 2020+
How can BHL …
• Help catalyze institutional change across the
partnership to ensure success for both partners and
BHL
• Integrate BHL’s specialized content with global
scientific initiatives to engage a broader audience
• Enable collaboration with biodiversity library
collections of all sizes and levels of resources
• Increase fundraising (institutional/philanthropic)
116. BHL 2020+
Governing the Commons* …
Further, all organizational arrangements are subject
to stress, weakness, and failure. Without an
adequate theory of self-organized collective action,
one cannot predict or explain when individuals will be
unable to solve a common problem through self-
organization alone, nor can one begin to ascertain
which of many intervention strategies might be
effective in helping to solve particular problems.
Elinor Ostrom. Governing the Commons: The Evolution of
Institutions for Collective Action (1990)
117. BHL 2020+
Governing the Commons* …
The term "common-pool resource" refers to a
natural or man-made resource source system that is
sufficiently large as to make it costly (but not
impossible) to exclude potential beneficiaries from
obtaining benefits from its use. To understand the
processes of organizing and governing CPRs, it is
essential to distinguish between the resource
system and the flow of resource source units
produced by the system, while still recognizing the
dependence of the one on the other.
Elinor Ostrom. Governing the Commons: The Evolution of
Institutions for Collective Action (1990)
118. BHL 2020+
Governing the Commons* …
Making the switch, however, from independent to
coordinated ordinated or collective action is a
nontrivial problem. The costs involved in transforming
a situation from one in which individuals act
independently to one in which they coordinate
activities can be quite high. And the benefits
produced are shared by all appropriators, whether or
not they share any of the costs of transforming the
situation.
Elinor Ostrom. Governing the Commons: The Evolution of
Institutions for Collective Action (1990)
119. BHL 2020+
Governing the Commons* …
Designing and adopting new institutions to solve CPR
problems are difficult tasks, no matter how
homogeneous the group, how well informed the
members are about the conditions of their CPR, and
how deeply ingrained are the generalized norms
of reciprocity. Given the strong temptations to shirk,
free-ride, and generally act opportunistically that
usually are present when individuals face CPR
problems, overcoming such problems can never be
assured.
Elinor Ostrom. Governing the Commons: The Evolution of
Institutions for Collective Action (1990)
120. “The study of biodiversity is far and away the
most important endeavor in the history of
humanity, certainly until now, and very possibly
into the future as well .... We are building the
card catalog for the most important library that
has ever existed, and ever will exist (at least
from the perspective of humans).”
Dr. Richard L. Pyle
Bishop Museum (2010)