The document discusses the Biodiversity Heritage Library (BHL), an open access digital library focused on biodiversity literature. It provides details on the BHL's member institutions, organizational structure, content selection and digitization processes, metadata standards, and online platform. The BHL aims to make biodiversity literature from its member institutions openly available online by digitizing books and journals, generating metadata, and developing tools for access and discovery.
Getting to know what UCT Libraries have to offer and how to use the resources @ your library. ALEPH, Databases, Subject Guides, Searching tips and techniques
Getting to know what UCT Libraries have to offer and how to use the resources @ your library. ALEPH, Databases, Subject Guides, Searching tips and techniques
Keynote presented at the International Association of University Libraries Conference (IATUL), 20 June 2017 in Bolzano, Italy.
Library metadata was created to describe objects and enable a reader to understand when they had the same or a different object in hand. Now linked data concepts and techniques are allowing us to recreate, merge, and link our metadata assets in new ways that better support discovery - both in our local systems and on the wider web. Tennant described this migration and the potential it has for solving key discovery problems.
Small pieces loosely joined: towards a unified theory of biodiversity for the...Vince Smith
Part of the symposium titled "Biodiversity science loosely joined: global approaches to taxonomy and biodiversity research supporting science and policy". At the Second DIVERSITAS Open Science Conference (DIVERSITAS OSC2). Cape Town, South Africa. Oct. 13-16, 2009.
Data mining OCLC for translations.
Creating authority records for VIAF.
Remodelling the bibliorgraphic structure to make the best mutli-lingual displays from all available data in a work set.
From Archive to Gateway: The Evolution of the Research LibraryMichael Levine-Clark
Levine-Clark, Michael, “From Archive to Gateway: The Evolution of the Research Library,” Invited. University of Utah, Friends of the Marriott Library Spring Banquet, Salt Lake City, April 9, 2013.
A talk presented January 20, 2013 in the Indo-US Joint Workshop on Biodiversity Informatics at the Ashoka Trust for Research in Ecology and the Environment in Bangalore, India.
Best Practices for Descriptive Metadata for Web ArchivingOCLC
Web archiving has become imperative to ensure that our digital heritage does not disappear forever, yet many institutions have not begun this work. In addition, archived websites are not easily discoverable, which severely limits their use. To address this challenge, OCLC Research has established the OCLC Research Library Partnership Web Archiving Metadata Working Group to develop a data dictionary that will be compatible with library and archives standards. Three reports on this project are available in late 2017, focused on metadata best practices guidelines, user needs and behaviors, and evaluation of web archiving tools.
An Overview of the Biodiversity Heritage Library. Martin R. Kalfatovic. University of Pretoria Visitor Presentation. Smithsonian Libraries. 20 September 2013.
Keynote presented at the International Association of University Libraries Conference (IATUL), 20 June 2017 in Bolzano, Italy.
Library metadata was created to describe objects and enable a reader to understand when they had the same or a different object in hand. Now linked data concepts and techniques are allowing us to recreate, merge, and link our metadata assets in new ways that better support discovery - both in our local systems and on the wider web. Tennant described this migration and the potential it has for solving key discovery problems.
Small pieces loosely joined: towards a unified theory of biodiversity for the...Vince Smith
Part of the symposium titled "Biodiversity science loosely joined: global approaches to taxonomy and biodiversity research supporting science and policy". At the Second DIVERSITAS Open Science Conference (DIVERSITAS OSC2). Cape Town, South Africa. Oct. 13-16, 2009.
Data mining OCLC for translations.
Creating authority records for VIAF.
Remodelling the bibliorgraphic structure to make the best mutli-lingual displays from all available data in a work set.
From Archive to Gateway: The Evolution of the Research LibraryMichael Levine-Clark
Levine-Clark, Michael, “From Archive to Gateway: The Evolution of the Research Library,” Invited. University of Utah, Friends of the Marriott Library Spring Banquet, Salt Lake City, April 9, 2013.
A talk presented January 20, 2013 in the Indo-US Joint Workshop on Biodiversity Informatics at the Ashoka Trust for Research in Ecology and the Environment in Bangalore, India.
Best Practices for Descriptive Metadata for Web ArchivingOCLC
Web archiving has become imperative to ensure that our digital heritage does not disappear forever, yet many institutions have not begun this work. In addition, archived websites are not easily discoverable, which severely limits their use. To address this challenge, OCLC Research has established the OCLC Research Library Partnership Web Archiving Metadata Working Group to develop a data dictionary that will be compatible with library and archives standards. Three reports on this project are available in late 2017, focused on metadata best practices guidelines, user needs and behaviors, and evaluation of web archiving tools.
An Overview of the Biodiversity Heritage Library. Martin R. Kalfatovic. University of Pretoria Visitor Presentation. Smithsonian Libraries. 20 September 2013.
An Inordinate Fondness for Data: The Biodiversity Heritage LibraryMartin Kalfatovic
An Inordinate Fondness for Data: The Biodiversity Heritage Library. Martin R. Kalfatovic. OCLC Digital Forum East 2009. November 5, 2009. Arlington, VA.
The Biodiversity Heritage Library: Collaborating Globally, Scanning LocallyMartin Kalfatovic
The Biodiversity Heritage Library: Collaborating Globally, Scanning Locally. Librarians as Digital Leaders: Collaborating on the Development and Use of Digitized Collections. American Library Association Annual Conference. Las Vegas, NV. 28 June 2014.
An International Cooperative Digital Library for Taxonomic Literature: The Bi...Martin Kalfatovic
An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library. Martin Kalfatovic. The Catholic University of America, School of Library and Information Science. LSC 715. 6 June 2008. Washington, DC.
The Biodiversity Heritage Library: Corn-fed, Missouri Raised, Going GlobalMartin Kalfatovic
The Biodiversity Heritage Library: Corn-fed, Missouri Raised, Going Global. Martin R. Kalfatovic. Missouri Botanical Garden Staff Meeting. August 19, 2009. Saint Louis, MO.
Digitizing Entomology: The Biodiversity Heritage Library @ the SmithsonianMartin Kalfatovic
Digitizing Entomology: The Biodiversity Heritage Library @ the Smithsonian. Martin R. Kalfatovic. National Museum of Natural History, Department of Entomology Staff Meeting. Martin R. Kalfatovic. November 26, 2007. Washington, DC.
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of LifeMartin Kalfatovic
Presentation at the Biodiversity Heritage Library @ Smithsonian Libraries event during ALA (June 25, 2007) held at the National Museum of Natural History. Updated and ported to PowerPoint version
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of LifeMartin Kalfatovic
Presentation at the Biodiversity Heritage Library @ Smithsonian Libraries event during ALA (June 25, 2007) held at the National Museum of Natural History
The Biodiversity Heritage Library: Origin | Growth | PartnershipsMartin Kalfatovic
The Biodiversity Heritage Library: Origin | Growth | Partnerships. Martin R. Kalfatovic. Biodiversity Heritage Library Organization and Planning Meeting. Kirstenbosch, Cape Town, South Africa. 14 June 2012. . Washington, DC. 24 May 2012.
BHL and Specimen Collection Data: The needle in the Festuca stackMartin Kalfatovic
BHL and Specimen Collection Data: The needle in the Festuca stack
Biodiversity_Next | 23 October 2019 | Leiden
Martin R. Kalfatovic. BHL Program Director | Biodiversity Heritage Library. ORCID: 0000-0002-4563-4627. https://doi.org/10.3897/biss.3.37787
Managing Scholarly Research Output: The Smithsonian Institution ExperienceMartin Kalfatovic
Managing Scholarly Research Output: The Smithsonian Institution Experience. Martin R. Kalfatovic, Alvin Hutchinson, Richard Naples, and Suzanne Pilsk. Smithsonian-The National Commission for Science, Technology and Innovation (NACOSTI). Washington, DC, 16 May 2019.
Seeing a Butterfly & Knowing What It Is: BHL: Past > Present > FutureMartin Kalfatovic
Seeing a Butterfly & Knowing What It Is: BHL: Past > Present > Future. Martin R. Kalfatovic. 2019 BHL Annual Meeting. Cornell University, Ithaca, NY. 30 April 2019.
Managing Scholarly Research Output: The Smithsonian Institution ExperienceMartin Kalfatovic
Managing Scholarly Research Output: The Smithsonian Institution Experience. Martin R. Kalfatovic, Alvin Hutchinson, Richard Naples, and Suzanne Pilsk. CNI Spring Meeting. St. Louis, MO. 8 April 2019.
Discoverable, Accessible, Reusable, and Transparent (DART): Scholarly Communi...Martin Kalfatovic
Discoverable, Accessible, Reusable, and Transparent (DART): Scholarly Communications and the Research Museum. Martin R. Kalfatovic. Global Summit of Research Museums. Berlin. 5 November 2018.
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...Martin Kalfatovic
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumination in Libraries and Museums. Martin R. Kalfatovic. 9th Shanghai International Library Forum. Shanghai, China. 19 October 2018.
Smithsonian Libraries: Digital Programs and Initiatives DivisionMartin Kalfatovic
Smithsonian Libraries: Digital Programs and Initiatives Division. Martin R. Kalfatovic. Smithsonian Libraries Research Services Meeting. Washington, DC. 20 September 2018.
The Biodiversity Heritage Library & Botany: Empowering Discovery through Free...Martin Kalfatovic
The Biodiversity Heritage Library & Botany: Empowering Discovery through Free Access to Biodiversity Knowledge. Martin R. Kalfatovic. Botany 2018. Rochester, MN. 24 July 2018.
Expanding Access for the Local and Global Increasing Access & Empowering Glob...Martin Kalfatovic
Expanding Access for the Local and Global Increasing Access & Empowering Global Biodiversity Research through the Biodiversity Heritage Library. Martin R. Kalfatovic. 2018 Ohio Natural History Conference. Cleveland Museum of Natural History. 24 February 2018.
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...Martin Kalfatovic
The Biodiversity Information Standards (TDWG), also known as the Taxonomic Databases Working Group, is a non-profit scientific and educational association that is affiliated with the International Union of Biological Sciences. TDWG was formed to establish international collaboration among biological database projects and related services. Promoting the wider and more effective dissemination of information about the World's heritage of biological organisms for the benefit of the world at large, TDWG focuses on the development of standards for the exchange of biological/biodiversity data. TDWG promotes the use of standards through the most appropriate and effective means and acts as a forum for discussion through holding meetings and through publications, especially the recently launched open access journal, Biodiversity Information Standards and Science. This presentation will focus on areas of possible collaboration by the larger networked information community around bioinformatic standards, areas where TDWG collaborates with other biodiversity organizations such as the Biodiversity Heritage Library (BHL), the Encyclopedia of Life (EOL), and the Global Biodiversity Information Facility (GBIF).
A Vast Library of Life: The Biodiversity Heritage LibraryMartin Kalfatovic
A Vast Library of Life: The Biodiversity Heritage Library. Martin R. Kalfatovic. Aim, Scope & Challenges of Research Museums: An Exchange between the Smithsonian Institution & Leibniz Association. Washington, DC. 30 October 2017.
Smithsonian Libraries in Service of Scholarly Communications: An Introduction...Martin Kalfatovic
Smithsonian Libraries in Service of Scholarly Communications: An Introduction to Smithsonian Research Online & Other Resources. Martin R. Kalfatovic. Aim, Scope & Challenges of Research Museums: An Exchange between the Smithsonian Institution & Leibniz Association. Washington, DC. 30 October 2017.
Free & Open Access to Biodiversity Literature: An Introduction to the Biodive...Martin Kalfatovic
Free & Open Access to Biodiversity Literature: An Introduction to the Biodiversity Heritage Library. Martin R. Kalfatovic. Presentation for Natural and Physical Sciences Library Staff. Smithsonian Libraries. Washington, DC. 14 June 2017.
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...Martin Kalfatovic
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communications | Digital Library | Biodiversity Heritage Library. Martin R. Kalfatovic. Presentation for the National Library of Medicine Staff. Smithsonian Libraries. Washington, DC. 9 June 2017
“The Gift of Time”: Impact through Open: The Biodiversity Heritage LibraryMartin Kalfatovic
“The Gift of Time”: Impact through Open: The Biodiversity Heritage Library. Martin R. Kalfatovic. Bracing for Impact: Digitizing Collections to Change Lives. 2017 Smithsonian Digitization Fair. Washington. 19 October 2017.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
5. BHL “classic” or US/UK: 15 institutions …Formed in
2006, 13 members and 2 affiliates
15 Members
•Academy of Natural Sciences Library
and Archives
•American Museum of Natural History
Library
•California Academy of Sciences Library
•Cornell University Library
•The Field Museum Library
•Harvard University Botany Libraries
•Ernst Mayr Library of the Museum of
Comparative Zoology
•Library of Congress
•Marine Biological Laboratory and Woods
Hole Oceanographic Institution Library
•Missouri Botanical Garden Library
•Natural History Museum, London, Library
& Archives
•The New York Botanical Garden
•Royal Botanic Garden, Kew, Library &
Archives
•Smithsonian Institution Libraries
•United States Geological Survey
Libraries
6. BHL “classic” or US/UK: Key Organizational Points
• BHL is not a legal entity; fiduciary
and legal agreements are generally
delegated to individual members
• Membership is governed my a
Memorandum of Understanding
signed by all members
• Two levels of membership (as of
2013):
• Member (voting and
administrative input);
annual dues of $10,000 USD
• Affiliate (provide content or
other services to BHL; no
voting or input into overall
BHL direction)
• BHL Secretariat (Administrative
component), housed at the
Smithsonian Libraries
• BHL Technical Team (housed at
Missouri Botanical Garden)
• BHL Executive Committee (elected
by Members): Chair, Vice-Chair and
Secretary
11. Selection
High-yield taxonomic materials
Unique & rare materials
Permissions titles
User requested titles & gap-fills
Discipline specific subject matter
Non-BHL member materials ingested from the
Internet Archive
ACTIVE
/HIGH
PRIORITY
PASSIVE /LOW
PRIORITY
12. Increase agreements
with publishers of in
copyright materials
US Titles: 206
UK Titles: 67
TOTAL TITLES: 273
US Licensors: 85
UK Licensors: 40
TOTAL LICENSORS: 125
13. Deduplication
• We try to avoid duplication where possible
• Tools
• Serials = Scanlist
• Monographs = Monographic deduper
• Check the BHL before you send for scanning
• We do our best but duplication happens
• Post-digitization, we merge titles as necessary
14. BHL US/UK Principles: Digitization
Mass-digitization scanning operation
Scan books, cover-to-cover
To the best we can, we seek to provide an exact digital
copy of the original physical object
We are experimenting with field note books
Infrastructure supports book-like objects
We do not (yet) have maps, art-works, photographs
Workflow designed around scanning physical books
We are working on solutions to incorporating born-
digital materials
15. BHL US/UK Principles: Digitization
Most BHL US/UK libraries scan directly through the
Internet Archive
We pay the Internet Archive to provide us with full
digitization services
Each BHL US/UK member library has its own
workflow:
Sending the books from our shelves
And the bibliographic metadata from our library catalogs
To the digitization station
And returning the books back to our shelves
16. BHL US/UK Principles: Metadata
Our baseline standard is MARC
We derive the metadata that displays on the BHL
website from the MARC records
We aggregate the bibliographic records from each
of our library catalogs into the BHL database AS
IS
We edit the metadata displayed on the BHL website
manually as necessary
BHL Digitization Specifications documentation currently
being updated
17. Digitization workflow
1. Titles vs. Items vs. Segments
2. Metadata we need:
• MARC for book and journal titles
• Volume information
• Page data
BHL Term Titles Items Segments
Library Term Book or Journal
Titles
Volume, Piece Articles, Book
chapters,
Meaning Conceptual unit Object Section of
consecutive pages
20. Internet Archive Scanning
Northeast Regional
Scanning Facility
(Boston)
New Jersey Facility
Natural History Museum,
London
Fedscan (Library of
Congress)
Internet Archive (San
Francisco)
Smithsonian Libraries
Missouri Botanical Garden
(Non-Scribe operation)
23. … and now including segments
92,356 “segments”
24. New Content Types
Field Books and other archival materials Stand along or linked illustrations
25. Relevant information in Spanish
about Mexican biodiversity
BHL contains much in Spanish or about Mexico,
but it is not clearly broken out
Ensayo ornitologico de los troquilideos ó colibries de
Mexico.
Mexico, ,I Escalante,1875
The Orchidaceae of Mexico and Guatemala.
London: Ridgway, [1837-1843]
A selection of the birds of Brazil and Mexico: the
drawings
London: H.G. Bohn,1841
27. 2012 BHL Member In-kind Staff FTE & Costs (incomplete)
14.193 FTE from the 14 member institutions
$1,239,300 staff and other costs
(does not include Secretariat or Technical staff)
28. 2012 BHL Central Support
7.055 FTE 4.43 FTE
Technical
2.625 FTE
Secretariat
Staff
$316,053
Other
$ 78,477
Total
$394,531
Staff
$472,529
Other
$ 22,615
Total
$495,144
31. User Statistics: 2007 - 2013
Visitors: 3,628,088
Page Views: 17,604,395
New vs. Returning: 50.06% vs. 49.04%
2007
2013
146,798 visitors | November 2012
33. I am thrilled with what I have been able to find
re: archaic mammary embryology some of
which I had been hoping to find at the National
Library of Medicine, and to get it through your
program was a huge advantage. Last night I
believe I requested and received 11 PDFs, all
of which are essential to a review paper* I am
completing.
Olav T. Oftedal PhD
Smithsonian Environmental Research Center
* “Evo-Devo of the Mammary Gland” by Oftedal, et al.
Journal of Mammary Gland Biology and Neoplasia (May 2013)
34. “
T
h
a
n
k
y
o
u
m
u
c
h
f
o
r
y
o
u
r
What an absolutely wonderful site.
It is a treasure trove of information.
Thank you!
May I compliment you on this splendid service? The
Library's invaluable for my work on seasonal
variability of climate and vector-borne disease in
British India, 1875-1940.
I really appreciate your work. The Biodiversity Heritage
Library is an excellent resource that regularly helps my
assistant and I obtain original descriptions for plants .... I
feel so privileged to be working in a day in age when such
resources are so readily available and easy to obtain.
36. Facebook
Total Page Likes: 4,384
Twitter @ BioDivLibrary
Total Followers: 2,369
Pinterest
2,373 images & 16 collections
Blog
Total Visits: 9,096
(2Q13)
BHL Social Media
February 2013
40. Firewall
Images (JP2)
PDF
Coordinate-based OCR
XML metadata
BHL Architecture: Window Seat Ed.
BHL DB
Internet Archive
Storage
Logic
APIs UI
Data
Exports
Access
Data Transform
Utilities
Geocoding
Name
Finding
42. Hardware & Software
Hardware
Scribe station
Off-the-shelf scanners or good-quality digital cameras
Software
Wonderfetch -> Partner Meta App (when using Scribe
machines)
Macaw
Uploading directly to Internet Archive (for example: MBG’s
Botanicus)
43. Standards and formats to consider
The simplest way to contribute a text item to IA is currently as a single pdf file. IA
creates a second pdf with a text layer, if none exist.
Items can be submitted as a stack of image files, one image per page. The files
can be in JPEG2000, JPG, or TIFF format, but with strict requirements for
how the files in an image stack are to be named, and the stack needs to be
packed into a single .zip or .tar file before submission.
When IA (Archive.org) scans a book for a Contributing Library, they use the
custom-engineered "Scribe" workstation, but for many materials, adequate
images can be made with off-the-shelf scanners or good-quality digital
cameras.
For best results, it is recommended to use the highest resolution your device is
capable of. Most images IA processes were produced at a resolution of 300-
600 ppi.
44. Standards and formats to consider
BHL recommends following, in part, the DLF's "Benchmark for
Faithful Digital Reproductions of Monographs and Serials"
(available online at
http://www.diglib.org/standards/bmarkfin.htm).
Bitonal: 600 dpi, 1-bit or bitonal TIFF images
Grayscale: 300 dpi, 8-bit grayscale uncompressed TIFF, or lossless
compressed image (e.g. LZW, JPEG2000 [*.jp2]).
Color: 300 dpi, 24-bit color uncompressed TIFF, or lossless compressed
images (e.g. LZW, JPEG2000 [*.jp2]).
NOTE: the above specifications are the preferred ones. BHL
will, however, accept lossy files. In the case of JPEG2000,
files with a compression level of 85% are acceptable.
45. Standards and formats to consider
Currently, BHL data can be downloaded as MODS,
EndNote and BibTex. See our wiki page with more
information:
http://biodivlib.wikispaces.com/Data+Exports#x--MODS
Title metadata as well as pagination, descriptive and page
order (structural) metadata is being copied into METS
files in the <biodiveristy> collection at IA.
The purpose of these METS files is to accommodate the need
of our pagination data.
These METS files are pagination specific and they do not have
the item/volume information included.
If bibliographic metadata for BHL content was required, it
should be found in the MODS files on the Data Exports page.
46. Standards and formats to consider
For the future, we are looking at serving OLEF as
an envelope format to share information with
other BHL Nodes.
See
http://www.bhle.eu/bhl-schema/v0.3/ and
http://www.slideshare.net/HeimoRainer/bhleuropemet
adataharmonisationtdwg20111018kollerwhrainer/6 )
47. Metadata generation and
indexing strategy
Each item to be uploaded needs a unique
identifier within our central repository, currently
Internet Archive (archive.org) and a folder with
such name is created to hold the uploaded and
generated (derivative) files.
Within BHL we record metadata at 3 levels of
bibliographic granularity – Title, Item & Page –
as well as metadata for the Creator(s) of the
title.
48. Metadata generation and
indexing strategy
Scanned material (jp2.zip) and basic title-level metadata content
(marc.xml), item-level metadata (meta.xml) and page-level
metadata (scandata.xml) are uploaded to Internet Archive
(IA), in the ‘biodiversity’ collection.
JP2.zip: The compressed JP2 images (Compression Quality 15) that
IA will use for delivering pages to the Read Online feature
following a very specific naming convention for the filenames:
Master images files named with local library identifier + 4-digit
sequence number (with no gaps).
MARC.xml: The MARC record for the title from the library catalog in
MARCXML format
Title, *Abbreviation, *Creator, Description, Publisher, Start Date Published,
End Date Published, Local Library Identifier, *OCLC Number, *ISSN,
*ISBN, *Call Number, *Subject, *Language, Date Created, Date Last
Modified, *Foreign Keys
49. Metadata generation and
indexing strategy
META.xml: The item level information (even redundant with
the title-level information) including the title, author, publisher,
copyright information, digitizing sponsor, date published, type
of item, and who originally uploaded it. IA may also update
this XML file with information as it processes the pages of the
item.
Barcode, Sequence, Local Library Identifier, +Start Volume, End
Volume, +Start Date, End Date, *Language, Scanning Institution,
*Scanning Contributor, *Scanning Sponsor, Date Created, Date
Last Modified
SCANDATA.xml: An XML file (scandata.xml) recording
information about each page image (handSide, cropBox,
original width & height, etc. )
FileName, Sequence, *Page number, *Page Type, Year, Volume,
IssuePrefix, Issue, Date Created, Date Last Modified
50. Metadata generation and
indexing strategy
CREATOR: A “Creator” is defined as a person or
company responsible for the creation of the Title.
Name, *Role, Date of Birth, Date of Death, Biography
A detailed description of the contents of each one
of these files and the whole process of
Uploading content to IA is available at:
http://biodivlib.wikispaces.com/Upload
51. Metadata generation and
indexing strategy
Internet Archive runs the OCR process and
generates “derivative files” that include:
The resulting files of the OCR process with ABBYY
FineReader (djvu, djvu.txt, djvu.xml, abby.gz)
A 100x152 pixel GIF with a looping, animated thumbnail of
the first 20 pages of a book.
The presentation version on BHL in PDF format.
The MARC record in binary and XML formats.
And others ( for a more detailed description you can see
http://biodivlib.wikispaces.com/Download+All+File+Type
s+and+Descriptions )
52. Metadata generation and
indexing strategy
The metadata from new items included in the BHL
collection is included in the database and indexed
to be used in searches through the Portal and API
services.
Periodically, the OCR pages are ran through
taxonomic names services to mine for new taxa
names like TaxonFinder (ubio.org) or GNRDS
(Global Names resolution tools and services:
resolver.globalnames.org) soon.
Taxa names are added to the database and written
back into Internet Archive (names.xml)
54. Online Platform
Publication
BHL API
(biodivlib.wikispaces.com/Developer+Tools+and+API)
The BHL Application Programming Interface (API) is a set of
REST-like web services that can be invoked via HTTP queries
(GET/POST requests) or SOAP.
Responses can be received in one of three formats: JSON, XML,
or XML wrapped in a SOAP envelope.
We are currently developing a new API v3, closer to a RESTful
design than previous versions, using resource-centric
URLs (where possible) and GET/PUT/POST/DELETE verbs.
56. Online Platform
Management
BHL Admin Dashboard
Admin Functions
(Alert Message, Image Server, Collections, Institutions,
Languages, Page Types, PDF Requests, Segment Types)
Library Functions
(Titles/Items/Segments /Pagination/Authors)
Science Functions (Names (Taxa) on a Page)
Library Statistics
(Titles/Items/Pages/Names/Segments/Items with Segments,
Names, Pages with Names)
Growth Statistics
(Titles/Items/Pages/Names/Segments new this Month/Year)
57. Online Platform
Management
BHL Admin Dashboard
PDF Generation Statistics (Generated: 174,162)
Internet Archive Harvesting Statistics (Complete: 119,125 items)
BioStor Harvest Statistics (Published: 11,126 as of Aug. 29, 2013)
DOI Assignment Statistics (DOI Approved: 57,338 as of Aug 29,
2013)
Web Traffic Statistics (API v2, OpenURL)
Reports
(Item Pagination, Title Import History, Character Encoding Problems,
DOIs by Institution, Monographic Contributions,
Items by Contributor)
58. Online Platform
Management
Monographic Deduping Tool
The MBLWHOI Library has been working on a tool that
assists with de-duplicating the monographs that BHL
members are sending to IA for scanning.
The application is ready for use and it’s entirely web-based,
requiring no client or user configuration.
The monographic deduper acts as a master database that
contains records for all of the monographs that any BHL
partner institution has scanned.
59. Online Platform
Management
Monographic Deduping Tool
In addition, there is a process also in place that allows for
material ingested from the Internet Archive, but not
contributed by a BHLpartner institution, to be added to the
deduper database.
Ultimately, the Monographic deduper database should be
seen as living record of accountability that communicates
to staff collaborating in the BHL network, a partner’s
promise to digitize a particular monographic title.
60. Online Platform
Management
Serials Bid List
It is a catalogue that allows users to browse and search
Serials titles held by BHL member institutions using
advanced filtering.
62. Scanning Locally, Collaborating Globally
6 global nodes: By country, region, language
Each node is independent and self-organized, but work under a
set of common principles
Share content as much as possible
Node leaders form a Global Coordinating Committee
Goal is to share a common portal where possible
Goal is to develop multi-lingual portal
68. Looking Forward
In any well-appointed
Natural History Library
there should be found
every book and every
edition of every book
dealing in the remotest
way with the subjects
concerned.
Charles Davies Sherborn
Epilogue to Index Animalium, March 1922