SlideShare a Scribd company logo
1 of 34
Download to read offline
1
COLLEGE OF INFORMATION STUDIES, UNIVERSITY OF MARYLAND, COLLEGE PARK
Information Audit Project
INFM 736 – Information Management Team Experience
Organization: Niels Bohr Library & Archives
Akashdeep Ray, Jeroen de Lange, Nishita Thakker, Thet Oo, You Zheng
5/13/2010
2
Table of Content
Exectutive Summary........................................................................................................... 5
1.Introduction...................................................................................................................... 6
2.Background Information.................................................................................................. 6
3.Project Rationale.............................................................................................................. 7
4.Information Audit ............................................................................................................ 8
5.Methodology.................................................................................................................... 9
5.1 Interviews................................................................................................................... 9
5.2 Litrature Review ........................................................................................................ 9
6.Business Process Maps .................................................................................................. 10
6.1 Collections/Manuscripts ......................................................................................... 10
6.2 Photo Collections.................................................................................................... 13
6.3 Oral Histories.......................................................................................................... 14
6.4 Books ...................................................................................................................... 15
7. Existing IT Structure..................................................................................................... 16
8. Fit Gap Analysis ........................................................................................................... 18
8.1 Organization............................................................................................................ 19
8.2 Usability.................................................................................................................. 23
9. Literature Review.......................................................................................................... 24
9.1 Standards and Policies ............................................................................................ 24
9.2 Adopting IT Systems .............................................................................................. 25
9.3 Case Studies............................................................................................................ 27
10. Recommendations....................................................................................................... 28
11. Limitations.................................................................................................................. 29
Appendix........................................................................................................................... 30
References......................................................................................................................... 32
3
List of Diagrams
Diagram 1 Business Process of Collections/Manuscripts................................................. 10
Diagram 2 Business Process of Photo Collections ........................................................... 13
Diagram 3 Business Process of Oral Histories ................................................................. 14
Diagram 4 Business Process of Books.............................................................................. 15
Diagram 5 Existing IT Structure....................................................................................... 17
4
List of Tables
Table 1 Introduction of IT systems................................................................................... 16
Table 2 Different file formats used by NBL&A............................................................... 21
Table 3 Overview of Metadata collected by NBL&A in ICOS........................................ 21
Table 4 Overview of Metadata used by NBL&A outside ICOS…………………………22
5
Executive Summary:
The project audits the digital assets of Neils Bohr Library and Archives, mapping the
business process flows for each of its information assets. This information audit has been
conducted by a team of graduate students from the University of Maryland as part of a team
capstone project.
Information was gathered through interviews conducted at the organization with various
staff members responsible for cataloguing and archiving different types of assets. A comparative
literature review has also been done to understand the current industry trends in the information
technology and processes used by libraries and archives. Using this information, business
process maps were created. Through this, an overall picture of the current situation at the
organization emerges.
A fit gap analysis identifying the gaps in the current system to address the organization‟s
need was also examined. Some of the key findings of the analysis were that multiple systems
were used for various digital assets, work organization exists in silos, cumbersome HTML links
between different collections, failure to store and organize different types of collections
(single/multi item) in an integrated manner, manual data reentry, lack of customized access
control and a lack of unified search engine for various digital assets.
The IT systems for digital asset management in the library and archives environment are
in a dynamic state which makes it a volatile buying decision. Tentative recommendations are to
adopt common and open industry standards in data. Open source technology should be adopted;
however with any selected system technical expertise would be required for customization based
on organization needs.
6
1. Introduction:
The aim of the project was to perform an Information Audit for the Niels Bohr Library &
Archives (NBL&A), by thoroughly mapping the business processes to identify problems with the
existing information environment, perform an in-depth problem analysis and offer a range of
broad recommendations. The report first explains the project rationale, defines the problem
statement and scope; defines the analysis approach and methodology. The business processes
maps are explained for certain members of the organization based on the information digital
assets they associate with at the NBL&A. The current IT platform for the digital assets is
explained along with its existing problems. The fit-gap analysis maps the current needs with the
ideal system and how it can be used to resolve certain issues. A few broad ranges of solutions are
provided to the NBL&A to improve their business processes.
2. Background Information:
AIP is a non-profit organization, which “promotes the advancement and diffusion of
knowledge of physics and its application to human welfare”. In order to accomplish its mission,
AIP supports ten physics and astronomy societies (i.e. American Astronomical Society,
American Physical Society, Society of Rheology etc; AIP, 2010b) with publishing, membership
administration, organizing exhibits, and conferences. Moreover, AIP also supports individual
scientists, students and the general public by offering a career network; preserve the history of
physics, and educating or support teachers in making known the history of physics. However,
AIP‟s core business is in publishing and selling advertisements in it 50 journals, which earned
them $77.2- million in 2009 (AIP, 2010c).
The Niels Bohr Library & Archives and the Center for History of Physics are divisions of
the American Institute of Physics that share a common mission: to help preserve and make
known the history of modern physics and allied sciences. The Library & Archives serves both as
a repository and a clearinghouse for information in the history of physics, astronomy, geophysics
and allied fields. In-house holdings include an outstanding collection of textbooks, monographs,
biographies, and related publications, dating mostly from ca. 1850–1950; over 30,000
photographs and other images; ca. 1,000 oral histories with many of the outstanding figures in
the fields that we cover; and archival records of AIP and its Member Societies along with other
7
archival records and personal papers of a select number of scientists. All of these materials are
indexed online.
As a clearinghouse, NBL&A maintain and update the International Catalog of Sources
for the History of Physics and Allied Sciences (ICOS for short), which contains descriptions of
over 9,000 archival and manuscript collections, oral history interviews, and other primary
sources in our fields at ca. 900 repositories worldwide. As part of its efforts, the NBL&A
actively encourages scientists‟ home institutions to support archival programs that preserve their
papers and the institution‟s history. NBL&A also preserves the records of AIP and its Member
Societies and occasionally the papers of individuals like Goudsmit, whose papers don‟t have a
natural home elsewhere.
3. Project Rationale:
This information audit was started for several reasons. First of all the status quo is
changing as AIP has adopted a publishing-based Content Management System called Polopoly.
Because, AIP has a centralized IT strategies, and limited resources (staff) they cannot provide
support to NBL&A‟s unique IT needs. Second, the NBL‟s current IT systems are not adequate
for their future needs / goals. This is the result of various factors such as an ad-hoc IT strategy
over the years, expansion of their collection due to grant-based projects such as Oral History
Interview and Goudsmit‟s Digitization Project, increasing diversity of user base, and the rise of
born digital files.
Over the last year the NBL&A have tried provisioning several systems that they felt
would fulfill the requirements but these systems fell far short of the expectations. In order to
determine what type of system would work for their purposes they need to carefully assess their
needs, current situation and processes. Thus, our aim was to understand all the existing processes
to identify NBL&A‟s information needs to help them make a more well-informed decision.
8
4. Information Audit:
Information audit is an analysis technique used for the assessment of information needs
and assets. The information audit suggested for this project assessed the needs and co-related
them with the current information landscape at the NBL&A. Based on this audit, one can
determine if the existing information environment is aligned with the goals and objectives of the
organization. The audit helped understand the possible solutions to improve the existing
conditions keeping in mind the constraints that may exist for the organization.
Several tools will be used while performing the information audit:
 Business process maps – visualize the inflow and outflow of information,
type/format of information, physical/logistical location of information and
identify potential issues and inefficiencies.
 Use case and activity diagrams – depict key users and dependencies within the
organization.
 Fit gap analysis – check actual performance of NBL against desired performance.
Describe gap to identify needs, purposes and objectives.
Information maps and business process maps allow us to gain in-depth knowledge of the
current assets and processes. While identifying the current system, the information environment
as a whole got mapped out, giving unique insights about the existing IT structure at the NBL&A.
Activity diagrams were targeted at detailed workflows with stepwise actions of various
organization members. These members‟ workflows are described in detailed to understand the
behaviors involved with various digital assets as categorized by the NBL&A. While diagrams
helped us better understand information flows within the NBL&A, they assisted in identifying
the roadblocks involved in various processed. Moreover, once the problems are clearly defined
potential solutions can be thought about which will indentify future system requirements.
After mapping out the information environment, a fit gap analysis was done to check the
performance of the ongoing activities and procedures at the NBL&A against the desired
performance or “ideal scenario”. The first step of the analysis was to describe the desired
situation that would be ideal for the activities at NBL&A. The second step was to compare the
9
ideal situation with the current situation, describing what “fits” and what poses a “gap”. This
analysis put together can be used to set minimum system requirements and constraints for
potential solutions.
5. Methodology:
In order to perform in depth problem analysis, we need accurate information about the
current information needs and assets, thorough understanding about various business processes
and ongoing activities at the NBL&A. The team started out with a few group meetings and then
conducted multiple rounds of personal interviews. During the course of the project, relevant
literature was reviewed to understand the conditions at other libraries and archives.
5.1. Interviews:
The project team conducted multiple rounds of structured interviews with 10 organization
members. The interviews focused on the day-to-day activities of the organization members and
how they interact with various digital assets of the NBL&A. This gave us knowledge about the
in- and out-flows of different types of information. Moreover, we tried to identify existing
roadblocks in each member‟s processes and understand their expectations of the potential
information structure.
5.2. Literature Review:
The relevant literature on information management and information audit provided a
structure for the project and its tasks to help us deal with the problem at hand in a systematic
manner. In the last decade there have been many changes going on within libraries and archives
around the world with the growing need for digitization. The literature review involved looking
at NBL&A‟s peers to identify the various systems in use and their processes. This kind of review
not only helped us but will provide a comparable view for the NBL&A to co-relate problems
with potential solutions.
10
6. Business Process Maps:
This section describes the various business processes at the NBL&A based on the
information gathered from interviewing organization members. These are categorized based on
the digital assets categories pre-defined at the NBL&A.
6.1. Collections/Manuscripts:
Diagram1. Business Process Maps for Collections/Manuscripts
11
NBL&A does not aim at archiving collections at their location, but seeks to preserve
them at the most appropriate repository. This is done by maintaining a database of contacts and
other information related to physicists, soliciting the respective institutions where the researchers
worked to take possession and archive research papers, and perform a regular tracking of
obituaries in newspapers for physicists. The researchers‟ names are checked with the Library of
Congress database and cases are labeled according to the established unified name. There can be
multiple number of cases open at the same time; however priorities are set based on the
importance of the researcher (N – Nobel Prize winner, W – very important, Q – anything else).
These cases remain open for investigation based on the importance, for example cases labeled
„Q‟ remain open for six months. The universities or the people contacted are also recorded into
the database. As a last resort if a suitable home for a researcher‟s work and materials are not
found, they are taken in by the NBL&A.
Collections, which may include manuscripts, videotapes, prints, and films, are donated to
NBL&A in the form of personal delivery or mail, uploaded digitally through the File Transfer
protocol (FTP) server, and sent as CDs/DVDs or email. They are then sorted based on the level
of copyright restrictions giving priority to less copyright restrictive and important files. Digital
files are checked for the format of the file – whether they can be accessed using the existing
software or not. Document files are usually in the form of .rtf or .doc format. Audio files are
usually in .mp3 format. If any of these files cannot be opened by the existing software, then help
is sorted from the IT department. These digital files are stored in the preservation server. As for
physical collections, they are stored in acid free boxes. Video tapes are not given priority for
archiving since they are likely to deteriorate quicker. The consequent step is to catalogue the
materials in two databases – one is a NBL&A database accessed through Microsoft Access (MS
Access) forms and the second is an Integrated Library Cataloguing System known as Horizon.
Much of the information stored in both the databases is same with some extra fields in the
Horizon module. The information in the Horizon database is according to the MARC1
format
and standards. An initial part of the cataloguing process is to thank the donors and record the
details of deed of gift letter received from the donors. This letter or document contains the
copyright permissions given to NBL&A. This information is also entered into the databases
1
The MARC formats are standards for the representation and communication of bibliographic and related information in
machine-readable form.
12
along with other data such as description, donor info, unique catalogue number, and name of the
collection (with naming conventions as described by the Library of Congress).
The Horizon application has the added advantage of making the records searchable online
through the ICOS search module on the NBL&A website. The records may contain links to a
finding aids HTML page. The process of creating finding aids also starts from the MS access
forms, however it depends on a couple of conditions – one, whether the collection is important
enough to create a finding aid and two, whether the collection is large enough to create a finding
aid during a project cycle. The creation of finding aids is a continuous process which can take
place at any time after the collections has been catalogued. As stated earlier, the content for
finding aids is first entered into access forms (known as the EAD module). The data entered is
then converted into .xml and .html files through a series of steps that includes exporting the data
into Dreamweaver and finally converting them into web pages (with frame, without frame
format) using scripts. These finding aids can therefore be published on the website. The
subsequent web page link for the finding aid is entered into Horizon record for the same
collection.
13
6.2. Photo Collections
Diagram 2. Business Process Map for Photo Collections
The NBL&A photo archive is known as the Emilio Segre Visual Archives (ESVA)
names after Emilio Segre, an eminent physicist. While the biggest donors to the photo archives
have been Physics Today (since 1940) and AIP, photos are donated from other sources on a
regular basis. Precedence is given to recently donated photographs while other photographs from
the backlog are worked on round the year. The digital photos are either sent over email, uploaded
to the FTP server or physically through mail. A thank you note is sent to the donor in response to
the deed of gift document. Based on the copyright permissions, photos are scanned and stored in
a temporary folder in the R drive (located on the NOVELL server) under the donor‟s name. The
photos are then catalogued through Access forms into a MS SQL Server Database. All copyright
information is entered into the database. Any data entered into the database contributes towards
the searching capability of the digital photograph. Meanwhile, the photos are converted into
common file formats. The resolution is decreased using Photoshop and the files are copied onto
NAS server at Melville. These files are cross mounted onto the production server APP1, through
which the photos are displayed on the web pages. As part of the step to link the metadata to the
picture files, the file name of the picture should match the file name entered in the metadata
record.
14
6.3. Oral Histories
Diagram3. Business Process Map for Oral Histories
The digitization of the oral transcripts starts from checking physical folders of the
transcripts with old labels. The collection is checked for uniformity of cataloguing information
with the records in the NBL&A database and the ICOS records in the Horizon database. The
convention is also checked with existing Library of Congress Authority of Records. The
transcripts are converted into digital format using Microsoft OCR (Optical Character
Recognition) software scan and uploaded into an FTP server. These files are in Microsoft Word
formats which then pass through Dreamweaver scripts to convert them into .xhtml files. These
can be uploaded into Host A server, where the files are linked into web pages. Therefore, they
can be accessed online. Oral transcript collections whose cataloguing information mismatched
with the ICOS records are then renamed with new labels and put back into the folders to then go
through the overall process.
15
6.4. Books
Diagram4. Business Process Map for Books
As an integrated library system, the acquisition of books is tracked via an acquisition
record made in Horizon. This record is printed out and sent to vendors for buying. The budget
allocated for books in a year is $3000 or they can be received as a gift. International standardized
catalogue information of the book is downloaded from WorldCat and OCLC records in the form
of .dat forms. These are then imported into Horizon. Since, there is a backlog of about 1000
books; they are placed on the shelves in the archives and are identified and recorded with the
shelf number in horizon. Books that are taken out to be placed on the library are checked with
the horizon records for errors. A unique call record is added for the book based on the author
name followed by placing them on the rack in the library.
16
7. Existing IT Structure:
In serving the public and its member societies the NBL&A has had a very ad-hoc IT
strategy over the years. As events occurred and grants where received, the NBL&A chose
different type of IT systems based on short-term needs (Table 1).
Year Event Reason
1998 Visual Archives Commercial
1999 Online Public Access Catalog Usability
1999 Integrated Library System Horizon Year 2000 bugs
1999 Introduction of NBL access database Record more
information on
collections
2000 Introduction of Finding Aids Consortium / Grant
2002 Update of Visual Archives (Oracle) Unstable, server update
2004 Updated Horizon to 7.3.3 -
2005 Made ICOS Google searchable Usability
2006 Update of Visual Archives (MYSQL) Unstable
2007-2009 Oral History Interviews put online Grant
2008-2009 Goudsmit project Grant
Table1. Introduction of IT systems
(AIP, 1998, 1999a, 1999b, 2000, 2002, 2005, 2006, 2007, 2008)
The following paragraphs will describe the IT systems the NBL&A uses, daily. A complete
overview can be found in the following diagram.
17
Diagram 5 Existing IT Structure
The International Catalog of Sources is part of an integrated library system called
Horizon, designed to catalog books, serials, and collections and manuscripts. The Horizon library
18
system runs on a dedicated server (LibServer) in Melville, NY and is searchable by a built-in
search engine called Dynex. Horizon is also used for acquisitions of books, but most of its
functions are unused.
The photographs are the only digital assets archived separately in the Emilio Segre Visual
Archives (ESVA) a separate website, and are being commercially sold online via a software
layer, that is connected to a MS SQL database. The ESVA runs on a NAS server in Melville,
NY. The ESVA can be searched via a quick query search and an advanced search (federated
search).
Oral history interviews, finding aids, newsletters etc are hosted on HostA web server in
Melville, NY and are an active part of the website. The website and the photographs are
searchable via a Google and Google custom search. The NBL&A also employs a federated
search engine called Varity. This federated search indexes the ICOS, ESVA and website. Google
doesn‟t index the ICOS, it used to do that, however that stopped working.
The previously discussed systems are all online and accessible to the public. The
NBL&A also uses a Microsoft Access database internally to record information about its
collections. This MS Access database is located on a Novell server in College Park, MD. This
server also hosts all digital files (photo‟s, OHI, etc) permanently, this server is also known as the
“preservation server”.
8. Fit Gap Analysis:
The fit-gap analysis aims at comparing the desired situation with the ongoing situation
and describes the gap between the two. First, is the description about what the NBL&A want
from a system, what aspects of it do they already have and thus identify the gaps between the
two. The following is a summary of information system requirements based on a NBL&A in-
house team‟s vision to integrate the diverse digital collections:
 Organization - an ability to organize the stored data while preserving the complex
relationships among them.
19
 Usability - information should be accessible to the stakeholders, visible to the
internet search engines. This accessibility however is a controlled activity –
NBL&A should be able to control the flow of interaction.
 Portability - Information assets needs to be standardized and in non proprietary
format which makes it easier to migrate them when the system is changed or
expanded to include newer features.
 Cost - should not prohibitive in terms of economic and manpower costs.
 Please see Appendix A for full description of functional requirements as
described by the NBL&A.
The NBL&A wants an integrated system to organize, store, and disseminate the
collection while persevering the complex relationships between the digital assets. The following
fit-gap analysis will look at the two broad requirement categories – organization and usability.
8.1. Organization:
When discussing organization of collections and preserving the complex relations among
the files we thought of a couple of implications. First the new system should be able to store and
organize different types of collections such as flat, single-item collections and hierarchical
multiple-item collections. Currently, different types of collections and file types come with
varying attributes or metadata. Larger multi-item collections are only described as a whole in a
catalogue record and potentially in a finding aid (a finding aid may/may not be created for a
collection). Also, such collections do not have item-level metadata (Meta-light). On the other
hand, single-item collections (such as photo collections) are described on an item-level in the
file‟s own metadata.
Second, the new system should be able to handle varying file formats and sizes
depending on the type of digital asset. Different digital assets/ items (such as books, transcripts,
manuscripts, photos, etc) require different metadata and thus have different types of catalogue
records. According to Table 2, Table 3 and Table 4, different collections require different type of
descriptions and follow different standards. The future system thus needs to be able to make
distinctions between different types of collections and accommodate each collection‟s individual
needs. Thus, the system must have different templates/ forms to input data about different
20
collections. With the various digital collections, there are many procedures followed before the
digital records are made accessible. The future system could simplify these pre-posting
procedures and thus avoid redundancy. Since the individual collections have very different ways
of being stored, catalogued and accessed, it has been difficult to present an integrated search for
any user. Currently, different systems are being used for different digital assets and these assets
do not communicate with each other as they are treated like individual entities. This is not only
reflected by the system but also in the procedures, policies, redundancy of work performed by
organization members associated with different digital assets, etc.
As seen in the diagram in section 7, the NBL&A operates a separate database to record
additional information regarding archival collections, and writing finding aids. Moreover, this
database is used for administrating the use of the collections. The current structure of the
database is unorganized, confusing and not scalable. Tables are not normalized in anyway,
making the database slow and potentially causing insertion, update and deletion anomalies. This
database stands completely separate from the online systems, and causes redundancies and
inefficiencies in workflow. Information has to be manually re-entered into the integrated library
system. The database and the online systems are only linked by the common catalog number
which is being used as a unique identification number.
The digital assets at the NBL&A are not well integrated. They use HTML links to
preserve relationships among different collections such as Oral History Interviews „linked‟ to its
respective catalog record and collection specific finding aids to their catalog record. In the future,
the Goudsmit Digitization Project will also be linked to the existing finding aids and catalog
record via HTML. HTML links are very cumbersome, since they had to be placed manually into
the catalog records; they are error sensitive and not easily transferable to new systems.
Lastly the NBL&A physically saves all of its born digital files on a preservation server.
However, the metadata is stored in the catalog records using the integrated library systems,
which is physically located on a separate server (LibServ) altogether. Also, the NBL&A database
stores metadata. However, that is only for internal use. The photos in the ESVA have their
metadata stored in the SQL Database in the MS SQL Server. These constraints make the current
system hard to migrate into a new system.
21
File type File extensions
Text Txt, pdf, doc, rtf, wpd, email, indd, access
database logs,
Photo‟s Tiff, jpg, pdf,
Audio Wav, mp3
Table2. Different file formats used by NBL&A
Table3. Overview of Metadata collected by NBL&A in ICOS
ICOS Books ICOS Archive ICOS OHI
Title Name Name
Author, date Author, date Interviewer, interviewee,
date
Publisher Description, size Description, size
Call No Owner Use and reproduction
Description, size, Country Owner
ISBN Biography / History Country
Added Author Scope of Material Notes
Location Notes Added Author
Collection Added Author Genre Terms
Status Location Location
Subjects Collection Collection
Edition Status Status
Source of Acquisition Subjects Subjects
22
Photo’s NBL&A Access
DB
OHI transcript Finding Aids
Catalog nr Accession Date Name Name
Description Accession Type Copyright Publisher, address
Date Accession Nr Origin Date
Credit Old accession NR Interviewee Encoding
information
Names Items in Accession Interviewer Location
Begin Location Location, interview Title and dates of
collection
End Location Date Papers created by
Other Location Abstract Size
Oversize Location Sessions Short description
Main Entry Language
Member Society Selected search
terms / subjects
Title Historical note
Collection ID Scope and content
of collection
Description Organization and
arrangement of
collection
Notes Access restrictions
Begin Date Restrictions of use
End Date Provenance and
acquisition
information
Linear Feet Processing
information
Proc Priority Other related
23
materials
ICOS Nr Container list
Donor Name
Restrictions
Description of
Restrictions
Deed sent Notes
Date deed send
Date deed returned
Thank you sent,
date
*more information
collected in
different tabs.
Table4. Overview of Metadata used by NBL&A outside ICOS
8.2. Usability:
The second aspect of the fit-gap analysis is the usability of the system. The systems
usability has implication on all of its users. Users are broadly classifies as researchers, non-
researchers and the staff at NBL&A. Thus, we have the general public (researchers and non-
researchers) who want to search the collection and we have the NBL&A employees who manage
the collection. For the general public it is important that the various digital assets are searchable,
as a collection on the whole, from main search engines such as Google and Yahoo. The current
NBL&A website uses a total of 4 different search engines, none of which are able to present a
complete overview of their collection. Search should be general keyword based or in depth by
using various search categories and any way the user desires.
For NBL&A employees the system should be standardized and easy to use and
independent of the type of digital asset. The system should allow them to manipulate collections
24
and disseminate it in various ways such as exhibitions, mobile devices, photo of the months, etc.
The system should have a user-friendly interface that can be used by organization members
with/without technical expertise. Moreover, the system should be able to control and record
which organization members get access to what types of files. This will help maintain uniformity
and help track changes that are made to various files.
9. Literature review
The goal of this literature review is to provide a baseline understanding of the current
state of research and practice in the management of digital assets in the archives and libraries,
particularly regarding the integration of various digital assets.
9.1 Standards and Policies
In the archival environment, there is a lot of debate about standardization and policies.
Some institutions are trying to set up own standards and policies, which differ from very strict
Electronic Records Management Systems that need actual certifications and are very rigid to
simple policies and tips (Joerling, 2010). Here is an overview of some of the policies:
 Trustworthy Repository Audit Certification (TRAC)
 Trusted digital repository (TRD)
 Open Archival Information System Reference Model (OAIS), NASA
 Information Life cycle approach (Hodge, 2000)
 Department of Defense 5015.2
 Model Requirements for the Management of Electronic Records (MoReq2), EU
 Victorian Electronic Record Strategy (VERS), Australia
 Document Management and Electronic Archiving (DOMEA), Germany
 Records, Document and Information Management System (RDIMS), Canada
 International Standard Archival Authority Record (ISAAR), International Council on Archives
The NBL&A doesn‟t need to comply with any governmental mandate, and therefore
doesn‟t need to be certified. Most of these policies are very strict and would mean too much
bureaucracy and red tape. Moreover, the NBL&A already complies with some important archival
standards such as MARC and EAD and follows conventions on naming authority designed by
25
the national library of congress. So these developments in the archival environment aren‟t
interesting for NBL&A.
9.2 Adopting IT Systems
Second step of the literature review was researching for new IT systems that can be used
for managing digital archives. While doing so, the following different systems were encountered:
 Enterprise Content Management Systems (ECMS)
 Digital Assets Management Systems (DAMS)
 Electronic Record management (ERM)
 Content Management Systems (CMS)
 Collection Management Systems (CMS)
 Document Management Systems (DMS)
 Integrated Library Systems (ILS)
Obviously there is a lot of overlap between systems, and not all of them match the
functional requirements set up by the NBL&A or can deal with the constraints. The following
paragraphs will perform a quick evaluation with case studies that specially study the ERM and
DAMS used by other organizations ranged from adapting existing library systems to developing
the whole system from scratch.
First, an ECMS is defined as, “the strategies, methods and tools used to capture, manage,
store, preserve, and deliver content and documents related to organizational processes. ECM
tools and strategies allow the management of an organization's unstructured information,
wherever that information exists” (AIIM, 2010). This encompasses document and record
management, groupware, web content, and business process management. An ECMS goes way
beyond the needs of the NBL&A, so we do not have to evaluate this tool.
A DAMS “consists of managing tasks and decisions surrounding the ingestion,
annotation, cataloguing, storage, retrieval and distribution of digital assets” (Jacobsen,
Schlenker, Edwards, 2005). A wide variety of systems can perform these tasks, however it is
agreed upon that central to this solution lays a database program (Peterashbyhayter, 2010). Some
26
examples of DAMS are Dspace, Fedora, GreenStone, and Eprints, etc. These systems match the
functional requirements given by the NBL&A and therefore should be explored further.
ERM is the “A computer-based facility for managing and controlling records throughout
the information life cycle” (Curaconsortium, 2010). An ERM will handle everything from
planning, to classifying, storing, securing, destruction to preservation and coordinating access.
However, the functionality of this type of this system is limited to managing a record as
“evidence” of an event and doesn‟t really allow for dissemination and manipulation. A lot of
organizations use ERM as a tool to be able to track documents and comply with government
regulation. Therefore this tool doesn‟t fit the bill for NBL&A and doesn‟t need to be evaluated
further.
A Content Management System can be defined in multiple ways. On the one hand it can
be a web Content Management System which allows for users without any technical knowledge
to manage the content of a website. On the other hand it can be an actual system to manage
documents for example for enterprises, media, learning, collections, mobile devices etc. A DMS,
which is a sort of a Content Management System, “indexes and profiles documents based on
content; controls documents using such functions as check in/checkout, version control, audit
trails, and security of information; and facilitates searching by profile values or by some other
hierarchical structure such as folders and files” (ischool Texas, 2010). Lastly a Content
Management System can also be described as a Collection Management System. A Collection
MS is “a piece of software that allows collecting institutions to manage data about their
collections and items they hold and are an integral part of managing the documenting
collections” (Collections Australia, 2010). A Collection MS thus describes, administers
information regarding collections, donors, and location etc. These types of systems are mostly
used by museums and archives, to literally manage their collections. However, in recent years,
due to digitization of museums, these collection management systems have begun moving
towards digital asset management and electronic record management. They keep records /
catalogues of their collection, describe their collections in detail, relate items to each other and to
history etc. Some of these systems thus have a good match with NBL&A system requirements
and should be further evaluated. A few examples of collection management systems are
27
Collective Access, Artlid and Gallery Systems. Based on our understanding, Collective Access
seems more appropriate for NBL&A.
Lastly, ILS are “enterprise resource planning systems for libraries, used to track items
owned, orders made, bills paid, and patrons who have borrowed” (Wikipedia, 2010). There are
many ILS, some proprietary like SirseDynex, Newgen and Exlibre and some open source like
CDS Invenio.
9.3. Case Studies:
A couple of case studies were researched to study other organizations which uses some of
the systems and technologies mentioned above. They range from adapting existing library
systems to developing the whole system from scratch.
At Portland State University Library (PSU) (http://vikat.pdx.edu/), capabilities of
Integrated Library System were expanded to accommodate the hierarchical structure found in
traditional archival finding aids. PSU uses Electronic Resources Management (ERM) from
Innovative Interfaces Inc. Brenner, Larsen, & Weston exploited ERM‟s ability to “replicate the
two-level hierarchical relationships between aggregators or publishers and the electronic and
print resources”. The authors, Brenner, Larsen, & Weston, admitted that the resource records
created were not as rich as those of traditional finding aids though. In the same article, the
authors also mentioned about the approaches used by the Library of Congress and University of
Washington. Library of Congress selectively adds records of individual items from their
archival collections to the OPACs (Online Public Access Catalog). That approach allows users
to have complete access to the items within the library‟s collections. However, that approach
hides the hierarchical relationship between items and the collections they belongs to. University
of Washington adopts a different approach; they put collection level MARC records in their
OPAC and are searchable like other bibliographic records. These collection level records are
then linked to finding aids with more complete description (Brenner, Larsen, & Weston, 2006).
The Washington Research Library Consortium (WRLC) (http://www.wrlc.org/ ) is a
consortium of eight libraries in the Washington DC metropolitan area. The WRLC member
libraries host quite a variety of unique special collections: manuscripts, photographs, slides, full-
text documents, magazines, comic books, audio recordings, video clips, and so on. Each special
28
collections is needed to be accessible from both WRLC‟s Digital and Special Collections Web
site and corresponding member library‟s special collections Web site. The digital objects also
need to be accessible through EAD finding aids. The public Web interface needs to be simple
enough for the new inexperienced users yet powerful enough for the experienced power
users. After evaluating and testing both commercial systems and open source software, Allison
B. Zhang and Don Gourley of WRLC could not find any system that met all of their
requirements. They finally decided to build a customized system by integrating best tools
available and chose Greenstone Digital Library software as a tool for presenting the digital
collections. Their customization with Greenstone involved designing and crating metadata,
designing plug-in to import metadata, working with configuration files, and defining numerous
macros (Zhang & Gourley, 2006).
10.Recommendations:
According to the results of the fit gap analysis and the literature review, tentative
recommendations can be drawn to mitigate the current and future needs of NBL&A. According
to Tennant (2008), it is not the right time to choose a new library related IT system since the
market is in a state of flux. It would be better to wait for a period of time before taking the leap.
However the requirement for any system should be standardized formats for data description,
metadata and storage. If cost is to be taken into consideration, using open source products would
be the way forward. A possible set of systems that can be implemented in the future2
are:
 Maintenance of the current system while modifying elements such as normalization of
databases.
 Implementation of Digital asset management system.
 Implementation of collection management system.
 Implementation of integrated library systems.
2
More information is provided in the presentation slides
29
11. Limitations:
The initial functional requirements provided by NBL&A were incomplete and
ambiguous. During the course of the project, initial assumptions were made about the metadata
structure and its implications on the future system. Certain assumptions were made about the
capabilities of the existing Horizon and Verity software applications. Quantifiable costs and
benefits need to be calculated for current and future systems that may be considered for
implementation. Literature review for this subject is limited, not every organization documents
its experiences. User experiences with the system need to be recorded in the future to make
better recommendations.
30
Appendix A:
Niels Bohr Library & Archives and Center for History of Physics
Digital Assessment Project
We are looking for:
1) A system that allows us to store materials in an organized manner, while retaining complex
relationships between some items;
2) A system that allows us to use and manipulate these digital items more creatively, with better
searching capabilities, and to help us present these items online.
Our system requirements are:
Complexity - can handle hierarchies and relationships between digital items; store flat, single-
item digital collections, in addition to collections of varying sizes that include multiple file
formats (i.e. email messages with attached documents, or a folder that contains a Word
document, a .PDF file and a video clip); see attached inventory for specific file types and storage
needs.
Searching - will allow all levels of searching, from simple keyword searches on file names to
full-text searching on OCRed documents; will allow access via Google or other search engine
results; will allow commenting or other user interaction.
Permanence - that digital items are stored in a non-proprietary format, in a system that will be
committed to long-term use at ACP or that can be easily migrated without losing formatting or
hierarchies.
Access - that staff members, as well as outside users and library patrons, can easily access digital
items without registration or log-ins; and that if needed, certain items could be restricted or
shielded from outside users.
Metadata - will be able to ingest existing metadata in batches, with little or no manual re-keying
Expansion - will expand to fit future needs of storage space, additional projects, complexity of
hierarchies, and possible linking of projects.
Standards - conforms to accepted standards in the professional archival community, on all the
points listed above; standards include MARC, EAD, XML
Support - will be installed, customized, maintained and otherwise supported by AIP Web
Development staff.
Cost - will not be prohibitive in costs, in either direct spending, necessary hardware to run the
program, or staff time and resources to use and support the program.
31
Current projects
Oral history interview project: scanned and OCRed interview transcripts; full-text searchability;
use of audio files and images; ability to continue to add more interviews indefinitely
Digitized archival materials: scanned manuscript collections consisting of TIFF images of text;
ability to link the digital images to an existing EAD/XML interface; ability to retain the
hierarchies of the items in the collection
Databases: the History Center is currently assembling a database of biographical information on
acclaimed physicists, then linking profiles together in multiples ways - according to professional
interests, research teams, educational background, etc.
Our users are:
Experienced users: Library/archives patrons, historians and other researchers who are already
familiar with our catalog and resources. The public interface must be familiar and consistent
with our other webpages.
New users: Items will be discovered through searches in our online catalog, through searches in
Google and other search engines, links in web exhibits, newsletters, press releases, Facebook
updates, etc. The interface must be intuitive, and easy to navigate back to the main catalog so
the user can start a new search or browse other relevant materials.
In-house use: content will be regularly accessed by library and archives staff.
Example of different online interfaces that inspire us:
The series description and box inventories of the Joseph Cornell papers at the Smithsonian
Archives of American Art. If you click on a digitized folder, this collection also has a nice
interface for clicking through the digital images -
http://www.aaa.si.edu/collectionsonline/cornjose/series1.htm
The interactive finding aid of the Aldo Leopold papers at the University of Wisconsin - Madison
http://digicoll.library.wisc.edu/cgi/f/findaid/findaid-
idx?c=wiarchives;cc=wiarchives;view=text;rgn=main;didno=uw-lib-leopoldpapers
32
Reference:
AIIM. 2010, April 01. What is Ecms. Retrieved from http://www.aiim.org/What-is-ECM-
Enterprise-Content-Management.aspx
AIP. 1998, May. Searchable database of photo's. Retrieved from
http://www.aip.org/history/newsletter/fall98/esvaweb.htm
AIP a. 1999, Spring. New integrated library system. Retrieved from
http://www.aip.org/history/newsletter/spr99/ils.htm
AIP b. 1999, December. New online catalog for niels bohr library online. Retrieved from
http://www.aip.org/history/newsletter/spring2000/nbl.htm
AIP. 2000, Fall. Finding aids to major collections online. Retrieved from
http://www.aip.org/history/newsletter/fall2000/findaid.htm
AIP. 2002, Fall. New format: emilio segre visual archives. Retrieved from
http://www.aip.org/history/newsletter/fall2002/esva.htm
AIP. 2005, Spring. Enhancing web access to library catalog. Retrieved from
http://www.aip.org/history/newsletter/spring2005/catalog.htm
AIP. 2006, Spring. Improved online visual archives. Retrieved from
http://www.aip.org/history/newsletter/spring2006/visualarchives.htm
AIP. 2007, Fall. Major collection of oral history interviews mounted online. Retrieved from
http://www.aip.org/history/newsletter/fall2008/oral-history.html
AIP. 2008, Fall. Digitizing the samuel goudsmit papers. Retrieved from
http://www.aip.org/history/newsletter/current/digitizing_goudsmit.html
AIP a. 2010, April 01. About AIP, Retrieved from http://www.aip.org/aip
AIP b. 2010, April 01. AIP: A federation of physical sciences, Retrieved from
http://www.aip.org/aip/societies.html
AIP c. 2010, April 01. Annual report 2009, Retrieved from http://www.aip.org/aip/reports.html
Bak, G. Armstrong, P. 2009. Points of convergence: seamless long-term access to digital
publications and archival records at library and archives Canada. Archival Science, Vol 8: p279-
293.
Bearman, D. 1991. Hypermedia and interactivity in museums: Proceedings of an international
conference -------------- 1992, Documenting Documentation, Archivaria, 34 Summer.
33
Brandeis Institutional Repository Planning Documents, 2006 - 2007. (n.d.). Retrieved April 1,
2010, from Brandeis Institutional Repository: http://dcoll.brandeis.edu/handle/10192/21866
Brenner, M., Larsen, T., & Weston, C. (2006). Digital Collection Management through the
Library Catalog. INFORMATION TECHNOLOGY AND LIBRARIES , 65-77.
Cohen, P. 2010. Fending off digital decay, bit by bit. New York Times, March 15
http://www.nytimes.com/2010/03/16/books/16archive.html?scp=1&sq=digital%20archives&st=c
se
Collections Australia. 2010, April 01. Collection management system. Retrieved from
http://www.collectionsaustralia.net/sector_info_item/7
Curaconsortium. 2010, April 01. Glossary of information management terms. Retrieved from
http://www.curaconsortium.co.uk/glossary.html
Hodge, G. M. 2000, January. Best Practices for digital archiving. Retrieved from
http://www.dlib.org/dlib/january00/01hodge.html
Ischool Texas . 2010, April 01. Glossary. Retrieved from
www.ischool.utexas.edu/~scisco/lis389c.5/email/gloss.html
Jacobsen, J. Schlenker, T. Edwards, L. 2005. Implementing a Digital Asset Management System:
For Animation, Computer Games, and Web Development. Focal Press, Burlington, MA.
Joerling, K. 2010, March 19. The Truth and consequences of dod certification. Retrieved from
http://www.incontextmag.com/article/The-truth-and-consequences-of-DoD-certification
Kaplan, D. (2009). Choosing a Digital Asset Management System That's Right for You. Journal
of Archival Organization , 33-40.
Kurtz, M. (2010). Dublin Core, DSpace, and a Brief Analysis of Three University Repositories.
NFORMATION TECHNOLOGY AND LIBRARIES  , 40-46.
NBL a. 2010, April 01. About the Niels Bohr Library & Archives, Retrieved from
http://www.aip.org/history/nbl/about.html
NBL b. 2010, January 20 Niels Bohr Library & Archives and Center for History of Physics,
digital assessment project, Retrieved from internal meetings
Peterashbyhayter. 2010, April 01. Photographic glossary. Retrieved from
http://www.peterashbyhayter.co.uk/glossaryD-E.html
Wikipedia, (2010, April 01). Integrated library system. Retrieved from
http://en.wikipedia.org/wiki/Integrated_library_system
34
Schneider, K. (2007, January 19). IT and Sympathy. Retrieved April 1, 2010, from ALA
TechSource: http://www.alatechsource.org/blog/2007/01/it-and-sympathy.html
Zhang, A. B., & Gourley, D. (2006). Building Digital Collections Using Greenstone Digital
Library Software. Internet Reference Services Quarterly , 11 (2), 71-89.

More Related Content

Similar to Information Audit Project

Information communication technology (ict) the spine of research institutes l...
Information communication technology (ict) the spine of research institutes l...Information communication technology (ict) the spine of research institutes l...
Information communication technology (ict) the spine of research institutes l...ramesha b
 
Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Martin Donnelly
 
Supporting The Health Researcher Of The Future
Supporting The Health Researcher Of The FutureSupporting The Health Researcher Of The Future
Supporting The Health Researcher Of The FutureAndy Tattersall
 
Sources of scientific information
Sources of scientific informationSources of scientific information
Sources of scientific informationMuhammad Ghous
 
Digital Services Division & The Biodiversity Heritage Library
Digital Services Division & The Biodiversity Heritage LibraryDigital Services Division & The Biodiversity Heritage Library
Digital Services Division & The Biodiversity Heritage LibraryMartin Kalfatovic
 
Digital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchDigital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchMartin Donnelly
 
Digital & Discovery @ Smithsonian Libraries 2013
Digital & Discovery @ Smithsonian Libraries 2013Digital & Discovery @ Smithsonian Libraries 2013
Digital & Discovery @ Smithsonian Libraries 2013Martin Kalfatovic
 
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)dri_ireland
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Martin Donnelly
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigmstarastar
 
Research Data Management: a gentle introduction
Research Data Management: a gentle introductionResearch Data Management: a gentle introduction
Research Data Management: a gentle introductionMartin Donnelly
 
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...lljohnston
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019heila1
 
Research Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staffResearch Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staffMartin Donnelly
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
Archives Practicum
Archives Practicum Archives Practicum
Archives Practicum E. Murphy
 
بنك المعرفة-المصرى
بنك المعرفة-المصرىبنك المعرفة-المصرى
بنك المعرفة-المصرىghadeermagdy
 

Similar to Information Audit Project (20)

Information communication technology (ict) the spine of research institutes l...
Information communication technology (ict) the spine of research institutes l...Information communication technology (ict) the spine of research institutes l...
Information communication technology (ict) the spine of research institutes l...
 
Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?
 
Supporting The Health Researcher Of The Future
Supporting The Health Researcher Of The FutureSupporting The Health Researcher Of The Future
Supporting The Health Researcher Of The Future
 
Sources of scientific information
Sources of scientific informationSources of scientific information
Sources of scientific information
 
Digital Services Division & The Biodiversity Heritage Library
Digital Services Division & The Biodiversity Heritage LibraryDigital Services Division & The Biodiversity Heritage Library
Digital Services Division & The Biodiversity Heritage Library
 
Digital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchDigital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening Research
 
Digital & Discovery @ Smithsonian Libraries 2013
Digital & Discovery @ Smithsonian Libraries 2013Digital & Discovery @ Smithsonian Libraries 2013
Digital & Discovery @ Smithsonian Libraries 2013
 
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms:
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigms
 
Research Data Management: a gentle introduction
Research Data Management: a gentle introductionResearch Data Management: a gentle introduction
Research Data Management: a gentle introduction
 
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
Open access (1)
Open access (1)Open access (1)
Open access (1)
 
Research Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staffResearch Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staff
 
New university of lisbon, portugal
New university of lisbon, portugalNew university of lisbon, portugal
New university of lisbon, portugal
 
Visibility and internationalization USARB Through Institutional Repository
Visibility and internationalization USARB Through Institutional Repository Visibility and internationalization USARB Through Institutional Repository
Visibility and internationalization USARB Through Institutional Repository
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Archives Practicum
Archives Practicum Archives Practicum
Archives Practicum
 
بنك المعرفة-المصرى
بنك المعرفة-المصرىبنك المعرفة-المصرى
بنك المعرفة-المصرى
 

Information Audit Project

  • 1. 1 COLLEGE OF INFORMATION STUDIES, UNIVERSITY OF MARYLAND, COLLEGE PARK Information Audit Project INFM 736 – Information Management Team Experience Organization: Niels Bohr Library & Archives Akashdeep Ray, Jeroen de Lange, Nishita Thakker, Thet Oo, You Zheng 5/13/2010
  • 2. 2 Table of Content Exectutive Summary........................................................................................................... 5 1.Introduction...................................................................................................................... 6 2.Background Information.................................................................................................. 6 3.Project Rationale.............................................................................................................. 7 4.Information Audit ............................................................................................................ 8 5.Methodology.................................................................................................................... 9 5.1 Interviews................................................................................................................... 9 5.2 Litrature Review ........................................................................................................ 9 6.Business Process Maps .................................................................................................. 10 6.1 Collections/Manuscripts ......................................................................................... 10 6.2 Photo Collections.................................................................................................... 13 6.3 Oral Histories.......................................................................................................... 14 6.4 Books ...................................................................................................................... 15 7. Existing IT Structure..................................................................................................... 16 8. Fit Gap Analysis ........................................................................................................... 18 8.1 Organization............................................................................................................ 19 8.2 Usability.................................................................................................................. 23 9. Literature Review.......................................................................................................... 24 9.1 Standards and Policies ............................................................................................ 24 9.2 Adopting IT Systems .............................................................................................. 25 9.3 Case Studies............................................................................................................ 27 10. Recommendations....................................................................................................... 28 11. Limitations.................................................................................................................. 29 Appendix........................................................................................................................... 30 References......................................................................................................................... 32
  • 3. 3 List of Diagrams Diagram 1 Business Process of Collections/Manuscripts................................................. 10 Diagram 2 Business Process of Photo Collections ........................................................... 13 Diagram 3 Business Process of Oral Histories ................................................................. 14 Diagram 4 Business Process of Books.............................................................................. 15 Diagram 5 Existing IT Structure....................................................................................... 17
  • 4. 4 List of Tables Table 1 Introduction of IT systems................................................................................... 16 Table 2 Different file formats used by NBL&A............................................................... 21 Table 3 Overview of Metadata collected by NBL&A in ICOS........................................ 21 Table 4 Overview of Metadata used by NBL&A outside ICOS…………………………22
  • 5. 5 Executive Summary: The project audits the digital assets of Neils Bohr Library and Archives, mapping the business process flows for each of its information assets. This information audit has been conducted by a team of graduate students from the University of Maryland as part of a team capstone project. Information was gathered through interviews conducted at the organization with various staff members responsible for cataloguing and archiving different types of assets. A comparative literature review has also been done to understand the current industry trends in the information technology and processes used by libraries and archives. Using this information, business process maps were created. Through this, an overall picture of the current situation at the organization emerges. A fit gap analysis identifying the gaps in the current system to address the organization‟s need was also examined. Some of the key findings of the analysis were that multiple systems were used for various digital assets, work organization exists in silos, cumbersome HTML links between different collections, failure to store and organize different types of collections (single/multi item) in an integrated manner, manual data reentry, lack of customized access control and a lack of unified search engine for various digital assets. The IT systems for digital asset management in the library and archives environment are in a dynamic state which makes it a volatile buying decision. Tentative recommendations are to adopt common and open industry standards in data. Open source technology should be adopted; however with any selected system technical expertise would be required for customization based on organization needs.
  • 6. 6 1. Introduction: The aim of the project was to perform an Information Audit for the Niels Bohr Library & Archives (NBL&A), by thoroughly mapping the business processes to identify problems with the existing information environment, perform an in-depth problem analysis and offer a range of broad recommendations. The report first explains the project rationale, defines the problem statement and scope; defines the analysis approach and methodology. The business processes maps are explained for certain members of the organization based on the information digital assets they associate with at the NBL&A. The current IT platform for the digital assets is explained along with its existing problems. The fit-gap analysis maps the current needs with the ideal system and how it can be used to resolve certain issues. A few broad ranges of solutions are provided to the NBL&A to improve their business processes. 2. Background Information: AIP is a non-profit organization, which “promotes the advancement and diffusion of knowledge of physics and its application to human welfare”. In order to accomplish its mission, AIP supports ten physics and astronomy societies (i.e. American Astronomical Society, American Physical Society, Society of Rheology etc; AIP, 2010b) with publishing, membership administration, organizing exhibits, and conferences. Moreover, AIP also supports individual scientists, students and the general public by offering a career network; preserve the history of physics, and educating or support teachers in making known the history of physics. However, AIP‟s core business is in publishing and selling advertisements in it 50 journals, which earned them $77.2- million in 2009 (AIP, 2010c). The Niels Bohr Library & Archives and the Center for History of Physics are divisions of the American Institute of Physics that share a common mission: to help preserve and make known the history of modern physics and allied sciences. The Library & Archives serves both as a repository and a clearinghouse for information in the history of physics, astronomy, geophysics and allied fields. In-house holdings include an outstanding collection of textbooks, monographs, biographies, and related publications, dating mostly from ca. 1850–1950; over 30,000 photographs and other images; ca. 1,000 oral histories with many of the outstanding figures in the fields that we cover; and archival records of AIP and its Member Societies along with other
  • 7. 7 archival records and personal papers of a select number of scientists. All of these materials are indexed online. As a clearinghouse, NBL&A maintain and update the International Catalog of Sources for the History of Physics and Allied Sciences (ICOS for short), which contains descriptions of over 9,000 archival and manuscript collections, oral history interviews, and other primary sources in our fields at ca. 900 repositories worldwide. As part of its efforts, the NBL&A actively encourages scientists‟ home institutions to support archival programs that preserve their papers and the institution‟s history. NBL&A also preserves the records of AIP and its Member Societies and occasionally the papers of individuals like Goudsmit, whose papers don‟t have a natural home elsewhere. 3. Project Rationale: This information audit was started for several reasons. First of all the status quo is changing as AIP has adopted a publishing-based Content Management System called Polopoly. Because, AIP has a centralized IT strategies, and limited resources (staff) they cannot provide support to NBL&A‟s unique IT needs. Second, the NBL‟s current IT systems are not adequate for their future needs / goals. This is the result of various factors such as an ad-hoc IT strategy over the years, expansion of their collection due to grant-based projects such as Oral History Interview and Goudsmit‟s Digitization Project, increasing diversity of user base, and the rise of born digital files. Over the last year the NBL&A have tried provisioning several systems that they felt would fulfill the requirements but these systems fell far short of the expectations. In order to determine what type of system would work for their purposes they need to carefully assess their needs, current situation and processes. Thus, our aim was to understand all the existing processes to identify NBL&A‟s information needs to help them make a more well-informed decision.
  • 8. 8 4. Information Audit: Information audit is an analysis technique used for the assessment of information needs and assets. The information audit suggested for this project assessed the needs and co-related them with the current information landscape at the NBL&A. Based on this audit, one can determine if the existing information environment is aligned with the goals and objectives of the organization. The audit helped understand the possible solutions to improve the existing conditions keeping in mind the constraints that may exist for the organization. Several tools will be used while performing the information audit:  Business process maps – visualize the inflow and outflow of information, type/format of information, physical/logistical location of information and identify potential issues and inefficiencies.  Use case and activity diagrams – depict key users and dependencies within the organization.  Fit gap analysis – check actual performance of NBL against desired performance. Describe gap to identify needs, purposes and objectives. Information maps and business process maps allow us to gain in-depth knowledge of the current assets and processes. While identifying the current system, the information environment as a whole got mapped out, giving unique insights about the existing IT structure at the NBL&A. Activity diagrams were targeted at detailed workflows with stepwise actions of various organization members. These members‟ workflows are described in detailed to understand the behaviors involved with various digital assets as categorized by the NBL&A. While diagrams helped us better understand information flows within the NBL&A, they assisted in identifying the roadblocks involved in various processed. Moreover, once the problems are clearly defined potential solutions can be thought about which will indentify future system requirements. After mapping out the information environment, a fit gap analysis was done to check the performance of the ongoing activities and procedures at the NBL&A against the desired performance or “ideal scenario”. The first step of the analysis was to describe the desired situation that would be ideal for the activities at NBL&A. The second step was to compare the
  • 9. 9 ideal situation with the current situation, describing what “fits” and what poses a “gap”. This analysis put together can be used to set minimum system requirements and constraints for potential solutions. 5. Methodology: In order to perform in depth problem analysis, we need accurate information about the current information needs and assets, thorough understanding about various business processes and ongoing activities at the NBL&A. The team started out with a few group meetings and then conducted multiple rounds of personal interviews. During the course of the project, relevant literature was reviewed to understand the conditions at other libraries and archives. 5.1. Interviews: The project team conducted multiple rounds of structured interviews with 10 organization members. The interviews focused on the day-to-day activities of the organization members and how they interact with various digital assets of the NBL&A. This gave us knowledge about the in- and out-flows of different types of information. Moreover, we tried to identify existing roadblocks in each member‟s processes and understand their expectations of the potential information structure. 5.2. Literature Review: The relevant literature on information management and information audit provided a structure for the project and its tasks to help us deal with the problem at hand in a systematic manner. In the last decade there have been many changes going on within libraries and archives around the world with the growing need for digitization. The literature review involved looking at NBL&A‟s peers to identify the various systems in use and their processes. This kind of review not only helped us but will provide a comparable view for the NBL&A to co-relate problems with potential solutions.
  • 10. 10 6. Business Process Maps: This section describes the various business processes at the NBL&A based on the information gathered from interviewing organization members. These are categorized based on the digital assets categories pre-defined at the NBL&A. 6.1. Collections/Manuscripts: Diagram1. Business Process Maps for Collections/Manuscripts
  • 11. 11 NBL&A does not aim at archiving collections at their location, but seeks to preserve them at the most appropriate repository. This is done by maintaining a database of contacts and other information related to physicists, soliciting the respective institutions where the researchers worked to take possession and archive research papers, and perform a regular tracking of obituaries in newspapers for physicists. The researchers‟ names are checked with the Library of Congress database and cases are labeled according to the established unified name. There can be multiple number of cases open at the same time; however priorities are set based on the importance of the researcher (N – Nobel Prize winner, W – very important, Q – anything else). These cases remain open for investigation based on the importance, for example cases labeled „Q‟ remain open for six months. The universities or the people contacted are also recorded into the database. As a last resort if a suitable home for a researcher‟s work and materials are not found, they are taken in by the NBL&A. Collections, which may include manuscripts, videotapes, prints, and films, are donated to NBL&A in the form of personal delivery or mail, uploaded digitally through the File Transfer protocol (FTP) server, and sent as CDs/DVDs or email. They are then sorted based on the level of copyright restrictions giving priority to less copyright restrictive and important files. Digital files are checked for the format of the file – whether they can be accessed using the existing software or not. Document files are usually in the form of .rtf or .doc format. Audio files are usually in .mp3 format. If any of these files cannot be opened by the existing software, then help is sorted from the IT department. These digital files are stored in the preservation server. As for physical collections, they are stored in acid free boxes. Video tapes are not given priority for archiving since they are likely to deteriorate quicker. The consequent step is to catalogue the materials in two databases – one is a NBL&A database accessed through Microsoft Access (MS Access) forms and the second is an Integrated Library Cataloguing System known as Horizon. Much of the information stored in both the databases is same with some extra fields in the Horizon module. The information in the Horizon database is according to the MARC1 format and standards. An initial part of the cataloguing process is to thank the donors and record the details of deed of gift letter received from the donors. This letter or document contains the copyright permissions given to NBL&A. This information is also entered into the databases 1 The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form.
  • 12. 12 along with other data such as description, donor info, unique catalogue number, and name of the collection (with naming conventions as described by the Library of Congress). The Horizon application has the added advantage of making the records searchable online through the ICOS search module on the NBL&A website. The records may contain links to a finding aids HTML page. The process of creating finding aids also starts from the MS access forms, however it depends on a couple of conditions – one, whether the collection is important enough to create a finding aid and two, whether the collection is large enough to create a finding aid during a project cycle. The creation of finding aids is a continuous process which can take place at any time after the collections has been catalogued. As stated earlier, the content for finding aids is first entered into access forms (known as the EAD module). The data entered is then converted into .xml and .html files through a series of steps that includes exporting the data into Dreamweaver and finally converting them into web pages (with frame, without frame format) using scripts. These finding aids can therefore be published on the website. The subsequent web page link for the finding aid is entered into Horizon record for the same collection.
  • 13. 13 6.2. Photo Collections Diagram 2. Business Process Map for Photo Collections The NBL&A photo archive is known as the Emilio Segre Visual Archives (ESVA) names after Emilio Segre, an eminent physicist. While the biggest donors to the photo archives have been Physics Today (since 1940) and AIP, photos are donated from other sources on a regular basis. Precedence is given to recently donated photographs while other photographs from the backlog are worked on round the year. The digital photos are either sent over email, uploaded to the FTP server or physically through mail. A thank you note is sent to the donor in response to the deed of gift document. Based on the copyright permissions, photos are scanned and stored in a temporary folder in the R drive (located on the NOVELL server) under the donor‟s name. The photos are then catalogued through Access forms into a MS SQL Server Database. All copyright information is entered into the database. Any data entered into the database contributes towards the searching capability of the digital photograph. Meanwhile, the photos are converted into common file formats. The resolution is decreased using Photoshop and the files are copied onto NAS server at Melville. These files are cross mounted onto the production server APP1, through which the photos are displayed on the web pages. As part of the step to link the metadata to the picture files, the file name of the picture should match the file name entered in the metadata record.
  • 14. 14 6.3. Oral Histories Diagram3. Business Process Map for Oral Histories The digitization of the oral transcripts starts from checking physical folders of the transcripts with old labels. The collection is checked for uniformity of cataloguing information with the records in the NBL&A database and the ICOS records in the Horizon database. The convention is also checked with existing Library of Congress Authority of Records. The transcripts are converted into digital format using Microsoft OCR (Optical Character Recognition) software scan and uploaded into an FTP server. These files are in Microsoft Word formats which then pass through Dreamweaver scripts to convert them into .xhtml files. These can be uploaded into Host A server, where the files are linked into web pages. Therefore, they can be accessed online. Oral transcript collections whose cataloguing information mismatched with the ICOS records are then renamed with new labels and put back into the folders to then go through the overall process.
  • 15. 15 6.4. Books Diagram4. Business Process Map for Books As an integrated library system, the acquisition of books is tracked via an acquisition record made in Horizon. This record is printed out and sent to vendors for buying. The budget allocated for books in a year is $3000 or they can be received as a gift. International standardized catalogue information of the book is downloaded from WorldCat and OCLC records in the form of .dat forms. These are then imported into Horizon. Since, there is a backlog of about 1000 books; they are placed on the shelves in the archives and are identified and recorded with the shelf number in horizon. Books that are taken out to be placed on the library are checked with the horizon records for errors. A unique call record is added for the book based on the author name followed by placing them on the rack in the library.
  • 16. 16 7. Existing IT Structure: In serving the public and its member societies the NBL&A has had a very ad-hoc IT strategy over the years. As events occurred and grants where received, the NBL&A chose different type of IT systems based on short-term needs (Table 1). Year Event Reason 1998 Visual Archives Commercial 1999 Online Public Access Catalog Usability 1999 Integrated Library System Horizon Year 2000 bugs 1999 Introduction of NBL access database Record more information on collections 2000 Introduction of Finding Aids Consortium / Grant 2002 Update of Visual Archives (Oracle) Unstable, server update 2004 Updated Horizon to 7.3.3 - 2005 Made ICOS Google searchable Usability 2006 Update of Visual Archives (MYSQL) Unstable 2007-2009 Oral History Interviews put online Grant 2008-2009 Goudsmit project Grant Table1. Introduction of IT systems (AIP, 1998, 1999a, 1999b, 2000, 2002, 2005, 2006, 2007, 2008) The following paragraphs will describe the IT systems the NBL&A uses, daily. A complete overview can be found in the following diagram.
  • 17. 17 Diagram 5 Existing IT Structure The International Catalog of Sources is part of an integrated library system called Horizon, designed to catalog books, serials, and collections and manuscripts. The Horizon library
  • 18. 18 system runs on a dedicated server (LibServer) in Melville, NY and is searchable by a built-in search engine called Dynex. Horizon is also used for acquisitions of books, but most of its functions are unused. The photographs are the only digital assets archived separately in the Emilio Segre Visual Archives (ESVA) a separate website, and are being commercially sold online via a software layer, that is connected to a MS SQL database. The ESVA runs on a NAS server in Melville, NY. The ESVA can be searched via a quick query search and an advanced search (federated search). Oral history interviews, finding aids, newsletters etc are hosted on HostA web server in Melville, NY and are an active part of the website. The website and the photographs are searchable via a Google and Google custom search. The NBL&A also employs a federated search engine called Varity. This federated search indexes the ICOS, ESVA and website. Google doesn‟t index the ICOS, it used to do that, however that stopped working. The previously discussed systems are all online and accessible to the public. The NBL&A also uses a Microsoft Access database internally to record information about its collections. This MS Access database is located on a Novell server in College Park, MD. This server also hosts all digital files (photo‟s, OHI, etc) permanently, this server is also known as the “preservation server”. 8. Fit Gap Analysis: The fit-gap analysis aims at comparing the desired situation with the ongoing situation and describes the gap between the two. First, is the description about what the NBL&A want from a system, what aspects of it do they already have and thus identify the gaps between the two. The following is a summary of information system requirements based on a NBL&A in- house team‟s vision to integrate the diverse digital collections:  Organization - an ability to organize the stored data while preserving the complex relationships among them.
  • 19. 19  Usability - information should be accessible to the stakeholders, visible to the internet search engines. This accessibility however is a controlled activity – NBL&A should be able to control the flow of interaction.  Portability - Information assets needs to be standardized and in non proprietary format which makes it easier to migrate them when the system is changed or expanded to include newer features.  Cost - should not prohibitive in terms of economic and manpower costs.  Please see Appendix A for full description of functional requirements as described by the NBL&A. The NBL&A wants an integrated system to organize, store, and disseminate the collection while persevering the complex relationships between the digital assets. The following fit-gap analysis will look at the two broad requirement categories – organization and usability. 8.1. Organization: When discussing organization of collections and preserving the complex relations among the files we thought of a couple of implications. First the new system should be able to store and organize different types of collections such as flat, single-item collections and hierarchical multiple-item collections. Currently, different types of collections and file types come with varying attributes or metadata. Larger multi-item collections are only described as a whole in a catalogue record and potentially in a finding aid (a finding aid may/may not be created for a collection). Also, such collections do not have item-level metadata (Meta-light). On the other hand, single-item collections (such as photo collections) are described on an item-level in the file‟s own metadata. Second, the new system should be able to handle varying file formats and sizes depending on the type of digital asset. Different digital assets/ items (such as books, transcripts, manuscripts, photos, etc) require different metadata and thus have different types of catalogue records. According to Table 2, Table 3 and Table 4, different collections require different type of descriptions and follow different standards. The future system thus needs to be able to make distinctions between different types of collections and accommodate each collection‟s individual needs. Thus, the system must have different templates/ forms to input data about different
  • 20. 20 collections. With the various digital collections, there are many procedures followed before the digital records are made accessible. The future system could simplify these pre-posting procedures and thus avoid redundancy. Since the individual collections have very different ways of being stored, catalogued and accessed, it has been difficult to present an integrated search for any user. Currently, different systems are being used for different digital assets and these assets do not communicate with each other as they are treated like individual entities. This is not only reflected by the system but also in the procedures, policies, redundancy of work performed by organization members associated with different digital assets, etc. As seen in the diagram in section 7, the NBL&A operates a separate database to record additional information regarding archival collections, and writing finding aids. Moreover, this database is used for administrating the use of the collections. The current structure of the database is unorganized, confusing and not scalable. Tables are not normalized in anyway, making the database slow and potentially causing insertion, update and deletion anomalies. This database stands completely separate from the online systems, and causes redundancies and inefficiencies in workflow. Information has to be manually re-entered into the integrated library system. The database and the online systems are only linked by the common catalog number which is being used as a unique identification number. The digital assets at the NBL&A are not well integrated. They use HTML links to preserve relationships among different collections such as Oral History Interviews „linked‟ to its respective catalog record and collection specific finding aids to their catalog record. In the future, the Goudsmit Digitization Project will also be linked to the existing finding aids and catalog record via HTML. HTML links are very cumbersome, since they had to be placed manually into the catalog records; they are error sensitive and not easily transferable to new systems. Lastly the NBL&A physically saves all of its born digital files on a preservation server. However, the metadata is stored in the catalog records using the integrated library systems, which is physically located on a separate server (LibServ) altogether. Also, the NBL&A database stores metadata. However, that is only for internal use. The photos in the ESVA have their metadata stored in the SQL Database in the MS SQL Server. These constraints make the current system hard to migrate into a new system.
  • 21. 21 File type File extensions Text Txt, pdf, doc, rtf, wpd, email, indd, access database logs, Photo‟s Tiff, jpg, pdf, Audio Wav, mp3 Table2. Different file formats used by NBL&A Table3. Overview of Metadata collected by NBL&A in ICOS ICOS Books ICOS Archive ICOS OHI Title Name Name Author, date Author, date Interviewer, interviewee, date Publisher Description, size Description, size Call No Owner Use and reproduction Description, size, Country Owner ISBN Biography / History Country Added Author Scope of Material Notes Location Notes Added Author Collection Added Author Genre Terms Status Location Location Subjects Collection Collection Edition Status Status Source of Acquisition Subjects Subjects
  • 22. 22 Photo’s NBL&A Access DB OHI transcript Finding Aids Catalog nr Accession Date Name Name Description Accession Type Copyright Publisher, address Date Accession Nr Origin Date Credit Old accession NR Interviewee Encoding information Names Items in Accession Interviewer Location Begin Location Location, interview Title and dates of collection End Location Date Papers created by Other Location Abstract Size Oversize Location Sessions Short description Main Entry Language Member Society Selected search terms / subjects Title Historical note Collection ID Scope and content of collection Description Organization and arrangement of collection Notes Access restrictions Begin Date Restrictions of use End Date Provenance and acquisition information Linear Feet Processing information Proc Priority Other related
  • 23. 23 materials ICOS Nr Container list Donor Name Restrictions Description of Restrictions Deed sent Notes Date deed send Date deed returned Thank you sent, date *more information collected in different tabs. Table4. Overview of Metadata used by NBL&A outside ICOS 8.2. Usability: The second aspect of the fit-gap analysis is the usability of the system. The systems usability has implication on all of its users. Users are broadly classifies as researchers, non- researchers and the staff at NBL&A. Thus, we have the general public (researchers and non- researchers) who want to search the collection and we have the NBL&A employees who manage the collection. For the general public it is important that the various digital assets are searchable, as a collection on the whole, from main search engines such as Google and Yahoo. The current NBL&A website uses a total of 4 different search engines, none of which are able to present a complete overview of their collection. Search should be general keyword based or in depth by using various search categories and any way the user desires. For NBL&A employees the system should be standardized and easy to use and independent of the type of digital asset. The system should allow them to manipulate collections
  • 24. 24 and disseminate it in various ways such as exhibitions, mobile devices, photo of the months, etc. The system should have a user-friendly interface that can be used by organization members with/without technical expertise. Moreover, the system should be able to control and record which organization members get access to what types of files. This will help maintain uniformity and help track changes that are made to various files. 9. Literature review The goal of this literature review is to provide a baseline understanding of the current state of research and practice in the management of digital assets in the archives and libraries, particularly regarding the integration of various digital assets. 9.1 Standards and Policies In the archival environment, there is a lot of debate about standardization and policies. Some institutions are trying to set up own standards and policies, which differ from very strict Electronic Records Management Systems that need actual certifications and are very rigid to simple policies and tips (Joerling, 2010). Here is an overview of some of the policies:  Trustworthy Repository Audit Certification (TRAC)  Trusted digital repository (TRD)  Open Archival Information System Reference Model (OAIS), NASA  Information Life cycle approach (Hodge, 2000)  Department of Defense 5015.2  Model Requirements for the Management of Electronic Records (MoReq2), EU  Victorian Electronic Record Strategy (VERS), Australia  Document Management and Electronic Archiving (DOMEA), Germany  Records, Document and Information Management System (RDIMS), Canada  International Standard Archival Authority Record (ISAAR), International Council on Archives The NBL&A doesn‟t need to comply with any governmental mandate, and therefore doesn‟t need to be certified. Most of these policies are very strict and would mean too much bureaucracy and red tape. Moreover, the NBL&A already complies with some important archival standards such as MARC and EAD and follows conventions on naming authority designed by
  • 25. 25 the national library of congress. So these developments in the archival environment aren‟t interesting for NBL&A. 9.2 Adopting IT Systems Second step of the literature review was researching for new IT systems that can be used for managing digital archives. While doing so, the following different systems were encountered:  Enterprise Content Management Systems (ECMS)  Digital Assets Management Systems (DAMS)  Electronic Record management (ERM)  Content Management Systems (CMS)  Collection Management Systems (CMS)  Document Management Systems (DMS)  Integrated Library Systems (ILS) Obviously there is a lot of overlap between systems, and not all of them match the functional requirements set up by the NBL&A or can deal with the constraints. The following paragraphs will perform a quick evaluation with case studies that specially study the ERM and DAMS used by other organizations ranged from adapting existing library systems to developing the whole system from scratch. First, an ECMS is defined as, “the strategies, methods and tools used to capture, manage, store, preserve, and deliver content and documents related to organizational processes. ECM tools and strategies allow the management of an organization's unstructured information, wherever that information exists” (AIIM, 2010). This encompasses document and record management, groupware, web content, and business process management. An ECMS goes way beyond the needs of the NBL&A, so we do not have to evaluate this tool. A DAMS “consists of managing tasks and decisions surrounding the ingestion, annotation, cataloguing, storage, retrieval and distribution of digital assets” (Jacobsen, Schlenker, Edwards, 2005). A wide variety of systems can perform these tasks, however it is agreed upon that central to this solution lays a database program (Peterashbyhayter, 2010). Some
  • 26. 26 examples of DAMS are Dspace, Fedora, GreenStone, and Eprints, etc. These systems match the functional requirements given by the NBL&A and therefore should be explored further. ERM is the “A computer-based facility for managing and controlling records throughout the information life cycle” (Curaconsortium, 2010). An ERM will handle everything from planning, to classifying, storing, securing, destruction to preservation and coordinating access. However, the functionality of this type of this system is limited to managing a record as “evidence” of an event and doesn‟t really allow for dissemination and manipulation. A lot of organizations use ERM as a tool to be able to track documents and comply with government regulation. Therefore this tool doesn‟t fit the bill for NBL&A and doesn‟t need to be evaluated further. A Content Management System can be defined in multiple ways. On the one hand it can be a web Content Management System which allows for users without any technical knowledge to manage the content of a website. On the other hand it can be an actual system to manage documents for example for enterprises, media, learning, collections, mobile devices etc. A DMS, which is a sort of a Content Management System, “indexes and profiles documents based on content; controls documents using such functions as check in/checkout, version control, audit trails, and security of information; and facilitates searching by profile values or by some other hierarchical structure such as folders and files” (ischool Texas, 2010). Lastly a Content Management System can also be described as a Collection Management System. A Collection MS is “a piece of software that allows collecting institutions to manage data about their collections and items they hold and are an integral part of managing the documenting collections” (Collections Australia, 2010). A Collection MS thus describes, administers information regarding collections, donors, and location etc. These types of systems are mostly used by museums and archives, to literally manage their collections. However, in recent years, due to digitization of museums, these collection management systems have begun moving towards digital asset management and electronic record management. They keep records / catalogues of their collection, describe their collections in detail, relate items to each other and to history etc. Some of these systems thus have a good match with NBL&A system requirements and should be further evaluated. A few examples of collection management systems are
  • 27. 27 Collective Access, Artlid and Gallery Systems. Based on our understanding, Collective Access seems more appropriate for NBL&A. Lastly, ILS are “enterprise resource planning systems for libraries, used to track items owned, orders made, bills paid, and patrons who have borrowed” (Wikipedia, 2010). There are many ILS, some proprietary like SirseDynex, Newgen and Exlibre and some open source like CDS Invenio. 9.3. Case Studies: A couple of case studies were researched to study other organizations which uses some of the systems and technologies mentioned above. They range from adapting existing library systems to developing the whole system from scratch. At Portland State University Library (PSU) (http://vikat.pdx.edu/), capabilities of Integrated Library System were expanded to accommodate the hierarchical structure found in traditional archival finding aids. PSU uses Electronic Resources Management (ERM) from Innovative Interfaces Inc. Brenner, Larsen, & Weston exploited ERM‟s ability to “replicate the two-level hierarchical relationships between aggregators or publishers and the electronic and print resources”. The authors, Brenner, Larsen, & Weston, admitted that the resource records created were not as rich as those of traditional finding aids though. In the same article, the authors also mentioned about the approaches used by the Library of Congress and University of Washington. Library of Congress selectively adds records of individual items from their archival collections to the OPACs (Online Public Access Catalog). That approach allows users to have complete access to the items within the library‟s collections. However, that approach hides the hierarchical relationship between items and the collections they belongs to. University of Washington adopts a different approach; they put collection level MARC records in their OPAC and are searchable like other bibliographic records. These collection level records are then linked to finding aids with more complete description (Brenner, Larsen, & Weston, 2006). The Washington Research Library Consortium (WRLC) (http://www.wrlc.org/ ) is a consortium of eight libraries in the Washington DC metropolitan area. The WRLC member libraries host quite a variety of unique special collections: manuscripts, photographs, slides, full- text documents, magazines, comic books, audio recordings, video clips, and so on. Each special
  • 28. 28 collections is needed to be accessible from both WRLC‟s Digital and Special Collections Web site and corresponding member library‟s special collections Web site. The digital objects also need to be accessible through EAD finding aids. The public Web interface needs to be simple enough for the new inexperienced users yet powerful enough for the experienced power users. After evaluating and testing both commercial systems and open source software, Allison B. Zhang and Don Gourley of WRLC could not find any system that met all of their requirements. They finally decided to build a customized system by integrating best tools available and chose Greenstone Digital Library software as a tool for presenting the digital collections. Their customization with Greenstone involved designing and crating metadata, designing plug-in to import metadata, working with configuration files, and defining numerous macros (Zhang & Gourley, 2006). 10.Recommendations: According to the results of the fit gap analysis and the literature review, tentative recommendations can be drawn to mitigate the current and future needs of NBL&A. According to Tennant (2008), it is not the right time to choose a new library related IT system since the market is in a state of flux. It would be better to wait for a period of time before taking the leap. However the requirement for any system should be standardized formats for data description, metadata and storage. If cost is to be taken into consideration, using open source products would be the way forward. A possible set of systems that can be implemented in the future2 are:  Maintenance of the current system while modifying elements such as normalization of databases.  Implementation of Digital asset management system.  Implementation of collection management system.  Implementation of integrated library systems. 2 More information is provided in the presentation slides
  • 29. 29 11. Limitations: The initial functional requirements provided by NBL&A were incomplete and ambiguous. During the course of the project, initial assumptions were made about the metadata structure and its implications on the future system. Certain assumptions were made about the capabilities of the existing Horizon and Verity software applications. Quantifiable costs and benefits need to be calculated for current and future systems that may be considered for implementation. Literature review for this subject is limited, not every organization documents its experiences. User experiences with the system need to be recorded in the future to make better recommendations.
  • 30. 30 Appendix A: Niels Bohr Library & Archives and Center for History of Physics Digital Assessment Project We are looking for: 1) A system that allows us to store materials in an organized manner, while retaining complex relationships between some items; 2) A system that allows us to use and manipulate these digital items more creatively, with better searching capabilities, and to help us present these items online. Our system requirements are: Complexity - can handle hierarchies and relationships between digital items; store flat, single- item digital collections, in addition to collections of varying sizes that include multiple file formats (i.e. email messages with attached documents, or a folder that contains a Word document, a .PDF file and a video clip); see attached inventory for specific file types and storage needs. Searching - will allow all levels of searching, from simple keyword searches on file names to full-text searching on OCRed documents; will allow access via Google or other search engine results; will allow commenting or other user interaction. Permanence - that digital items are stored in a non-proprietary format, in a system that will be committed to long-term use at ACP or that can be easily migrated without losing formatting or hierarchies. Access - that staff members, as well as outside users and library patrons, can easily access digital items without registration or log-ins; and that if needed, certain items could be restricted or shielded from outside users. Metadata - will be able to ingest existing metadata in batches, with little or no manual re-keying Expansion - will expand to fit future needs of storage space, additional projects, complexity of hierarchies, and possible linking of projects. Standards - conforms to accepted standards in the professional archival community, on all the points listed above; standards include MARC, EAD, XML Support - will be installed, customized, maintained and otherwise supported by AIP Web Development staff. Cost - will not be prohibitive in costs, in either direct spending, necessary hardware to run the program, or staff time and resources to use and support the program.
  • 31. 31 Current projects Oral history interview project: scanned and OCRed interview transcripts; full-text searchability; use of audio files and images; ability to continue to add more interviews indefinitely Digitized archival materials: scanned manuscript collections consisting of TIFF images of text; ability to link the digital images to an existing EAD/XML interface; ability to retain the hierarchies of the items in the collection Databases: the History Center is currently assembling a database of biographical information on acclaimed physicists, then linking profiles together in multiples ways - according to professional interests, research teams, educational background, etc. Our users are: Experienced users: Library/archives patrons, historians and other researchers who are already familiar with our catalog and resources. The public interface must be familiar and consistent with our other webpages. New users: Items will be discovered through searches in our online catalog, through searches in Google and other search engines, links in web exhibits, newsletters, press releases, Facebook updates, etc. The interface must be intuitive, and easy to navigate back to the main catalog so the user can start a new search or browse other relevant materials. In-house use: content will be regularly accessed by library and archives staff. Example of different online interfaces that inspire us: The series description and box inventories of the Joseph Cornell papers at the Smithsonian Archives of American Art. If you click on a digitized folder, this collection also has a nice interface for clicking through the digital images - http://www.aaa.si.edu/collectionsonline/cornjose/series1.htm The interactive finding aid of the Aldo Leopold papers at the University of Wisconsin - Madison http://digicoll.library.wisc.edu/cgi/f/findaid/findaid- idx?c=wiarchives;cc=wiarchives;view=text;rgn=main;didno=uw-lib-leopoldpapers
  • 32. 32 Reference: AIIM. 2010, April 01. What is Ecms. Retrieved from http://www.aiim.org/What-is-ECM- Enterprise-Content-Management.aspx AIP. 1998, May. Searchable database of photo's. Retrieved from http://www.aip.org/history/newsletter/fall98/esvaweb.htm AIP a. 1999, Spring. New integrated library system. Retrieved from http://www.aip.org/history/newsletter/spr99/ils.htm AIP b. 1999, December. New online catalog for niels bohr library online. Retrieved from http://www.aip.org/history/newsletter/spring2000/nbl.htm AIP. 2000, Fall. Finding aids to major collections online. Retrieved from http://www.aip.org/history/newsletter/fall2000/findaid.htm AIP. 2002, Fall. New format: emilio segre visual archives. Retrieved from http://www.aip.org/history/newsletter/fall2002/esva.htm AIP. 2005, Spring. Enhancing web access to library catalog. Retrieved from http://www.aip.org/history/newsletter/spring2005/catalog.htm AIP. 2006, Spring. Improved online visual archives. Retrieved from http://www.aip.org/history/newsletter/spring2006/visualarchives.htm AIP. 2007, Fall. Major collection of oral history interviews mounted online. Retrieved from http://www.aip.org/history/newsletter/fall2008/oral-history.html AIP. 2008, Fall. Digitizing the samuel goudsmit papers. Retrieved from http://www.aip.org/history/newsletter/current/digitizing_goudsmit.html AIP a. 2010, April 01. About AIP, Retrieved from http://www.aip.org/aip AIP b. 2010, April 01. AIP: A federation of physical sciences, Retrieved from http://www.aip.org/aip/societies.html AIP c. 2010, April 01. Annual report 2009, Retrieved from http://www.aip.org/aip/reports.html Bak, G. Armstrong, P. 2009. Points of convergence: seamless long-term access to digital publications and archival records at library and archives Canada. Archival Science, Vol 8: p279- 293. Bearman, D. 1991. Hypermedia and interactivity in museums: Proceedings of an international conference -------------- 1992, Documenting Documentation, Archivaria, 34 Summer.
  • 33. 33 Brandeis Institutional Repository Planning Documents, 2006 - 2007. (n.d.). Retrieved April 1, 2010, from Brandeis Institutional Repository: http://dcoll.brandeis.edu/handle/10192/21866 Brenner, M., Larsen, T., & Weston, C. (2006). Digital Collection Management through the Library Catalog. INFORMATION TECHNOLOGY AND LIBRARIES , 65-77. Cohen, P. 2010. Fending off digital decay, bit by bit. New York Times, March 15 http://www.nytimes.com/2010/03/16/books/16archive.html?scp=1&sq=digital%20archives&st=c se Collections Australia. 2010, April 01. Collection management system. Retrieved from http://www.collectionsaustralia.net/sector_info_item/7 Curaconsortium. 2010, April 01. Glossary of information management terms. Retrieved from http://www.curaconsortium.co.uk/glossary.html Hodge, G. M. 2000, January. Best Practices for digital archiving. Retrieved from http://www.dlib.org/dlib/january00/01hodge.html Ischool Texas . 2010, April 01. Glossary. Retrieved from www.ischool.utexas.edu/~scisco/lis389c.5/email/gloss.html Jacobsen, J. Schlenker, T. Edwards, L. 2005. Implementing a Digital Asset Management System: For Animation, Computer Games, and Web Development. Focal Press, Burlington, MA. Joerling, K. 2010, March 19. The Truth and consequences of dod certification. Retrieved from http://www.incontextmag.com/article/The-truth-and-consequences-of-DoD-certification Kaplan, D. (2009). Choosing a Digital Asset Management System That's Right for You. Journal of Archival Organization , 33-40. Kurtz, M. (2010). Dublin Core, DSpace, and a Brief Analysis of Three University Repositories. NFORMATION TECHNOLOGY AND LIBRARIES  , 40-46. NBL a. 2010, April 01. About the Niels Bohr Library & Archives, Retrieved from http://www.aip.org/history/nbl/about.html NBL b. 2010, January 20 Niels Bohr Library & Archives and Center for History of Physics, digital assessment project, Retrieved from internal meetings Peterashbyhayter. 2010, April 01. Photographic glossary. Retrieved from http://www.peterashbyhayter.co.uk/glossaryD-E.html Wikipedia, (2010, April 01). Integrated library system. Retrieved from http://en.wikipedia.org/wiki/Integrated_library_system
  • 34. 34 Schneider, K. (2007, January 19). IT and Sympathy. Retrieved April 1, 2010, from ALA TechSource: http://www.alatechsource.org/blog/2007/01/it-and-sympathy.html Zhang, A. B., & Gourley, D. (2006). Building Digital Collections Using Greenstone Digital Library Software. Internet Reference Services Quarterly , 11 (2), 71-89.