Tahir Sandhu and Gwen Williams report on how they built a digital library on core subject-analysis principles and with Greenstone digital library software. University of Illinois at Urbana-Champaign.
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Ā
The English Emblem Books Digital Library : a Final Report (2003)
1. LIS 450DLāDigital Libraries
Professor Bruce Schatz
13 May 2003
The English Emblem Books Digital Library: A Final Report
By:
Tahir Sandhu
Gwen Williams
email the authors at: seealso@me.com
Copyright 2003 Sandhu and Williams.
This work is covered by a creative commons license
Attribution-Noncommercial-Share Alike 3.0
http://creativecommons.org/licenses/by-nc-sa/3.0/
2. The Emblem Books:
The English Emblem Books are pedagogic resources printed between
the 16th and 19th century. Each emblem book is a uniļ¬ed discourse on a
particular subject that the pedagogues of the time thought appropriate for
education of the youth. Our collection includes eight books by eight
authors. Each books reļ¬ects on various aspects of a subject, such as Desire.
The leaves of each book are emblems. What makes each leaf an
emblem is the overall structure of the leaf that has three elements: motto,
pictora, and annotation. The three elements on each page or a plate are
collectively known as an emblem.
The authors of these emblems organized them in two different ways.
One, authors put together a number of emblems that illuminated various
aspects of a subject, such as Desire. Such collections are known as the
āemblem books.ā Two, authors put together a number of emblems that each
described independently an aspect of a subject, such as Heraldry. Such
collections are known as the ābooks of emblems.ā
Our digital collection includes the two types of emblems. Each case is
captured as a digital image representing only one page from each type. We,
therefore, have loose leaves of digital pages that represent the three
elements of an emblem: motto, pictora, and annotation. Some of these
images are like an open book with pictora on the one side and motto and
annotation on the other. Some of these images are simply one long page
with the three elements on it.
A Leaf from Emblem Book A Leaf from Book of Emblems
Figure A: Two Types of Emblems
Collection Development:
LIS 450 DL required us to gather digital imagery, if we chose to
develop a digital collection of imagery and text. We visited the
Pennsylvania State University (PSU) English Emblem Books web site and
browsed through the catalog of the English Emblem Books. We also visited
2
3. the Middlebury College Minerva Project web site to obtain images from the
Minerva Britanna book of emblems.
We downloaded about 200 images from various books, along with
complete bibliographic citation for each image. PSU and Middlebury had
already saved these images as JPEG with a resolution of 105 pixels per
inch. The size of these images was about 8x7 inches. To manage these
images as cover images for each bibliographic record, we resized these
images at a high resolution (200 pixels per inch) and smaller dimensions
(about 2 inches in width).
Thereafter, we designed a ļ¬ling system for these images to ensure
that the each image has: (a) a subject assigned to it using a noun phrase,
such as āApe Never Man,ā for an emblem that reļ¬ects on āhow futile is it to
imitate someone;ā (b) an acronym to indicate the author of the book that
originally contained the emblem; (c) a code to indicate the storage area
where we put each authorās work, (d) a year to indicate the publication of
the book, and (e) a ļ¬le extension (.html) to indicate the type of ļ¬le in which
the emblem was stored. Accordingly, the name of each emblem thus
consisted of these ļ¬ve elements in the following order:
[Noun Phrase] [Author Initials] [Storage Code] [Date] .[File Type]
Ape_Never_Man_GW_01_1635.html
This ļ¬ling convention allowed us to identify a digital emblem without
having to use application software to open the ļ¬le and visually examine it.
This system also allowed us to organize the emblems according to their
āsubject.ā We chose to use the subject as the main organizing principle
because of the interests of our audience.
Audience of this collection:
We believe that our collection is a useful resource for: (a) college
students studying history or literature, or both; (b) literary analysts such
as graduate students or professional writers; and (c) amateur and
professional historians such as history buffs or faculty respectively.
Most of these groups are trained to look for (a) subjects, (b) author,
(c) title, or (d) publication date of item in the library. Therefore, we decided
to keep these four elements in the ļ¬ling system for digital emblems. For
instance, the naming convention noted above indicates that the āNoun
Phraseā captures the āsubjectā of the emblem, and at the same time the
emblemās authorial and temporal identity is given. When the audience
would browse the records in our collection, they would not be looking at
cryptic ļ¬le name that have no meaning for them. Instead, each emblem in
our collection will have a name that would indicate useful information to
the audience. For instance, the following screen shot makes it explicit:
3
4. Figure B: Record List
All records in the collection are listed alphabetically and their
āsubject,ā āauthor,ā and āpublication dateā are given in the name. It takes a
modest amount of time to learn what each name of the record signiļ¬es.
Once the audience groups learn this, browsing the records, we hope,
becomes a pleasure.
The Record: Bibliographic Description:
The ILCSO OPAC treats each emblem book or a book of emblems as a
bibliographic entity. We, however, could not do so, because our collection
had a peculiar dimension to it: instead of having to deal with individual
emblem books or books of emblems, we were to treat each digital leaf as if it
was a full bibliographic resource. It was difļ¬cult for us to copy the entire
bibliographic record from the ILCSO OPAC and assign appropriate ļ¬elds to
our digital resources, which were mere pages from a book.
We therefore designed a bibliographic record for each digital image.
We described each digital image keeping in view that our audience would
need āfull bibliographic citationā in order to use our collection for education
and research purposes. Our bibliographic record thus consisted of the
necessary elements for citation: (a) author, (b) book [title], (c), publisher
and year of publishing. These elements are depicted in the following screen
shot:
4
5. Figure C: Bibliographic Record
The pages of this record are hyperlinked to the appropriate
information in its entirety. Once you click on the page icon, the next screen
gives you the desired citation information as shown below:
Figure D: One element of the Record
5
6. This record element can be hyperlinked to three other elements: (a) āsee
also referencesā in the top right box; (b) āspecial announcementsā in the
bottom right,ā and (c) āadditional element,ā in the bottom left.
The record element also has enough room for future inclusion of
additional information about the book in the top left (main box). We
anticipate that such bibliographic elements would be utilized for a
āconceptual linkingā among various resources on the web.
Subject Description: Facet Analysis:
We analyzed the subject of our resources (digital images) along three
dimensions: (a) structural components of an emblem, (b) audience needs,
and (c) inadequacy of LCSH.
As noted above, our digital images consisted of motto, pictora, and
annotations (poems in most cases). These elements collectively expressed
the subject of each digital image. These elements, in the main, are
inseparable as far as the subject of an emblem is concerned. Furthermore,
these elements need to be visually together insofar as the integrity of the
original resource is concerned, that is, we could not separate the motto, or
pictora from the annotation. They would have to be kept as one image
representing the original. We, however, described the three elements with
natural language expressions along three axes representing: (a) motto
with a noun phrase that expressed the subjects; (b) pictora with noun
descriptors that summarized the graphical elements in the pictora; and (c)
annotation with noun descriptors that summarized the subject of the
poem.
The tripartite description of the subject allowed us to design
additional access points (motto, pictora, and annotation) in case the
audience were interested in discovering a subset of emblems in our
collection with common characteristics, such as (a) mottos expressing a
certain subjectsādesire, sin, death, etc.; (b) pictora representing certain
graphical symbolsāangels, mythical ļ¬gures, instruments of torture,
landscape, etc.; and (c) annotations reļ¬ecting common subjectsālove,
friendship, Christian life, pastoral governance, etc.
While assigning the three-part subject description to each digital
emblem, we were aware of the two difļ¬culties we faced. One, our collection
of digital emblems is a sort of āinfra-collectionā in that our bibliographic
records for each digital image did not qualify to be legitimate bibliographic
entities. Therefore, we could not assign the legal LCSH to our records and
resources. So we could not assign the same subjects to our digital entities
that the ILCSO OPAC assigned to each bibliographic record, for the ILCSO
OPAC assigned the subjects to each āemblem bookā or ābooks of emblems.ā
We were not cataloging the books: we were cataloging āleaves in the books.ā
Two, if we assigned the LCSH to our digital emblems, we would burden the
audience to browse through a long list of records that a query on ālove in
art, death in art, religion, and Christian life (all LCSH)ā would yield.
6
7. As shown below, the LCSH would not yield any records from our
collection because our collection existed below the bibliographic level of the
LSCH.
Figure E: LCSH String
We thus decided that each subject string that ends with a legal LCSH,
should further be organized along its āsubject facets.ā For instance, we
organized the subject, āChristian lifeā into facets that we believe provide a
common characteristic to assign a subject heading to our digital emblems.
7
8. Figure F: LCSH and a list of Facets
The subject string in this browsing hierarchy is thus a legal LCSH up to the
fourth levelāChristian life. Thereafter, the subject is dispersed into its
essential facets that recur in multiple emblems. All emblems having the
facet, āobedienceā are thus collocated as a subset in this browsing
hierarchy. The audience interested in āobedienceā can thus access these
emblems directly.
Likewise, if the audience is interested in emblems that pontiļ¬cate on
āGod as a belovedā can browse through the hierarchical list until the
desired item is encountered (see below). Note that if we had not done a
āfacet analysisā to assign subject heading below the LCSH, the emblem on
āGodā and āGod as beloved,ā would fail to appear as a hyperlink on the
screen. Thus, our audience who have little knowledge of the LCSH would
not be able to ļ¬nd items.
8
9. Figure G: LCSH, Sub-divisions, and a List of Facets
It is primarily with this reason that we did āfacet analysisā for each
digital emblem to determine (a) its class identity within the legal LCSH
headings, such as represented iconically with bookshelves in the screen
shots above and below; and (b) a descriptor that represents the facets that
most audience members are likely to use as a ābrowsing / search string.ā
The screen shot below illustrates this:
Figure H: LCSH and a Facet Leaf
9
10. We hope that our facet analysis would serve the search and browsing
requirements for two types in our audience: faculty and professional
writers on the one hand, and student and amateur historians on the other.
For instance, faculty and professional writers are likely to search (browse)
for items in a digital library that collocates items according to the LCSH,
such as the strings represented with bookshelves above. The students and
amateur historians are likely to search (browse) items with ākeywordsā
that have a certain āsubject context,ā such as ābeloved,ā whose subject
context in the above string is as follows: God as a subject in Love in Art as
expressed in the Poetry that one ļ¬nds in the English Emblems of 16th to
19th century.
We believe that such subject context is absolutely necessary for any
audience trained to ļ¬nd an itemās relevancy for their needs not because the
item contains a ākeyword,ā but because the item is contextualized. We have
tried to show that such contextualization can be determined by a careful
āsubject/facet analysis.ā
In the examples above, we show that any item that contains the
ākeyword search query,ā āpassionate love,ā would retrieve all resources
from a collection that contains the text strings: passionate love. The
audience who is interested in the āsubject of passionate love,ā in the
context of the subject string, would likely be disappointed to see items that
discuss āpassionate loveā from the standpoint of contemporary studies of
passionate love as a āpsychosomatic or biological/hormonal/chemicalā
phenomenon. To avoid such random collocation of items, we argue that the
subject context of an item should be determined on three levels:
(A): at the level of a discipline (Literature, 16th to 19th century),
which is a basic class (BC) in our subject string, as well as the āsub-
divisionsā of a BC, which are the intermediary strings between the BC and
the digital emblems. The class-identity holds together all the sub-divisions
at the lower levels. The BC descriptor thus controls the collocation of sub-
divisions under a legitimate LCSH for a discipline, such as literary studies.
All sub-divisions are subject to be assigned a new disciplinary identity by
changing only the BC description into a legitimate LCSH for another
discipline, such as history. For instance, the BC in our example can be re-
designated as āHistory, 16th to 19th century.ā Therefore, all sub-divisions in
our examples can be used as browsing (search) strings for students of
history as well. Therefore, without disrupting the underlying subjects in a
LCSH, by changing the BC for one discipline into another disciplineās
legitimate LCSH, the same digital resources can be repurposed for multiple
audiences in different disciplines. Subject analysis at the discipline level is,
we hope, a conceptual tool for building digital collections that can be re-
appropriated in a federated search arrangement. (Greenstone has that
power).
(B): at the level of discourse in a discipline, which are the facets in
our subject string. āObedience,ā āDevotional Literature,ā āLife of Christ,ā
Chastity,ā etc. are the various discursive topics, or discourses, pertaining
10
11. to Christian Life in literature that the discipline of literary studies has
constituted over the years. These discourses allow reļ¬ning and re-ļ¬ltering
of digital resources that are attached to a BC string. Each resource
becomes attached to a subject context that the audience will decide as
relevant or irrelevant. Furthermore, subject assignment at the discourses
level will allow comparing and contrasting such a list of facets before a
complete cross-disciplinary switching is made. For instance, the facet
descriptors for literary studies, when switched at the BC level with a string
in history, would facilitate a consolidated list of facets for the new display,
which can be arranged on the ļ¬y.
(C): at the level of the surface of the resource, which is the
ākeywordā that may or may not appear in the actual resources such as the
digital emblem. These surface-level noun phrases would serve a double
purpose: (a) keyword in context, that is each phrase would have the
subject context attached to it, as noted above; and (b) random access to
that phrase, provided such phrase appeared in the actual resource, as part
of the full-text searches.
We have greatly beneļ¬ted from the teachings of Professor Pauline
Cochrane and the works of Indian classiļ¬cationist, Mr. S. R. Ranganathan
to propose that the subject description for a digital resource, such as the
digital emblem, should not be assigned only at the ākeyword-level.ā It
should be assigned at three levels: discipline, discourse, and surface. Such
subject description (facet analysis), we believe, will allow the full
integration of automated indexing and list preparing tools (Greenstone) in
subject cataloging. Therefore, the āsynthetico-analyticalā work of the
human indexer will render the machine as the most reliable tool to retrieve
only that which is fully desired by the audience.
11
12. Implementation in Greenstone:
A complete list of bibliographic records in our Greenstone homepage
is given in Figure B (see above). The book icons yield to the following
screen in Figure I that contains two ļ¬les: the graphic image on the left, and
the html record on the right.
The image is associated as a ācover imageā with the html ļ¬le that
contains the bibliographic data for that image in an html ļ¬le.
We made full use of the Greenstone plug-ins to avoid manual linking
of html and graphic ļ¬les. We also avoided unnecessary encoding to
construct indexes and browsing hierarchies. In short, we made full use of
the Greenstone automation that allowed us to create templates for html
ļ¬les and then invoked special routines in the collection conļ¬guration ļ¬le,
ācollect.cfg,ā to allow Greenstone to put together a list of the searched
items on the ļ¬y. Greenstone also established the hyperlinks from the
leaves of each record to the resource sections where we stored that
information.
Figure I: Emblem and the Record
12
13. Encoding and Metadata:
After we carefully analyzed each emblem we created the
bibliographic record as shown above in Figure I. The record has 7
elements: āBibliographic Record, āMotto,ā āAuthor,ā āBookā, āPublisher,ā
āSubjects,ā and āPictora.ā All seven elements are placed in the ābodyā of an
āhtmlā ļ¬le, which divides the html ļ¬le into: a head with only the ātitleā of
the html ļ¬le in it; and a body with seven ālayout tablesā in it. In each table,
we placed each element of the record. Each table contains: (a) title, and (b)
section. Title of the section is what appears as text on the right side of the
leaves in Figure I. The section is what appears as a note card in Figure D. If
the button āExpand Textā in Figure I is clicked all āsectionsā within the
html ļ¬le are displayed in the same sequence that they appear on the right
side of the Figure I, which is a ātable of contentā for the entire bibliographic
record.
The conceptual anatomy of the html ļ¬le and the layout tables is
represented as follows:
Figure J: Conceptual Anatomy of the Record
We wrote no html codes for this record layout. We used
Dreamweaver, an object-oriented html editor. We dragged the draw tool
onto a blank html ļ¬le and drew a layout table. We repeated the operations
to get seven layout tables.
13
14. In each table we inserted two cells, which Dreamwaver placed into
separated rows. We picked the color for each layout table from the color
palate and applied it to the table.
We inserted the title of the html ļ¬le in the ātitleā window of the
Dreamweaver design and layout window.
We then split the Dreamweaver design and layout window to see the
āhtml codesā and the ālayout elementsā simultaneously.
We clicked on one layout box to see its start and end codes in the
code window. Then we enclosed the layout table with html section tags:
<Section>
layout table
</Section>
Then we manually inserted the description tags directly before the
beginning of the layout table:
<Section>
<Description>
</Description>
layout table
</Section>
Within the description tags, we manually inserted metadata tags:
<Section>
<Description>
<Metadata name =āTitleā mode =āaccumulateā></Metadata>
</Description>
layout table
</Section>
Then we enclosed the start tag <Section> and the end tag </
Description> within a comment box. We also enclosed the end tag </
Section> within a comment box. This enclosing made the section,
description, and metadata tags invisible in the browser. But the content
and the color of the layout table remained visible in the browser.
The manual and the software generated html encoding for one layout
table looked like the following:
<!āā
<Section>
<Description>
<Metadata name =āTitleā mode =āaccumulateā></Metadata>
</Description>
āā>
layout table
<!āā
</Section>
āā>
14
15. We copied and pasted the Dreamweaver and the manual tags in the
html ļ¬le six times over. Now we have our master template for the
bibliographic record.
This template is essentially a series of sections in an html ļ¬le. Each
section has a header and a body, and within each section header we
encoded the metadata for that section. The metadata tags for the section
on author looks like the following:
<!āā
<Section>
<Description>
<Metadata name =āAuthorā mode =āaccumulateā></Metadata>
<Metadata name =āTitleā mode =āaccumulateā></Metadata>
</Description>
āā>
Greenstone builds an index on the ļ¬rst pair for the metadata tags
only when mode =āaccumulateā is speciļ¬ed. Greenstone takes the second
pair of the metadata tags and uses that as a ātitleā for that section to
display along the leaves. See Figure I above.
We, in essence, divided the html ļ¬le into six sections. Each section
having a small index hook attached to it in the form of a pair of metadata
tags.
When we had prepared a master copy of a record for one author, we
saved it as a template for that author. Then we manually typed the
bibliographic information about each digital emblem in the index hook and
the body of each section. We saved each html ļ¬le as a bibliographic record
with the naming convention described above.
Greenstone Framework, Importing, and Building a Collection:
Once we prepared all our bibliographic records and stored them in
respective folders, we were ready to build the collection using Greenstoneās
automated plug-ins, utilities, and special procedures.
We proceeded as follows:
First, we setup the Greenstone framework as explained in Witten
and Bainbridgeās How to Build a Digital Library, chapter 6, pages 302-319.
We followed the instructions for building collections manually, that is, we
setup the Greenstone framework in the command line mode.
We elected to build through the command line mode rather than
through the browser for a variety of reasons. Firstly, we experimented
with building through the collector and discovered that the collection built
through the browser generated ļ¬ve directories, whereas the collection
built through the command line generated seven: we believed this to be a
crucial distinction and we still believe so. Secondly, building through the
command line mode offered us many advantages, including increased
15
16. ļ¬exibility in connecting with and navigating directories on the classroom
server; increased visibility during the importing and building processes in
that we could immediately spot error messages (and the successes that
scrolled machine characters past our eyes as the HASH directories and
indexes were built in a symmetry pattern not that dissimilar from
streaming fractals); and increased knowledge of Greenstoneās building
procedures. Still another advantage of building through the command line
was access to and control over optional switches important for full
utilization of Greenstoneās importing and building processes. For example,
using the command line mode for importing enabled us to add our emblem
htmls and JPEGs in batches. We were able to build and re-build existing
collections without generating duplicate Greenstone archive records: this
was achieved by invoking the optional switch ā-removeoldā at the import
process (see Witten and Bainbridge, pages 315-316). The Witten and
Bainbridge text contains a list of other such features available for the
command line mode import and build processes.
Once you set up the Greenstone framework for building a collection
(a step that includes supplying the collection name, in our case,
āDEmblemsā), the software sets up seven directories to store: (a) source
ļ¬les, (b) ļ¬les in the Greenstone Archival Format, which is an XML format
that Greenstone utilized to build web pages for the display, and indexes, (d)
images, (e) collection logo, (f) plain text ļ¬les such as hļ¬les, (g) perl scripts,
(h) the collection conļ¬guration ļ¬le, collect.cfg, and (i) various
automatically generated directories and ļ¬les, such as the hash directory
structure, the associated ļ¬les directory, the fail.log, and the collection
information database. The seven directories are: āimport,ā āarchives,ā
ābuilding,ā āindex,ā āetc,ā āimages,ā and āperllib.ā
Before executing the import perl script to automatically convert the
source ļ¬les into the Greenstone Archival Format (GAF), we speciļ¬ed the
following routines in the collect.cfg ļ¬le. (See Appendix A: DEmblems
Conļ¬guration File, collect.cfg).
a. We speciļ¬ed the following indexes for Greenstone to look for
the metadata elements speciļ¬ed in each section:
i. Motto
ii. Author
iii. Book
iv. Publisher
v. Subject
vi. Pictora
b. We invoked a special routine ā-description_tagsā for the html
plug-in to accumulate description tags for searching and
indexing.
c. We also invoked a special routine ā-cover_imageā for the html
plug-in to associate each JPEG with the corresponding html
bibliographic record. Greenstone associates the two ļ¬les based
16
17. upon the naming convention that should give the JEPG the
same preļ¬x that the html ļ¬les has. For instance, RECORD1.jpg
will be associated with RECORD1.html. Note, after all
collection building and indexing steps are completed, the
associate ļ¬les will be stored in the index directoryās sub-
directory called, āassoc.ā
d. We speciļ¬ed the four alphabetical vertical browsing lists that
Greenstone will use to build browsing on the metadata
elements, āSource,ā āMotto,ā āSubject,ā and āPictora.ā The
āSourceā list was built at the document level and the other
three lists were built at the section level of the document
(recall that the metadata elements, āMotto,ā āSubject,ā and
āPictoraā were speciļ¬ed within sections of html documents).
Specifying the four alphabetical vertical browsing lists as we
did invoked default display features of Greenstone such as the
alphabetical buckets (A-B, C, D-F, etc.) and page advancing
arrows (icon + āMatches 11-20ā) that respectively appear at
the top and bottom of search or browse query results.
e. We speciļ¬ed the h-ļ¬les that Greenstone will use to build the
two browsing hierarchies, the subject classiļ¬cation as
explicated in the above report section, āSubject Description:
Facet Analysis,ā and the Biblical motto classiļ¬cation ordered
on the hierarchical structure of the canon itself. (See
Appendix B: hļ¬le for DEmblems, sub.txt).
f. We speciļ¬ed the āsortā operations on the two browsing
hierarchies. We did not specify the āsortā operations for the
vertical browsing lists, relying instead on the default sort,
which corresponds with the metadata element particular to
each list (eg, the AZSectionList constructed on the metadata
element āPictoraā will be sorted by āPictoraā).
g. We speciļ¬ed the ābuttonnameā for each of the browsing lists.
As Figure I shows, three buttonnames display on the
navigational bar in the characteristic Greenstone colors and
fonts and four buttonnames appear as linked text. The
āsearch,ā āļ¬lenames,ā and āsubjectsā buttons display as they
do because these buttonname icons came with the downloaded
Greenstone software. Our linked text buttons, āMottoAtoZ,ā
āBiblicalMotto,ā āSubjectsAtoZ,ā and āPictoraAtoZ,ā are chosen
names speciļ¬c to our collection: as such, the macro ļ¬les did
not contain such icons. We toyed with the idea of substituting
a pre-made buttonname for uniform-display purposes. For
example, we could have easily speciļ¬ed that the āMottoAtoZā
display as the pre-made button, āphrases.ā However, we still
would have been left with ļ¬nding pre-made buttons for
āBiblicalMottoā and for āPictoraAtoZ,ā not to mention ļ¬nding a
second subject-related button that differentiated the
17
18. alphabetical vertical list of subjects from the classiļ¬ed
hierarchy. Moreover, the term āphrasesā is not synonymous
with the term āMottoAtoZ.ā We decided that clarity of button-
naming took priority over uniform-display of the buttons.
h. We speciļ¬ed a formatting string for each leaf in a particular
browsing list to be hyperlinked with the source html document
in that list.
i. We speciļ¬ed through a formatting string the navigational
buttons, āExpand Text,ā āDetach, and āHighlight.ā These
buttons appear at the lower left corner of the screen when the
full record and the image are displayed side by side.
j. We speciļ¬ed a formatting string for each section title to
display as a heading above its particular section layout table.
When the āExpand Textā button is selected, all headings are
displayed.
k. We invoked the collection icon feature of Greenstone by
specifying the path to our Fireworks designed logo, placed in
the DEmblems images directory.
l. We speciļ¬ed the searchable ļ¬eld names for the pull down-
menu. The searchable ļ¬elds correspond to the metadata
elements and indexes speciļ¬ed above (see a.).
m. We wrote a succinct description of the collection in the
ācollectionextraā line, indicating the LIS class it was
constructed for as well as the builders of the collection. The
collectionextra line functions as part of a splash-page for the
entire collection. Insofar as additional information for the
splash-page, Greenstone automatically generates statements
for the search and browsing features speciļ¬ed. Any additional
revision to the splash-page would entail manual revision of the
macros, a step we elected not to pursue at this time.
Once we ļ¬nalized the collect.cfg ļ¬le, we executed the āimportā perl script
and the ābuildcolā perl script.
Greenstone parsed all source ļ¬les and built a hash directory
structure for storing information about the GAF. The GAF and the hash
directory structure ensure that the human administrator knows the path
to source documents and the associated metadata that Greenstone stores
in XML in the GAF (see Appendix C: Greenstone Archival Format for
Adversity_Misery_HH_02_1686.html, an example of doc.xml).
Furthermore, the hash directory structure and the GAF ensure that the
software is able to build web pages on the ļ¬y once a search query is
executed and the desired document is clicked-on.
We executed the ļ¬nal step by moving the building directory contents
into the index directory. The English Emblem Book Digital Library was
thus complete and available for the audience.
18
19. Possibilities for the English Emblem Books Digital Library:
There are a few select aspects of the collection that had we had more
time to work on the project we would have implemented. We have already
invested time in understanding and investigating possible solutions for
each of the following. As it ended up, the semester-hourglass beat us.
Thumbnails. Greenstone has automated procedures that generate
thumbnails for images imported. Provided the appropriate plug-in is
invoked through the conļ¬guration ļ¬le, an image and its corresponding
thumbnail are automatically associated by Greenstone; a GAF ļ¬le records
the association; and the hash directory structure stores each. We
investigated thumbnails because we initially wanted (a) to not re-size the
JPEGs collected from PSU and Middlebury; and (b) to specify to
Greenstone that the thumbnail should stand as a substitute for the ācover
image.ā Essentially we wanted to specify to Greenstone, ādisplay the
thumbnail associated with this html in the cover image spot and make the
thumbnail link to the appropriate image.ā
Revisions to the macros. Customization of colors, fonts, and
navigational features are certainly possible through working with the
macro ļ¬les. We have already mentioned two areas where macro-work
could have been possible to enhance our already visually appealing
collection: creating collection-speciļ¬c buttons for the navigational bar and
designing a different splash page. We were also interested in revising the
macros in such a way as to make Boolean searching across metadata
elements possible. For example, the current collection allows a search for
the subject, āObedience.ā The displayed results will show numerous leaves
from various books by various authors. We would like to have enabled a
Boolean search for the subject, āObedienceā AND the author, āHugo.ā
Incorporation of Strictly Textual Leaves. The emblem books and
books of emblems collected all had digitized images of pages that were not
emblems proper, that is, PSU and Middlebury had also digitized the various
prefaces, tables of contents, dissertations, dedications, and exhortations
associated with each book. Such leaves are strictly textual matter and, as
such, were beyond the initial scope of the project, focusing as we did on the
emblem proper. But for our audience these leaves are absolutely crucial for
studying the resources. It is, for example, obvious in the āprefaceā by the
English translator of Hugoās Pia Desidera that this emblem book, and this
translation in particular, is intended to school 17th century English women
and children in a decidedly Christianized morality; and was censored by
the translator in order to cleanse the work of the āshamefulā and
āridiculousā follies attributed to monks and Jesuits in the original book:
such historical and literary discourses are of paramount concern for our
audience and their respective disciplines.
Of the three possibilities described above, incorporation of strictly
textual leaves would seem the most important if we desired to take this
19
20. collection beyond the LIS450DL classroom and into the classrooms and
desktops of our identiļ¬ed audience.
Bibliography:
Witten, Ian H., and David Bainbridge. How to Build a Digital Library. The
Morgan Kaufmann series in multimedia information and systems.
Amsterdam [u.a.]: Morgan Kaufmann, 2003.
20