Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The English Emblem Books Digital Library : a Final Report (2003)


Published on

Tahir Sandhu and Gwen Williams report on how they built a digital library on core subject-analysis principles and with Greenstone digital library software. University of Illinois at Urbana-Champaign.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

The English Emblem Books Digital Library : a Final Report (2003)

  1. 1. LIS 450DL—Digital Libraries Professor Bruce Schatz 13 May 2003 The English Emblem Books Digital Library: A Final Report By: Tahir Sandhu Gwen Williams email the authors at: Copyright 2003 Sandhu and Williams. This work is covered by a creative commons license Attribution-Noncommercial-Share Alike 3.0
  2. 2. The Emblem Books: The English Emblem Books are pedagogic resources printed between the 16th and 19th century. Each emblem book is a unified discourse on a particular subject that the pedagogues of the time thought appropriate for education of the youth. Our collection includes eight books by eight authors. Each books reflects on various aspects of a subject, such as Desire. The leaves of each book are emblems. What makes each leaf an emblem is the overall structure of the leaf that has three elements: motto, pictora, and annotation. The three elements on each page or a plate are collectively known as an emblem. The authors of these emblems organized them in two different ways. One, authors put together a number of emblems that illuminated various aspects of a subject, such as Desire. Such collections are known as the “emblem books.” Two, authors put together a number of emblems that each described independently an aspect of a subject, such as Heraldry. Such collections are known as the “books of emblems.” Our digital collection includes the two types of emblems. Each case is captured as a digital image representing only one page from each type. We, therefore, have loose leaves of digital pages that represent the three elements of an emblem: motto, pictora, and annotation. Some of these images are like an open book with pictora on the one side and motto and annotation on the other. Some of these images are simply one long page with the three elements on it. A Leaf from Emblem Book A Leaf from Book of Emblems Figure A: Two Types of Emblems Collection Development: LIS 450 DL required us to gather digital imagery, if we chose to develop a digital collection of imagery and text. We visited the Pennsylvania State University (PSU) English Emblem Books web site and browsed through the catalog of the English Emblem Books. We also visited 2
  3. 3. the Middlebury College Minerva Project web site to obtain images from the Minerva Britanna book of emblems. We downloaded about 200 images from various books, along with complete bibliographic citation for each image. PSU and Middlebury had already saved these images as JPEG with a resolution of 105 pixels per inch. The size of these images was about 8x7 inches. To manage these images as cover images for each bibliographic record, we resized these images at a high resolution (200 pixels per inch) and smaller dimensions (about 2 inches in width). Thereafter, we designed a filing system for these images to ensure that the each image has: (a) a subject assigned to it using a noun phrase, such as “Ape Never Man,” for an emblem that reflects on “how futile is it to imitate someone;” (b) an acronym to indicate the author of the book that originally contained the emblem; (c) a code to indicate the storage area where we put each author’s work, (d) a year to indicate the publication of the book, and (e) a file extension (.html) to indicate the type of file in which the emblem was stored. Accordingly, the name of each emblem thus consisted of these five elements in the following order: [Noun Phrase] [Author Initials] [Storage Code] [Date] .[File Type] Ape_Never_Man_GW_01_1635.html This filing convention allowed us to identify a digital emblem without having to use application software to open the file and visually examine it. This system also allowed us to organize the emblems according to their “subject.” We chose to use the subject as the main organizing principle because of the interests of our audience. Audience of this collection: We believe that our collection is a useful resource for: (a) college students studying history or literature, or both; (b) literary analysts such as graduate students or professional writers; and (c) amateur and professional historians such as history buffs or faculty respectively. Most of these groups are trained to look for (a) subjects, (b) author, (c) title, or (d) publication date of item in the library. Therefore, we decided to keep these four elements in the filing system for digital emblems. For instance, the naming convention noted above indicates that the “Noun Phrase” captures the “subject” of the emblem, and at the same time the emblem’s authorial and temporal identity is given. When the audience would browse the records in our collection, they would not be looking at cryptic file name that have no meaning for them. Instead, each emblem in our collection will have a name that would indicate useful information to the audience. For instance, the following screen shot makes it explicit: 3
  4. 4. Figure B: Record List All records in the collection are listed alphabetically and their “subject,” “author,” and “publication date” are given in the name. It takes a modest amount of time to learn what each name of the record signifies. Once the audience groups learn this, browsing the records, we hope, becomes a pleasure. The Record: Bibliographic Description: The ILCSO OPAC treats each emblem book or a book of emblems as a bibliographic entity. We, however, could not do so, because our collection had a peculiar dimension to it: instead of having to deal with individual emblem books or books of emblems, we were to treat each digital leaf as if it was a full bibliographic resource. It was difficult for us to copy the entire bibliographic record from the ILCSO OPAC and assign appropriate fields to our digital resources, which were mere pages from a book. We therefore designed a bibliographic record for each digital image. We described each digital image keeping in view that our audience would need “full bibliographic citation” in order to use our collection for education and research purposes. Our bibliographic record thus consisted of the necessary elements for citation: (a) author, (b) book [title], (c), publisher and year of publishing. These elements are depicted in the following screen shot: 4
  5. 5. Figure C: Bibliographic Record The pages of this record are hyperlinked to the appropriate information in its entirety. Once you click on the page icon, the next screen gives you the desired citation information as shown below: Figure D: One element of the Record 5
  6. 6. This record element can be hyperlinked to three other elements: (a) “see also references” in the top right box; (b) “special announcements” in the bottom right,” and (c) “additional element,” in the bottom left. The record element also has enough room for future inclusion of additional information about the book in the top left (main box). We anticipate that such bibliographic elements would be utilized for a “conceptual linking” among various resources on the web. Subject Description: Facet Analysis: We analyzed the subject of our resources (digital images) along three dimensions: (a) structural components of an emblem, (b) audience needs, and (c) inadequacy of LCSH. As noted above, our digital images consisted of motto, pictora, and annotations (poems in most cases). These elements collectively expressed the subject of each digital image. These elements, in the main, are inseparable as far as the subject of an emblem is concerned. Furthermore, these elements need to be visually together insofar as the integrity of the original resource is concerned, that is, we could not separate the motto, or pictora from the annotation. They would have to be kept as one image representing the original. We, however, described the three elements with natural language expressions along three axes representing: (a) motto with a noun phrase that expressed the subjects; (b) pictora with noun descriptors that summarized the graphical elements in the pictora; and (c) annotation with noun descriptors that summarized the subject of the poem. The tripartite description of the subject allowed us to design additional access points (motto, pictora, and annotation) in case the audience were interested in discovering a subset of emblems in our collection with common characteristics, such as (a) mottos expressing a certain subjects—desire, sin, death, etc.; (b) pictora representing certain graphical symbols—angels, mythical figures, instruments of torture, landscape, etc.; and (c) annotations reflecting common subjects—love, friendship, Christian life, pastoral governance, etc. While assigning the three-part subject description to each digital emblem, we were aware of the two difficulties we faced. One, our collection of digital emblems is a sort of “infra-collection” in that our bibliographic records for each digital image did not qualify to be legitimate bibliographic entities. Therefore, we could not assign the legal LCSH to our records and resources. So we could not assign the same subjects to our digital entities that the ILCSO OPAC assigned to each bibliographic record, for the ILCSO OPAC assigned the subjects to each “emblem book” or “books of emblems.” We were not cataloging the books: we were cataloging “leaves in the books.” Two, if we assigned the LCSH to our digital emblems, we would burden the audience to browse through a long list of records that a query on “love in art, death in art, religion, and Christian life (all LCSH)” would yield. 6
  7. 7. As shown below, the LCSH would not yield any records from our collection because our collection existed below the bibliographic level of the LSCH. Figure E: LCSH String We thus decided that each subject string that ends with a legal LCSH, should further be organized along its “subject facets.” For instance, we organized the subject, “Christian life” into facets that we believe provide a common characteristic to assign a subject heading to our digital emblems. 7
  8. 8. Figure F: LCSH and a list of Facets The subject string in this browsing hierarchy is thus a legal LCSH up to the fourth level—Christian life. Thereafter, the subject is dispersed into its essential facets that recur in multiple emblems. All emblems having the facet, “obedience” are thus collocated as a subset in this browsing hierarchy. The audience interested in “obedience” can thus access these emblems directly. Likewise, if the audience is interested in emblems that pontificate on “God as a beloved” can browse through the hierarchical list until the desired item is encountered (see below). Note that if we had not done a “facet analysis” to assign subject heading below the LCSH, the emblem on “God” and “God as beloved,” would fail to appear as a hyperlink on the screen. Thus, our audience who have little knowledge of the LCSH would not be able to find items. 8
  9. 9. Figure G: LCSH, Sub-divisions, and a List of Facets It is primarily with this reason that we did “facet analysis” for each digital emblem to determine (a) its class identity within the legal LCSH headings, such as represented iconically with bookshelves in the screen shots above and below; and (b) a descriptor that represents the facets that most audience members are likely to use as a “browsing / search string.” The screen shot below illustrates this: Figure H: LCSH and a Facet Leaf 9
  10. 10. We hope that our facet analysis would serve the search and browsing requirements for two types in our audience: faculty and professional writers on the one hand, and student and amateur historians on the other. For instance, faculty and professional writers are likely to search (browse) for items in a digital library that collocates items according to the LCSH, such as the strings represented with bookshelves above. The students and amateur historians are likely to search (browse) items with “keywords” that have a certain “subject context,” such as “beloved,” whose subject context in the above string is as follows: God as a subject in Love in Art as expressed in the Poetry that one finds in the English Emblems of 16th to 19th century. We believe that such subject context is absolutely necessary for any audience trained to find an item’s relevancy for their needs not because the item contains a “keyword,” but because the item is contextualized. We have tried to show that such contextualization can be determined by a careful “subject/facet analysis.” In the examples above, we show that any item that contains the “keyword search query,” “passionate love,” would retrieve all resources from a collection that contains the text strings: passionate love. The audience who is interested in the “subject of passionate love,” in the context of the subject string, would likely be disappointed to see items that discuss “passionate love” from the standpoint of contemporary studies of passionate love as a “psychosomatic or biological/hormonal/chemical” phenomenon. To avoid such random collocation of items, we argue that the subject context of an item should be determined on three levels: (A): at the level of a discipline (Literature, 16th to 19th century), which is a basic class (BC) in our subject string, as well as the “sub- divisions” of a BC, which are the intermediary strings between the BC and the digital emblems. The class-identity holds together all the sub-divisions at the lower levels. The BC descriptor thus controls the collocation of sub- divisions under a legitimate LCSH for a discipline, such as literary studies. All sub-divisions are subject to be assigned a new disciplinary identity by changing only the BC description into a legitimate LCSH for another discipline, such as history. For instance, the BC in our example can be re- designated as “History, 16th to 19th century.” Therefore, all sub-divisions in our examples can be used as browsing (search) strings for students of history as well. Therefore, without disrupting the underlying subjects in a LCSH, by changing the BC for one discipline into another discipline’s legitimate LCSH, the same digital resources can be repurposed for multiple audiences in different disciplines. Subject analysis at the discipline level is, we hope, a conceptual tool for building digital collections that can be re- appropriated in a federated search arrangement. (Greenstone has that power). (B): at the level of discourse in a discipline, which are the facets in our subject string. “Obedience,” “Devotional Literature,” “Life of Christ,” Chastity,” etc. are the various discursive topics, or discourses, pertaining 10
  11. 11. to Christian Life in literature that the discipline of literary studies has constituted over the years. These discourses allow refining and re-filtering of digital resources that are attached to a BC string. Each resource becomes attached to a subject context that the audience will decide as relevant or irrelevant. Furthermore, subject assignment at the discourses level will allow comparing and contrasting such a list of facets before a complete cross-disciplinary switching is made. For instance, the facet descriptors for literary studies, when switched at the BC level with a string in history, would facilitate a consolidated list of facets for the new display, which can be arranged on the fly. (C): at the level of the surface of the resource, which is the “keyword” that may or may not appear in the actual resources such as the digital emblem. These surface-level noun phrases would serve a double purpose: (a) keyword in context, that is each phrase would have the subject context attached to it, as noted above; and (b) random access to that phrase, provided such phrase appeared in the actual resource, as part of the full-text searches. We have greatly benefited from the teachings of Professor Pauline Cochrane and the works of Indian classificationist, Mr. S. R. Ranganathan to propose that the subject description for a digital resource, such as the digital emblem, should not be assigned only at the “keyword-level.” It should be assigned at three levels: discipline, discourse, and surface. Such subject description (facet analysis), we believe, will allow the full integration of automated indexing and list preparing tools (Greenstone) in subject cataloging. Therefore, the “synthetico-analytical” work of the human indexer will render the machine as the most reliable tool to retrieve only that which is fully desired by the audience. 11
  12. 12. Implementation in Greenstone: A complete list of bibliographic records in our Greenstone homepage is given in Figure B (see above). The book icons yield to the following screen in Figure I that contains two files: the graphic image on the left, and the html record on the right. The image is associated as a “cover image” with the html file that contains the bibliographic data for that image in an html file. We made full use of the Greenstone plug-ins to avoid manual linking of html and graphic files. We also avoided unnecessary encoding to construct indexes and browsing hierarchies. In short, we made full use of the Greenstone automation that allowed us to create templates for html files and then invoked special routines in the collection configuration file, “collect.cfg,” to allow Greenstone to put together a list of the searched items on the fly. Greenstone also established the hyperlinks from the leaves of each record to the resource sections where we stored that information. Figure I: Emblem and the Record 12
  13. 13. Encoding and Metadata: After we carefully analyzed each emblem we created the bibliographic record as shown above in Figure I. The record has 7 elements: “Bibliographic Record, “Motto,” “Author,” “Book”, “Publisher,” “Subjects,” and “Pictora.” All seven elements are placed in the “body” of an “html” file, which divides the html file into: a head with only the “title” of the html file in it; and a body with seven “layout tables” in it. In each table, we placed each element of the record. Each table contains: (a) title, and (b) section. Title of the section is what appears as text on the right side of the leaves in Figure I. The section is what appears as a note card in Figure D. If the button “Expand Text” in Figure I is clicked all “sections” within the html file are displayed in the same sequence that they appear on the right side of the Figure I, which is a “table of content” for the entire bibliographic record. The conceptual anatomy of the html file and the layout tables is represented as follows: Figure J: Conceptual Anatomy of the Record We wrote no html codes for this record layout. We used Dreamweaver, an object-oriented html editor. We dragged the draw tool onto a blank html file and drew a layout table. We repeated the operations to get seven layout tables. 13
  14. 14. In each table we inserted two cells, which Dreamwaver placed into separated rows. We picked the color for each layout table from the color palate and applied it to the table. We inserted the title of the html file in the “title” window of the Dreamweaver design and layout window. We then split the Dreamweaver design and layout window to see the “html codes” and the “layout elements” simultaneously. We clicked on one layout box to see its start and end codes in the code window. Then we enclosed the layout table with html section tags: <Section> layout table </Section> Then we manually inserted the description tags directly before the beginning of the layout table: <Section> <Description> </Description> layout table </Section> Within the description tags, we manually inserted metadata tags: <Section> <Description> <Metadata name =“Title” mode =“accumulate”></Metadata> </Description> layout table </Section> Then we enclosed the start tag <Section> and the end tag </ Description> within a comment box. We also enclosed the end tag </ Section> within a comment box. This enclosing made the section, description, and metadata tags invisible in the browser. But the content and the color of the layout table remained visible in the browser. The manual and the software generated html encoding for one layout table looked like the following: <!—— <Section> <Description> <Metadata name =“Title” mode =“accumulate”></Metadata> </Description> ——> layout table <!—— </Section> ——> 14
  15. 15. We copied and pasted the Dreamweaver and the manual tags in the html file six times over. Now we have our master template for the bibliographic record. This template is essentially a series of sections in an html file. Each section has a header and a body, and within each section header we encoded the metadata for that section. The metadata tags for the section on author looks like the following: <!—— <Section> <Description> <Metadata name =“Author” mode =“accumulate”></Metadata> <Metadata name =“Title” mode =“accumulate”></Metadata> </Description> ——> Greenstone builds an index on the first pair for the metadata tags only when mode =”accumulate” is specified. Greenstone takes the second pair of the metadata tags and uses that as a “title” for that section to display along the leaves. See Figure I above. We, in essence, divided the html file into six sections. Each section having a small index hook attached to it in the form of a pair of metadata tags. When we had prepared a master copy of a record for one author, we saved it as a template for that author. Then we manually typed the bibliographic information about each digital emblem in the index hook and the body of each section. We saved each html file as a bibliographic record with the naming convention described above. Greenstone Framework, Importing, and Building a Collection: Once we prepared all our bibliographic records and stored them in respective folders, we were ready to build the collection using Greenstone’s automated plug-ins, utilities, and special procedures. We proceeded as follows: First, we setup the Greenstone framework as explained in Witten and Bainbridge’s How to Build a Digital Library, chapter 6, pages 302-319. We followed the instructions for building collections manually, that is, we setup the Greenstone framework in the command line mode. We elected to build through the command line mode rather than through the browser for a variety of reasons. Firstly, we experimented with building through the collector and discovered that the collection built through the browser generated five directories, whereas the collection built through the command line generated seven: we believed this to be a crucial distinction and we still believe so. Secondly, building through the command line mode offered us many advantages, including increased 15
  16. 16. flexibility in connecting with and navigating directories on the classroom server; increased visibility during the importing and building processes in that we could immediately spot error messages (and the successes that scrolled machine characters past our eyes as the HASH directories and indexes were built in a symmetry pattern not that dissimilar from streaming fractals); and increased knowledge of Greenstone’s building procedures. Still another advantage of building through the command line was access to and control over optional switches important for full utilization of Greenstone’s importing and building processes. For example, using the command line mode for importing enabled us to add our emblem htmls and JPEGs in batches. We were able to build and re-build existing collections without generating duplicate Greenstone archive records: this was achieved by invoking the optional switch “-removeold” at the import process (see Witten and Bainbridge, pages 315-316). The Witten and Bainbridge text contains a list of other such features available for the command line mode import and build processes. Once you set up the Greenstone framework for building a collection (a step that includes supplying the collection name, in our case, “DEmblems”), the software sets up seven directories to store: (a) source files, (b) files in the Greenstone Archival Format, which is an XML format that Greenstone utilized to build web pages for the display, and indexes, (d) images, (e) collection logo, (f) plain text files such as hfiles, (g) perl scripts, (h) the collection configuration file, collect.cfg, and (i) various automatically generated directories and files, such as the hash directory structure, the associated files directory, the fail.log, and the collection information database. The seven directories are: “import,” “archives,” “building,” “index,” “etc,” “images,” and “perllib.” Before executing the import perl script to automatically convert the source files into the Greenstone Archival Format (GAF), we specified the following routines in the collect.cfg file. (See Appendix A: DEmblems Configuration File, collect.cfg). a. We specified the following indexes for Greenstone to look for the metadata elements specified in each section: i. Motto ii. Author iii. Book iv. Publisher v. Subject vi. Pictora b. We invoked a special routine “-description_tags” for the html plug-in to accumulate description tags for searching and indexing. c. We also invoked a special routine “-cover_image” for the html plug-in to associate each JPEG with the corresponding html bibliographic record. Greenstone associates the two files based 16
  17. 17. upon the naming convention that should give the JEPG the same prefix that the html files has. For instance, RECORD1.jpg will be associated with RECORD1.html. Note, after all collection building and indexing steps are completed, the associate files will be stored in the index directory’s sub- directory called, “assoc.” d. We specified the four alphabetical vertical browsing lists that Greenstone will use to build browsing on the metadata elements, “Source,” “Motto,” “Subject,” and “Pictora.” The “Source” list was built at the document level and the other three lists were built at the section level of the document (recall that the metadata elements, “Motto,” “Subject,” and “Pictora” were specified within sections of html documents). Specifying the four alphabetical vertical browsing lists as we did invoked default display features of Greenstone such as the alphabetical buckets (A-B, C, D-F, etc.) and page advancing arrows (icon + “Matches 11-20”) that respectively appear at the top and bottom of search or browse query results. e. We specified the h-files that Greenstone will use to build the two browsing hierarchies, the subject classification as explicated in the above report section, “Subject Description: Facet Analysis,” and the Biblical motto classification ordered on the hierarchical structure of the canon itself. (See Appendix B: hfile for DEmblems, sub.txt). f. We specified the “sort” operations on the two browsing hierarchies. We did not specify the “sort” operations for the vertical browsing lists, relying instead on the default sort, which corresponds with the metadata element particular to each list (eg, the AZSectionList constructed on the metadata element “Pictora” will be sorted by “Pictora”). g. We specified the “buttonname” for each of the browsing lists. As Figure I shows, three buttonnames display on the navigational bar in the characteristic Greenstone colors and fonts and four buttonnames appear as linked text. The “search,” “filenames,” and “subjects” buttons display as they do because these buttonname icons came with the downloaded Greenstone software. Our linked text buttons, “MottoAtoZ,” “BiblicalMotto,” “SubjectsAtoZ,” and “PictoraAtoZ,” are chosen names specific to our collection: as such, the macro files did not contain such icons. We toyed with the idea of substituting a pre-made buttonname for uniform-display purposes. For example, we could have easily specified that the “MottoAtoZ” display as the pre-made button, “phrases.” However, we still would have been left with finding pre-made buttons for “BiblicalMotto” and for “PictoraAtoZ,” not to mention finding a second subject-related button that differentiated the 17
  18. 18. alphabetical vertical list of subjects from the classified hierarchy. Moreover, the term “phrases” is not synonymous with the term “MottoAtoZ.” We decided that clarity of button- naming took priority over uniform-display of the buttons. h. We specified a formatting string for each leaf in a particular browsing list to be hyperlinked with the source html document in that list. i. We specified through a formatting string the navigational buttons, “Expand Text,” “Detach, and “Highlight.” These buttons appear at the lower left corner of the screen when the full record and the image are displayed side by side. j. We specified a formatting string for each section title to display as a heading above its particular section layout table. When the “Expand Text” button is selected, all headings are displayed. k. We invoked the collection icon feature of Greenstone by specifying the path to our Fireworks designed logo, placed in the DEmblems images directory. l. We specified the searchable field names for the pull down- menu. The searchable fields correspond to the metadata elements and indexes specified above (see a.). m. We wrote a succinct description of the collection in the “collectionextra” line, indicating the LIS class it was constructed for as well as the builders of the collection. The collectionextra line functions as part of a splash-page for the entire collection. Insofar as additional information for the splash-page, Greenstone automatically generates statements for the search and browsing features specified. Any additional revision to the splash-page would entail manual revision of the macros, a step we elected not to pursue at this time. Once we finalized the collect.cfg file, we executed the “import” perl script and the “buildcol” perl script. Greenstone parsed all source files and built a hash directory structure for storing information about the GAF. The GAF and the hash directory structure ensure that the human administrator knows the path to source documents and the associated metadata that Greenstone stores in XML in the GAF (see Appendix C: Greenstone Archival Format for Adversity_Misery_HH_02_1686.html, an example of doc.xml). Furthermore, the hash directory structure and the GAF ensure that the software is able to build web pages on the fly once a search query is executed and the desired document is clicked-on. We executed the final step by moving the building directory contents into the index directory. The English Emblem Book Digital Library was thus complete and available for the audience. 18
  19. 19. Possibilities for the English Emblem Books Digital Library: There are a few select aspects of the collection that had we had more time to work on the project we would have implemented. We have already invested time in understanding and investigating possible solutions for each of the following. As it ended up, the semester-hourglass beat us. Thumbnails. Greenstone has automated procedures that generate thumbnails for images imported. Provided the appropriate plug-in is invoked through the configuration file, an image and its corresponding thumbnail are automatically associated by Greenstone; a GAF file records the association; and the hash directory structure stores each. We investigated thumbnails because we initially wanted (a) to not re-size the JPEGs collected from PSU and Middlebury; and (b) to specify to Greenstone that the thumbnail should stand as a substitute for the “cover image.” Essentially we wanted to specify to Greenstone, “display the thumbnail associated with this html in the cover image spot and make the thumbnail link to the appropriate image.” Revisions to the macros. Customization of colors, fonts, and navigational features are certainly possible through working with the macro files. We have already mentioned two areas where macro-work could have been possible to enhance our already visually appealing collection: creating collection-specific buttons for the navigational bar and designing a different splash page. We were also interested in revising the macros in such a way as to make Boolean searching across metadata elements possible. For example, the current collection allows a search for the subject, “Obedience.” The displayed results will show numerous leaves from various books by various authors. We would like to have enabled a Boolean search for the subject, “Obedience” AND the author, “Hugo.” Incorporation of Strictly Textual Leaves. The emblem books and books of emblems collected all had digitized images of pages that were not emblems proper, that is, PSU and Middlebury had also digitized the various prefaces, tables of contents, dissertations, dedications, and exhortations associated with each book. Such leaves are strictly textual matter and, as such, were beyond the initial scope of the project, focusing as we did on the emblem proper. But for our audience these leaves are absolutely crucial for studying the resources. It is, for example, obvious in the “preface” by the English translator of Hugo’s Pia Desidera that this emblem book, and this translation in particular, is intended to school 17th century English women and children in a decidedly Christianized morality; and was censored by the translator in order to cleanse the work of the ‘shameful’ and ‘ridiculous’ follies attributed to monks and Jesuits in the original book: such historical and literary discourses are of paramount concern for our audience and their respective disciplines. Of the three possibilities described above, incorporation of strictly textual leaves would seem the most important if we desired to take this 19
  20. 20. collection beyond the LIS450DL classroom and into the classrooms and desktops of our identified audience. Bibliography: Witten, Ian H., and David Bainbridge. How to Build a Digital Library. The Morgan Kaufmann series in multimedia information and systems. Amsterdam [u.a.]: Morgan Kaufmann, 2003. 20
  21. 21. Appendix A: DEmblems Configuration File, collect.cfg 21
  22. 22. Appendix B: hfile for DEmblems, sub.txt 22
  23. 23. Appendix C: Greenstone Archival Format for Adversity_Misery_HH_02_1686.html, an example of doc.xml 23