Transcript of "Rockymountain Wilderness Survival Library: Information Organization Project"
Jason MooreSpring 2012SLIS 5200 TXWI-ADraft 4Rockymountain Wilderness Survival Library:Information Organization System1. Project description1.1. Collection and information objectsThis collection, the Rockymountain Wilderness Survival Library, is housed in the Philmont Training Centerat the Philmont Scout Ranch in Cimarron, New Mexico. The collection consists of 1200 non-fiction booksof the following types: general wilderness survival guides, field guides, regional historical accounts,handbooks, and atlases. The subject coverage in this collection is broad and includes, but is not limitedto, Northern New Mexican geography, Native American history and culture, regional plant and animalidentification, orienteering, cooking and food preparation, outdoor sporting, hiking, and camping. Thiscollection is used as a supplement to official Boy Scouts of America publications for use by troop leadersand camp counselors. For this reason, items are borrowed primarily in relation to official camp programsand activities. Materials are purchased through limited grant funds and collection growth relies solely ondonations from Boy Scouts of America members and alumni.1.2. Users demographics and knowledgeThis collection is strictly limited to use by visiting Boy Scout troop leaders and Philmont camp counselors.These two user groups can be considered together demographically as consisting primarily of educatedmiddle-class males of generally high socio-economic status, ranging between eighteen and forty-fiveyears of age. Most of these users should have similar cultural backgrounds and can be considered to becomfortable with a moderately complex information retrieval system.In order to best understand the various decisions that factor into the way in which this system is designed,the users’ level of knowledge of the four following types should be considered: general, domain, system,and information seeking. These knowledge types represent what users bring along with them before theirinteraction what a system and must be recognized and accounted for in a systems design in order toaccomplish the best possible user experience. General knowledge is defined as knowledge related to auser’s intellectual capabilities, their life experiences, and their attitudes and inclinations. Users of thissystem can be considered to have a high level of general knowledge because of their relatively strongeducational backgrounds as well as their generally high socio-economic status. Domain knowledge canbe defined as the users’ over-all level of knowledge of the subject material of the collection. This system’susers can be considered to possess a high level of domain knowledge related to the subjects covered bythis collection. Since all users of this collection serve in a roll of general authority and expertise, their levelof understanding of these subject areas should be well above novice. System knowledge can be definedas a broad understanding of the structures, architectures, and technical aspects information systems.Users of this system have a mostly moderate level of knowledge in this area. A large majority of them arecomfortable with high-level computing tasks such as word processing, general internet use, as well asnavigation and configuration of an operating system’s basic graphical user interface. There are someusers with a lower-than-moderate understanding of systems, but it can be safely assumed that these
users are capable of receiving assistance from more technically proficient colleagues. Information seekingknowledge can be defined as a user’s ability to search for, retrieve, and use information. Most users ofthis system have a low to moderate level of information seeking knowledge. While many of these usersare familiar with searching for information online, only a small sub section of them have any experiencewith concepts such as Boolean operators, regular expressions, or classification systems. Because ofthese factors, a relatively simple system that relies heavily on the users domain knowledge is mostappropriate.1.3. Users problems and questionsWhen interacting with the system, users look for material that is related to specific activities and programsthat take place at the camp but are not sufficiently covered by material in official Boy Scouts of Americapublications. Because of this, many of the questions that users have are subject oriented, but users oftenrequest items from specific publishers and authors, especially in the case of field guides and historicaltexts. A random sampling of questions asked by users of this system is as follows:User question 1: I am looking for a couple of books on general orienteering skills, preferably illustratedand including supplemental materials such as a map and possibly a protractor. I would like to use them inorder to prepare for a map and compass activity with my scouts. Can you help me find some that I cancheck out?Object attributes: Subject, Type, Illustrations, Supplemental Materials, Related ActivityDesired precision: HighDesired recall: ModerateUser question 2: I was at a meeting the other day and I heard someone mention some books about thehistory of Native Americans in this region, I think I heard the name Veronica Tiller mentioned as anauthor, I think one of them was called Tiller’s Guide to Indian Country. I’d like to find some other books byher, but if you don’t have any, I’d like to check out a few books on that subject anyway.Object attributes: Author, Title Subject, TypeDesired precision: HighDesired recall: ModerateUser question 3: I’m looking for four or five field guides on poisonous plants and animals from this area.We’re planning a hike and I need something a little more specific than what is included in my Boy ScoutManual and relatively easy for the youngsters to understand. I’ve heard that Peterson Field Guides arereally great. It would help if they were illustrated and small enough to carry on the trail.Object attributes: Subject, Type, Content Difficulty, Publisher, Dimensions, Related Activity, IllustrationsDesired precision: ModerateDesired recall: HighUser question 4: I need to check out a handbook on basic shooting and archery skills. It’s for thebeginner’s shooting and archery program, so I need it to be pretty basic. I’d like it to include pictures aswell.Object attributes: Subject, Type, Content Difficulty, Related Activity, IllustrationsDesired precision: ModerateDesired recall: LowAs these questions suggest, attributes that are important to the users of this collection and that should beincluded in the system records are as follows: author, title, publisher, subject, type, content difficulty,illustrations, related activity, supplemental materials, and physical dimensions. It will be necessary toinclude the ISBN of every title so that lost materials can be more easily replace in the event that fundingbecomes available.
2. Representation of information objects2.1. Entity levelEach record-- or collection of fields taken from metadata elements-- in a database represents one entity,or part, of the collection. The totality of these records, when viewed together, is a representation of thecollection as a whole. For this to occur adequately, it is important to first identify the entity level, or theportion of the collection of which a single record makes a description. This level of representation createsa relationship between the structure of the system and the information objects of which it is composedthat can be standardized across the entire collection. Declaring what constitutes the entity level of acollection determines what each record represents. In the case of this collection, each record describes asingle book. Since users never have a reason to specifically request any entity below this level-- such asa single chapter, or a single page-- it is appropriate to set the baseline of what constitutes an entity in thecollection at the level of an individual bound monograph.2.2. Metadata elements and semanticsEach of the eleven attributes listed in section 1.3 translates to a single metadata element used to providedescriptions of the objects that are effective and useful to the user. The Title and Author of each book areincluded as elements due to their being the most easily identifiable attributes that are nearly unique to aspecific work. Also included are the Subject and the Type of work represented by the objects which giveinsight into both their structure and intellectual content. Elements like Content Difficulty and RelatedActivity provide a connection between specific programs and activities at the camp and the intellectualcontent of the book. Other elements, such as Illustrations and Supplemental Materials, help to identify awork based on its potential use. Publisher, ISBN, and Classification are also included as elements, bothfor internal purposes, as well as for their potential to be used by patrons. For a detailed list of thesemetadata elements and semantics, see Appendix A.Beyond the task of simply providing a description of the objects that make up the collection, thesemetadata elements should support the four essential user tasks, or activities that are necessary for a userto perform when attempting to access an item from the collection: finding, identifying, selecting, andobtaining. Finding refers to generating results that respond to the query entered by a user. This task ispotentially supported by any searchable field that can be applied to multiple items, but more often issupported by content-descriptive elements such as Subject and Type. This is because most usersapproach the collection with some idea of what type of content they are looking for. Identifying refers tothe process of determining whether or not the objects described by the records are the same as the onessought out by the user. This task is supported by the more unique elements such as Title, Author, andpotentially, ISBN since they are less likely to be repeated elsewhere in the collection. Selecting refers toisolating specific results that respond to the users particular information needs. This task is supported bycontent-related elements that are more specific than subject, such as Content Difficulty and RelatedActivity. This is because most of the users of this system are looking for items with relation to a specificprogram at the camp or that are at a certain difficulty level. Obtaining refers to the process of actuallylocating the physical object to be utilized by the patron. This task is mainly supported by the Classificationelement.2.3. Record structure and specificationsSince each of the elements represented in the metadata scheme can easily translate into exactly onefield, the only additional fields in the record structure are RecordID, RecordDate, and Classification (SeeAppendix B) This amounts to a total of fourteen fields in the record structure for this database. There arespecific database management reasons for including these additional fields. The RecordID field serves as
the primary key for the database and functions as a way to uniquely identify each record in the system.This field has no use for the end user and instead supports database management work by allowing acataloger or system administrator to pull up individual records by executing a simple query. TheRecordDate field serves a similar purpose in that it is mostly administrative in nature. This field places atimestamp on each record which provides a context for catalogers and system administrators pertainingto when a record was created. This can support a number of administrative functions, from statisticalreporting to quality control. The Classification field allows the system to provide a code for the book’sphysical location in the collection.Technical specifications for the system are what controls how data is entered into the database as well ashow it is organized and stored. There are four types of technical specifications for each field in thedatabase record structure. These specifications are: Field Type, Indexing, Entry Validation, and ContentValidation. The Field Type specification determines what kind of data is stored by the field. Every field,other than RecordID and RecordDate, is a text field that stores data as text. This means that the dataentered into these fields is stored as text that has no numerical value. The RecordID field is of the fieldtype “autonumber.” This field type is stored as an integer value that automatically advances as each newrecord is added to the system. The RecordDate field is of the field type “autodate” which stores a datevalue that retrieves the current date from the user’s system upon creation of a record.The Indexing specification determines the way in which each field is searched in the database. Indexingthe fields allows the database’s search engine to determine what constitutes a direct match to a user’squery. The two types of indexing available for fields in this database are Word and Term indexing. Wordindexing allows for each word in an input field to be indexed separately so that when a user performs aquery, the words are considered on an individual basis, without being contextually related to the rest ofthe data stored in the field. Term indexing indexes each full input value for a field. When this is done, fullinput values—such as a single subject term that may be composed of multiple words-- must be matchedexactly by a query in order to be returned as a result. The fields: Author, Title, Subject, Type, RelatedActivity, Supplements, Illustrations and Publisher, are indexed using both Term and Word indexingspecifications. This is due to the type of information contained in the field and how it is input bycatalogers. Each input value for these fields has the potential to contain more than one word and need tobe found through non-specific querying on occasion. The remaining fields: RecordID, RecordDate, ISBN,Content Difficulty, and Physical Descriptions, are indexed using only the Word indexing specification. Thisis because these fields will always only contain a single unit of information rather than a string of words.Because of this, there is only one way for a user to search these fields.The Entry Validation specification determines the amount of content a cataloger must enter into a fieldand whether the value of the input is required to be unique to a single record. There are three possiblestates for this specification that can all be used simultaneously: Required, Single, and Unique. TheRequired state of entry validation necessitates that the field must have a value in order for the databaserecord to be saved. This state ensures that important identifying fields such as Author, Title, Subject, andISBN are always included in every record. It also forces the cataloger to provide information that isimportant to a majority of the collections patrons such as Related Activity, Content Difficulty, and PhysicalDimensions. The Single state of entry validation limits a fields ability to contain multiple values. This entryvalidation is used to ensure that a field that can have only one possible value does not contain multipleones, which can be misleading to the user. Fields that require this validation are Title, ISBN, Type,Content Difficulty, Publisher, and Physical Dimensions. The third state of entry validation is Unique. Thisstate makes it a requirement that no two records have the same value for the corresponding field. Theonly fields in this system that use this form of validation are RecordID and ISBN. These two fields arerequired to contain unique values for each record since they are referring to discrete objects. TheRecordID field is the only place where the record itself is identified as unique in the database, and theISBN number is the only piece of metadata that specifies a distinct book from the collection, since it isconceivably possible that two books can share the same title. The only fields in this database that do not
require any form of entry validation are Supplements and Illustrations, since not every title will have theattributes described by these fields.The Content Validation specification allows input values to be checked against either a specified format ora list of predefined values-- rejecting any input that does not meet the specification. The former is knownas a mask. It is used to fit data into a strictly defined format. It is applied to fields that require values to bea certain length or to be delimited in a specific way with specific characters. The only field in this databasethat uses the mask form of content validation is Physical Dimensions. This field is required to display theobjects dimensions in a specific format, the mask for which is ##X##X##. There are several fields in thisdatabase that utilize controlled vocabularies that are defined by validation lists. These fields are Type,Related Activity, Content Difficulty, Supplements and Illustrations. These fields are able to be governed bycontrolled vocabularies due to the finite amount of values that can be accepted as input. For a detailed listof specifications, see Appendix B.2.4. Record content and input rulesThe content of all fields is governed, to a certain extent, by input rules. These rules determine what is anacceptable way for the cataloger to input a record. The only fields in this database that are not governedby input rules are RecordID, RecordDate, Title, Subject and Publisher. Other than RecordID andRecordDate, two fields with automated input, these fields vary too much to be restricted by rules that donot allow for these differences. The rest of the fields are governed by rules in order to provide the userwith the most coherent experience possible. For fields that only have a small number of available options,it is important to establish a controlled vocabulary in order to create a one to one connection when theuser executes a query. The controlled vocabulary is established by creating a list of possible input values.If a user enters a value that is not on the list, the record cannot be saved in the system.Finding a chief source of information for each field is important because it ensures that the cataloger isnot working in a void. The creation and maintenance of records is far more efficient when the catalogercan easily locate the necessary data when handling the object. The majority of fields in this database aresourced from the actual content of the book. This is somewhat problematic in a few cases where thevalue of a field must be determined using the catalogers discretion and interpretation of the material. Thisis acceptable though due to the fact that all catalogers working on this system are required to have a highlevel of domain knowledge and some level of experience with the specific materials being cataloged. Thecatalogers of this particular questions are already staff members at the park and have more than apassing familiarity with the content of the potentially problematic fields. For a detailed list of input rules byfield number, see Appendix C.3. Access and authority controlDue to the variant nature of many metadata elements, it is necessary to have some governance over thedata input and searching of a database in order to achieve the highest level of collocation and recall.Establishing this governance is known as authority control. Authority control allows for all of the knownvariations of subject terms, author names and titles to be recognized as related to agreed-upon standardversions of the terms. There are two primary forms of authority control that come in to use in informationsystems: name authority control, and subject authority control.Name authority control is the form of authority work that deals specifically with the names of people andorganizations. This usually encompasses authors, publishers, performers, artists and other figuresinvolved in the creation of a work. Because these names are often referred to by many variant forms—abbreviations, initials, pseudonyms, etc. — a standardized name for each entity must be established. Thisstandardized version is referred to as the authorized name. Authorized names are stored in a separate
database, known as an authority file, which catalogers can consult in order to determine the proper nameto place in the bibliographic record for a work. Within the authority file, authorized names are stored inrecords that contain a field for all known variant forms of the name. The bibliographic record is then linkedto the authority record so that any author search carried out by an end user will automatically crossreference variant names in order to assure all relevant records are recalled.Subject authority control is the form of authority work that ensures that subject metadata contained in adatabase is collocated through the use of validation lists and subject authority files, or thesauri, which arestructured syndetically—meaning that terms are cross-referenced to show their semantic relationships.This allows related terminology to be grouped together in a way that recognizes equivalency, hierarchy,and association. A thesaurus is used in this system to govern the input for the Subject field since it ismore complex and has a higher probability to contain complicated semantic relationships than any otherfield. When subject authority control is used properly, the end user should be able to more easily recallrelevant records without having previous knowledge of preferred terminology or their search term’splacement in a hierarchy.4. Representation of information content4.1. Subject accessMetadata elements that are not related to the physical description of an information object but insteaddescribe its intellectual content are known as subject representations. Since most end users are lookingfor materials based on the information they contain, subject is one of the most important access points forsearching the collection. This is why it is necessary that subject access be handled well through authoritycontrol so that the end user does not have to use a process of trial and error when searching by subjectbefore finding any relevant content. Fields that provide subject access in this system are Subject, Type,Related Activity, and Content Difficulty.There are several forms of subject authority control, a few of which are briefly mentioned in section threeof this document. The most prevalent of these are controlled vocabularies, subject headings and terms,and classification. A controlled vocabulary assists in subject authority control in much the same way as aname authority file does in name authority control. It attempts to determine, from a wealth of variants, anauthorized form of a subject term. This is achieved when the controlled vocabulary for a field is createdwith the domain, and the uses’ knowledge of it, under consideration. The system for this collectioncontains two systems for implementing controlled vocabularies for fields. As mentioned elsewhere in thisdocument, one of these is a simple input validation list that forces catalogers to choose from a list ofauthorized terms rather than using natural language. This is implemented in a variety of subject fields.The other system is the thesaurus which is used to establish relationships between terms and theirvariants as well as synonyms and associated terms.Subject headings are also used to implement subject authority control. Subject headings are maintainedin a subject authority file known as a subject headings list. These headings represent a broad domain thatcan include many narrower subject areas. For example, in this database, the subject heading“Recreation” includes the narrower subjects of “boating,” “archery,” “fishing,” and so on. The subjectheading file, much like a thesaurus, will establish these relationships while defining the narrower terms aspreferred rather than “recreation,” since that is included in the Related Activity field.Classification schemes are also used to promote collocation by physically organizing information objectsby their subject metadata. This facilitates the actual retrieval of the object from the stacks by an end userwho has identified it as a satisfactory response to their query. This is achieved by identifying a number of
facets that are abbreviated and placed together to create a code that determines a material’s placementon a shelf.All of these models are achieved though subject analysis which is the act of analyzing the content ofinformation objects in order to determine its concern. Subject analysis is performed through familiarizationwith content, extraction of terminology directly from the content, and translation of the extracted terms intoterms that are validated by the controlled vocabulary and then utilizing input rules to ensure correctspelling , punctuation, format, etc.4.2. Thesaurus structureA thesaurus is used for authority control on the Subject field in this system since it is not regulated by avalidation list like the other content-derived fields like Type and Related Activity. This is because many ofthe subject terms used in this system are derived from natural language and the amount of subjects is toovast to be regulated by a simple list. The thesaurus is also necessary because the possible relationshipsbetween the subject terms are far more complex than those in other fields. The controlled vocabularycreated by the thesaurus is a list of terms that are authorized for input with their relationship to terms thatare semantically related to them and can be used as search terms by an end user. The related terms arenot authorized for input but are still recognized as valid by the search engine through cross-referencingthe thesaurus.This is achieved through the syndetic structure of the thesaurus where the semantic relationshipsbetween authorized and non-authorized terms are established. Three semantic relationships arerecognized by the thesaurus: equivalency, hierarchy, and association. A non-authorized term isequivalent to an authorized term when they are synonyms or homonyms, or equal in meaning.Equivalency is expressed in the thesaurus with the statement “USE FOR.” For example, the authorizedterm “Recreation” should be used in place of the non-authorized term “Sports” since sport engaged in atthe camp are more often referred to simply as recreational activities.A non-authorized term has a hierarchical relationship to an authorized term when it is either a domain ofwhich the authorized term is a part or if it is part of a subdomain of an authorized term. For instance theterm non-authorized term “Boating” is a broader term than the authorized term “Canoeing.” This isbecause canoeing is the only specific type of boating activity that is engaged in at the camp. The term“Boating” in this example would be notated in the thesaurus with BT for broader term in that it is a broaderterm than “Canoeing.” If the relationship were reversed, the term would be notated with NT for narrowerterm.Two authorized terms can be related as well in a way that is neither synonymous nor hierarchical. This isan associative relationship. These terms share some sort of characteristics but cannot be considered tomean exactly the same thing. A good example of an associative relationship between terms in this systemis the two authorized terms “Safety” and “Emergency Preparation.” These terms clearly do not mean thesame thing but are related in that they share many qualities. The thesaurus states that these two termsare related with the reciprocal RT or related term.The domain of the thesaurus, or what it covers, is the different wilderness survival related areas that arecovered by the books in the collection such as camping, or botany. Since there are no specific limitationsto these subjects, the scope is the same as the domain. The exhuastivity of the thesaurus, or how manydifferent subject terms are provided, is low in that it only summarizes the main topics of each book ratherthan all sub-topics as well as topics of individual sub-units of each work. This is because users search thesystem looking for titles that cover mostly individual subjects rather than broad arrays of sub-topics. Useof the thesaurus allows for higher precision in subject searching as well as the possibility for higher recall.
4.3. Classification schemeClassification of physical materials allows for them to be grouped together on the shelf in such a way thatmaximizes collocation of the aspects of the material that are most important for browsing by users of acollection. If the materials in the collection are classified well, they will be placed in a way that will allow auser to discover more relevant material upon approaching the shelf to retrieve a specific item. It alsoallows the user to find the specific item within the collection rather than having to search through anunorganized array of items. This makes classification perhaps the single most important part of theorganization system being implemented.There are two different types of classification schemes: hierarchical and faceted. Hierarchical schemesplace materials in categories and sub-categories based on their subject material. These schemes, mostpredominant of which is the Library of Congress Classification scheme and the Dewey Decimal System,are best suited for large collections with a complex array of hierarchical subjects such as Chemistry whichis a sub-category of Science. This would be represented in the classification scheme where the first partof the code would represent the highest hierarchical level, Science, followed by a portion that wouldrepresent the sub-category, Chemistry. The other type of scheme, faceted, uses portions of the code todetermine specific aspects of the bibliographic metadata for the object that is not is not hierarchical. Afaceted classification scheme that includes chemistry would not place it under Science because, perhaps,it is a scheme for a much less complex collection that is dedicated to the domain of science. This wouldrender the higher level of classification unnecessary.The classification scheme for this collection is a faceted one that includes facets for Related Activity,Author, and Title. This scheme is used because most users look for material that is related to an activityat the camp so it would make sense that all materials related to each activity should be grouped togetheron the shelf. Since each title has the potential to be related to more than one activity, the first activitylisted is considered the primary activity and will be used for this facet. Beyond that, users are comfortablewith books being arranged alphabetically by author, so that concept is retained in the scheme. For thethird facet, each title is arranged alphabetically by title. This represents a relatively simple classificationscheme that works for the users of this collection.The facets are derived by using codes that are created by following relatively simple rules, the extent ofwhich are detailed in Appendix E. The first facet, related activity, is derived from the first listed term in theRelated Activity field of the bibliographic record. The code is then created by taking the first three lettersof the first word. The second facet is derived from the author’s name and is generally taken from the firstthree letters of their last name. The last facet is taken from the Title field in the bibliographic record anduses the first three letters of the title, unless the title begins with a number, which is omitted. The facetsare separated by hyphens. A unique number is appended to the end of the classification code to maintaineach specific item’s individual status within the collection. This number is taken from the RecordID field inthe bibliographic record and is preceded in the code by a colon.The following example code is derived from the book Be Expert at Map and Compass. The primaryrelated activity for this book is orienteering classes so the first facet of the code is ORI. The author’s lastname is Kjellstrom so the second facet is KJE and the title is outlined above, so the third facet is BEE, thefinal code looks like this: ORI-KJE-BEE: 2.5. Name authority controlName authority control is the process used to standardize names of individuals and organizations duringthe design of an information system. The need for name authority control arises out of the problem thatmany authors as well as organizations responsible for the creation of information objects do not always
go by the same exact name. This is because these entities sometimes abbreviate their names, have thespelling of their names changed through translation and misspelling, use titles before or after their propernames, change the word order of their names, work under pseudonyms or completely change theirnames altogether. This problem results in many authors and organizations being represented by manyvariant names. Because of this, both technical and end users of information systems are faced with thedifficult challenge of figuring out which name to use when searching the system or when inputting records.Without any standardization in place to help with this, the recall of a search for an authors ororganizations name may have significantly low recall, especially if the user is looking for a name that hasa many variations.Name authority control in this system is accomplished through the creation of a separate database, calleda name authority file that contains a single record for each individual or organization under its control.Both the Author field and the Publisher field in the main database are under control of the name authorityfile. Many of the materials in this system are authored by people who use titles, people with foreignnames that have undergone translation, and names that have been abbreviated. Many users of thesystem will already attempt to search for author names in the correct order of entry but this is not true forevery user, so the name authority file must include variant orders for names as well. Much of the materialin this collection is not current, with publication dates that are decades old. The publishers of much of thematerial are small organizations that may not currently be active and if active may not be using the samename as the one printed in the material itself. This is why the name authority file covers the manypossible variant names of the organizations responsible for publishing the material in this collection.Each record in the name authority file contains five fields: RecordID, RecordDate, AuthorizedName,VariantNames, and SourcesUsed. RecordID and RecordDate, much like in the main database for thesystem, are included for administrative purposes. AuthorizedName establishes the only form of the nameto be used in the main database for the corresponding field. This ensures that the name is fullystandardized since no other form can be used when the technical user is inputing records. VariantNamesincludes a list of all known variations of the authors name. What constitutes an actual variation to beplaced in the file is determined by the rules outlined in Appendix F. SourcesUsed provides documentationof all sources referred to when both determining the authorized name as well as collecting and addingvariant names.End users of the system may reference the name authority file when an author search either returnsinsufficient results or inaccurate results, in order to determine if they are using a variant and unauthorizedform of the authors name. If a search of the name authority file reveals that this is the case, the user canresubmit their search of the main database using the authorized form of the authors name. Thisincreases recall in situations where a search returns little to no results due to the use of an unauthorizedname while reducing recall in situations where a search returns too many or inaccurate results of authorswhose authorized name in the system matches an unauthorized name of a different author. The same istrue in rare instances when a user searches the system by publisher rather than author.6. System evaluation and development6.1. Performance testPerformance testing occurred in an office environment under direct observation. A subject was selectedwho fit the profile of a typical user of this system. After selection, the subject was given a preliminaryinterview to collect his demographic information as well as his knowledge of and comfort with informationretrieval systems and library environments. The subject is a former boy scout in his early thirties. Hecomes from an upper middle-class background and considers himself fairly comfortable with information
systems and seeking behaviors. His level of general knowledge is high due to his cultural background aswell as his extensive education. His domain knowledge level is high due to his history with boy scouts andother subject areas related to the collection and his system knowledge is slightly above average. The userwas given a briefing on the history of the collection as well as provided with general information about thecamp and its programs. He was also provided access to the systems thesaurus as well as its nameauthority file. Some slight explanation was needed concerning the use of these tools as well as a briefintroduction to the mechanics of the systems search form. After this, the user was given a print-outincluding the four user identified in section1.3 of this document. After the user performed searches to findmaterials corresponding to the questions listed, a brief spoken interview was conducted in order to assessthe effectiveness of the system.User question 1: I am looking for a couple of books on general orienteering skills, preferably illustratedand including supplemental materials such as a map and possibly a protractor. I would like to use them inorder to prepare for a map and compass activity with my scouts. Can you help me find some that I cancheck out?Object attributes: Subject, Type, Illustrations, Supplemental Materials, Related ActivityDesired precision: HighDesired recall: ModerateProbable precision: HighProbable recall: LowQuery formulation (n): First attempt- Field: Subject, Input: Orienteering AND Field: SupplementalMaterials, Input: Maps, Instruments; Second Attempt- Field: Related Activity, Input: Orienteering ClassesAnalysis of results: The first search performed by the user only returned one result due to the fact thatonly one book in the system provides the type of supplemental materials asked for in the question. This aresult of there only being records for ten items in the system. Taking this into consideration, the userdecided he would like to search for books that fit the other criteria. Since the question references classeson orienteering, the user decided to search by the program in the system. This query returned two results.While this does not drastically increase the recall, it does show better performance.User question 2: I was at a meeting the other day and I heard someone mention some books about thehistory of Native Americans in this region, I think I heard the name Veronica Tiller mentioned as anauthor, I think one of them was called Tiller’s Guide to Indian Country. I’d like to find some other books byher, but if you don’t have any, I’d like to check out a few books on that subject anyway.Object attributes: Author, Title Subject, TypeDesired precision: HighDesired recall: ModerateProbable precision: HighProbable recall: LowQuery formulation (n): First Attempt- Field: Author, Input: Tiller, Veronica; Second Attempt- Field:Author, Input: Velarde, Veronica; Third Attempt- Field: Subject, Input: HistoryAnalysis of results: Since the user forgot to consult the name authority file before performing an authorsearch, he used an unauthorized form of the authors name. Because of this, the system returned noresults for his search. He then verified the authorized version of the authors name and got one result.The title in the system is not the same one in the question but since the one title cataloged still fit therequirements of the question it was considered an adequate result. The user then decided to attempt asubject search to find some similar titles. The subject search returned two results, one of which was thebook that had been requested already. This may not have boosted the number of results to the desiredlevel of recall but it was deemed acceptable by the user.User question 3: I’m looking for four or five field guides on poisonous plants and animals from this area.We’re planning a hike and I need something a little more specific than what is included in my Boy Scout
Manual and relatively easy for the youngsters to understand. I’ve heard that Peterson Field Guides arereally great. It would help if they were illustrated and small enough to carry on the trail.Object attributes: Subject, Type, Content Difficulty, Publisher, Dimensions, Related Activity, IllustrationsDesired precision: ModerateDesired recall: HighProbable precision: ModerateProbable recall: LowQuery formulation (n): Field: Type, Input: Field Guide; Field: Content Difficulty, Input: IntermediateAnalysis of results: The users query only returned one result but it was exactly what he was looking for.However, it was decided upon further analysis that the Content Difficulty field could present problems forrecall since the other field guide title in the system is labeled as Advanced difficulty but could havepotentially been useful to the user. This title was not returned as a search result and would only havecome back if that field were left out of the query or if the search were performed for titles that have anadvanced content difficulty level. If this query were performed this way, the user would not be able to findtitles that are listed as basic or intermediate.User question 4: I need to check out a handbook on basic shooting and archery skills. It’s for thebeginner’s shooting and archery program, so I need it to be pretty basic. I’d like it to include pictures aswell.Object attributes: Subject, Type, Content Difficulty, Related Activity, IllustrationsDesired precision: ModerateDesired recall: LowProbable precision: ModerateProbable recall: LowQuery formulation (n): Field: Subject, Input: Archery; Field: Related Activity, Input: RecreationPrograms; Field: Content Difficulty, Input: basic; Field: Illustrations, Input: Color PhotographsAnalysis of results: The users initial query returned only one result but since that is what is requested inthe question, this was considered adequate. Furthermore, the result was a perfect match for therequested material. The only problem identified was in the ambiguity of the related activity field value“Recreation Programs.” This name is somewhat ambiguous and could potentially cause confusion for auser that is unaware that the archery and shooting classes are part of a broader category. However, thiswould be apparent to a user who is familiar with the structure of the camp.While the performance test served to identify a number of areas in need of improvement, it alsohighlighted some aspects of the system that worked well in helping the user to find and identify the propermaterials to answer the provided user questions. Potential problems that were identified included issueswith the Related Activities and Content Difficulty fields. The Related Activities field has values on itsvalidation list that were potentially unclear or too broad for some searches. For instance, User question 4requests material related to the archery program at the camp. This program is considered part of thebroader recreational programs sector of related activities. While a user familiar with the camps programsand how they are organized would know this, it would maximize usability of the system to provide a list ofall specific programs. The Content Difficulty field proved to be especially problematic though since thenature of a titles difficulty level is highly subjective. This could lead a user to miss potentially adequatematerials for their searches when they are searching based on this field.The test identified that both the name authority file and the thesaurus proved very beneficial to the usersince some of the subject terms used in the system may not perfectly match a users information seekingbehaviors or their personal vocabulary regarding possible subject queries and because one of thequestions uses an alternate form of one of the author names used in the system. They Type field workedwell since many users of the system look for books by this criteria and the user knew to that searching bythis field would help him to find the specific kind of material requested.
The user reported that the overall use of the system was fairly easy but suggested that problems with theContent Difficulty field be addressed above all else. He thought that it was a potentially helpful feature butthat it may be more useful to have it split into two fields where the user could rank them in importance. Hesuggested that the fields be labeled with something like “Content Difficulty First Choice” and “ContentDifficulty Second Choice” so that if a user does not retrieve adequate results, they can broaden theirsearch by adding the secondary difficulty level. He felt that the Related Activities field could be broadenedto include other more specific programs for the sake of search-ability. He also suggested that a few moresubject terms be added to the system for the sake of clarification.6.2. Change and developmentBased on the results of performance testing, a number of potential changes to the system have beenidentified. It has been decided that the content-difficulty field will be split out into primary and secondaryfields with the option of searching on only the primary field or searching on both. This gives the user theability to broaden or narrow their search at will, allowing for the desired amount of recall in search results.The suggestion of broadening options for the Related Activities field was decided against due to the factthat it would unnecessarily complicate the classification scheme for the collection. If too many specificprograms were identified, there would be entire sections of the collection that would have little to nomaterials available. Because of this, though, it was decided that the related activity, orienteering classes,was potentially too narrow but was kept since it does not easily fit under a broader program.It has also been noted that it may not be necessary to specify the specific types of illustrations present ina work. Users are not likely to search with that level of specificity. During the design process, it wasnoticed that it would not be possible to execute a query searching for a title that simply includesillustrations. The system requires the user to select a specific kind of illustration. This could have beenavoided by splitting the Illustrations field into two separate fields, much like the proposed new ContentDifficulty field. The user would then have the ability to use one field to request that illustrations be presentand another field to specify the type of illustration, if this is an important detail. A minor problem wasrecognized in the masked format of the Physical Dimensions field. It is not possible, based on the waythat input validation is set up, to use fractional values. This requires that the user round the value up inorder for it to fit inside the input mask. It may be worth considering making a change in the validation forthis field to allow for decimals.If this project had access to more flexible and dynamic software as well as the funds to hire competentdevelopers and programmers, many other changes would have been considered. For instance, it wouldbe useful to be able to integrate the thesaurus and the name authority file into the system to allow forautomatic input validation. It would also be beneficial to have more leverage over design of the userinterface, especially regarding the search form.
Moore/ TXWI-A/pg.13Appendix A. Metadata elements and semanticsNo. Element name Semantics1 Title The works proper name, as given on the title page2 Author The creator of the work3 Type The kind of work that is represented; such as field guide, handbook, atlas,etc.4 Subject Topic or topics covered by the work5 Content Difficulty Level of difficulty of the works subject matter: basic, intermediate, oradvanced6 Illustrations Images printed in and as a part of the work7 SupplementalMaterialMaterials that are included with the work but are not bound with the volume8 Related Activity Official camp program or activity that is related to the work in a significantway9 Physical Dimensions Measurements, in inches, of the works length, width, and depth10 Publisher The entity responsible for the printing, binding, and distribution of the work11 ISBN The International Standard Book Number12 Classification The call number denoting the item’s physical location in the collection
Moore/ TXWI-A/pg.14Appendix B. Record structure and specifications1. Record structure specificationsNo. Field name Field type Indexing Entry validation Contentvalidation1 RecordID Autonumber Term None None2 RecordDate Autodate Term None None3 Author Text Term, Word Required None4 Title Text Term, Word Required, Single None5 ISBN Text Word Required, Single None6 Subject Text Term, Word Required None7 Type Text Term, Word Single List box8 Related Activity Text Term, Word Required List box9 Content Difficulty Text Word Required, Single List box10 Supplements Text Term, Word None List box11 Illustrations Text Term, Word None List box12 Publisher Text Term, Word Single None13 Physical Dimensions Text Word Required, Single Mask14 Classification Text Word Required, Single None2. Textbase structureTextbase: C:Usersjason w mooreDocumentsSPRING 2012 5200 DB FILESjwm1Created: 2/23/2012 2:43:50 AMModified: 2/23/2012 2:43:50 AMField Summary:1. RecordID: Automatic Number(next avail=1, increm=1), Term2. RecordDate: Automatic Date(Both Date and Time,When Created), Term3. Author: Text, Term & WordValidation: required4. Title: Text, Term & WordValidation: required, single-only5. ISBN: Text, WordValidation: required, single-only6. Subject: Text, Term & WordValidation: required7. Type: Text, Term & WordValidation: required, valid-list8. Related Activity: Text, Term & WordValidation: required, valid-list9. Content Difficulty: Text, WordValidation: required, single-only, valid-list10. Supplements: Text, Term & WordValidation: valid-list11. Illustrations: Text, Term & WordValidation: valid-list12. Publisher: Text, Term & WordValidation: single-only13. Physical Dimensions: Text, WordValidation: required, single-only, mask ##X##X##14. Classification: Text, WordValidation: required, single-only
Moore/ TXWI-A/pg.15Appendix C. Record content and input rulesField # 1Field Name: RecordIDSemantics: Unique identifying number for each recordChief Source of Information: SystemInput Rules: NoneField # 2Field Name: RecordDateSemantics: Timestamp of the date of record creationChief Source of Information: SystemInput Rules: NoneField # 3Field Name: AuthorSemantics: Creator of the workChief Source of Information: Title page of the bookInput Rules: The authors name should be input in the following format: LastName, FirstName,MiddleInitial. If a text has multiple authors, only include the first one listed.Example: Kjellstron, BjornField # 4Field Name: TitleSemantics: The proper name associated with the workChief Source of Information: Title page of the bookInput Rules: Use capitalization for all words in the title besides articles. Do not include leadingarticles in the title field. Subtitles should follow a colon and use the same capitalization rules.Example: Be expert with map and compassField # 5Field Name: ISBNSemantics: International Standard Book NumberChief Source of Information: Copyright page of bookInput Rules: This field is always either a 10 or 13 digit numberExample: 1446544133 or 978-1446544136Field # 6Field Name: SubjectSemantics: Topic or Topics covered by the workChief Source of Information: The content of the bookInput Rules: First letter of each subject term should be capitalized. There is no limit to the amountof subject terms allowed in a record. Refer to the thesaurus in Appendix D for the list ofauthorized terms.Example: OrienteeringField # 7Field Name: TypeSemantics: The type of workChief Source of Information: The content of the bookInput Rules: Must be selected from the controlled vocabularyExample: HandbookField # 8
Moore/ TXWI-A/pg.16Field Name: Related ActivitySemantics: Corresponding camp program or activityChief Source of Information: The content of the bookInput Rules: Must be selected from the controlled vocabularyExample: Orienteering ClassesField # 9Field Name: Content DifficultySemantics: Level of difficulty of the works subject matterChief Source of Information: The content of the bookInput Rules: Must be selected from the controlled vocabularyExample: IntermediateField # 10Field Name: SupplementsSemantics: Materials that are included with the work but are not bound with the volumeChief Source of Information: The containerInput Rules: Must be selected from the controlled vocabularyExample: Map, CompassField # 11Field Name: IllustrationsSemantics: Images printed in, or as a part of, the workChief Source of Information: The content of the bookInput Rules: Must be selected from the controlled vocabularyExample: DiagramsField # 12Field Name: PublisherSemantics: The entity responsible for the printing, binding, and distribution of the workChief Source of Information: The copyright page of the bookInput Rules: Capitalize the publisher’s name. Do not include corporate marks or geographicalinformation.Example: American Orienteering ServiceField # 13Field Name: Physical DimensionsSemantics: Measurements, in inches, of the works length, width, and depthChief Source of Information: The containerInput Rules: Must be input in the following format: inchesXinchesXinchesExample: 3X7X1Field # 14Field Name: ClassificationSemantics: Code for location of physical item in collectionChief source of information: Multiple, refer to Appendix EInput Rules: Refer to Appendix EExample: ORI-KJE-BEE:2
Moore/ TXWI-A/pg.18Appendix E. Classification scheme1. SchemeRelated Activity Author TitleEmergency Preparation Classes - EME See Notation Rules See Notation RulesHistory and Culture Classes - HISOrienteering Classes - ORIRecreational Programs - REC2. Notation rulesFacet name: Related ActivityChief source of information: Taken from the Related Activity field in the bibliographic record. This fieldis at the discretion of the cataloger and is based on the most prevalent subject terms for the title.Notation rules: Abbreviated form of the activity’s name. See the Scheme table above.Facet name: AuthorChief source of information: The title page of the book provides the author’s name.Notation rules: Use the first three letters of the author’s last name.Facet name: TitleChief source of information: The title page, front cover, or copyright page of the work provides the title.Notation rules: Use the first three letters of the title excluding common stop-words like “the,” “and,” etc.3. Rule for unique numberThe unique number is taken from the auto-numbered RecordID field in the bibliographic record.4. ExampleTitle: Be Expert with Map and CompassAuthor: Bjorn KjellstromRecordID: 2Related Activity: Orienteering ClassesClassification: ORI-KJE-BEE:2
Moore/ TXWI-A/pg.19Appendix F. Name authority file1. Record structure specificationsNo. Field name Field type Indexing Entry validation1 RecordID Autonumber Term --2 RecordDate Autodate Term --3 AuthorizedName text Term & Word Required, Single4 VariantNames text Term & Word --5 SourcesUsed text Term & Word --2. Textbase structure (View Inmagic Tutorial for what goes here)Textbase: C:Usersjason w mooreDocumentsNameAuthoCreated: 5/2/2012 10:23:43 PMModified: 5/2/2012 10:23:43 PMField Summary:1. RecordID: Automatic Number(next avail=6, increm=1), Term2. RecordDate: Automatic Date(Both Date and Time,When Created), Term3. AuthorizedName: Text, Term & WordValidation: required, single-only4. VariantNames: Text, Term & Word5. SourcesUsed: Text, Term & WordLog file enabled, showing RecordIDLeading articles: a an theStop words: a an and by for from in of the toXML Match Fields:1. RecordIDTextbase Defaults:Default indexing mode: SHARED IMMEDIATE
Moore/ TXWI-A/pg.20Default sort order: <none>Textbase passwords:Master password = 0 Access passwords:No Silent password3. Record content and input rulesField #. Field name: 3. AuthorizedNameSemantics: The only form of an authors name authorized to be used in the system.Input rules: Use name that is most commonly cited. If it can be verified, use most recent form ofname. Do not use pseudonyms or nicknames (Example: W.H. “Chip” Gross) Enter these forms ofan authors name in the VariantNames Field. If an authors name has changed, use most recentname and place any previous forms of the name in the VariantNames field. Enter names withnormal casing. Do not enter in all lowercase or all uppercase characters. Enter names in invertedorder (Last name, First name Middle Initial.) Do not use titles. Credentials may be used if placed atthe end of the name as it is entered.Example: Tiller, Veronica E. VelardeField #. Field name: 4. VariantNamesSemantics: All known variant forms of an authors name or alternative names used by the authorin publication.Input rules: Variants can include: Alternate order (normal order as opposed to inverted order),Abbreviated forms of names, alternate spellings, pseudonyms, nicknames. Enter with samespelling as source. Use normal casing, not all lowercase or uppercase. Enter both orders for everyname. Press F7 to create a new entry.Example: Veronica TillerField #. Field name: 5. SourcesUsedSemantics: Resource consulted to verify authorized name.Input rules: Use full title of resource, a comma, and year of access.Example: Library of Congress Name Authority File, 20124. Sample recordsRecordID 1RecordDate 5/2/2012 22:47:30AuthorizedName Tiller, Veronica E. VelardeVariantNames Velarde, Veronica E.; Tiller, Veronica
Moore/ TXWI-A/pg.21; Veronica Tiller; Velarde, Veronica; Veronica VelardeSourcesUsed Library of Congress Name Authority File, 2012$RecordID 2RecordDate 5/2/2012 22:49:39AuthorizedName Kjellstrom, BjornVariantNames Kjellstrom, Gosta Ambjorn; Bjorn Kjellstrom; Gosta Ambjorn KjellstromSourcesUsed Library of Congress Name Authority File, 2012$RecordID 3RecordDate 5/2/2012 23:09:44AuthorizedName Dalrymple, ByronVariantNames Byron William Dalrymple; Byron W. Dalrymple; Dalrymple, Byron W.; Dalrymple, Byron WilliamSourcesUsed Library of Congress Name Authority File, 2012$RecordID 4RecordDate 5/2/2012 23:15:51AuthorizedName Gross, W. H.VariantNames Gross, Chip
Moore/ TXWI-A/pg.22; Chip Gross; Gross, Warren; Warren Gross; W.H. "Chip" Gross; Warren H. Gross; Gross, Warren H.SourcesUsed Library of Congress Name Authority File, 2012$RecordID 5RecordDate 5/2/2012 23:31:18AuthorizedName Grubbs, BruceVariantNames Bruce Grubbs; Broce O. Grubbs; Grubbs, Bruce O.SourcesUsed Library of Congress Name Authority File, 2012$