An analysis of web searches in a south african(final)

Uploaded on


  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • I wanted to find out if current theories of information seeking were valid in our local University of Cape Town Libraries OPAC. We have all read and heard the arguments for an against maintaining the tradition 2nd generation catalogues but these very catalogues are the basis for many of our discovery tools, product we all use like Primo and Encore. Are systems librarians wasting their time setting up complicated indexes and search options when one simple search box could suffice? Are cataloguers adding superfluous subject headings and other metadata entries?My paper analyses the use of an academic library catalogue in a South African context in an attempt to understand the information seeking behaviour of our users and to examine whether that behaviour matches trends reported in the literature. On Tuesday Michael Stephens in his Technolust workshop said “we don’t want to talk about SEARCH”. So here I am doing just that. He also said that for academic libraries, that print collections are no longer our core business. Both these points are true to a certain extent and I hope that this paper will uncover some of the problems we still face with regard to accessing scholarly collections. This is not one of Stephen Abrams 50 reasons not to change but an attempt to find solutions.
  • I will begin by explaining why I attempted this study.I will explain the methods I used to extract the data, and after presenting the results attempt to interprete them.I will finish off with a discussion and conclusion and maybe some more questions!
  • Online public access catalogues were originally designed for seeking print materials and access to physical collections in support of academic study is still important at the University of Cape Town Libraries, a fact borne out by circulation and OPAC query statistics.
  • Despite the growing number of electronic resources there is still a significant reliance on print and physical collections in some South African academic libraries and could be a caution that we need to take the print collection and its intellectual access more seriously for the forseeable future. While circulation figures initially rose in the mid 2000s they showed a very slight decline in the latter half of the decade. Over the past six years OPAC query statistics have remained fairly constant.A possible explanation of the lowish 2011 figure may be due to the major building operations in the Main Library during that year.
  •   This is a comparison over 6 years of the activity in the UCT OPACOverall, there has been a very slight decrease in the total number of OPAC queries in the years from 2006 to 2011, a trend nevertheless to be expected during a period of accelerated electronic resource acquisition at the University of Cape Town.  
  • Previous studies have questioned the role, effectiveness and relevance of the online catalogue and have argued that the importance of the OPAC will decrease. In a 2005 OCLC study, it was reported that students do not begin with the library catalogue, a large percentage first preferring to start with search engines. However, in a survey of library user studies, Thomas Mann found that although starting their searches on the internet, most users are actually using the physical library and more importantly, the print sources in them. Information seeking behaviour of library users has changed over the last decade and we are noting the principle of “least effort” as a characteristic of searchers who prefer keywords over subjects. Instant gratification has become an expectation of even people outside the Google generation. In her report for the Library of Congress, Karen Calhoun makes a recommendation to “abandon the attempt to do comprehensive subject analysis manually with LCSH in favour of subject keywords”, a suggestion rejected by Thomas Mann as having “serious negative consequences for the capacity of research libraries to promote scholarly research”. Mann makes an important distinction between scholars and “information seekers” and argues that research libraries would be doing their users a disservice by abandoning subject analysis. Michael Gorman also has opposed the Calhoun recommendation, stating that it would be a “scholarly catastrophe” and Sanford Berman’s despite his long standing criticism of Library of Congress Subject Headings has called the idea “bibliobarbarism”, arguing that proper subject analysis is fundamental to fully exposing a library’s collections. So ….. are South African academic users following the trends reported in the literature? I wanted to see what types of searches were being done and more importantly how much subject searching was being done.
  • UCTs Web Opac was impleminated in 1999 using the Aleph® ILS system from Ex Libris and although UCT is part of a consortium, it has a separate catalogue and maintains its own indexes and search options.  Other OPAC user studies in the literature have almost exclusively relied on something called Transactional Log Analysis (TLA) which is the analysis of actual web logs. This method has certain disadvantages. It needs system tools to clean the data and because of the vast volume of information in a web log, the usual methodology is to turn data logging on to record OPAC transactions for a relatively short period of time, usually days or maybe weeks. Using samples potentially, introduces variables which could affect the validity of the findings.  In contrast, since 2006, Calico’s Aleph® system has had the capability to store records of OPAC activities in the form of Oracle® data tables (the z69 Opac Events table), stored permanently on the database. A six year period of search data is currently available allowing the entire data set to be analysed over a substantially longer period than studies using TLA.No attempt was made to quantify the number of “hits” in the following reports, although it is possible to do so. The success of a search cannot always be measured by the presence or non presence of results. Nor was there an attempt to study or measure search success, nor measure user satisfaction. The difficulties of accurately assessing success have been amply documented in the literature. For instance the assumption that a zero result means an unsuccessful search is not always true. This is unlike Google who feel honour bound to give you “something” even if it isn’t what you want!
  • In Aleph®, “Search” commands are keyword searches while “Scan” commands are browse searches from alphabetical lists. This is typical of a second generation catalogue where a distinction is made between querying and browsing. The study also attempted to measure the use of tools such as My Library Card which allows self mediated transactions in the OPAC as well as the Help function. For the purposes of this research paper, the SDI profile, z39.50 server requests have been ignored as SDI has never been fully implemented and z39.50 requests are recent additions to the table not available before 2009. Disregarded too, for the purposes of this study are the Refine, Save and Cross sets options as these actions are performed after the Search is done.
  • The first report attempted to find out which types of searches (Keyword and Browse Searches) were used the most and whether there was any change during the period studied.All the indexes available as browse searches were listed and sorted by frequency of use Keyword searches were also analysed and ranked by order of useSelf mediated services in the OPAC and the Help Function   
  • Types of OPAC searches. The report attempted to find out which types of searches (both Scan (Browse) and Search (keyword)) were used the most and whether there was any change during the period studied.There is a clear preference for Searching over Browsing and more specifically as we shall see later on, for Basic searching. This may support principle of “least effort”. However, the Basic Search is the OPAC’s default, and may account for the large results, around one million per year, compared with 250 000 Browse commands in the same period. Browse searching typically needs the user to have a pre-knowledge of the vocabulary especially for subject searching where familiarity with Library of Congress Subject Headings is necessary.  The Google generation has “a strong preference for expressing themselves in natural language” but OPACs do not have the ability to assist users with converting natural language queries into the formal terms required by the system. From 2006 to 2011 there was a marked decrease in the use of the Browse search and an increase in general keyword search use. Keyword searches account for the largest percentage (over 85 percent in 2011) of searches in the OPAC.
  • Browse searchesIn order to discover how much subject searching is done as opposed to “known item” searches, first three groupings were made called “Subject searches”, “Known Item searches” and “Qualification metadata searches” . For Subject searches the LCSH, MeSH, local thesaurus subject headings, subject subdivisions and Keywords from subject headings were counted as “Subject searches”. Aleph® allows the keyword indexes to be browsed in a similar manner as a Headings list, therefore even though Keywords from subject is a keyword, its use in this context would imply that the user was aware of the ability to search subject headings.  For Known Item searches title, author, corporate author, etc were grouped. The third category for the purpose of this study, called “Qualification metadata”, is based on Karen Markey’s suggestion to add qualifiers or further attributes to OPAC design to assist users in their choice of documents.  The difficulty was to decide where to place general keywords (WRD) in these three categories. An eyeball analysis of the search text used by users showed a mixture of keywords making it impossible to predict what type of type of search (author, title or subject, for instance) was intended by the user. Therefore a fourth category was created for general keyword. As we shall see, the small number of general keyword searches was not enough to skew the results in the browse search results.
  • Title and author searches are the most well used, a result also found in similar OPAC studies.Shelf Mark searches are the fourth most used browse search which was surprising. A possible explanation may be that users are using this as a quick way to check availability once they have written down the code and not found it on the shelves?  
  • In the present study, title searches account for more than double any other type of browse search. Even though Subject (SUB) searches are the third most popular browse search, when all the searches are grouped in the manner described earlier, subject searching forms only 16 percent of the overall total. Known Item searches amount to 76 percent and Qualification metadata, eight percent. The low rate of use of several of the options such as DDC, imprint, LC subdivision, language code and publisher may indicate that users are unfamiliar with these concepts in a retrieval strategy. The fourth category for general keyword was insignificant and not enough to skew the results in the browse search results.
  • The difficulty of Subject searching particularly in a Browse search has been well documented and there is clear evidence here, showing that users are not aware of the structure and purpose of a Browse index, nor of the role of controlled vocabulary in the OPAC.
  • 3. In the same way as the previous report for Browse searches, All the indexes available as keyword searches were listed and sorted by frequency of use. The same 3 categories of search were used again: i.e “Subject searches”, “Known Item searches” and “Qualification metadata searches” and General Keywords.
  • Not surprisingly GENERAL KEYWORDS were used the most (60.8 %)Almost 13 percent of the searches were title keyword searches and 12 percent were author keyword searches while only three percent were subject keyword searches.
  •  The preference for general keyword (almost 61 percent) – is not surprising as the results follow findings in other similar studies.  For this type of search at a purely systems level, we are not able to even guess what the user was searching for. Catalogue use studies have confirmed the use of the OPAC primarily as a finding aid. This study confirms that in the case of keyword searches, 30 percent are for known items. However when general keywords are analysed, there is no way of knowing what the intended outcome of search is, making it difficult to determine whether users were searching for subject terms or known items.Still the statistics show that deliberate subject searching is the lowest type of search.   
  • The last and final report attempted to quantify the use of Help and “My Library Card”, the OPAC term for the user’s library account. Loan history, fines, current loans and holds lists are accessible to the user as well as the ability to change certain personal information such as passwords and contact details and renew items on loan. The results reflect a growing use of self mediated services. Use of “My Library Card”, the user’s interface with their library account and online renewals increased over the five year period, a trend one might expect in a user population increasingly comfortable with technology. However with a student population of over 23 000 and a staff of 4 500, one would expect that this service would be used more. One of the obstacles to its accessibility may be that the OPAC login is not the same as the campus network login at UCT.  The use of Help in 2010 is almost double that of the previous year. This may be due to the fact that Aleph® was upgraded to a new version at the beginning of 2010 and has different features from previous versions. 
  • The results in the data raise an important question. Is subject searching not important to the user or simply too difficult?If the argument for the need for subject analysis being important for scholarship is valid, the results of this study are cause for concern. Subject searching is the least frequent type of search, supporting the views in current literature that searches for known items are easier for the user. In support of subject analysis Thomas Mann states “Cataloguing and classification do provide the recognition mechanisms that scholarship requires for systematic literature retrieval in book collections.”As a performance indicator desirable in an information literate student, the Association of College and Research Libraries has as one of the outcomes of its information literacy standards for higher education: “Selects controlled vocabulary specific to the discipline or information retrieval source”.OPAC system design seems to have too much of a “cognitive load” on the user. Should we recognise today’s “scholarly population” and simplify the OPAC with their needs in mind?  Karen Markey makes three suggestions with regard to the OPAC’s redesign. The first throws out “outdated Boolean based catalogs” in favour of post Boolean ones. Secondly, Markey suggests the use of other enhancements to the bibliographic record; that of tables of contents, book indexes and abstracts. Her third recommendation is the addition of metadata which would qualify the record with attributes such as discipline, intended audience, historical period and genre. Adding user defined metadata is already possible in the form of tagging in some databases. Tagging could provide an alternative to formal subject analysis, a fact that many library system vendors have recognised.
  • The two opposing themes in the literature about the future of the OPAC have implications for this paper. On the one hand allegations that the OPAC has had its day; the “lipstick on a pig” concept, first coined by North Carolina State University's Andrew Pace argues that efforts should be concentrated on developing Web 2.0 tools. The other view suggests that the lessons learned from OPAC studies should be used to improve current OPAC developments. Both these ideas have been incorporated to some degree in Discovery Tools such as Primo, in use at UCT Libraries. Costly software and human intellectual input make catalogues very expensive to maintain. The task should be to understand the needs of our users and analyse their information-seeking behaviour. We should be giving more thought to Ian Rowlands’ idea of a “library having a department devoted to the evaluation of the user”.  This paper has attempted to investigate the “user-catalogue interaction” from a systems perspective, but the cognitive processes in the user’s’ query formulation also need to be understood. A future study could look more closely at the actual search language terms used and investigate the appropriateness of the indexes in which searches are being done.  Until the OPAC is redesigned with the user’s needs in mind, the present drawbacks of the OPAC need to be overcome by librarians in the form of thorough user instruction and training and an ongoing dialogue with their users. I came across a lovely term the other day, that of “the Hacker Ethics” which describes the developing generation of people who play, explore and make social connections, overturning centuries of values, selfless work, duty and guilt (do you hear Librarian here?). The word “to hack” means to tinker or rework, and the author Woody Evans, in his book Building Library 3.0, recommends that libraries become outposts of “meta-hacking”. So with a hacker-informed ethos, are we not supporting information-skimming and shallow reading information seeking culture? As librarians we can’t turn stop change but maybe we can ensure that some of our values are get incorporated into the new world that awaits us. We cannot “half-serve” our patrons – we have to meet them half way, but how do we at the same time ensure that we do not commit “bibliobarbarism” as coined by that bibliographic warrior Sanford Berman. 
  • References Anatell, K. & Huang, J. 2008. Subject searching success: transaction logs, patron perceptions and implications for library instruction. Reference & user services quarterly. 48(1):68-76. [Online]. Available: [2010, September 25]. Association of College and Research Libraries. 2000. Information literacy competencies for higher education. [Online]. Available: [2010, November, 21]. Asunka, S. et al. 2008. Understanding academic information seeking habits through analysis of web server files: the case of the teachers college web site. Journal of academic librarianship. 33(1):33-45. [Online]. Available from EBSCOHost: Academic Search Premier at [2010, September 25]. Bates, M.J. 2003. Task force recommendation 2.3 research and design review: improving user access to library catalog and portal information: final report (version 3). Prepared for the Library of Congress. [Online]. Available: [2010, October 31]. Berman, S. 2006. Quoted by Norman Oder. The end of LC subject headings? Report on catalog revamp provokes strong reactions. Library journal. 131(9):14-15. [Online]. Available at EBSCOHost: Academic Search Premier at [2010, July 23]. Breeding, M. 2007. The birth of a new generation of library interfaces. Computers in libraries. 27(9):34-37. [Online]. Available from EBSCOHost: Academic Search Premier at [2010, October 23]. Calhoun, K. 2006. The changing nature of the catalog and its integration with other discovery tools: Final report: March 17, 2006. [Online]. Available: [2010, August 12]. Chudnov, D. 2006. On the clarifying of a few things. One big library. [Online]. Available: [2010, November 1]. Davidson, L.A. 1999. Libraries and their OPACs lose out to the competition. Library computing. 18(4):279-283. [Online]. Available from Proquest: Proquest Education Journals at [2010, October 23]. de Jager, K. 2007. Opening the library catalogue up to the web: a view from South Africa. Information development. 23(1):48-53. [Online ]. Available from SwetsWise at [2010, September 27]. De Rosa, C. and others. 2005. Perceptions of libraries and information resources: a report to the OCLC membership. Dublin,Ohio: OCLC Online Computer Library Center. [Online]. Available: [2010, September 27]. Dempsey, L. 2006. The library catalogue in the new discovery environment: some thoughts. Ariadne. 48. [Online]. Available: [2010, October 23].  Ex Libris. 2009. Z69 opac events. Version 20.0 April 30, 2009. [Online]. Availablefrom Ex Libris Documentation Center at [2010, November 4]. Evans, W. 2009. Building library 3.0: issues in creating a culture of participation. Oxford: Chandos.Gorman, M. 2006. Quoted by Norman Oder. The end of LC subject headings? Report on catalog revamp provokes strong reactions. Library journal. 131(9):14-15. [Online]. Available at EBSCOHost: Academic Search Premier at [2010, July 23]. Hancock-Beaulieu, M. 1989. Online catalogues: a case for the user. In The online catalogue: developments and directions. C.R. Hildreth, Ed. London: Library Association. 25-46. Hildreth, C.R. 1989. General introduction; OPAC research: laying the groundwork for future OPAC design. In The online catalogue: developments and directions. C.R. Hildreth, Ed. London: Library Association. 1-24. Jansen, B.J. 2006. Search log analysis: what it is, what’s been done, how to do it. Library & information science research. 28:407-432. [Online]. Available from Elsevier ScienceDirect at [2010, September 25]. Kurth, M. 1993. The limits and limitations of transactional log analysis. Library hi tech. 11(20):98-104.  Lau, E.P. and Goh, D.H. 2006. In search of query patterns: a case study of a university OPAC. Information processing and management. 42:1316-1329. [Online]. Available from Elsevier ScienceDirect at[2010, September 25] Lynch, C.A. 1989. Applications of performance and usage data for online catalogues. In The online catalogue: developments and directions. C.R. Hildreth, Ed. London: Library Association. 127-141. Mann, T. 2005. Survey of library user studies. [Online]. Available: [2010, November 2]. Mann, T. 2008a. The changing nature of the catalog and its integration with other discovery tools. Final report: March 17, 2006. Prepared for the Library of Congress by Karen Calhoun. A critical review. Journal of library metadata. [Online]. 8(2):169-180. Available: [2010, July 21]. Mann, T. 2008b. Will Google’s keyword searching eliminate the need for LC cataloguing and classification? Journal of library metadata. 8(2):159-168. [Online]. Available: [2010, July 21]. Markey, K. 2007. The online library catalog: paradise lost and paradise regained? D-Lib magazine. 13(1/2):1-15. [Online]. Available: [2010, September 25].  Ortiz-Repiso, V. and others. 2006. How researchers are using the OPAC of the Spanish Council for Scientific Research Library Network. The electronic library. 24(2):190-211. [Online]. Available from Emerald at [2010, September 25]. Papadakis, I., Stefanidakis, M. and Tzali, A. 2009. Semantic navigating an OPAC by subject headings meta-information. The electronic library. 27(5):779-790. [Online]. [Online]. Available from Emerald at [2010, September 25]. Peters, T.A. 1989. When smart people fail: an analysis of the transactional log of an online public access catalog. Journal of academic librarianship. 15(5):267-273. [Online]. Available at EBSCOHost: Academic Search Premier at [2010, September 25] Rowlands, I. and others. 2008. The Google generation: the information behaviour of the researcher of the future.Aslib proceedings. New information perspectives. 60(4):290-310. [Online]. Available from Emerald at [2010, September 25] Tague, J.M. 1989. Negotiation at the OPAC interface. In The online catalogue: developments and directions. C.R. Hildreth, Ed. London: Library Association. 47-60. Tennant, R. 2004. A bibliographic metadata infrastructure for the 21st century.Library hi tech. 22(2):175-181. [Online]. Author’s version available: [2010, October 24]. Tennant, R. 2005. “Lipstick on a pig”. Library journal. 130(7):34. [Online]. Available at EBSCOHost: Academic Search Premier at [2010, October 23]. Tennant, R. 2007. Demise of the local catalog. Library journal. 132(12):26. [Online]. Available at EBSCOHost: Academic Search Premier at [2010, October 23].  Villén-Rueda, L., Senso, J.A. & de Moya-Anegón, F. 2007. The use of OPAC in a large academic library: a transactional log analysis study of subject searching. Journal of academic librarianship. 33(3):327-337. [Online]. Avaiable at [2010, September 25]. 


  • 1. Introduction and BackgroundMethodology and Interpretation of the resultsDiscussion and Conclusion
  • 2. Library Circulation statistics and OPAC transactions are notsignificantly decreasing
  • 3. Circulation of items 2000-2010 Web OPAC transactions 2006-2010 Year Number of loans Year Opac 2000 373 773 transactions 2001 420 622 2006 1 769 627 2002 501 339 2007 1 646 879 2003 563 122 2008 1 557 088 2004 542 154 2005 508 255 2009 1 545 516 2006 444 740 2010 1 615 857 2007 436 320 2011 1 537 283 2008 415 995 2009 410 616 2010 424 703 2011 381 815
  • 4. Total number of OPAC transactions 2006-2011
  • 5. • Why study OPAC use? Questionable role of the OPAC in terms of relevance, use and value• Are there new ways of information seeking and are they changing the way patrons are searching the OPAC?• How seriously should we consider calls to abandon LCSH cataloguing?• What about the “classical functions of bibliographic control”?• Are South African students following the same searching behaviour patterns shown elsewhere?
  • 6. • UCT implemented the Web OPAC in 1999 (Aleph® ILS system from Ex Libris)• Since 2006 OPAC search records have been stored as Oracle tables• Transactional Log Analysis (TLA) was rejected as a tool for data analysis in favour of SQL and other reporting tools• No attempt was made to study or measure search success, nor measure user satisfaction
  • 7. Events that are registered in the Z69 (Web OPAC events) Oracle table (Ex Libris, 2009):• Search Command - Multi field (find-a)• Search Command - Basic search (find-b)• Search Command - CCL (find-c)• Search Command - Advanced (find-d)• Search Command - Multi base (find-m)• Scan• Refine Search• Cross sets• My Library Card• Help• SDI Profile• Save• Z39 Server Search request• Z39 Server scan requestSearch = Keyword searchScan = Alphabetical Browse search
  • 8. Description of the 4 Reports:1. Types of OPAC searches2. Browse Searches3. Keyword Searches4. Self mediated services in the OPAC (My Library Card) and the Help function
  • 9. 2006 2011 Browse Browse 24% 14%Keyword Searches Keyword Searches 76% 86% Searching and Browsing 2006 vs 2011
  • 10. Types of browse searchTitleAuthorISSN SubjectsISBN Known ItemsJournal title QualificationAuthor & titleCorporate authors MetadataKeywords from author General KeywordsSystem numberImprintWords from titleSeriesPublisherCorporate authorsKeywords from authorPlace of publicationKeywords from place of publicationKeywords from publisherMeSH subjects, Subject, LC subject , Keywords from subject, Local thesaurus, LC subject subdivisionCourse codeLocationDepartmentShelf markKeywords from language codeKeywords from yearDewey classification numberGeneral keyword
  • 11. Top ten Web OPAC Browse searchesType of browse search No. %Title 745 910 44.3Author 360 174 21.4Subject 248 010 14.7Shelf mark 120 180 7.1Journal title 115 976 6.9Author & title 24 050 1.4LC subject 18 656 1.1Course code 14 834 0.9System number 5 450 0.3ISBN 5 314 0.3
  • 12. OPAC Browse SearchesKnown Item Searches 76% General keyword 0% Qualification metadata 8% Subject Searches 16%
  • 13. "icts impact" AND "user" "information technology impact" AND "libraries" "internet" AND "information user" "internet" AND "library" "is branding evil" "issues in Diagnosis" "jazz" and "south africa" "jim goes to joburg"Actual Subject Browse searches in the OPAC showing inappropriatekeyword and Boolean searching
  • 14. Type of keyword searchWordsW-titlesW-authors 1. SubjectsISSN 2. Known ItemsISBN 3. Qualification metadataBarcode 4. General keywordsW-seriesW-publishersW-Unif.TitlesW-place of publW-subjectsW-ToCW-sublib.W-yearW-formatW-language codeW-thesesW-notesW-material typeW-collectionW-shelf
  • 15. Top ten web OPAC Keyword searchesType of keyword search Total PercentageWords 3 989 423 60.8W-titles 844 494 12.9W-authors 809 381 12.3W-subjects 215 963 3.3ISSN 193 505 2.9W-sublib. 119 463 1.8W-year 80 819 1.2W-format 77 804 1.2W-language code 77 027 1.2ISBN 73 683 1.1
  • 16. OPAC Keyword Searches Subject searches 3% Known Item Searches 30% General Keywords 61%Qualification Metadata 6%
  • 17. Self Mediated services35,000 200630,000 200725,000 200820,000 200915,000 201010,000 5,000 2011 0 Help Function My Library card
  • 18. • The study supports the trends in the literature which show decreasing use of subject searching in favour of keywords• What is the role and importance of subject searching ? For whom?• OPAC is rigid and unforgiving for untrained searchers• OPACs still reflect 1.0 design in interface and ability• Solutions?
  • 19. • User studies• User instruction• “Hacker ethics” (Evans, W. 2009)• Bibliobarbarism? (Berman, S. 2006)