REFINING THE GKEY SEARCH FOR BETTER KEYWORD RELEVANCE Jorge Cruz, Lynn Murray, and Sarah Haight Sanabria Southern Methodis...
The Problem
 
 
Other Problem Searches <ul><li>Little Women </li></ul><ul><li>Holy Bible </li></ul><ul><li>New York Times </li></ul>
About the GKEY search <ul><li>MARC fields can be given weights of 0-500 </li></ul><ul><li>Weights are assigned to field/su...
What we tried <ul><li>Restoring system defaults </li></ul><ul><li>Implementing suggested revisions from a past review of k...
What didn’t work… <ul><li>Giving multiple (or all) fields the same weight seemed to make the results equally irrelevant. <...
Developing our scheme <ul><li>We created the following principles to guide our keyword weight creation: </li></ul><ul><ul>...
5 Proposals <ul><li>Ranked 76 field/subfield combinations from least to most important </li></ul><ul><ul><li>Proposal 1: A...
5 Proposals <ul><ul><li>Proposal 3: 30 fields are weighted, most important fields in each category weighted heaviest </li>...
Testing the proposals <ul><li>Our test searches: </li></ul><ul><ul><li>Moby Dick </li></ul></ul><ul><ul><li>Little Women <...
Results <ul><li>Proposal 1 performed best, although we expected Proposal 5 to be best. </li></ul>
The new keyword search
What we learned <ul><li>Best to have unique value for each field/subfield. </li></ul><ul><li>Fields with no assigned weigh...
Know when to give up the quest
Questions? <ul><li>Slides and Excel spreadsheets are online at the conference website. </li></ul>
Upcoming SlideShare
Loading in …5
×

Refining the GKEY search for better keyword relevance

512 views

Published on

Presentation given at Ex Libris Users of North America conference 5/11/10. Focuses on revisions to keyword weighting to increase relevance of search results.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
512
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The actual novel does not appear in list until #11– also numbers 17, 42, 56. Top ten were all commentaries, and several did not include the terms “moby dick” in the title.
  • Over time, we had drifted pretty far from the system defaults. System defaults = select fields set to 100
  • *This makes logical sense…if you give everything the same weight, it defeats the purpose of relevance ranking. *Ex Libris advises against using 500 too often. It is the highest weight, and if you use it on fields that are not common in many records, you end up with skewed search results. Our results ended up being strange if more than a couple of fields had a weight of 500. In our final implementation, nothing has a 500 weight. 395 is the highest.
  • Otherwise, you end up really skewing the search results. Case in point: field 600a. It currently has a weight of 500; someone in the past obviously thought that name subjects were very important and should have the highest weights possible. But if you look at the field distribution, only about 15% percent of Bib records even have 600a’s! Without even doing any testing, this is probably telling us that the field 600a is currently over weighted.
  • 2: All series title fields have same weight, all main author fields have same weight, etc. Still in order with title most impt. 3: Highest weight assigned to most impt. Title fields, then most impt. Author fields, etc., so it does not ascend from bottom. The author’s name would come before an analytic title, for instance.
  • Refining the GKEY search for better keyword relevance

    1. 1. REFINING THE GKEY SEARCH FOR BETTER KEYWORD RELEVANCE Jorge Cruz, Lynn Murray, and Sarah Haight Sanabria Southern Methodist University
    2. 2. The Problem
    3. 5. Other Problem Searches <ul><li>Little Women </li></ul><ul><li>Holy Bible </li></ul><ul><li>New York Times </li></ul>
    4. 6. About the GKEY search <ul><li>MARC fields can be given weights of 0-500 </li></ul><ul><li>Weights are assigned to field/subfield combinations (245a, 245b, etc.) </li></ul><ul><li>We use automatic “AND” with relevance </li></ul>
    5. 7. What we tried <ul><li>Restoring system defaults </li></ul><ul><li>Implementing suggested revisions from a past review of keyword weights </li></ul>
    6. 8. What didn’t work… <ul><li>Giving multiple (or all) fields the same weight seemed to make the results equally irrelevant. </li></ul><ul><li>Using 500 indiscriminately </li></ul>
    7. 9. Developing our scheme <ul><li>We created the following principles to guide our keyword weight creation: </li></ul><ul><ul><li>Fields with higher distribution in the database should have higher weights </li></ul></ul><ul><ul><li>Rank the fields based on types: title, author, subject, contents </li></ul></ul><ul><ul><li>Within title: proper, uniform, alternate, analytic, preceding/succeeding, series </li></ul></ul>
    8. 10. 5 Proposals <ul><li>Ranked 76 field/subfield combinations from least to most important </li></ul><ul><ul><li>Proposal 1: Ascending from bottom, 5-unit increments </li></ul></ul><ul><ul><li>Proposal 2: Fields of same value have same weight, still ascending from bottom </li></ul></ul>
    9. 11. 5 Proposals <ul><ul><li>Proposal 3: 30 fields are weighted, most important fields in each category weighted heaviest </li></ul></ul><ul><ul><li>Proposal 4: 30 fields are weighted, ascending from bottom </li></ul></ul><ul><ul><li>Proposal 5: All fields are weighted, most important fields in each category weighted heaviest </li></ul></ul>
    10. 12. Testing the proposals <ul><li>Our test searches: </li></ul><ul><ul><li>Moby Dick </li></ul></ul><ul><ul><li>Little Women </li></ul></ul><ul><ul><li>Oil gas forms </li></ul></ul><ul><ul><li>Death penalty </li></ul></ul><ul><ul><li>Presbyterian church </li></ul></ul><ul><ul><li>Texas law forms </li></ul></ul><ul><ul><li>Texas legal forms </li></ul></ul><ul><ul><li>Holy Bible </li></ul></ul><ul><ul><li>New York Times </li></ul></ul>
    11. 13. Results <ul><li>Proposal 1 performed best, although we expected Proposal 5 to be best. </li></ul>
    12. 14. The new keyword search
    13. 15. What we learned <ul><li>Best to have unique value for each field/subfield. </li></ul><ul><li>Fields with no assigned weight sometimes have more weight than weighted fields. </li></ul><ul><li>Tailor your test searches to your collection. </li></ul><ul><ul><li>5-unit increments between weights are fine. </li></ul></ul>
    14. 16. Know when to give up the quest
    15. 17. Questions? <ul><li>Slides and Excel spreadsheets are online at the conference website. </li></ul>

    ×