Your SlideShare is downloading. ×
0
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Content Findability in a Portable Content World
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Content Findability in a Portable Content World

1,935

Published on

What makes information worth finding? A discussion of the joys and perils of subject access, by Lise Kreps, taxonomist and librarian. Presented at the March 2008 Content Convergence and Integration …

What makes information worth finding? A discussion of the joys and perils of subject access, by Lise Kreps, taxonomist and librarian. Presented at the March 2008 Content Convergence and Integration Conference in Vancouver Canada. For more information, see my website, www.relevantinfoservices.com.

Published in: Economy & Finance, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,935
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Content Findability in a Portable Content World Lise Kreps, M.S.L.S. Relevant Information Services [email_address]
    • 2. Prologue: Who am I? <ul><li>Master of Library Science, 1987 </li></ul><ul><ul><li>Academic & public librarian </li></ul></ul><ul><ul><li>Taught at University of Washington’s iSchool </li></ul></ul><ul><li>20 years in technical documentation, usability and e-commerce </li></ul><ul><li>Software manual & online Help indexing </li></ul><ul><li>Cataloging books, images, and audio </li></ul><ul><ul><li>Amazon, Microsoft, Corbis, National Public Radio </li></ul></ul>
    • 3. Content Findability in a Portable Content World <ul><li>Act I: What makes information worth finding? </li></ul><ul><li>Act II: What’s it all about? </li></ul><ul><li>Act III: Too much of a good thing? </li></ul><ul><li>Act IV: Can’t machines do this? </li></ul><ul><li>Act V: What’s findability worth to you? </li></ul>
    • 4. Act I: What makes information worth finding? <ul><li>It satisfies my need well enough </li></ul><ul><li>Not more trouble than it’s worth to get it </li></ul><ul><li>Sounds simple, eh? </li></ul><ul><li>I know what I want, so why doesn’t it just magically appear? </li></ul><ul><li>Let’s look more closely… </li></ul>
    • 5. The information satisfies my need well enough <ul><li>The info I need -- not someone else needs </li></ul><ul><li>…for my specific purpose </li></ul><ul><li>… at this particular time </li></ul><ul><li>…in my particular context </li></ul>
    • 6. The information satisfies my need well enough <ul><li>Contains enough useful info to be worth my while </li></ul><ul><li>From a source I trust </li></ul><ul><li>In language or style appropriate for my need </li></ul><ul><li>In a format I can use for this need </li></ul><ul><li>No legal or financial barriers to my using it </li></ul>
    • 7. The information satisfies my need well enough <ul><li>I didn’t miss anything too important </li></ul><ul><li>In the Library & Information Science world, this is called Recall : </li></ul><ul><li>Number of relevant items retrieved [divided by the] Total number of relevant items available </li></ul><ul><li>“ Do I think I got enough of the good stuff that’s probably out there?” </li></ul>
    • 8. Not more trouble than it’s worth to get the information <ul><li>Understands my question </li></ul><ul><ul><li>Search interpreter (human or computer) speaks my language at my level </li></ul></ul><ul><ul><li>Doesn’t make me guess or learn its terminology </li></ul></ul><ul><ul><li>Doesn’t give me results that seem unrelated to my question </li></ul></ul>
    • 9. Not more trouble than it’s worth <ul><li>Helps me make good choices </li></ul><ul><ul><li>I can tell what each menu item means and how it differs from its neighbours </li></ul></ul><ul><ul><li>Asks me to clarify my intention (“did you mean… or …”) </li></ul></ul><ul><ul><li>Shows other useful search terms within good items I find </li></ul></ul><ul><ul><li>Offers useful ways to change and narrow my search </li></ul></ul><ul><ul><li>Let’s shop for shoes on LandsEnd … </li></ul></ul><ul><ul><ul><li>And narrow our choices by </li></ul></ul></ul><ul><ul><ul><ul><li>Women’s </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Sandals </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Leather etc. </li></ul></ul></ul></ul></ul>
    • 10. Not more trouble than it’s worth <ul><li>Most of all: I didn’t have to wade through too much JUNK </li></ul><ul><li>In the InfoSci world, this is called Precision : </li></ul><ul><li>Number of relevant items retrieved [divided by the] Total number of items retrieved </li></ul><ul><li>“ Out of all the stuff I got, how much of it was what I really wanted ?” </li></ul><ul><li>“ Info-Noise:” the biggest barrier to findability </li></ul>
    • 11. Retrieval Effectiveness <ul><li>High Recall + High Precision = </li></ul><ul><li>High Information Retrieval Effectiveness = </li></ul><ul><li>I found all of and only the info worth finding </li></ul><ul><li>But here are the Gotchas… </li></ul>
    • 12. Gotchas… <ul><li># 1: You can’t have it both ways. </li></ul><ul><ul><li>Recall and Precision are inversely related: </li></ul></ul><ul><ul><li>Better Recall = worse Precision, and vice versa </li></ul></ul><ul><li># 2: You don’t know what you’re missing. </li></ul><ul><ul><li>In the real world, Recall is hard to assess; </li></ul></ul><ul><ul><li>you may never know what relevant information you didn’t find. </li></ul></ul>
    • 13. Gotchas… <ul><li># 3: Size matters. </li></ul><ul><ul><li>The way you seek information changes depending on how much info you think you’re dealing with </li></ul></ul><ul><li># 4: You don’t know what you want... </li></ul><ul><ul><li>Until you know what your choices are. </li></ul></ul><ul><ul><ul><li>Finding out what’s available redefines your information need. </li></ul></ul></ul><ul><ul><li>Good searching is iterative </li></ul></ul>
    • 14. What’s a searcher to do? <ul><li>In the InfoGlut world, we care most about Precision </li></ul><ul><li>Make educated guesses about </li></ul><ul><ul><li>what is in the collection </li></ul></ul><ul><ul><li>whether we’re missing something important </li></ul></ul><ul><li>Refine our search strategies, until we </li></ul><ul><li>Get “enough” good results for this information need…and then we quit </li></ul>
    • 15. What’s a content producer to do? <ul><li>Make your content very smart about how it presents itself </li></ul><ul><li>So it is findable in the contexts where it is most useful, and </li></ul><ul><li>It doesn’t become more of the InfoNoise </li></ul>
    • 16. Act II: What’s it all about? [email_address]
    • 17. Tags <ul><ul><li>Smart content uses metadata tags, or database fields </li></ul></ul><ul><ul><li>Tags contain information such as the content’s </li></ul></ul><ul><ul><ul><li>Creator </li></ul></ul></ul><ul><ul><ul><li>Date of creation </li></ul></ul></ul><ul><ul><ul><li>Format </li></ul></ul></ul><ul><ul><ul><li>Location or destination </li></ul></ul></ul><ul><ul><ul><li>Title </li></ul></ul></ul><ul><ul><ul><li>Subject area </li></ul></ul></ul><ul><ul><li>Title might not be descriptive, e.g. </li></ul></ul><ul><ul><ul><li>Metaphoric or idiomatic: “Your new bundle of joy” </li></ul></ul></ul><ul><ul><ul><li>Generated from filename: 2008030601.jpg </li></ul></ul></ul>
    • 18. Subject area categorization <ul><li>“ Aboutness” </li></ul><ul><li>Sometimes called “keywords,” </li></ul><ul><li>Often the most useful “findability” access point </li></ul><ul><li>Subjective: aboutness is in the eye of the beholder </li></ul><ul><li>Each item is usually “about” multiple subjects </li></ul><ul><li>What do you think this image is about? </li></ul>
    • 19. What do the Corbis catalogers think it’s about? <ul><li>About 20 keywords, for </li></ul><ul><li>“ Foreground” subjects </li></ul><ul><ul><li>Fishing boat </li></ul></ul><ul><ul><li>Harbor </li></ul></ul><ul><ul><li>Ocean </li></ul></ul><ul><li>Implied subjects </li></ul><ul><ul><li>Marine scenes </li></ul></ul><ul><ul><li>Industry </li></ul></ul><ul><ul><li>Travel </li></ul></ul><ul><li>Geographic location </li></ul><ul><ul><li>Goose Cove </li></ul></ul><ul><ul><li>Newfoundland </li></ul></ul><ul><ul><li>Canada </li></ul></ul><ul><li>Image composition attributes </li></ul><ul><ul><li>Nobody </li></ul></ul><ul><ul><li>Reflection </li></ul></ul><ul><li>“ Emotional” attributes </li></ul><ul><ul><li>Serenity </li></ul></ul><ul><ul><li>Simplicity </li></ul></ul>
    • 20. Weighting <ul><li>Are all these keywords equally important? </li></ul><ul><li>Which are most important? </li></ul><ul><li>Best practice: “weight” each “aboutness” tag </li></ul><ul><li>Items with “high aboutness” get ranked higher in search results for that keyword </li></ul>
    • 21. Size (still) matters <ul><li>If you had just one bookcase, you could organize it by colour like this </li></ul><ul><li>But imagine if a public library was like this Unshelved comic </li></ul><ul><li>As collection size increases, you need an increasingly complex system of subject categorization </li></ul>
    • 22. Size (still) matters <ul><li>The Problem of the World’s Biggest Bookstore (Amazon.com) </li></ul><ul><ul><li>Subject headings converge from multiple sources </li></ul></ul><ul><ul><ul><li>Publishers of all sizes </li></ul></ul></ul><ul><ul><ul><li>Library of Congress </li></ul></ul></ul><ul><ul><ul><li>Users’ tags </li></ul></ul></ul><ul><ul><li>“ Education” is okay for some small publishers, but </li></ul></ul><ul><ul><li>Not usually specific enough for the Amazon Universe </li></ul></ul>
    • 23. What words should I use for “aboutness” keywords? <ul><li>Prominent or frequent location </li></ul><ul><ul><li>Featured in title, headings, description or summary </li></ul></ul><ul><ul><li>Appears frequently in the content </li></ul></ul><ul><ul><li>“ Foreground” or main subject of image </li></ul></ul><ul><li>High semantic value </li></ul><ul><ul><li>Nouns (“snow”) </li></ul></ul><ul><ul><li>Gerund verbs for activities (“skiing”) </li></ul></ul><ul><ul><li>Short modifier phrases (“cross-country skis”) </li></ul></ul>
    • 24. What words should I use for “aboutness” keywords? <ul><li>Differentiates this content from other content </li></ul><ul><li>Users want this keyword </li></ul><ul><ul><li>Appears frequently in user search logs </li></ul></ul><ul><ul><li>Often suggested by users </li></ul></ul><ul><li>Similar content uses this keyword </li></ul><ul><ul><li>Competitors’ websites </li></ul></ul><ul><ul><li>Published thesauri </li></ul></ul>
    • 25. Which keywords have I already used? <ul><li>You need to be able to </li></ul><ul><ul><li>Browse all keywords as alphabetical list </li></ul></ul><ul><ul><li>Use this list when tagging new content </li></ul></ul><ul><ul><li>Edit the keywords -- both in the list and in the content </li></ul></ul><ul><li>Tagging consistently increases Precision </li></ul>
    • 26. Big 3 Problems Inherent In Language <ul><li>OK now I have a list. Am I done yet? </li></ul><ul><ul><li>No. Uncontrolled lists like this, and folksonomies also, do not handle the… </li></ul></ul><ul><li>Big 3 Problems Inherent In Language: </li></ul><ul><ul><li>Equivalent relationships </li></ul></ul><ul><ul><li>Homonyms (look the same but aren’t) </li></ul></ul><ul><ul><li>Hierarchical relationships and other related concepts </li></ul></ul>
    • 27. Equivalent relationships <ul><li>In a book index, these are “See” references </li></ul><ul><li>Decide on a preferred form of the keyword, and lead the other forms to that </li></ul><ul><li>Word variations and synonyms are the most common </li></ul>
    • 28. Equivalent relationships <ul><li>Word variations </li></ul><ul><ul><li>Spelling variations </li></ul></ul><ul><ul><ul><li>color = colour </li></ul></ul></ul><ul><ul><ul><li>Chanuka = Hanukkah </li></ul></ul></ul><ul><ul><li>Word ending variations (word stemming) </li></ul></ul><ul><ul><ul><li>Canad* = Canada, Canadian </li></ul></ul></ul><ul><ul><ul><li>Immigra* = immigrant, immigrate, immigration </li></ul></ul></ul><ul><ul><li>Plural and tense variations </li></ul></ul><ul><ul><ul><li>goose = geese </li></ul></ul></ul><ul><ul><ul><li>run = ran, running </li></ul></ul></ul>
    • 29. Equivalent relationships <ul><li>Synonyms </li></ul><ul><ul><li>baby = infant </li></ul></ul><ul><ul><li>purchasing = buying </li></ul></ul><ul><ul><li>pupil = student – but not if it’s Pupil (Eye) </li></ul></ul><ul><li>Equivalency control increases Recall </li></ul>
    • 30. Homonyms <ul><li>Words that are spelled alike but have different meanings </li></ul><ul><li>Disambiguate by appending clarifier terms </li></ul><ul><ul><li>Turkey (Bird), Turkey (Meat), or Turkey (Country) </li></ul></ul><ul><ul><li>Play (Dramatic work), Play (Imaginative activity), or Play (Sports activity) </li></ul></ul><ul><li>Can also ask searcher to choose one (“Did you mean… or …”) </li></ul><ul><li>Homonym control increases Precision </li></ul><ul><li>Now you have a “Controlled Vocabulary” </li></ul><ul><li>But you’re still not done yet… </li></ul>
    • 31. Hierarchies <ul><li>Grouping related categories together, e.g. </li></ul><ul><ul><li>Restaurant menus </li></ul></ul><ul><ul><li>Yellow Pages </li></ul></ul><ul><ul><li>File folders in hanging files in filing cabinets </li></ul></ul><ul><ul><li>Command menus in software applications </li></ul></ul><ul><ul><li>“ Browse” trees on e-commerce websites </li></ul></ul><ul><li>Especially handy for browsing if you’re not sure how to describe (or spell) what you want </li></ul>
    • 32. Hierarchies <ul><li>Broader/narrower (parent/child) concepts </li></ul><ul><ul><ul><li>Instrumental music > Piano sonatas </li></ul></ul></ul><ul><ul><ul><li>Canada > British Columbia > Vancouver > Burnaby > Capitol Hill </li></ul></ul></ul><ul><ul><li>Broader term should retrieve all its child terms </li></ul></ul><ul><ul><ul><li>Birds = Sparrows + Penguins + Ostriches etc. </li></ul></ul></ul><ul><ul><li>Or, if too many results, narrow search by selecting one or more child terms </li></ul></ul><ul><ul><li>Child term may have multiple parents (polyhierarchy) </li></ul></ul><ul><ul><li>In a book index, these are “main entries” and “subentries” </li></ul></ul><ul><li>Controlled Vocabulary + Hierarchy = Taxonomy </li></ul>
    • 33. Hierarchies <ul><li>Related concepts (“cousins”) </li></ul><ul><ul><li>Not (usually) broader/narrower </li></ul></ul><ul><ul><li>In a book index, these are “See also” s </li></ul></ul><ul><ul><ul><li>Parenting See also Child development </li></ul></ul></ul><ul><ul><ul><li>School supplies See also Office supplies </li></ul></ul></ul><ul><ul><li>Searching one keyword should suggest the other keyword but not automatically retrieve its results </li></ul></ul><ul><li>Taxonomy + Related concepts = Thesaurus </li></ul>
    • 34. Hierarchies <ul><li>Scope note </li></ul><ul><ul><li>Tells taggers and searchers where this concept stops and other concepts begin </li></ul></ul><ul><ul><li>Often differentiates related concepts </li></ul></ul><ul><ul><ul><li>Medieval. Use for European history of the 5 th through 15 th centuries. For earlier periods, consider Classical antiquity. For later periods, consider Renaissance. </li></ul></ul></ul><ul><ul><ul><li>Office supplies. Use only for business contexts. For educational or home contexts, use School supplies. </li></ul></ul></ul><ul><ul><li>Thesaurus + Scope notes + further rules for how to apply your keywords = Ontology </li></ul></ul>
    • 35. Act III: Too much of a good thing? [email_address]
    • 36. The Perils of Polylingual Hierarchy, Or, Will It Play in Paris? <ul><li>“Grandchild” terms may not always fit the “grandparent” categories </li></ul><ul><li>Hierarchies are different in other languages </li></ul><ul><li>Two real-life examples </li></ul><ul><ul><li>Part I: “Not the Comfy Chair!” </li></ul></ul><ul><ul><li>Part II: The Event of Dessert </li></ul></ul>
    • 37. “Not the Comfy Chair!” <ul><li>Furniture </li></ul><ul><ul><li>Chairs </li></ul></ul><ul><ul><ul><li>Dining chairs </li></ul></ul></ul><ul><ul><ul><li>Armchairs </li></ul></ul></ul><ul><li>But what about Dentist chairs? Electric chairs? Thrones? They’re chairs but not domestic Furniture </li></ul>
    • 38. “Not the Comfy Chair!” <ul><li>In European languages, there is no general “Chair” </li></ul><ul><ul><li>Only approximately “Comfy chair” and “Uncomfy chair” </li></ul></ul><ul><ul><li>“ Comfy” may have arms, upholstery, be in living room </li></ul></ul><ul><ul><li>“ Uncomfy” may be armless, unupholstered, in kitchen </li></ul></ul><ul><ul><li>But not always… </li></ul></ul><ul><li>So is this chair comfy? </li></ul>
    • 39. The Event of Dessert <ul><li>Wedding cake is not really a Dessert </li></ul><ul><li>Brownies may be Snacks </li></ul><ul><li>Ice cream cones are definitely Snacks </li></ul><ul><li>Civilised countries that have cake at Tea time prefer cheese or fruit for dessert </li></ul>
    • 40. The Event of Dessert <ul><ul><li>Foods </li></ul></ul><ul><ul><ul><li>Dessert </li></ul></ul></ul><ul><ul><ul><ul><li>Cake </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Wedding cake? </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Ice cream </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Ice cream cones? </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Brownies? </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Cheese?? </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Quark??? </li></ul></ul></ul></ul></ul>
    • 41. The Event of Dessert <ul><li>Quark is a soft cheese </li></ul><ul><li>In Germany they eat Quark with Fruit for Dessert… </li></ul><ul><li>…but they also eat Quark on Toast for Breakfast . </li></ul>
    • 42. The Event of Dessert <ul><li>Dessert moved from a Food to an Event </li></ul><ul><ul><li>Marked by tableware (plate, fork) </li></ul></ul><ul><li>The sweet Dessert foods moved to Foods > Sweets </li></ul><ul><li>Here we have Brownie as Snack… </li></ul><ul><li>… and here we have Brownie as Dessert </li></ul><ul><li>Dessert foods without tableware are Snacks </li></ul><ul><ul><li>Snacks and Breakfast are also Events </li></ul></ul>
    • 43. How deep should I get? <ul><li>How far down the hierarchy should go on making narrower terms? </li></ul><ul><li>In the InfoSci World, this is called Specificity </li></ul><ul><li>Keywords should be specific enough to </li></ul><ul><ul><li>Make useful distinctions between items </li></ul></ul><ul><ul><li>Accurately describe each item </li></ul></ul><ul><ul><li>Not split hairs and make useless distinctions </li></ul></ul><ul><li>Specificity increases Precision </li></ul>
    • 44. How deep should I get? <ul><li>The level of specificity you need depends on </li></ul><ul><ul><li>Subject domain (type of information) </li></ul></ul><ul><ul><ul><li>More technical domains -> more specificity </li></ul></ul></ul><ul><ul><li>Collection size (size matters again!) </li></ul></ul><ul><ul><ul><li>As your collection grows -> more specific keywords </li></ul></ul></ul><ul><ul><ul><li>Otherwise, too many items share a particular keyword </li></ul></ul></ul><ul><ul><ul><ul><li>Precision plummets </li></ul></ul></ul></ul><ul><ul><ul><li>Often a problem with folksonomies, which tend to use a lot of broad keywords </li></ul></ul></ul>
    • 45. The more the merrier? <ul><li>If a few keywords are good for findability, then a lot of keywords must be better, right? </li></ul><ul><li>In the InfoSci world, this is called Exhaustivity: </li></ul><ul><ul><li>How many different keywords each item gets </li></ul></ul><ul><li>Tip: when you’re stretching for keywords that aren’t much “about” this item, it’s time to stop plastering keywords </li></ul><ul><ul><li>Otherwise, your Precision tanks </li></ul></ul><ul><li>Again, often a problem with folksonomies </li></ul>
    • 46. Will my keywords play nicely with others? <ul><li>Good “aboutness” keywords from a consistent taxonomy can integrate your content very well </li></ul><ul><li>This is the biggest reason to invest in a taxonomy </li></ul><ul><li>You can use your “aboutness” tags to automatically: </li></ul><ul><ul><li>Retrieve relevant images to accompany text </li></ul></ul><ul><ul><li>Suppress inappropriate images </li></ul></ul><ul><ul><li>Suggest appropriate products to accompany an article </li></ul></ul><ul><ul><li>Offer highly-relevant related topics </li></ul></ul><ul><ul><li>Sensitively combine professional- with user-generated content </li></ul></ul><ul><ul><li>And that’s not all… </li></ul></ul>
    • 47. Will my keywords play nicely with others? <ul><li>You can use your “aboutness” tags to automatically: </li></ul><ul><ul><li>Generate your website’s browse menus </li></ul></ul><ul><ul><li>Categorize search results into narrower categories </li></ul></ul><ul><ul><li>Increase users’ “personalization” experience, by offering them tags to </li></ul></ul><ul><ul><ul><li>Identify their personal attributes in their website profiles </li></ul></ul></ul><ul><ul><ul><li>Subscribe to feeds of new content that will interest them </li></ul></ul></ul><ul><ul><ul><li>Suggest topics most relevant to them </li></ul></ul></ul><ul><ul><li>Make your “tag cloud” retrieve more relevant results </li></ul></ul><ul><ul><li>Improve findability! </li></ul></ul>
    • 48. What happens when worlds collide? <ul><li>In the Portable Content world, collections of different content often merge </li></ul><ul><ul><ul><li>Acquiring new collections </li></ul></ul></ul><ul><ul><ul><li>Selling your collection to others </li></ul></ul></ul><ul><ul><ul><li>Continuously incorporating new content from suppliers or users </li></ul></ul></ul>
    • 49. What happens when worlds collide? <ul><li>Keywords from different content collections often don’t merge well, because the collections differ in: </li></ul><ul><ul><li>Uncontrolled vs. controlled vocabularies </li></ul></ul><ul><ul><ul><li>Preferred term to use </li></ul></ul></ul><ul><ul><ul><li>Synonym control </li></ul></ul></ul><ul><ul><ul><li>Word variation control </li></ul></ul></ul><ul><ul><li>Hierarchy construction </li></ul></ul><ul><ul><ul><li>Decisions about what belongs with what </li></ul></ul></ul>
    • 50. What happens when worlds collide? <ul><li>Keywords from different content collections often don’t merge well, because the collections differ in: </li></ul><ul><ul><li>Specificity levels </li></ul></ul><ul><ul><ul><li>Collection size </li></ul></ul></ul><ul><ul><ul><li>Subject area domain </li></ul></ul></ul><ul><ul><li>Exhaustivity </li></ul></ul><ul><ul><ul><li>Tagging quality standards </li></ul></ul></ul><ul><li>And when your collection grows, your own keywords will need to get more specific </li></ul>
    • 51. What’s a content producer to do? <ul><li>Accept others’ keywords in special metadata tags </li></ul><ul><ul><li>Review these and “map” them to your keywords </li></ul></ul><ul><li>Understand that your keywords will need to change </li></ul><ul><li>Publish your keywords to your suppliers </li></ul><ul><li>Expose your keywords to your users as suggestions for their own tagging </li></ul>
    • 52. What’s a content producer to do? <ul><li>Standardize on a publicly-available thesaurus, e.g. </li></ul><ul><ul><li>Library of Congress subject headings </li></ul></ul><ul><ul><li>National Library of Medicine’s subject headings (MeSH) </li></ul></ul><ul><ul><li>Getty Art & Architecture Thesaurus </li></ul></ul><ul><ul><li>If you can find one that matches your content and meets your users’ search needs </li></ul></ul><ul><ul><ul><li>Your competitors won’t sell you theirs </li></ul></ul></ul>
    • 53. Act IV: Can’t machines do this? [email_address]
    • 54. If I can search the full text, why bother with keywords? <ul><li>Full-text search generally can’t cope with </li></ul><ul><ul><li>Synonyms </li></ul></ul><ul><ul><li>Homonyms </li></ul></ul><ul><ul><li>Trivial occurrences (low “aboutness”) </li></ul></ul><ul><ul><li>Inferences (high “aboutness” not explicit in the text), e.g. </li></ul></ul><ul><ul><ul><li>Intended audience </li></ul></ul></ul><ul><ul><ul><li>Prerequisite knowledge </li></ul></ul></ul><ul><ul><ul><ul><li>African rainforest animals (Which countries? Which animals?) </li></ul></ul></ul></ul><ul><ul><li>Non-textual content (images, audio, video, etc.) </li></ul></ul><ul><li>Online Help indexes vs. full-text search </li></ul><ul><ul><li>Indexes are selective </li></ul></ul><ul><ul><ul><li>Like travel guide rather than phone book </li></ul></ul></ul><ul><ul><li>In my usability tests, users got to the best answers faster via the index </li></ul></ul>
    • 55. Can computers automatically generate keywords and tag content? <ul><li>Uses linguistic analysis mathematical rules to churn through text and try to </li></ul><ul><ul><li>Comprehend all the “aboutnesses” and </li></ul></ul><ul><ul><li>Categorize the content </li></ul></ul><ul><li>A huge, complex and difficult field of Information Science </li></ul><ul><ul><li>Analysis rules are different for each subject area domain </li></ul></ul><ul><ul><li>There are no “magic” fits-all, out-of-the-box solutions </li></ul></ul><ul><li>How do they do it? Here are a couple of methods </li></ul>
    • 56. Term Frequency-Inverse Document Frequency (TF-IDF) <ul><li>Term occurs frequently within a document = more “aboutness”, right? </li></ul><ul><li>But if that term occurs in lots of your documents, it’s not a good discriminator for finding the most relevant documents; e.g. </li></ul><ul><ul><li>“ Software” may be </li></ul></ul><ul><ul><ul><li>rare on a travel website, but </li></ul></ul></ul><ul><ul><ul><li>on nearly every page of a technical publisher’s website </li></ul></ul></ul><ul><li>TF-IDF is a statistical weighting method </li></ul><ul><li>Determines a term’s relative importance within a document and within the collection of documents </li></ul><ul><ul><li>If term is frequent in doc AND rare in collection , then </li></ul></ul><ul><ul><li>High TF-IDF (high “aboutness” and good discriminator) </li></ul></ul><ul><li>For each term, calculates numerical values (vector space) </li></ul><ul><li>Analyses compare these values to other documents’ values to retrieve similar documents </li></ul>
    • 57. Term Frequency-Inverse Document Frequency (TF-IDF) <ul><ul><li>Some drawbacks of TF-IDF </li></ul></ul><ul><ul><ul><li>Needs lots of sophisticated programming to be smart about your particular content collection and subject area domain </li></ul></ul></ul><ul><ul><ul><ul><li>Dictionaries of stop words, word variants, phrases, synonyms, etc. </li></ul></ul></ul></ul><ul><ul><ul><li>Can miss a low-occurring term that is crucial to this document, e.g. </li></ul></ul></ul><ul><ul><ul><ul><li>Article on baby-proofing your home </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Mentions “danger,” “injuries,” and “stairs” only once, but they are the main point of the article </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Does not mention “safety” or “falling” – important concepts to capture </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Does not mention “toddlers” or “parenting” – the implied subject and audience of the article </li></ul></ul></ul></ul><ul><ul><ul><ul><li>A human indexer would immediately catch these </li></ul></ul></ul></ul>
    • 58. Inductive Learning Algorithms <ul><li>Humans teach computer what “aboutness” looks like </li></ul><ul><li>For each term, professional experts hand-tag a set of “good example” docs </li></ul><ul><li>Use mathematical linguistic analysis to compare new docs to “good example” docs for similarity and categorization </li></ul><ul><li>More expensive than straight TF-IDF, but </li></ul><ul><li>Better results </li></ul><ul><li>Good at capturing concrete concepts (“auto repair”) </li></ul><ul><li>Poor at capturing implied concepts (intended audience) </li></ul>
    • 59. Does clickstream = “aboutness”? <ul><li>Are pages “about” the same thing if users: </li></ul><ul><ul><li>Click through them in the same sequence? </li></ul></ul><ul><ul><ul><li>“ Customers who viewed this also viewed…” </li></ul></ul></ul><ul><ul><li>Buy the same items on them? </li></ul></ul><ul><ul><ul><li>“ Customers who bought this also bought…” </li></ul></ul></ul><ul><ul><li>Write reviews of them? </li></ul></ul><ul><ul><li>Add them to a “recommended” list? </li></ul></ul><ul><ul><li>Tag them with the same tag? </li></ul></ul><ul><li>Let’s look at Amazon again… </li></ul><ul><ul><li>How relevant are each of this item’s “related” items? </li></ul></ul><ul><ul><li>How relevant are the professional and user tags? </li></ul></ul>
    • 60. Does clickstream = “aboutness”? <ul><li>Info that users consciously connect is more likely to be related than passive clickstream trails </li></ul><ul><ul><li>The more effort they put into categorization, the better the categorization is likely to be </li></ul></ul><ul><li>But “Related to” ≠ “about” the same subject </li></ul><ul><li>Connections or tags very useful to one person ≠ tags useful to everyone; e.g. users’ personal tags like </li></ul><ul><ul><li>“ Me” or “Home” or “Cynthia” (recipient of this gift) </li></ul></ul>
    • 61. Act V: What’s findability worth to you? [email_address]
    • 62. Human effort is expensive <ul><li>Effort “up front”: expense to producer </li></ul><ul><ul><li>Creating and maintaining thesaurus </li></ul></ul><ul><ul><li>Tagging by trained staff </li></ul></ul><ul><ul><li>And/or designing (and redesigning) smart automatic categorization and retrieval systems </li></ul></ul><ul><ul><li>Better findability = happier customers </li></ul></ul>
    • 63. Human effort is expensive <ul><li>How much money are you willing to put into your content’s findability? </li></ul><ul><ul><li>Not all content is equally worth the money </li></ul></ul><ul><ul><ul><li>Professionally- vs. user-generated content </li></ul></ul></ul><ul><ul><ul><li>“ Personalized” content vs. general content </li></ul></ul></ul><ul><ul><ul><li>New content vs. old </li></ul></ul></ul><ul><ul><ul><li>“ Push” content you most want to sell </li></ul></ul></ul><ul><ul><li>Be careful of “pushing” irrelevant content and </li></ul></ul><ul><ul><li>Losing your customers’ confidence </li></ul></ul>
    • 64. Human effort is expensive <ul><li>Effort “at the end”: expense to customers </li></ul><ul><ul><li>Users do the tagging: e.g. folksonomies </li></ul></ul><ul><ul><li>Users slog through junk (low Precision) </li></ul></ul><ul><ul><li>Only works in contexts where users are motivated enough to volunteer their time and effort </li></ul></ul><ul><li>Find your balance </li></ul>
    • 65. Epilogue: Q & A <ul><li>Questions? Comments? </li></ul><ul><li>Further questions or comments? Want a copy of this presentation? Email [email_address] </li></ul><ul><li>Thank you! </li></ul>

    ×