Making Search Better: Managing Relevancy

465 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
465
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Copyright (2012) Miles Kehoe/New Idea EngineeringInc
  • Making Search Better: Managing Relevancy

    1. 1. Join us right after the event at FirehouseGrill for a free drink, kindly provided byAvePoint and Rackspace! 1765 EastBayshore Road East Palo Alto, CA 94303(Next to Nordstrom Rack). Drinks to be provided by…..
    2. 2. Miles Kehoe Mark BennettNew Idea Engineering, Inc. New Idea Engineering, Inc.miles.kehoe@ideaeng.com mbennett@ideaeng.com408-446-3460 / 408-828-4592 408-446-3460 / 408-829-6513Chris Fernandez Evan SayerNew Idea Engineering, Inc. New Idea Engineering, Inc.chris.fernandez@ideaeng.com Evan.sayer@ideaeng.com408-446-3460 / 650-279-8343 408-446-3460 /
    3. 3. AgendaIntroductions/ObjectivesChallenges of Enterprise SearchImproving Search QualityPressure Points of SearchImproving RelevancyKey Take-Aways
    4. 4. New Idea Engineering Inc. founded in 1996 ◦ The Business & Technology of SearchVendor-neutral search expertise: ◦ Commercial and Open Source Projects ◦ Evaluation, Selection, Implementation, Ongoing MgmtSearch analysts, developers, consultants, authors: ◦ Blog: www.enterprisesearchblog.com ◦ Presentations at SPTEchCon / ESS / KMWorld / … ◦ Authors of Professional Microsoft Search
    5. 5. …content, intent, relevancy
    6. 6. Different people havedifferent intentsThere are rarely asingle ‘right answer’Intents often requiremultiple iterations
    7. 7. …the big picture
    8. 8. Some Assembly Expected – often RequiredThe Platform Most search platforms are general purpose Vendors expect you to customize platform The OOB platform works*The Demo Sales staff work from a script Demos are scripted and use easy, clean data “That‟s the product I want!” A POC using your content is best criteria *“Any sufficiently advanced demo is indistinguishable from product”
    9. 9. User Requirements Drive Product CapabilityEarly search: exact matchUsers pushed for (and got): Plurals/Stemming Soundex Document filters Fielded search Boolean Operators & advanced query syntax Synonyms and thesaurus support Security Facets Navigation …
    10. 10. But Search is Tough to get RightSearch touches everything: All repositories/formats Structured and unstructured Has to respect security Expected to co-exist with other repositories Many „near duplicate‟ documents Content changing in real time And it looks so easy on the „net & in the demo - and it has to have sub-second response!
    11. 11. Enterprise Search 101: A Primer
    12. 12. Simple Search ModelThe search User sees search like this: health Search Kernel Result List Did you mean Health Services 1. Health and smoking 2. High risk behaviors … As a search manager, you see the magic behind the curtain: Crawler/Indexer Search Index
    13. 13. Anatomy of a search engineInverted ‘Table Like’ StructureWord List for extracted fields (metadata)apple New printer ships Jan 20K PressRel.TXTautonomy iPhone top hit Dec 12K Music.PDFcisco 8 core chip Apr 128K News.DOCXdigitalgoogle Snow Leopard Dec 8K Jdbc:12345hewlett Win 8 Launch Aug 39K Info.XPSintelmicrosoft Content Repository
    14. 14. Site Design ◦ Good planning is a key step in search!Indexing ◦ Improve content prior to /during indexingQuery Pre-Processing ◦ Enhance the user queryMaximize Kernel Capability ◦ Improve content prior to indexingPost Processing ◦ Enhance result set
    15. 15. Site Design: Part 1 ◦ ◦ ◦ http://sales/eastern_region http://sales/western_region http://sales/international
    16. 16. Site Design: Part 2 ◦ ◦ ◦ ◦ ◦ ◦
    17. 17. The indexing processStart with a „key‟: a file name, URL, DB rowProcess the document: fetch the document identify the format convert into text recognize the language apply indexed synonyms extract entitiesFeed the doc into the indexer
    18. 18. Enhance the quality of the content ◦ Improve content prior to indexing ◦ Add synonym terms for indexing ◦ Provide/extract entities for facets ◦ Add content where it makes sense (Context of user and document) ◦ How? FS4SP pipeline; IDOL IDX; Exalead API… ◦ Remember: speed is of the essence!
    19. 19. Enhance the quality of the query ◦ Autocomplete ◦ Evaluate/parse the query  Recognize any special format (name, product number) Inappropriate vocabulary? ◦ Expand query using platform query language ◦ Is it a best bet term? Think „Siri‟
    20. 20. Note: Scriptaculous.js includes this style of Autocomplete code .Pingar also includes this in their commercial library of tools
    21. 21. Use all that your platform supports: ◦ Spell check (custom dictionary) ◦ Thesaurus/Synonyms ◦ Stemming/Plurals ◦ Fielded search/relevanceCall out to likely federated sources
    22. 22. Processing results before the user sees them ◦ Eliminate or collapse duplicate documents ◦ Insert best bets, spell suggests ◦ Don‟t hesitate to insert your own smarts ◦ Assemble sources (tabular view, clusters) ◦ If it looks like:  a person name, show contact info  a part number, show a link to the product page  a common term, offer to disambiguate
    23. 23. …the quality improvement cycle
    24. 24. Relevancy: The Process Identify the problem Test Results Diagnose the problem Tune Docs/Search
    25. 25. Identifying Problem Queries ◦ ◦ ◦ ◦ ◦ ◦
    26. 26. Diagnosing the Problem ◦ ◦ ◦ ◦ ◦
    27. 27. Emulate Google.com Simplify search form Use the context Regularly monitor and adjust search Actively encourage user feedback Encourage and recognize good metadata @ content creation
    28. 28. Add quality metadata tools Wide range of tools Costs vary based on solution ◦ Use „behavior-based‟ metadata ◦ Use custom-created taxonomies ◦ Use automated tools
    29. 29. Some solutions from NIE PartnersPingar: Automated taxonomy, entity extraction, summarization and moreWAND: Vertical market human created taxonomiesConcept Searching: Seeded automatic taxonomy with human validationCustom Taxonomies: Expert creates custom taxonomy for your company
    30. 30. Key Take-Aways Quantify the metric for your site(s) Encourage tagging at content creation Keep search and results simple Use the context (query cooking) Facilitate and take action on feedback Ongoing monitoring and tuning is required
    31. 31. Miles Kehoe Chris Fernandez Mark BennettNew Idea Engineering, Inc. New Idea Engineering, Inc. New Idea Engineering, Inc.miles.kehoe@ideaeng.com chris.fernandez@ideaeng.com mbennett@ideaeng.com408-446-3460 / 408-828-4592 408-446-3460 / 650-279-8343 408-446-3460 / 408-829-6513
    32. 32. Join us right after the event at FirehouseGrill for a free drink, kindly provided byAvePoint and Rackspace! 1765 EastBayshore Road East Palo Alto, CA 94303(Next to Nordstrom Rack). Drinks to be provided by…..

    ×