Copyright (2012) Miles Kehoe/New Idea EngineeringInc
1. Join us right after the event at FirehouseGrill for a free drink, kindly provided byAvePoint and Rackspace! 1765 EastBayshore Road East Palo Alto, CA 94303(Next to Nordstrom Rack). Drinks to be provided by…..
2. Miles Kehoe Mark BennettNew Idea Engineering, Inc. New Idea Engineering, Inc.firstname.lastname@example.org email@example.com / 408-828-4592 408-446-3460 / 408-829-6513Chris Fernandez Evan SayerNew Idea Engineering, Inc. New Idea Engineering, Inc.firstname.lastname@example.org Evan.email@example.com / 650-279-8343 408-446-3460 /
3. AgendaIntroductions/ObjectivesChallenges of Enterprise SearchImproving Search QualityPressure Points of SearchImproving RelevancyKey Take-Aways
4. New Idea Engineering Inc. founded in 1996 ◦ The Business & Technology of SearchVendor-neutral search expertise: ◦ Commercial and Open Source Projects ◦ Evaluation, Selection, Implementation, Ongoing MgmtSearch analysts, developers, consultants, authors: ◦ Blog: www.enterprisesearchblog.com ◦ Presentations at SPTEchCon / ESS / KMWorld / … ◦ Authors of Professional Microsoft Search
5. …content, intent, relevancy
6. Different people havedifferent intentsThere are rarely asingle ‘right answer’Intents often requiremultiple iterations
7. …the big picture
8. Some Assembly Expected – often RequiredThe Platform Most search platforms are general purpose Vendors expect you to customize platform The OOB platform works*The Demo Sales staff work from a script Demos are scripted and use easy, clean data “That‟s the product I want!” A POC using your content is best criteria *“Any sufficiently advanced demo is indistinguishable from product”
10. But Search is Tough to get RightSearch touches everything: All repositories/formats Structured and unstructured Has to respect security Expected to co-exist with other repositories Many „near duplicate‟ documents Content changing in real time And it looks so easy on the „net & in the demo - and it has to have sub-second response!
11. Enterprise Search 101: A Primer
12. Simple Search ModelThe search User sees search like this: health Search Kernel Result List Did you mean Health Services 1. Health and smoking 2. High risk behaviors … As a search manager, you see the magic behind the curtain: Crawler/Indexer Search Index
13. Anatomy of a search engineInverted ‘Table Like’ StructureWord List for extracted fields (metadata)apple New printer ships Jan 20K PressRel.TXTautonomy iPhone top hit Dec 12K Music.PDFcisco 8 core chip Apr 128K News.DOCXdigitalgoogle Snow Leopard Dec 8K Jdbc:12345hewlett Win 8 Launch Aug 39K Info.XPSintelmicrosoft Content Repository
14. Site Design ◦ Good planning is a key step in search!Indexing ◦ Improve content prior to /during indexingQuery Pre-Processing ◦ Enhance the user queryMaximize Kernel Capability ◦ Improve content prior to indexingPost Processing ◦ Enhance result set
15. Site Design: Part 1 ◦ ◦ ◦ http://sales/eastern_region http://sales/western_region http://sales/international
16. Site Design: Part 2 ◦ ◦ ◦ ◦ ◦ ◦
17. The indexing processStart with a „key‟: a file name, URL, DB rowProcess the document: fetch the document identify the format convert into text recognize the language apply indexed synonyms extract entitiesFeed the doc into the indexer
18. Enhance the quality of the content ◦ Improve content prior to indexing ◦ Add synonym terms for indexing ◦ Provide/extract entities for facets ◦ Add content where it makes sense (Context of user and document) ◦ How? FS4SP pipeline; IDOL IDX; Exalead API… ◦ Remember: speed is of the essence!
19. Enhance the quality of the query ◦ Autocomplete ◦ Evaluate/parse the query Recognize any special format (name, product number) Inappropriate vocabulary? ◦ Expand query using platform query language ◦ Is it a best bet term? Think „Siri‟
20. Note: Scriptaculous.js includes this style of Autocomplete code .Pingar also includes this in their commercial library of tools
21. Use all that your platform supports: ◦ Spell check (custom dictionary) ◦ Thesaurus/Synonyms ◦ Stemming/Plurals ◦ Fielded search/relevanceCall out to likely federated sources
22. Processing results before the user sees them ◦ Eliminate or collapse duplicate documents ◦ Insert best bets, spell suggests ◦ Don‟t hesitate to insert your own smarts ◦ Assemble sources (tabular view, clusters) ◦ If it looks like: a person name, show contact info a part number, show a link to the product page a common term, offer to disambiguate
23. …the quality improvement cycle
24. Relevancy: The Process Identify the problem Test Results Diagnose the problem Tune Docs/Search
25. Identifying Problem Queries ◦ ◦ ◦ ◦ ◦ ◦
26. Diagnosing the Problem ◦ ◦ ◦ ◦ ◦
27. Emulate Google.com Simplify search form Use the context Regularly monitor and adjust search Actively encourage user feedback Encourage and recognize good metadata @ content creation
28. Add quality metadata tools Wide range of tools Costs vary based on solution ◦ Use „behavior-based‟ metadata ◦ Use custom-created taxonomies ◦ Use automated tools
29. Some solutions from NIE PartnersPingar: Automated taxonomy, entity extraction, summarization and moreWAND: Vertical market human created taxonomiesConcept Searching: Seeded automatic taxonomy with human validationCustom Taxonomies: Expert creates custom taxonomy for your company
30. Key Take-Aways Quantify the metric for your site(s) Encourage tagging at content creation Keep search and results simple Use the context (query cooking) Facilitate and take action on feedback Ongoing monitoring and tuning is required
31. Miles Kehoe Chris Fernandez Mark BennettNew Idea Engineering, Inc. New Idea Engineering, Inc. New Idea Engineering, Inc.firstname.lastname@example.org email@example.com firstname.lastname@example.org / 408-828-4592 408-446-3460 / 650-279-8343 408-446-3460 / 408-829-6513
32. Join us right after the event at FirehouseGrill for a free drink, kindly provided byAvePoint and Rackspace! 1765 EastBayshore Road East Palo Alto, CA 94303(Next to Nordstrom Rack). Drinks to be provided by…..