Designing and Implementing Search Solutions

1,497 views

Published on

Presentation at Gothenburg University, the 8th of March 2012 by Svetoslav Marinov, Tobias Berg and Björn Klockljung-Johansson.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,497
On SlideShare
0
From Embeds
0
Number of Embeds
221
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Designing and Implementing Search Solutions

  1. 1. Implementing and designing search solutions Gothenburg University – Gothenburg – 2012-03-08 © FINDWISE 2012
  2. 2. Agenda • Introduction to Findwise • Technical approach • DIY UX design • Research
  3. 3. About Findwise• Founded in 2005• Offices in Sweden, Denmark, Norway and Poland• 72 employees (February 2012)• Our objective is to be a leading provider of Findability solutions utilising the full potential of search technology to create customer business value
  4. 4. Technology independentCreating search-driven Findability solutions based on market-leadingcommercial and open source search technology platforms:  Autonomy IDOL  Microsoft (SharePoint and FAST Search products)  Google GSA  IBM ICA/OmniFind  LucidWorks  Apache Lucene/Solr (Open source)  and more…
  5. 5. Findability Challenges Employee productivity (DN article, March 2011): ”The effort to find the right information costs an average company 80,000 SEK per employee and year” Customer Service quality and efficiency (Accenture report, March 2011): “69% of agents dont have answers to help service customers” E-commerce conversion rate (Google survey, December 2010): “77% of those surveyed used search within an e-commerce website to find products”
  6. 6. Information overload?
  7. 7. A search engine alone is not enough
  8. 8. Technical approach
  9. 9. RE-USE
  10. 10. STANDARD
  11. 11. Standard architecture
  12. 12. Search core
  13. 13. Search core - overviewDocuments Inverted indexTitle: Brown fox Term DocumentsContent: The quickbrown fox jumps overthe lazy dog Tokenization … … StemmingAuthor: Tobias Berg Stop-word fox 1 … jump 1,2Title: My dog lazy 1Content: My old dogcannot jump anymore dog 1,2Author: SvetoslavMarinov tobias 1 berg 1 … …
  14. 14. Relevancy Retrieved documents Relevant documents • Precision – how many of the retrieved documents are relevant? • Recall – how many of the relevant documents were retrieved?
  15. 15. Relevancy Recallfind everything related to the query Goal - lemmatization Improve precision, - synonyms without sacrificing recall - wildcards - anti-phrasing - or-operator Precision find only entities related to the query - exact word matching - exact phrase matching - and-operator
  16. 16. Search core – relevance score • TF/IDF • Field length • Field weight • Title *2 • Author *4 • Content *1 • Freshness • …
  17. 17. Search Core • Optimized for full-text search{query} {result} • Sub-second responses • Tunable relevance • ScalableFind Scorematching documents • Configurable & Extendabledocuments
  18. 18. Standard architecture
  19. 19. Connectors
  20. 20. Connectors – fetch data Id Product Description Price name 1 Wheel Makes the bus go 45 Database round round round connector 2 Window A shield of 12 glass Id Book name Abstract Author 1 Ulysses Irish novel James Joyce Database 2 Crime and Russion Dostoevsky, connector Punishment novel Fyodor
  21. 21. Connector framework – code example public void execute() { //Insert code to fetch content } public void interrupt() { //Insert code to handle interrupt signal } public void init() { //Insert code to initialize connnector }
  22. 22. Connector Frameworks • Existing connectors • Re-usable • Configuration interfaceshttp://incubator.apache.org/connectors/ • Standardized implementationhttp://code.google.com/p/google-enterprise-connector-manager/
  23. 23. Standard architecture
  24. 24. Pipeline
  25. 25. Pipeline - overview • PDF/Office -> Text • Lemmatization • Language identification • NER • Phonetic search • Keyword extraction • External calls • …
  26. 26. Pipeline framework – code exampleprotected void addAction(Document doc) throws PipelineException { //Insert code doc.addField(“Title”,”Hello world!”);}protected void updateAction(Document doc) throws PipelineException { //Insert code addAction(item);}protected void deleteAction(Document doc) throws PipelineException { //Insert code}
  27. 27. NLP tools and approaches • Open source: GATE, OpenNLP, UIMA, StanfordNLP, Mallet, Apache Mahout • Proprietary: IBM LanguageWare • Own components: e.g. KeywordExtraction Service; LanguageIdentify • POS taggers – Hunpos, OpenNLP, Mallet • Dependency Parsers – MaltParser, StanfordParser • NER – rule-based + statistical models • Document summarization • Document clustering
  28. 28. Pipeline – configuration example
  29. 29. Pipeline frameworks • Re-usable stages http://www.openpipeline.com/ • Configuration interface • Focus on task Findwise Hydra http://www.pypes.org/
  30. 30. Putting it all together
  31. 31. What the frell is UX design?
  32. 32. What the frell is UX design? • Interaction design • Usability Engineering • Information Architecture • Visual Design
  33. 33. Findwise UX design principles Users want results Dialogue not monologue Participation builds trust Answer frequent questions Simple but powerful
  34. 34. Users want results
  35. 35. Dialogue not monologue
  36. 36. Participation builds trust
  37. 37. Answer frequent questions
  38. 38. Simple but powerful
  39. 39. Findwise UX design principles Users want results Dialogue not monologue Participation builds trust Answer frequent questions Simple but powerful
  40. 40. DIY UX design
  41. 41. DIY UX design Design research Analytics Usability tests Iterate!
  42. 42. Design research • Be easy to reach – keep contact • Let users requests guide you when prioritizing new features • Listen & try to discover the underlying problem • Try to find out what the user needs not what they say they want
  43. 43. Analytics • Web analytics • Search analytics • A/B testing
  44. 44. Usability tests • Test early - test often • Use sketches, paper prototypes, static prototypes and working prototypes! • Create real tasks or problems • Don’t ask them how they would want it • Test on friends and family or colleagues
  45. 45. Iterate!
  46. 46. Why UX design? • Improved requirements • Better feedback • Eliminate bias • Less development time
  47. 47. Summary • Listen & try to discover the underlying problem • Search analytics – Top queries • Do usability tests early & often • Iterate!
  48. 48. Research • Collaboration with Universities GU, Borås, KTH, Copenhangen U. • EU projects RUSHES • Master’s Thesis supervision Chalmers, KTH, Lund
  49. 49. Master’s Thesis projects • A way to test ideas • A way to recruit people • A way to cooperate with Universities • Keyword Extraction • Document Clustering • NER • Document summarization • Extracting structural information from text • Query log analysis
  50. 50. Resources - books • The design of everyday things • Don’t make me think • Search analytics for your site • ManifoldCF in Action • Taming Text
  51. 51. Shameless plug twitter.com/findwise slideshare.net/findwise findabilityblog.se findwise.com
  52. 52. Tobias Berg tobias.berg@findwise.comThanks! Björn Klockljung Johansson bjorn.klockljung.johansson@findwise.com Svetoslav Marinov svetoslav.marinov@findwise.com

×