Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Google Is a Two Page Site

670 views

Published on

My talk on search and the Sitecore.ContentSearch API at the 2015 SUGNordic in Malmö.

Published in: Technology
  • Be the first to comment

Google Is a Two Page Site

  1. 1. Google Is Just a Two Page SiteRelevant Results with Sitecore.ContentSearch Martina Helene Welander Technical Consulting Engineer, Sitecore
  2. 2. Speaker • Technical Consulting Engineer at Sitecore • Community and Information Enthusiast • Ecosystem Sites with Dnepropetrovsk Team Martina Helene Welander
  3. 3. Hi! • Martina Welander • Technical Consulting Engineer • Ecosystem sites • mhwelander.net / @mhwelander
  4. 4. Speaker • Technical Consulting Engineer at Sitecore • Community and Information Enthusiast • Ecosystem Sites with Dnepropetrovsk Team • @mhwelander / mhwelander.net Martina Helene Welander
  5. 5. Speaker
  6. 6. In the direction of awesome, that’s where
  7. 7. …let’s do search!
  8. 8. Can haz knowledge?
  9. 9. Google Is Just a Two Page SiteRelevant Results with Sitecore.ContentSearch Martina Helene Welander Technical Consulting Engineer, Sitecore
  10. 10. “Google is simply a search box with a second page of results. And those results are from other sites!”
  11. 11. Lalala hello world examples lalala ten items in my tree!
  12. 12. Sitecore.ContentSearch 101
  13. 13. Sitecore 7
  14. 14. Search and index ALL the items * *
  15. 15. Search API (LINQ-based) Search Technology Provider (DLLs and Configuration) Search Technology API and Indexes IEnumerable<DocSearchResult>
  16. 16. var index = Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_master_index"); using (var context = index.CreateSearchContext()) { var query = context.GetQueryable<ResultItem>().Where(x => x.Title == "Hej"); var executedResults = query.GetResults(); myModel.myList = executedResults.Hits.Select(x => x.Document).ToList(); }
  17. 17. Where Sitecore adds value • Source content to index to strongly typed object – and back again! • You can actually index anything • Provider model – Solr, Lucene, Elastic Search, Azure Search • Provider-agnostic LINQ-based search API • Highly configurable
  18. 18. Sitecore.ContentSearch is an API
  19. 19. Where should I focus my efforts?
  20. 20. CONFIIIIIG!
  21. 21. Crawlers Mappers Converters Sitecore Field  Index Field  Object Property Analyzers Sitecore Field  Searchable Data Analyzer Wrappers
  22. 22. Back to Plain Ol’ Search Actually kind of difficult
  23. 23. It’s all about the Pentiums analyzers (Tokenizers and Filters)
  24. 24. Tokenizers
  25. 25. Hello my name is Martina “Hello”, “my”, “name”, “is”, “Martina”
  26. 26. Types of Tokenizer StandardTokenizer “My name is Martina”  “My”, “name”, “is”, “Martina” KeywordTokenizer “My name is Martina”  “My name is Martina” N-Gram Tokenizer (Min 4, Max 5) “sitecore” -> “site”, “itec”, “ecor”, “core”, “siteco”, “iteco” … etc
  27. 27. Filters
  28. 28. Examples of Filters • Standard Filter • (Snowball) Porter Stem Filter • Stop Filter • Synonym Filter • Keep Words Filter • Pattern Replace Filter ORDER MATTERS!
  29. 29. Indexing Process
  30. 30. Index
  31. 31. Query
  32. 32. Results
  33. 33. “name” ”Hello” “Hello, my name is Martina” “Martina” “my” Rebuild when analyser changes! Contains(“Hello, my name is Martina”)
  34. 34. Configuring a custom analyzer
  35. 35. Lucene – What does it look like?
  36. 36. Solr – What does it look like? <fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> </analyzer> </fieldType>
  37. 37. Previewing and help
  38. 38. 6492 12:54:21 INFO ExecuteQueryAgainstLucene (sitecore_master_index): content:make~0.7 title:make~0.7 content:new~0.7 title:new~0.7 content:item~0.7 title:item~0.7 - Filter : Debugging A Lucene-Based ContentSearch In Sitecore - Dan Cruickshank
  39. 39. My Super-Duper Analyzer
  40. 40. …which isn’t very special at all • Standard analyser • Standard filter • Porter Stem Filter • StopWords Filter • Synonym Filter (EXM / ECM, PXM / APS)* • Lowercase filter
  41. 41. The Query
  42. 42. What makes something relevant? (tf.idf) • tf – term frequency • Idf – inverse document frequency • coord - # of terms found in document • fieldNorm – field length
  43. 43. My fields • Title • Text • Byline • Keywords • Product
  44. 44. context.GetQueryable<ResultItem>() .Where(…)
  45. 45. .Filter() vs .Where()
  46. 46. #1 – Find me a match • Equals() • Contains() • StartsWith() .Where(x => x.ResultsTitle.Contains("scaling")) .Where(x => x["scaling"].Contains("scaling")) .Match() .EndsWith()
  47. 47. #2 – Slop and fuzziness! • Like() • Fuzzy search – fuzziness factor (float) • Phrase search – slop (int)
  48. 48. #3 – I love you, PredicateBuilder Expression<Func<ResultItem, bool>> predicate = PredicateBuilder.True<ResultItem>(); foreach (var word in list) { predicate = predicate.Or(x => x.Title.Contains(word); } False for ‘OR’, True for ‘AND’
  49. 49. #4 – Boost • At query time • At index time (type or field) • Rules-based
  50. 50. BOOST
  51. 51. BOOST
  52. 52. ~1000 real items storageType=“true”
  53. 53. Attempt #1: EVERYTHING
  54. 54. If the title… • Like phrase (with slop) • Contains phrase • Starts with phrase • Equals phrase If the content… • Like phrase (with slop) • Contains phrase • Starts with phrase • Equals phrase
  55. 55. Search: xDB Scaling
  56. 56. Search: Managing engagement plans
  57. 57. Search: Create engagement plans
  58. 58. A couple of important lessons • Whole Phrases vs Individual Terms • Boost() • Contains() / Equals()
  59. 59. Attempt #2: Phrase and terms
  60. 60. “engagement plan setup” OR “engagement” OR “plan” OR “setup”
  61. 61. “engagement” AND “plan” OR “engagement” AND “setup” OR “plan” AND “setup” OR “engagement” AND “plan” AND “setup”
  62. 62. Needs more boost
  63. 63. Attempt #3: Favouring titles
  64. 64. Sitecore 7 ContentSearch Tips - Matt Burke “Finding a user’s search term in the title or keywords of a document is probably more relevant than one where the term is only in the body”
  65. 65. My work in progress
  66. 66. If nothing is working, you probably didn’t rebuild your index
  67. 67. Search: xDB Scaling
  68. 68. Search: Manage engagement plans
  69. 69. Search: Create engagement plans
  70. 70. // TODO: On the plane home • Keywords • Location • Pinning exact title matches – “scaling” • Expected search phrases with boost – e.g. “scaling xDB”, “xDB scaling”, “xDB scaling options” xDB • Key Behaviour Cache – developer or editor? • Common searches
  71. 71. It’s not all queries and indexes • Vague titles are a bit of a nightmare • Review use of keywords in content • “I would never search for that!” • Continuous user testing and tuning
  72. 72. What I learned • It isn’t magic • Get to know the provider • Content and content structure matter • Search is actually quite hard
  73. 73. OrganizersSponsor Thanks to our… &…

×