SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
Google Is Just a Two Page
SiteRelevant Results with Sitecore.ContentSearch
Martina Helene Welander
Technical Consulting Engineer, Sitecore
2.
Speaker
• Technical Consulting Engineer at Sitecore
• Community and Information Enthusiast
• Ecosystem Sites with Dnepropetrovsk Team
Martina Helene
Welander
4.
Speaker
• Technical Consulting Engineer at Sitecore
• Community and Information Enthusiast
• Ecosystem Sites with Dnepropetrovsk Team
• @mhwelander / mhwelander.net
Martina Helene
Welander
15.
Search API
(LINQ-based)
Search Technology Provider
(DLLs and Configuration)
Search Technology API and Indexes
IEnumerable<DocSearchResult>
16.
var index =
Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_master_index");
using (var context = index.CreateSearchContext())
{
var query = context.GetQueryable<ResultItem>().Where(x => x.Title == "Hej");
var executedResults = query.GetResults();
myModel.myList = executedResults.Hits.Select(x => x.Document).ToList();
}
17.
Where Sitecore adds value
• Source content to index to strongly typed object – and back again!
• You can actually index anything
• Provider model – Solr, Lucene, Elastic Search, Azure Search
• Provider-agnostic LINQ-based search API
• Highly configurable
25.
Hello my name is
Martina
“Hello”, “my”, “name”, “is”,
“Martina”
26.
Types of Tokenizer
StandardTokenizer
“My name is Martina” “My”, “name”, “is”, “Martina”
KeywordTokenizer
“My name is Martina” “My name is Martina”
N-Gram Tokenizer (Min 4, Max 5)
“sitecore” -> “site”, “itec”, “ecor”, “core”, “siteco”, “iteco” … etc
42.
What makes something relevant? (tf.idf)
• tf – term frequency
• Idf – inverse document frequency
• coord - # of terms found in document
• fieldNorm – field length
43.
My fields
• Title
• Text
• Byline
• Keywords
• Product
48.
#3 – I love you, PredicateBuilder
Expression<Func<ResultItem, bool>> predicate =
PredicateBuilder.True<ResultItem>();
foreach (var word in list)
{
predicate = predicate.Or(x =>
x.Title.Contains(word);
}
False for ‘OR’,
True for ‘AND’
49.
#4 – Boost
• At query time
• At index time (type or field)
• Rules-based
54.
If the title…
• Like phrase (with slop)
• Contains phrase
• Starts with phrase
• Equals phrase
If the content…
• Like phrase (with slop)
• Contains phrase
• Starts with phrase
• Equals phrase
64.
Sitecore 7 ContentSearch Tips
- Matt Burke
“Finding a user’s search term in the title or
keywords of a document is probably more
relevant than one where the term is only in
the body”
70.
// TODO: On the plane home
• Keywords
• Location
• Pinning exact title matches – “scaling”
• Expected search phrases with boost – e.g. “scaling xDB”, “xDB
scaling”, “xDB scaling options”
xDB
• Key Behaviour Cache – developer or editor?
• Common searches
71.
It’s not all queries and indexes
• Vague titles are a bit of a nightmare
• Review use of keywords in content
• “I would never search for that!”
• Continuous user testing and tuning
72.
What I learned
• It isn’t magic
• Get to know the provider
• Content and content structure matter
• Search is actually quite hard