• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Spcua 2013 Alexey Kozhemiakin Enterprise Search

Spcua 2013 Alexey Kozhemiakin Enterprise Search



English version of my slides from SPCUA 2013

English version of my slides from SPCUA 2013



Total Views
Views on SlideShare
Embed Views



1 Embed 962

http://powersearching.wordpress.com 962



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Spcua 2013 Alexey Kozhemiakin Enterprise Search Spcua 2013 Alexey Kozhemiakin Enterprise Search Presentation Transcript

    • May 22nd 2013, KievEnterprise search portalsSharePoint 2013Alexey Kozhemiakin
    • May 22nd 2013, Kievor “How to make a coolsearch”Alexey Kozhemiakin
    • Who’s speaking to you?• Solution Architect @epam• Focusing on search• Sharepoint Search FAST/2010/2013• Apache Lucene, Solr, elasticsearch,Oracle Endeca…• http://powersearching.wordpress.com3
    • Agenda• Enterprise Search Portal• Insight into SP2013 Search• Key changes from SP2010• A bit of magic – relevancy calculation• Search governance, useful hint & tips4
    • Key search patterns• I know what I’m searching and where to find it• I know what I’m searching but don’t know whereto find it.• I don’t‘ know what I’m searching5http://aghy.hu/AghyBlog_EN/Lists/Posts/Post.aspx?ID=199
    • • Demand:• Fast growing enterprises• Zoo of internal systems• Solution:• “google” inside enterprise• Quick-wins for business:• Single point of smart search and information retrieval• Reduce search time by employee• Better inner communications and simplified reuse ofconent6Enterprise Search Portal
    • But after deployment…• «.. Search sucks»• Out of the box search knows nothing about you• «Typical But…• … Microsoft takes care of decent search algorithm»• … we’re not sure we can do better»• ... we don’t need search, everybody know where content is»• … make our search like in facebook/google/bing (instead ofrequirements)»7
    • Why it’s hard• Ambiguous short queries• Unstructured not optimized content• Different active vocabulary of content users andcreators• Limited resources ($), while in internet search:• Auto and manual testing of search quality (assessors)• Continuous improvement8
    • Search architecture inSP20139
    • Search in two phaseprocess• Matching – all docs with keywords• Linguistics: stemming, phonetics• Synonyms• Ranking• «Фичи»• TF-IDF, BM25• Вес полей• Тип файла• Дата изменения• Популярность• …10
    • Ranking in FAST• Linear combination of features11
    • Ranking in FAST• Impact of each component to final rank120100020003000400050006000700080001st 2nd 3rd 4thterm:fast term:search freshness static rank proximity
    • Migration FAST->SP201313
    • Ranking in SP201314
    • Ranking in SP2013• Default Relevancy Model• Two neural networks• Freshness in not included in ranking• Features15Type InstanceBM25 BM25Static UrlDepthBucketedStatic InternalFileTypeBucketedStatic LanguageStatic ClickDistanceStatic QueryLogClicksStatic QueryLogSkipsStatic LastClicksStatic EventRateMinSpan - soft TitleMinSpan - soft TitleMinSpan - soft TitleMinSpan - soft Content
    • Ranking in SP201316• Default relevancy model
    • Explain rank• /_layout/15/explainrank.aspx• rankdetail property17
    • Explain rank• Manual validation in excel18
    • 19
    • Search Governance1. Search analytics2. Fine tuning and adaptation3. Regular testing4. Security assessment5. Promotion whithin company6. Content optimization and basic SEO20
    • 1. Search analytics• Search analytics• Search analytics• Search analytics• Obey! Use Search analytics21
    • 1. Search analytics• OOTB in SP2013• Most popular queries• «No Results/abandoned» queries• 3rd party tools (Google Analytics, Omniture,WebTrends)• Measure search quality (!)• % click on results• Which results• Return after clicks• Session analysis• Query segmantation22
    • Query segmantation• Analyze and improve not only top N queries, butclasses of queries23
    • 2. Fine tuning• Authoritative Pages• Quick win – content source priority• Query Rules• Smart search for users• Synonyms• Separate mapping file• Expansion only• Termsets synonyms NOT working• Relevancy models24
    • Authoritative Pages• Impacts ClickDistance• ClickDistance, UrlDepth have hich impact on totalscore (see explain rank)• Configures in CA, CSOM25
    • Query Rules (Rule +Action)• The tool to make search smarter• Interactive feedback to user queries• Post processing of queries• Leverage navigational queries• …26
    • Condition for Query Rules• Query Matches Keyword Exactly• Advanced Query Text Match• Query Matches Dictionary Exactly• Query Contains Action Term• Query More Common in Source• Result Type Commonly Clicked27
    • Actions для Query Rules• Create and display a result block• Change ranked search results• Best Bets• XRANK• Works additive to total rank• Not explained in rankdetail• How to choose correct value?28
    • Templates forQueryRules• Typical navigational keywords from our portal• Software, soft, download, install• How to• Policy, Blog• Portal• Music, Video• Presentation, Documents, Report• Training, tutorial• Book, ebook• You will have different ones!29
    • Custom Rank Models• Сбор Query Judgments• Tune neural network coefficients using machinelearning• Gradient Descent, Lambda Rank• Microsoft.Office.Server.Search.RankerTuning30
    • Custom Rank Models• Modify manually new model or very simple (notdefault one!)• A/B testing of weights• Measure, measure: Precision, NDCG31
    • Custom Rank Models• Example of simple model – people search32
    • 3. Search quality testing• Why need? It’s your compass.• «Unit testing»• Periodical manual testing33
    • 4. Security «audit»• Search reveals breaches in security• Security by obscurity• Examples of queries:• «confidential»• Salaries, performance reviews• Solution – automatic monitoring of sensitivequeries34
    • 5. Adoption of content• Use with departments• Get help with search monitoring of their queries• Guideline to format content• Basic SEO• Titles• Friendly urls• Custom meta tags <meta name=…• Title, description• Custom Automatically appear in crawled properties35
    • 6. Promotion withincompany• Image – «you will find everything here»• Integrate with other portals• Propose Search as a serivce• Widget «Global search»• Badges, gamification36
    • Promotion• Social Best-bets37
    • Semantic search• Cannot be solved in general• Analytics + fine tuning• See practices above• NLP – question answering• Rocket science• English only• Part of speech tagging, dependency parsing• Stanford NLP, Open NLP, IR38
    • «References»• Patents - http://goo.gl/20sbR• Explain Rank page - http://goo.gl/o3ZmN• How SP2013 relevancy models works - http://goo.gl/arf0P• MS Enterprise Search approach - http://goo.gl/x8SDO• Customizing ranking models in SP 2013 - http://goo.gl/lBJAp39
    • May 22nd 2013, KievThanksSkype: Alexey_KozhemiakinEmail: Alexey.Kozhemiakin@gmail.comBlog: http://powersearching.wordpress.com40