Se på hvabrukereharbehov for, hvilkeutfordringeroppstår I hverdagenHvilketekniskemuligheter/begrensingerertilgjengeligeHvilkemålharbedriften
Development EnvironmentOS: Windows Server 2008 R2 SP1CPU: 4 coresMemory: 8GB -> 16 GBDisk: Fast disksVisual Studio 2012SQL Server 2012 (Max server memory: 1500 MB)
Dedicatedsearch farm for 40 million searchableitems and 10 queries per secondFront end server to host yoursearch UIOne index server per 10 million items20 million items30 million items40 million itemsServer to host crawlingAnalytics processingCentral administration and othersharepointapplication servicesQuery and resultsprocessingSearchadministrationDocumentprocessingDatabase serverLoadbalanced front end and redundant admin and queryprocessingIndex replicas for redundancy and increasedthroughputExtra crawl component per 20 M items and redundancyCluster or mirrorthe database server for faulttoleranceMultiple data centers for disasterscenarioesFor advancedquery and resultprocessing, putComperio Front betweenyoursearchcenter and REST APIFor advancedcontentenrichment, deployyourcontentenrichment web services
FundingSystem requirements have increasedInfrastructureinvestmentsare massiveThere must be a significant PAIN to solveTimeTo analyse requirementsTo purchase and setuptheinfrastructureTo get to know all thenewstuffTo build and deployyourcustomizationsDocumentationWewereearlyadopters -> not much to findon Google, MSDN or TechnetNetworkKnowingsomeonewhoknowssomething…AutomationYouwill ned to re-install SharePointYouwill re-deployyoursolutionsAutospinstaller, customcmdlets and scriptsPerformanceCPU increased from 4 > 8 coresondevenvMemoryIncreased from 8 GB > 16 GB ondevenv (paging)Increased from 16 GB per SQL Server to 16 GB per database instanceDisk IOYouneedenough disk spindles to handle the IOYouneed to configureyour SAN correctOptoutofdynamic disk solutionLoadbalancerTurn ofsticky sessions and trust thedistributedcacheTest and tune timeoutsDistributed cacheConfigureenoughmemoryAnti virusTurn it ofExcludetheindex folder ++
The purpose of the search capacity test is to validate the documented and undocumented soft boundaries in Microsoft SharePoint Server 2013, with focus on maximum number of documents in search partitionmaximum number of documents in a crawl databasearchitecture for crawling a large number of file sharesgetting an initial picture of search and crawl performance Crawled 30 million documents from file shares via symbolic links on crawler server. Tested 20,000 searches per day and used top 300 used searchqueries from searchstatistics.4 server farm with 2 indexpartitions, 2 crawl component and 1 crawl database.
Slide shows actualnumberswith 31 million itemsindexed
Display templates control which managed properties are shown in the search results, and how they appear in the Web Part. Each display template is made of two files: an HTML version of the display template that you can edit in your HTML editor, and a .js file that SharePoint uses.Control templates determine the overall structure of how the results are presented. Includes lists, lists with paging, and slide shows.Item templates determine how each result in the set is displayed. Includes images, text, video, and other items.Group templates is special for search results and is used for html surrounding grouped itemsHover templates is used for presenting more information on a search result hit. A item template and a hover template have a connection
Hvordan display templates erbygdoppControlGroupItem
APIEnkeltgrensesnitt for å spørre SP uten å ha SP-bibliotekLett å teste og konsumere
Query rules conditionsQuery matches string exactlyQuery contains stringQuery matches dictionary exactlyQuery more common in sourceResult type commonly clickedAdvanced query matching
Whatshould be indexed?Content sources and start addressesContent types / file typesCrawl rules for exclusionsWhat parts oftheindexedcontentshould be searchable?Full-textindexFieldedsearchRefinersWhatshould be displayed?In searchsuggestionsIn searchresultsIn searchflyouts
Transcript of "Share point 2013 enterprise search (public)"
What we have learnedabout SharePoint 2013 and Enterprise Search Petter Skodvin-Hvammen Tallak Hellebust
Agenda• How to run a successful search project• Architecture and infrastructure learnings• User experience and search customizations• How can you crawl thousands of file shares• Discover associations and enrich indexed content• What about search relevancy?
Sprint 0 – goalBest Solution Business Goals User Needs Technology
Sprint 0 – process Technology Concept Enterprise Final Analysis Assessment Development Strategy Report• User Interviews • Sources • Problem Solving • Information • Presentations• Stakeholder • Information • Information modus Marketplace • Recommendations interviews Model • Mockups • Achieving • Project plan• Search Logs • Technology • Clickable concept business goals • Quickwins• Existing work and components demo documentation • Architecture • Best practices • Scaling • Concept testing
How to run a successful search project• Sprint 0• Planning• Development• Testing• Demo• Deployment
One sprint ahead UX (Sprint 0) UX (Sprint 2) UX (Sprint 3) UX (Sprint 4) UX (Sprint n+1) Sprint 1 Sprint 2 Sprint 3 Sprint n• Let the UX-work be one sprint ahead of the technical team• Produce a clickable prototype each sprint• The prototype are a visual presentation of the product backlog• The technical team implements the prototype in the next sprint
Infrastructure NeedsIs Microsoft moving into server hardware business?
40 Million WFE Query Admin FRONT WFE Query Admin FRONT 10 Queries /Documents Second Index-0 Index-1 Index-0 Index-1 Doc Proc Doc Proc Doc Proc Doc Proc Enrichment Enrichment Enrichment Enrichment Crawling Index-2 Index-3 Index-2 Index-3 Crawling Analytics Doc Proc Doc Proc Doc Proc Doc Proc AnalyticsCentral Admin Enrichment Enrichment Enrichment Enrichment Doc Proc Doc Proc Enrichment Enrichment • Admin DB • Analytics DB • Crawl DB • Link DB • Other SP DBs SQL Server SQL Server
Infrastructure InvestmentsWhat Spec Count TotalSharePoint Server Virtual Machine 12 12 VMsCPU 8 cores 12 96 coresMemory 16 GB 12 192 GBSystem Disk 150 GB 12 1,8 TBData Disk 450 GB 12 5,4 TBDisk IO 200 (Indexer) 10 2 000 IOPS• Physical Servers • Licenses for • UAT Env• Database Servers • SharePoint Server • QA/Test Env• Load Balancer • SQL Server • Dev Envs• SAN or local disk arrays • Windows Server• Domain Controller • CALs/eCALs• Other networking • Visual Studio • Comperio FRONT
We have learned that…You will need Performance will get you• Funding!• Time • Add more CPU• Documentation • Add more Memory• Network • Optimize Disk IO• To automate • Balance load vicely • Tune Distributed cache • Know your Anti virus
Capacity Test Findings• Crawl rate decline 1% per million items indexed• Query latency increase exponentially from 12 million items indexed per partition• Database latency insignificant during crawling• Successfully crawled file shares via symbolic directory links• Disk space usage significant lower than expected
Disk Space UsageServer System Volume (C:) Data Volume (E:) Used Free Capacity Used Free space Capacity space space spaceAdmin, Crawler, Content Processing, Analytics 33.3 116 149 42 807 849ProcessingQuery Processing, Index Partition 0 34.4 115 149 270 579 849Query Processing, Index Partition 1 34.5 115 149 268 581 849Crawler, Content Processing, Analytics Processing 34.5 115 149 55 794 849The table above shows measured disk space usage for 31 million items indexedDisk volume TotalNumber of servers 4 We reduced data volumeData 52Index 1 077 248 from 850 GB to 450 GBLogs 24 576 MB 1 101 876 GB 1 076 Huge savings in storage costs!
Database Space UsageDatabase Capacity Test Table to the left shows measured databaseNumber of searchable items (in millions) 30 space usage for 31Search Service Application 156 million items indexedAnalytics Reporting 6Crawl Store 19 151Links Store 24 316 MB 43 628 GB 43
FRONT Search• Advanced query and result processing• Highly customizable business logic represented through reusable tasks and flows• Lightweight development environment• Lightweight deployment• Fully integrated with SharePoint result presentation and display templates• Fully integrated with SharePoint security
FRONT Search in SP2013• Front webpart – Handles communication between Front and UI• Front app – Handles claims security• Front webservice – Flow engine
FRONT Search <=> Query rulesFRONT Search Query rules• Conditions • Conditions – Analyze query – Analyze request – Six types – Full flexibility • Actions• Tasks (Actions) – Change query model – Add promoted result – Perform parallel queries – Add blocked result – Full flexibility – Change query• Publishing – Special conditions case • Publishing• Result processing – When is the rule active – Analyze result from a query – Perform new queries based on result – Change order/grouping/content of result
FRONT Search <=> Result sourcesFRONT Search Result sources• Source system • Source system – SP 2013 – Local SP 2013 index – SP 2010 – Remote SP 2013 index – FAST ESP – OpenSearch – Lucene/Solr • Query transformation – … – Subset of content• Query transformation – Full control of query model
Public API Custom components Unit of scale/role boundaryHTTPFile sharesSharePointUser profilesLotus Notes Crawl LinkDocumentumExchangefoldersCustom - BCS Analytics Reporting Admin
Search UX Examples has been removed from presentation to preserve client IPPlease contact Petter or Tallak if you like to discuss search user experience
How do you index millions of documentsin thousands of file shares in hundreds of locations? Bonus! Support governance and operations
Challenges• Max 50 content sources per service application• Max 100 start addresses per content source• Max 20 concurrent crawls per service application• Limit bandwidth usage for specific server locations• Limit crawler impact within local business hours• Grant read access to crawler per file share• Avoid token bloat issues with more than 1000 groups per account• Manage indexing and crawling of each file
A Proven Approach• Symbolic links in smart folder • Host aliases for crawler impact structure • Custom timer job that synchs custom lists from custom app impactfilessourceimpactaccountsymlink • Custom timer job that creates/removes symbolic• Content Sources per region links with smart start addresses • Custom list: Locations file://impact/files/source/impact – Map server prefix to content source• Content Enrichment to fix file – Map location to schedule paths in results and impact• Custom application for • Custom List: File shares managing file shares and – Map share to crawl account granting access to crawler – Map UNC to symlink – Map share specific
Example SolutionFiles in Norway Crawler Impact Rules• Incremental Crawl every 6 hours • Server name: default• Start address: file://default/files/norway/default • Server name: reduced wait 60 secsFiles in India Crawl Rules• Incremental Crawl every night at 21:00 IST • file://*/user1/* account=user1• Start address: file://reduced/files/india/reduced • file://*/user2/* account=user2Custom list: File Shares Folders• UNC Path: osl-file01share1hr • files/norway/default/user1/symlink1• Crawl Account: user2 • files/norway/default/user1/symlink2• Symlink: files/norway/default/user2/symlink3 • files/norway/default/user2/symlink3 • files/india/reduced/user1/symlink4Custom list: Locations • files/india/reduced/user1/symlink5• Server Prefix: osl • files/india/reduced/user2/symlink6• Content Source: norway• Crawler Impact: default
Discover associations in your indexed data using custom entity extractors Explore how your • Examples indexed data is – Organizationassociated with terms – Projects often used by your – Customers business – Products
Add metadata or clean up your indexed data using custom content enrichment• Based on where the • Remove company items are located, add name from title for all info about web pages – Department • Normalize names – information owner, • Normalize phone – Security classification numbers• Lookup name based • Fix search result link on user account
Synchronize Terms with Search Spelling and Synonyms Dictionaries«Custom Timer «Custom Timer Job» Job»Synchronize Synchronize Spelling Thesaurus SSA Inclusion
How fast can you findwhat you are searching for? - Relevancy - Recall – Precision -• What should be • How to a weight a indexed? managed• What should be property? searchable? • How to change• What should be ranking model? displayed? • How to tune ranking?
Managed Property Weighting These are not ordered by importance!
Change Ranking Model• The default ranking model in SP 2013 did not fit us! – Power Points always won We replaced the SP – Complete matches in site 2013 ranking model titles and document titles with the SP 2010 were outranked by number ranking model of partial matches in body – Community sites were weighted lower than discussions and posts
Tune Ranking ModelMicrosoft will soonrelease a tool for tuningranking models!1. Select ranking model to tune2. Select result source to search3. Add judgement sets4. Add queries to judgement sets5. Run queries and evaluate results6. Add and tune features7. Save and publish model
Petter Skodvin-Hvammenpsh@adgruppen.no@pettershTallak Hellebusttallak.email@example.com@titakker THE END
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.