Share Point2007 Best Practices Final


Published on

Delivered at the SharePoint Best Practices Conference in La Jolla CA Feb 6 to 9

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Share Point2007 Best Practices Final

  1. 1. Enterprise Search ITP278
  2. 2.  Marianne Sweeny  Ascentium    Director of Search Services, Web producer at Microsoft for 7+ years, pointy-head not propeller-head
  3. 3. Agenda  Introduction  MOSS 2007 Search  Configuring MOSS Search  Here There Be Dragons  Resources  Appendix
  4. 4. Introduction July 2008: Google acknowledges that its spiders have found 1 TRILLION unique URLs on the Web 2000: 1 billion pages 1999: 26 million pages
  5. 5. There is No Magic Bullet  Susan Feldman (IDC) Enterprise Search Summit West 2008 – Employees average 3.5 hours/week searching – Cost = $5000 per employee per year  There can be no “silver bullet” solution for finding information – Customers don‟t know what they don‟t know – “Google experience” is finding what they want/need in the first few pages and not necessarily Google itself – Enterprises have different lines of business and different information types  Search of tomorrow: is here today – Personalized to the device and user – Contextual – Flexible – Secure – Adaptable
  6. 6. Search Index: A Different Kind of Database Search Engine Index SQL Server Index
  7. 7. Web Search and Enterprise Search Web Search Enterprise Search  Publishers want their content to  Publishers do not think about be found document discoverability  Anarchistic publishing model =  Controlled corpus of documents “anyone, anywhere, any time”  Standards and practices in place  Unlimited document set  No spam  No real standards or code, more  Users and authors generally like guidelines share contextual understanding  No central authority  Customized tagging or metadata  Spam  Can customize search  Commercialization technology to enterprise themes  Technology is agnostic and concepts  Has to work the same for everyone worldwide  No shared understanding Successful enterprise search efforts target corpuses of information and set search scopes appropriately. I&KM pros are wise to study information worker context before trying to “Google-ize” their enterprises. Forrester Search Wave Q2 2008
  8. 8. Advanced Search  Few customers use it and those that do are disappointed  Boolean or SQL operators work sporadically  Confusing message  What is “regular” search…not as effective  Searchhas progressed beyond the stages of Advanced  Filters  Facets  Context
  9. 9. MOSS 2007 Search  Query engine breaks the search terms down  Index engine stores the properties  Content index stores the text
  10. 10. Better Than Ever MOSS 2007 SharePoint 2003  Relevance customizable to the  Relevance keyed on numeric values enterprise content derived solely from document text  Automated metadata extraction  Collection frequency  Enhanced text analysis  Term frequency  Fully integrated admin experience  Document length between Windows  Term position  SharePoint Services v3 and MOSS  Different systems between Windows 2007 SharePoint Systems and SharePoint  Single search system and index Portal Server per server farm  Multiple indexes  Custom content groups, Best  Custom Content groups, Best Bets, scheduling are now shared Bets, scheduling configurations services are portal-based  Scopes can be tied to document  Scopes tied to content sources properties  Index propagated at completion of  Improved control over indexing master crawl only
  11. 11. Simplified Administration UI Search settings page at the SSP level Managing crawls • Content sources • Explicit SharePoint Content Source Type • Content source for Business Data (Enterprise CAL) Crawl logs • Snapshot of crawled content in your index – lists all documents found in the content source and their status • Filters by date, site, and etc. • Summary by host name (#of successes, errors, and warnings) Crawl rules • Included and excluded rules • Ability to pre-test crawl rules • Easy to change order of crawl rules Managing scopes • Scopes decoupled from content sources • Scopes can span multiple content sources • Scope by Property, Site, Content Source, and URL
  12. 12. Indexing Performance Improvements  Search is a shared service – Unified WSS and MOSS search for 1 index per SSP – Crawls, content sources, crawl rules schema, shared scopes etc are administered centrally at the shared service level – Scopes and best bets can also be administered at the consuming sites  Crawl to small indexes that are then consolidated at scheduled times into a “master merge”  Content index that holds text of pages with Property store that holds other document values  Propagate data incrementally as it is being indexed to the query servers – Propagation starts within 30 seconds of the first shadow index written – No need to wait till the end of the crawl for information to be available in queries – No propagation of properties  Single item add /removal without re-indexing entire corpus with continuous propagation – Change Log Crawl: detects what items have changed with in a WSS or a MOSS 2007 site and crawl only those items – Security Change Only Crawl: no need to fully index all the content of a site when permissions on this site have changed
  13. 13. Relevance: Types  Dynamic ranking = relevance impacted by query term – Frequency – Location in document – Appearance in link text – Appearance in URL  Static ranking = relevance independent of customer query – URL Depth – Click Distance – Authority/Demoted site – Change property weights – Language of customer (browser setting) – Document type: HTML files, PPT, Word docs, emails , XML files, Excel spreadsheets, Plain text, List items
  14. 14. Relevance: Enhancements Manually assign synonyms and editorialized results to keywords – Use search logs to detect popular searches, low click- through from results or 0 result queries  Search Alerts – User can subscribe to receive email when results change  File type filtering – Some file types are deemed more relevant (i.e. HTML, DOC) than others (XML, txt) – Supports 220 files types, MS and non-MS application  Property weights * – Assign different weights to properties so that important properties such as „Title‟ have a bigger influence on ranking – Change default property weights through the Schema Object Model – Note: The weights used in the product were carefully tested. Changes to the weights may also have a negative effect on relevance * Marcy Tobin wants me to tell you that this is not a trivial undertaking
  15. 15. MOSS 2007 Faceted Search Facets are predetermined content categories presented to the customer to narrow search results •Can be presented pre- or post- query •Used for Advanced search Empowers customer to most effectively refine their search Filters results by predetermined categories
  16. 16. Federated Search  Import or export federated locations using Federated Location Definition (.FLD) files  Incorporates results from outside content sources that subscribe to OpenSearch 1.1  Passes the query into the subscribed resource and returns results into single interface  Relevance calculation done according to originating resource criteria, not MOSS 2007 criteria  Pre-defined FLD files found at s/federated.aspx#fscp  Can develop own FLD files if destination subscribes to OpenSearch 1.1 – Day Software has developed a standard connector for LiveLink ECM
  17. 17. People Search Build and publish rich personal profiles  Customize personal profile attributes  Populate personal profiles using information from Active Directory, other LDAP directories, or Line-of Business systems  Control access to information using security and privacy controls  Generate and display organizational charts based on directory information  Publish personal profiles using MOSS My Sites Identify people who can help  Find people based on keyword matches with MOSS personal profiles  Find people in line-of-business systems  Filter results by common attributes such as Job Title or Department  Find “in-common” connections, including managers, site memberships, distribution lists, and colleagues  Group results by social distance  Subscribe to People Alerts
  18. 18. People Search Results Page Find people by project, expertise or… Filter by relevant attributes Contact information & online availability
  19. 19. LOB Applications with BDC Extracts data from line-of- business, CRM, and other 3rd Party data stores  Caches for indexing by search service  Searches any data source accessible through or Web Services  Uses Live Communication Server for connectivity options Aggregated into a single application
  20. 20. FAST ESP Technology FAST is a sophisticated search engine tailor-made for ecommerce and help desk  Uses sophisticated linguistic processing  Searches structured and unstructured content  Indexing Process: Conversion-language detection-synonyms-spell check- external call outs-entity extraction-categorization-vectorization-custom navigation-normalizer-alerting-indexing Why is it Unique  Auto Classification  Advanced Linguistics: text mining for concept and relationship mapping  Recall: Lemmatization, synonym expansion, wildcards, anti-phrasing, phonetic search  Precision: Exact word matching, exact phrase matching, proximity, tokenization  Location aware results (retail and news) – excellent for mobile search  Recommendation engine  Increased capacity:100-200 million documents on 1 server and 150 million q/second
  21. 21. Custom Results  Search Scopes  Allow users to refine search through filtering  Define content resources and map to business rules/key concepts  Focused content = shared understanding = more precise results  Duplicate results filtering  Collapsing duplicates from same directory or site to leave more room for other relevant results  Less favoritism, more results on desired page 1  Definitions  Automatically extract “definitions” from indexed content and display them as matches directly on the results page  A web property on the Search Best Bets web part (can turn on/off display of definition)  Returned in the Query Object Model  Can not be edited  Best Bets  Editorially assigned results based on these key concepts assigned to selected query terms  Can be many-to-many
  22. 22. Scalability  No physical limit for the maximum number of documents in one index  Recommended document limit is 50 Millions of documents per indexer  A document is anything from a Word or PowerPoint file, to a web page, an individual SharePoint list item, one people entry, or an SAP customer record  Large/small documents count the same  The „average document size‟ depends on the corpus mix – i.e., heavy use of WSS 3.0 lists versus limited use  Dependent on supporting hardware
  23. 23. Security  Query time stripping – customer only sees those results that they have permission to view  Support for pluggable authentication for content in SharePoint Server and WSS 3.0 Sites  Implements ASP.NET 2.0 authentication model  Minimum crawler permission is “Full Read”  Still provides the same security trimming functionality  Automatically configured for new sites  Search visibility options  Prevent sites/lists appearing in search results at a site/list level  “Security only” crawl for single item add/removal
  24. 24. Search Analytics  Export search logs to Excel  Query terms  Page views  Number of results returned  Volume trends  Query success: can define success for certain query terms  Report Center  Access to MOSS 2007 BI features  Filters data for permissions and relevance  Key Performance Indicators [KPI]  Create a KPI list or other measures of success  Default KPIs exist in OOB deployment  KPI information can be drawn from MOSS 2007 data sources: SharePoint lists, Excel workbooks, SQL Server 2005 Analysis Services, manually entered information
  25. 25. Configuring MOSS 2007 Search
  26. 26. Search Roadmap  Useful participants  Content creators  Information Architect/User Experience Architect  Taxonomist  Define key enterprise themes in content  Map existing content to these themes  Create filters and scopes to map for themes  Get as much customer data as possible to find search pain points  Review search logs and customer feedback mechanisms  What are they trying to find  What terms are they using  Assemble a cross functional team to:  Assign relevance weighting that makes sense to the customer behavior and the corpus  Develop Best Bets for searches with 0 results  Create editorial guidelines and tools that enforce strong meta data standards across the enterprise  Develop controlled vocabulary that best describes enterprise key concepts and themes and Is used as a foundation for meaningful metadata and facets  Design a structure that leverages the structural elements like URL depth and click distance
  27. 27. Pareto‟s Principle  Known as the 80/20 rule  Named after late 19th century economist  20% of your content is answering 80% of your searches  Not an excuse to stop optimizing at the top 20%  Don‟t forget the Long Tail
  28. 28. Define Content  Define content scopes  Segment content into logical groups  Create scope rule based on – Address – Property query – Content source  At the SSP level or individual level  SSP level scopes are shared among all sites that use the SSP  Select Authority resources  Define special terms if needed  Terms or language proprietary to the enterprise – i.e. “goat rodeo”  Provides additional clarification for searcher  Use synonym mapping for term variants – C# and Csharp  Two information points can be displayed for a special term – Definition of the term – Best Bet
  29. 29. Designate Authority Sites  Hilltop Algorithm  Quality of links more important than quantity of links  Segmentation of corpus into broad topics  Selection of authority sources within these topic areas  Pre-query calculation applied at query time  Topic Sensitive Page Rank  Consolidation of Hypertext Induced Topic Selection [HITS] and PageRank  Pre-query calculation of factors based on subset of corpus – Context of term use in document – Context of term use in history of queries – Context of term use by user submitting query
  30. 30. Educate: Structural Influences  File Type Bias  In order of relevancy (highest to lowest ) – HTML Web pages – PowerPoint presentations – Word documents – Emails – XML files – Excel spreadsheets – Plain text files – List items  Auto Language Detect  Foreign language results are less relevant than results in user‟s language  English language is always considered as relevant as user‟s language  URL Depth and Click Distance  Short URLs are like prime real estate.  Items with shorter URLs are considered more relevant than items placed in longer URLs – The level is determined by reviewing the number of slash (“/”) characters in the URL  Keywords separated by hyphens in the URL are good
  31. 31. Educate: Content Influences  Anchor Link Text  Search indexes the anchor text from the following elements: – HTML anchor elements – SharePoint Services link lists – SharePoint Portal Server 2003 listings – Word 2007, Excel 2007, and PowerPoint 2007 hyperlinks  Any file types handled by installed 3rd party iFilter components which emit hyperlinks  Metadata extraction  Shadow title detection is provided within the body of the item – Primarily based on text formatting features – Shadow title is added automatically to the document – Weighted the same as the original title – Only for Microsoft Office file types  Auto Description text  Optimized URLs  Enterprise Search checks URL matching at query time:  If query matches to the host name of a page in the index it will display as the first result
  32. 32. Enhanced Search Results Site Actions >> Site Settings >> Modify All Site Settings >> Site Collection Administration (Select Keywords) >> Manage Keywords >> quot;Add Keyword“ >> Synonym Mapping Best Bets
  33. 33. Hardware Considerations  Dedicated crawl-target servers for large sites  Separate SQL Server instance for Search  Fast disk for SQL, fast CPU for Indexer, more memory  Dedicated Web Front End Server for crawling  Separate indexer machine  In most cases, your search index is on its own server
  34. 34. Indexing Configuration  Use dedicated web front ends for crawling large farms/sites  Upgrade WSS 2003 sites to WSS 2007 sites to index them faster  Define Crawler Impact Rules to avoid site overload  Schedule for off-hours crawling where appropriate  Balance results freshness with load on servers  Consider using single content access account per region  Regularly cleanup and Review  Crawl rules  Property and schema  Best Bets / keywords
  35. 35. Customizing Results Display To access the XSL property of the Search Core Results Web Part 1. In your browser, navigate to the results page URL:Copy Code http://<ServerName>/SearchCenter/Pages/results.aspx 2. Click the Site Actions link, and then click Edit Page. 3. In the Search Core Results Web Part, click the edit down arrow to display the Web Part menu, and then click Modify Shared Web Part. This opens the Search Core Results Web Part tool pane. 4. Click Data Form Web Part to display the XSL Editornode. 5. Click the Source Editor button. 6. This opens the Text Entry window for the Web Part's XSL property. You can modify the XSLT directly in this window; however, you may find it easier to copy the code to a file. You can then edit that file using an application such as Visual Studio 2005. 7. After you have finished editing the file, you can copy the modified code back into the Text Entry window and save your changes to the Search Core Results Web Part.
  36. 36. Here There Be Dragons
  37. 37. Dragons 1  Note the infrastructure update where Microsoft rolled the features of Search Server 2008 into MOSS 2007 that includes federated search ability, and a unified administration dashboard.  Read more here: ng-availability-of-infrastructure-updates.aspx  Also please note that it is *not* an easy installation, and that users *must* read the entire documentation for it before upgrading their portal.  More people destroy their portal than upgrade it due to not reading the documentation and installing the prerequisite patches  Must ensure a schedule for the incremental crawl to catch additions to the document set  Must turn on PDF indexer and stemming
  38. 38. Dragons 2  Use the Web part to accommodates wildcard search  Found here: /09/new-web-part-for-wildcard-search-in-enterprise- search.aspx  Use of special characters in the thesaurus can lead to highly irrelevant results and impact “did you mean” capabilities  The Expert search capacity is predicated on the My Sites profile  Employee participation critical to optimal functionality  Benefits of click-distance are missed if Authority sites are not configured
  39. 39. Dragons 3  The value of statistical ranking can vary from the partial indexes to the master merge index  Without authoritative sites configured in the relevance settings, the benefits of click-distance are missed  Results delayed from servers without Internet connections  Backward compatibility  Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model  Index files, scopes, search alerts, filters, word breakers, thesaurus files not upgraded  Custom applications using SharePoint 2003 administrative object model must be rewritten to use MOSS 2007 object model
  40. 40. Resources  Microsoft Enterprise Search website  Webcast: Installing and Configuring Search in MOSS 2007 US&EventID=1032325467&CountryCode=US  Tune Search server 2008 search-server-express-2008-etc/  Configuring MOSS 2007 Search (Cale Hoopes) appliance.html  MOSS Developer Center on MSDN  MOSS 2007 Software Developers Kit us/library/ms550992.aspx  MOSS 2007 on TechNet us/library/3e3b8737-c6a3-4e2c-a35f-f0095d952b781033.mspx  Search Optimization for a MOSS 2007 Content Management site:  Faceted Search from the Microsoft SharePoint Team Blog
  41. 41. More Resources  Enterprise search blog  MOSS BDC Search moss-2007-business-data-catalog-search-excel-services-sql-analysis- services.aspx  Find it All with SharePoint Enterprise Search us/magazine/cc162512.aspx  Google Enterprise Connector for MOSS 2007 min/sharepoint_connector.html  Ontologica Search for MOSS 2007  Michael Gannotti on SharePoint ame=Search%20Technologies  Sitemap.xml Generator:  SEO Advice from a Propellerhead for … :
  42. 42. Even More Resources  MOSS 2007 Administrator Documentation moss-2007-wss-v3/  SharePoint Search linkshttp://www.virtual-  All About SharePoint : S.S. Ahmed sharepoint-search-part-1.aspx  Working with MOSS search - creating scopes sharepoint-search-part-2.aspx  MOSS 2007 search customization customization.aspx  MOSS 2007 Search & Indexing search-and-indexing.aspx  Create a custom Search Page connect-a-custom-search-page-to-a-custom-search-scope.aspx
  43. 43. Appendix
  44. 44. Auto Classification Products  Concept Searching  Auto-classifies documents for MOSS 2007  Uses established probabilistic methods to distinguish multiword concepts and weight by importance (relevance)  Extracts concepts and weights their relevance to searcher query – Presents for search refinement  (insider trading)  Integration with MOSS  Extracts metadata and compound terms  Incorporates with existing taxonomy if one exists  Appends metadata and stores as MOSS property  Part of the main MOSS index  Uses standard MOSS administration features
  45. 45. Adjusting Relevance Property weights  Assign different weights to properties so that certain properties such as „Title‟ have a bigger influence on ranking  Change default property weights through the Schema Object Model using Microsoft.Office.Server.Search.Administration;()); Ranking ranking = new Ranking(SearchContext.GetContext( appGuid )); //dump parameters foreach (RankingParameter param in ranking.RankingParameters) { RankingParameter lookedup = ranking.RankingParameters[param.Name]; Console.WriteLine(lookedup.Name + quot;: quot; + lookedup.Value); } //Lookup by index for (int i = 0; i < ranking.RankingParameters.Count; i++){ RankingParameter param = ranking.RankingParameters[i]; Console.WriteLine(param.Name + quot;: quot; + param.Value); } //Setting the weight of property ‘prop’ to ‘weight’ ranking.RankingParameters[property].Value = float.Parse(weight); ranking.StartRankingUpdate(RankingUpdateType.ClickDistanceUpdate); Console.Write(quot;Updating ..quot;); while (ranking.Status != RankingUpdateStatus.Idle) { Console.Write('.'); System.Threading.Thread.Sleep(1000); } Console.WriteLine(quot;Done.quot;); Remember that Marcy Tobin wants me to let you know that this is not a trivial matter and she knows of what she speaks.
  46. 46. Push/Pull Data to Users  Alerts  Same alerting infrastructure for WSS and MOSS – Timer service is used to handle all alerts notifications  Frequency can be set to Daily/Weekly – Notifications for search alerts will be sent according to the creation time  „Alert Me‟ link can be added/removed using a web part property on the Search Action Links web part and on the Search Core Results web part  A rollup of all user‟s alerts for a site collection – http://<sitecollection>/_layouts/MySubs.aspx  Alert “gotchas” – No “My Alerts Summary” web part – No upgrade path from SPS2003 alerts to MOSS 2007 alerts except for WSS alert types  RSS Feeds  Ability to subscribe for an RSS feed on the search results  „RSS‟ link can be added/removed using a web part property on the Search Action Links web part and on the Search Core Results web part
  47. 47. Protocol Handlers  Connects to a content source and enumerates the documents  Ships with support for  Web Content, NTFS File Shares, Exchange Public Folders, Lotus Notes Databases, SharePoint Content, SharePoint profiles, and Business Data Catalog  Partners providing support for  Documentum, Hummingbird, OpenText, FileNet, Interwoven, and others  us/spssdk/html/_introduction_to_a_protocol_handl er.asp?frame=true
  48. 48. The Query object model KeywordQuery request = new KeywordQuery(site); request.QueryText = strQuery; request.ResultTypes |= ResultType.RelevantResults; //if we want to get more than one result table //request.ResultTypes |= ResultType.SpecialTermResults; //Setting optional parameters on the Query object request.RowLimit = 10; request.StartRow = 0; request.KeywordInclusion = KeywordInclusion.AllKeywords; //Executing the query ResultTableCollection results = request.Execute();
  49. 49. Metadata Property Mapping  Crawled properties  Emitted by iFilters and Protocol Handlers  Identified by a property set (GUID) and property ID (name or numeric ID)  Managed properties  Mapping target for crawled properties (many-to- many)  Identified by internal ID  Friendly name used in queries – Can be used in the query with property: Value