Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Tony Hart & Mark Stone are working on user context Keyword – for a manual approachAdvanced - (Rank Profiles that are contextually aware)
  • Overview of MSFT’s Search Solution (MSW is MSFT’s Intranet)
  • Can you please add a one line description of each type of plugin in this slide and how can you enable/disable individual plugins..
  • This slide provides a more detailed architecture of content processing in FS14 and how the content processing maps to concepts in SS14. There is a schema object model that defines the schema of the properties. As in SS14 there are CP and MP and they can be managed by either the UI/powershell or the schema OM. The updates to the schema are stored in the config server and update tools are used to batch the update process. The document processing pipeline reads the schema information and performs activities like, idetntifying new CP, mapping MP to crawled proeprties extracting informtion from documents and mangedproepties that is used for deep navigatoin, sort by custom proeprtiesetx.
  • Carlos Valcarcel: Arrchitecture-Fast Search Server 2010 For SharePoint

    1. 1. Architecture: Fast Search Server 2010 for SharePoint<br />SharePoint Saturday<br />Carlos Valcarcel<br />Fast Technology Specialist, Fast, A Microsoft Subsidiary<br />
    2. 2. Demo: Fast Search Server 2010<br />FAST: A Brief Time of History<br />SharePoint 2010<br />Search features<br />Fast Search Server 2010<br />Features<br />Architecture<br />Why Fast Search Server instead of SharePoint search?<br />Agenda<br />
    3. 3. MSW – Microsoft Internal Web Site<br />demo<br />
    4. 4. You’ve probably heard it all before.<br />Fast was founded in 1997; it was 11 when the acquisition completed (2008).<br /> – still an active site!<br />Sold by Fast to Overture, then Overture bought by Yahoo!<br />Fast invested in enterprise search<br />Our flagship product, ESP, powers some of the largest sites on the web<br />Dell, Best Buy, Scirus (Reed Elsevier), Financial Times, Oodle, Rakutan<br />When we OEM’ed our product:<br />Documentum<br />Dell Message One (Email/eDiscovery)<br />CommVault<br />EMC Centera<br />MatterSpace®<br />Fast: A Brief Time of History<br />Where did Fast come from?<br />
    5. 5. Linear scalability<br />Support for more languages<br />Better relevancy<br />Support for 100 million documents per farm<br />Federated results on one page (OpenSearch compliant)<br />Navigators (navigator counts not displayed)<br />Users can tag documents<br />SharePoint follows clicks to boost relevancy<br />Auto detect languages in documents<br />User can increase boosting based on language<br />Query completion<br />Did you mean…?<br />Sub second response time<br />Synonym support (called Aliases)<br />Phonetic matching (ShartenMicklesonKjartanMikkelsen)<br />Native 64-bit deployment<br />Scaling along all dimensions<br />Query processing across multiple servers<br />Search dashboard<br />Adding content<br />Crawl rules<br />Powershell has 128 commandlets for search so everything you want to do for search can now be scripted.<br />Merges results from multiple nodes<br />SharePoint 2010 Search<br />A Brief Look: Great New Features! Less Filling! Secret Ingredients from Norway!<br />
    6. 6. Almost everything available in SharePoint 2010<br />Lemmatization/Stemming<br />Document Thumbnail and Preview<br />Visual Best Bets<br />People Search with phonetic search<br />Federated Search (OpenSearch)<br />Single search (federated) across all content<br />Relevancy per audience<br />Custom GUI per audience is possible<br />Location, Language, Role, and Search aware<br />Document boosting and blocking (click-through relevancy)<br />Document processing pipeline<br />Synonyms<br />Secure Search<br />Dynamic navigators (OOTB and custom)<br />Taxonomy<br />Breadcrumb navigation<br />Fast Search Server for 2010<br />The Future of SharePoint Search: More and Better (did I mention with Secret Ingredients from Norway?)<br />
    7. 7. The GUI: Enhancing the Search ExperienceYou’ve Got Your Search in My Collaboration Platform!<br />FS4SP<br />
    8. 8. User Interface is visual and actionable<br />Visual and conversational interaction with precise control<br />Deep Refinement<br />Thumbnails<br />Sort on any field<br />Similar Results<br />Previews<br />Built on SharePoint Search Center<br />Leverages all of innovations in SharePoint<br />Open Web Parts, Federation, query suggestions, related queries, Did you mean?<br />Visual results connects users with content<br />Thumbnails for Word and PowerPoint<br />Visual Best Bets highlight premium content <br />Preview in browser without leaving the results<br />
    9. 9. Map metadata to Managed Properties <br />Automatic association of metadata to content<br />Crawled Properties<br />Crawled Properties Standard document metadata discovered by the crawler or extracted from the full text by the FAST Content Processing Pipeline.<br />Managed Properties <br />Map one or more Crawled Properties to a single field. Enables sorting, refinement, relevance tuning and fielded searching.<br />Maps automatically or through Central Administration or PowerShell<br />Any data can be found!!<br />Index Profile<br />Managed Properties<br />
    10. 10. How does it work?<br />Put your terms in the out of the box extraction dictionaries by modifying an XML file<br />Map the crawled property to a managed property<br />Index your content<br />Modify refinement panel web part<br />Example: Create a custom entity extractor<br />Customized Extraction Dictionary<br />
    11. 11. How does it work?<br />Built on a SharePoint List or custom extractor<br />Edit the Search Center Results Page<br />Modify the shared web part by adding tags to the refinement panel XML<br />Create your own labels<br />Save and Publish<br />Custom Collections<br />Add refiners to user interface<br />
    12. 12. Quickly build a contextual experience<br />User based tools for creating results that are relevant to your users<br />One-way synonyms<br />Keywords map to other terms<br />Two-way synonyms<br />Keywords become equivalent to other terms<br />Best Bets<br />Highlights key resources that are always relevant to a keyword<br />Visual Best Bets<br />Extend Best Bets with pictures, video, Silverlight controls<br />Document Promotion / Demotion<br />Tailor specific document relevancy<br />Pick the right ingredients <br />Match the proper terms and contexts to boost relevancy for targeted users to ensure your users are always finding the right content<br />Create new user contexts<br />Site administrators create contexts based on user profiles to deliver relevant results to the right audiences<br />Create new keywords<br />Site Administrators have powerful and simple tools to configure the search experience for groups of users<br />
    13. 13. Deliver results that are contextually relevant<br />with search that can understands your business and role<br />Role-specific <br />relevance<br />Targeted Best <br />Bets / Visual <br />Best Bets<br />Business driven<br />refinement<br />”What should I know about selling ERP?”<br />- Alan Brewer, Sales Lead<br />”What should I know about implementing ERP?”<br />- Renee Lo, Consultant<br />
    14. 14. Rank Profiles<br />Tune relevancy without impacting the default algorithm<br />Out of the box relevancy<br />Tuned for great general productivity experience, relevancy improves with click-throughs and link text analysis. <br />Extend the default algorithms<br />Create new default relevancy models. Blend static and dynamic ranking parameters to instantly improve search results.<br />
    15. 15. How to create a Rank Profile<br />IT Pros are empowered to create new profiles quickly<br />Rank Profiles created in PowerShell by extending the default relevancy algorithm…<br />… and are exposed in the user interface by modifying the sorting web part. <br />
    16. 16. Back End Processing Tasks:<br />Load content from many different places<br />Out of the box connectors for SharePoint, exchange public folders, and shared files<br />SharePoint Designer to configure connection to customer portfolio/holdings database<br />Create custom metadata with content processing pipeline<br />Names of holdings, offerings, key concepts, companies, people<br />Synonyms for key concepts (real estate ~ REIT)<br />Roll-ups configured with optional results collapsing stage<br />Create custom relevance profile<br />Designers can stylize the User Interface<br />Apply styles to web parts<br />Federation, People Search, Search actions<br />Build custom web parts for visual navigation<br />Use SharePoint workflows to perform business specific actions<br />Leveraging the platform to build applications<br />Putting together all of the pieces to build search-driven applications<br />
    17. 17. Simplified, powerful administration<br />A high-end enterprise search solution that’s easy to deploy and manage<br />Manage efficiently with full support for Microsoft System Center and PowerShell scripting to automate tasks<br />Deploy easilyusing wizard-driven installation, a topology designer, and native support for 64-bit virtualization<br />Streamline administrationwith a simplified admin console that helps you manage search services across your enterprise<br />
    18. 18. Architecture<br />FS4SP<br />
    19. 19. Microsoft’s 2010 Dog-Food Farm<br />Description: Team Collaboration Portal & Social Networking<br />Day to day work and internal experiments <br />Data Set: <br />Workload: <br />Search Full Crawl generating ~75%<br />
    20. 20. FAST Search for SharePoint Scaleout<br />Back-end with extreme and flexible scale out options<br />Scale-out multiple “dimensions”<br />Query Volume<br />Content Volume<br />Indexing freshness<br />Redundancy options<br />Search<br />Indexing<br />Performance targets*<br />30M Docs/node<br />50 QPS/node<br />35 docs/sec<br />Query Volume<br />Search and Indexing<br />Query and Result Processing<br />Content Volume<br />No theoretical upper bounds!<br />Crawling and Content <br />Processing<br />*Depends on content and hardware specifics<br />
    21. 21. SharePoint Server(s)<br />FAST Search Server 2010<br />FAST Server(s)<br />Summary of architectural components<br />Other Server(s)<br />Site Collection Level Admin UI<br /><ul><li>Keyword Management
    22. 22. User Context Management
    23. 23. Site Promotion/Demotion</li></ul>PowerShell<br /><ul><li>Schema configuration
    24. 24. Admin configuration
    25. 25. Deployment configuration</li></ul>Central Administration UI <br /><ul><li>Property mapping
    26. 26. Property extraction
    27. 27. Spell-checking</li></ul>Administration and Schema Object Model<br />Advanced Content Processing<br />Linguistics<br />Web<br />Link<br />Analysis<br />Connectors<br /><ul><li>SharePoint
    28. 28. File Traverser
    29. 29. Web
    30. 30. BDC
    31. 31. Exchange
    32. 32. Notes
    33. 33. Documentum</li></ul>Security <br />Access<br />Module <br />Indexing<br />SharePoint <br />Front-end<br />Custom <br />Front-End<br />Query Object Model<br />Query and <br />Result <br />Processing<br />Search<br />Core <br />Query Web Service<br />Connectors<br /><ul><li>Web Crawler
    34. 34. JDBC</li></ul>Federation <br />Object Model<br />Monitoring Services<br />Content<br />Microsoft System Center Operations Manager<br />OpenSearch or Other Sources<br />People Search<br />
    35. 35. Search LOB Systems via BDC/BCS<br />Enhance SharePoint platform capabilities with out-of-box features, services, and tools that streamline development of solutions with deep integration of External Data and Services. <br />Office Apps<br />Cache<br />Offline Operations<br />BDC Client Runtime<br />SharePoint<br />SPD<br />Design<br />Tools<br />VSTO<br />Web 2.0<br />LOB<br />Siebel<br />SAP<br />Dynamics<br />
    36. 36. Document Processing Pipeline Stages<br />Default<br />Optional<br />Format Conversion<br />iFilters, OutSideIn<br />Language detection and encoding<br />Lemmatizer<br />Linguistics normalization<br />Tokenizer<br />Word breaking<br />Entity Extraction<br />Persons, companies, locations, email, date/time, URL, prices, file names<br />DateTimeNormalizer<br />Date normalization<br />Vectorizer<br />Create document vector for similarity searching<br />WebAnalyzer<br />Anchor text and link cardinality analysis<br />PropertiesMapper<br />Map to crawled properties<br />PropertiesReporter<br />Report detected properties<br />XML Properties mapper<br />Offensive Content Filter<br />Verbatim extractor<br />Loads dictionary for custom extraction, e.g product names<br />Field Collapsing<br />Mapper<br />…<br />Configurable<br />Stages<br />EntityExtraction<br />Language<br />Detection<br />Format<br />Conversion<br /> The different plug-ins can either be configured from UI or from config files<br />
    37. 37. Content Processing and Schema<br />Admin UI<br />Schema CmdLets<br />Custom Client<br />Extracted document attributes reported as Crawled Properties<br />Crawled Properties mapped to Managed Properties<br />Characteristics are defined for Managed Properties, e.g. <br />Refiners<br />Sorting<br />Queryable<br />Type<br />Definition and mapping done via UI or Powershell<br />Schema Object Model<br />Update configuration<br />Schema Service (hosted in IIS)<br />Report discovered crawled properties<br />Update Tools<br />Persistence<br />Property backend<br />bliss<br />psctrl<br />configserver<br />Alert pipeline<br />of updated<br />schema<br />Document Processing Pipeline<br />PropertiesMapper<br />PropertiesReporter<br />
    38. 38. Pipeline Extensibility API<br />Motivation<br />Straightforward way to add text analysis functionality<br />Flexibility and supportability<br />Example uses<br />Sentiment analysis<br />Translation<br />Auto-Classification<br />Mechanism<br />Just before Mapper<br />“any” binary<br />Runs in sandbox with timeout<br />Mapper<br />Extensibility<br />…<br />Standard processing<br />
    39. 39. Yeah, So What?<br />100 million documents per farm<br />Refiners: only uses the first 1000 results<br />Search is restricted to one farm<br />Tell Me Something Awesome<br />SharePoint 2010<br />Fast Search Server 2010<br />40 Million Documents per server<br />Refiners: exact count from the entire result set<br />Content can be indexed and search across farms<br />3.6 TB of disk space per server (so far!) and support for NAS and SANs.<br />Full support for VMs (Hyper-V and VMware)<br />
    40. 40. There is nothing wrong with SharePoint!<br />SharePoint brings together a number of collaborative technologies that would otherwise not play well together<br />As SharePoint adoption spreads the need for enterprise search only increases<br />Search today is where RDBMSs were over 20 years ago<br />Let me say that again: there is nothing wrong with SharePoint!<br />Is Something Wrong With SharePoint?<br />
    41. 41. The Present<br />SharePoint 2010 search addresses a host or previous issues<br />No migration path from SP 2010 to Fast Search 2010<br />The Future<br />Where do you think Fast Search Server will be in 3 years (the next release of SharePoint)?<br />Why Fast Search Instead of SharePoint Search?<br />
    42. 42. You’ve Got QuestionsI’ve probably got answers…<br />Q and A<br />
    43. 43. Demo: Fast Search Server 2010<br />FAST: A Brief Time of History<br />SharePoint 2010<br />Search features<br />Fast Search Server 2010<br />Features<br />Architecture<br />Why Fast Search Server instead of SharePoint search?<br />Agenda<br />
    44. 44. The organizers of SharePoint Saturday<br />To all of you for attending!<br />Thanks<br />
    45. 45. Capacity Planning White Paper<br /> <br />RSS: FAST Search Server 2010 for SharePoint Newly Published Content<br />If you bookmark only one RSS feed for Fast Search Server 2010 this is the one:<br />Documentation<br />TechNet: <br />MSDN Blogs<br />Enterprise Search:<br />Steve Nicolaou, Fast Architect:<br />Jørgen's FAST Search Blog: <br />Dark Corners: <br />Enterprise Search User Group<br />Second Wednesday of every month! You missed July! Don’t miss August!<br />Case Study: Search and the FBI Sentinel Program <br />Author: Marti Hearst, Search User Interfaces (<br />Next Generation Tools: Content Transformation Service/Interaction Management Service<br />References<br />