2. Microsoft’s Search Vision & Strategy
Search is Everywhere
Desktop Enterprise Internet Devices
Big Bet
Enterprise Internet
Consumer Portals / Partner Portals
Employee Productivity High Value Search
Marketing / B2B / … Monetization
eDiscovery
People & Connect to all Research Portals
Expertise your Content 360o customer views
Interactive Visual Search -
Competitive Intelligence
Personalization - Social Networks
…
Best of Microsoft - Best of SharePoint - Best of High End Search
3. Introducing FAST Search for SharePoint
OOB User Experience
Tailoring General Productivity
Search Platform and Architecture
Search Driven Applications
Deployment and Administration
Summary and Resources
4. Microsoft Enterprise Search
The 2010 Wave
General productivity search Customized productivity search
Light customization and search driven applications
Common across the product line
• UI Framework • Connector Framework (BDC)
• Social search features and integration • APIs and developer Experience
• SharePoint platform integration • Admin & deployment capabilities
• End user and site administrator enablement • Operations advantages (SCOM, scripting)
5. SharePoint vs. FAST Search for
SharePoint
User Interface • Query and Result
Central Administration Processing
• Content Processing Pipeline
Crawler and
• Customizability and
Connector
Scalability
7. Introducing FAST Search for SharePoint
OOB User Experience
Tailoring General Productivity
Search Platform and Architecture
Search Driven Applications
Deployment and Administration
Summary and Resources
8. Get visual, interactive search experience
using a
better answers, faster
Sorting on
Query any property
Related
completion searches &
people
Document Scrolling
thumbnails previews
Read in Office
Web Apps Federated
results
9. Connecthow you find and collaborate with others
and streamline
with people and expertise
Filter by title, Phonetic
expertise & name lookup
other attributes
Expertise
Real-time matching
presence
Org
browsing
Find recent
content
10. Deep Refinement and Sorting
Enables precise control of results
Enables conversational experience across all of the
Out of the Box
results
You will never miss any content
Enabling better findability and exploration
Discover non-obvious relationships across the entire result set
Exact counts shows relative weight
Provides analytic view of your results
Indicates priority and importance
The right lever to slice and dice your content
Sort on any field
Sorting Options Empower the user to use the relevance model that best fits their
needs Exact Counts
Rearrange the result set to meet specific criteria
Alphabetical, numeric, and date
12. Introducing FAST Search for SharePoint
OOB User Experience
Tailoring General Productivity
Search Platform and Architecture
Search Driven Applications
Deployment and Administration
Summary and Resources
13. Customize search to meet your business
needs
Deliver results that are contextually
relevant
Search in the language of your
business
Tune relevancy to improve accuracy
Create structure from unstructured
content
Configure the UI to extend your
application
Similarity Search
15. Visual Best Bets
Identify static content that is always relevant
Set Vertical
Visual Notification Orientation
Visual Best Bets
Built on SharePoint Keywords Easy and quick to setup
Matches keywords and synonyms that are contextually Point and click setup for site admins. Set and forget with
relevant to users. Include banners, videos, external content expiration dates . Web Parts allow for easy page
websites. customization
16. Audience-specific search experiences
Use User Context to meet the needs of diverse groups
Renee Lo Alan Brewer
Engineering Sales Manager
Contoso Consulting User Contoso Consulting
”What should I know about context ”What should I know about
implementing ERP?” selling ERP consulting?”
Infor
Soci
m-
al
ation
conte
conte
xt
xt
Application
context
Username & Group Business Unit Preferred Sites
Memberships Department SharePoint Audiences
Location Team Interests & Current Projects
Languages Time of Day Context of Current Task
17. Quicklybased tools contextual experience to your users
User
build a for creating results that are relevant
One-way synonyms
Keywords map to other terms
Two-way synonyms
Keywords become equivalent to other terms
Best Bets
Highlights key resources that are always
relevant to a keyword
Visual Best Bets
Extend Best Bets with pictures, video,
Silverlight controls
Document Promotion / Demotion
Tailor specific document relevancy
Pick Create new keywords
the right ingredients
user contexts
Match the proper terms and contexts toand simpleuser for
Site Administrators create contexts boost relevancy
administrators have powerful based on tools
targeted users to the search experience always finding the
to configure ensure your users are the right
profiles to deliver relevant results tofor groups of
right audiences
users
content
18. Search in the language of your business
Identify what is important to improve the search experience
Use language that has specific meaning to your mobile workforce direct mail
business revenue merger
communications
Users can quickly refine content using familiar terms Taxonomy
Build confidence that you found the correct answers the first cloudchain
supply
computing
time
audit
Productivity
best practices XML
archive acquisition storage
Leverage corporate knowledge to make content cost savings Social Media
findable
Profit Strategy Development
Corporate taxonomies customer relations
market share
Business terminology
IP Telephony quality
Product names SOX compliance Competition
Acronyms
risk target markets
part numbers
brand management
Define custom rules to identify unique terms Global presence
Handle complex terms such as part numbers or forms
Disaster Recovery
Searching for ”XXX 123 abc“ finds “XXX-123-abc“ and
“GG^XXX-123-abc_HH“
19. Introducing FAST Search for SharePoint
OOB User Experience
Tailoring General Productivity
Search Platform and Architecture
Search Driven Applications
Deployment and Administration
Summary and Resources
22. FAST Search for SharePoint … …
High Level Architecture
FAST FAST
Query Connector
SSA SSA
23. Extensible Content Processing
Enables search that has a deep understanding of your information
Properties Entity Format
Mapper Vectorizer Extractor Lemmatizer Converter
…
Web Date/Time Word Language
Analyzer Normalizer Breaker Detector
24. How does the pipeline work?
A systematic approach to interpreting your content
Sequential stages perform specific tasks while ingesting content
Breaks down content to the smallest addressable chunks to build meaning
Understands file encoding, data formats, and written languages
Supports 400+ file formats, 80+ languages
Process your content to make it searchable
Normalizes content so that a consistent relevancy model can be applied
Identifies structured and unstructured metadata in your content
Maps document metadata to SharePoint Crawled Properties
LanguageExtraction
Format Conversion
WebDate Encoding
Document Vector
Entity and Time
Lemmatization
Link Analysis
Map Crawled
Tokenization Extracts language written aawas extracting anchorvarious maps run,
Identifies root metadatahyperlinks language. byspecific encoding so Out
Apply the documents for forand standard representation, and applications
Finds terms
Converts theintext content torules for document that reflects importantthat
Creates of theof a from multiple ofdiscoveredpredefinedto handle idioms
Analyzesplainnative specificlanguage them toencodings, concepts, locale
Maps all adatestherepresentationfile a identifyingFor Englishwhich reinforces
unique word that given formats,
and times maps and locale words, categories.
the text it pipeline
Normalization
andProperties
Detection andproper
runs, box dictionaries occurrence. Used tokenization documents.
of the and frequency a For Companies to find 14-Mar-10 is
specific support for back be a single breakers similar in part numbers
termsrunning and ran People,example, the and Locations, but can be
the authority ranking ofcan custom wordknows thatfoundand language
stages representations. document.by lemma. Understands lemmatization
phrases. Also applies to used
stages
or telephone numbers.
specific grammar 14, context.
extended March and2010.
equivalentto any category.
25. Extending Pipeline capabilities
Straightforward way to add custom text analysis functionality
Configure Optional Processing Steps
XML Properties mapper
Offensive Content Filter
Field Collapsing
Verbatim (wholeword) extractor
Use a dictionary for custom extraction
Pipeline Extensibility
Calls external applications for custom item processing
Add Custom Processing Sandboxed execution
Pipeline Extensibility is a specially defined stage that takes a Executable arguments and temporary files are automatically
set of crawled properties, as flat text handled with timeouts.
as input and maps output to another crawled property Runs just before the Crawled Property Mapper, providing
accessibility within SharePoint
26. Powerful Entity Extraction business
Enables search-driven navigation that is relevant to your
PRODUCT
CONCEPT
COMPANY
27. Tune relevancy to improve accuracy
Changing content and users need require a flexible solution
Start with great relevance OOB
Tuned for great general productivity experience
Automatically improves relevancy with social
click-throughs and link text analysis
Create new relevance models Standard Sorting Options
MultipleRank Profiles
Blend static and dynamic ranking parameters to
instantly improve search results Custom Rank Profiles
Create with simple PowerShell commands
Expose as new sorting options
30. Robust query language
Use FAST Query Language (FQL) for precise query development
FQL provides a robust and expressive query
language
Wildcard support - *, ?
Numeric Data types (Integer, Float, Decimal, Datetime)
Operators
Direct field access (e.g., title:othello,
author:shakespeare)
Numeric (COUNT, RANGE, <, <=, >, >=)
Boolean (AND, OR, ANY, NOT)
Rank (RANK, XRANK)
Proximity (NEAR, ONEAR)
Sorting (SORT, SORTFORMULA)
String (operator support for strings)
Boundary (starts-with, ends-with, equals)
Filter
31. Introducing FAST Search for SharePoint
OOB User Experience
Tailoring General Productivity
Search Platform and Architecture
Search Driven Applications
Deployment and Administration
Summary and Resources
32. Searchthe search application needs you have across your business
Meet all
Driven Applications
Sales:
360o Customer Insight “How do I support the
Services: unique search needs of
Knowledge Browser teams and work that
Marketing: impact our business?”
Competitive Intelligence
Research & Development: To do so, you need a
Innovation Portal search platform that
Support: has
Call Center Advisor • A deep understanding of
Operations: your information
Systems/Logistics Portal • Flexible relevance to
meet diverse needs
Legal, HR, IT, Finance, …… • A customizable UX to
34. Top information from
Woodgrove…new market view
report to send to clients
Set of Customers to
explore, with rollup
Experts to help,
with availability
and rating
View of information across
different pivots, with drilldown
Immediate actions
on selected items
News and external
Drilldown to single view with all clues about a customer: portfolio, opinion to monitor
holdings, communications, annual and quarterly customer plans, etc… and send to clients
35. How would you create this?
Content Crawling: bring in data from lots of places
OOB connectors to SharePoint (reports, account documents), exchange public
folders, shared files;
BDC with customization in SPD (no code) for customer portfolio/holdings
Content processing: creating metadata
Names of holdings, offerings, key concepts, companies, people
Synonyms for key concepts (real estate ~ REIT)
OOB web parts configured for style
Federation, People Search, Search actions
Custom web parts for visual navigation
Roll-up configured via results collapsing
Custom relevance profile
SharePoint workflows for act-on-selected-items
36.
37.
38.
39.
40. Introducing FAST Search for SharePoint
OOB User Experience
Tailoring General Productivity
Search Platform and Architecture
Search Driven Applications
Deployment and Administration
Summary and Resources
41. Secure,or federate with content, to information
Index
unified access applications, and services
OpenSearch Federation
Search
Index
User Experience Enterprise Business Information
Content Applications Services
Indexing Connectors
43. FAST Search HW – Best Practices
Admin / Processing
Server
CPU: 2 x 2GHz+ (Quad/six core)
Memory: 24-48 GB
Disk:
2 x 300 GB, SAS, 10K RPM (RAID 1)
Storage Server CPU: 2 x 2GHz+ (Quad/six core)
Memory: 24-48 GB
Disk alternatives:
1.0 TB: 8 x 300 GB, SAS, 10K RPM (RAID10)
1.8 TB: 8 x 300 GB, SAS, 10K RPM (RAID 5)
3.6 TB: 16 x 300 GB, SAS, 10K RPM (RAID 5+0)
New: 7.2 TB: 16 x 600 GB, SAS, 10K RPM (RAID 5+0)
SAN: Configured for “database performance”
44. FAST Search – Main Components
SharePoint Crawler
SharePoint Crawler
Capacity: ~30 mill items per crawler node, SQL server needs to be scaled for high IO
SP Crawl
People Crawl
Crawl DB
Web Analyzer
CPU/disk footprint can vary by a factor of 10 depending on the content:
Web Analyzer - number of links
- length of links
- internal cross link ratio
Average capacity: ~30 mill items per web analyzer node
FAST-WA-1
Can be deployed with the Indexer in normal scenarios
Web Analyzer
Indexer/Search
Indexer/search node
Two supported models:
- Normal mode:
~15 mill items per node
~25 QPS
FAST-FSTIDX-11
Index/Search - High Density Mode:
~ 40 mill. items per node
~ 7 QPS
45. Rows and Columns
Columns give you more indexing
Need more Doc Processors and Content Distributor roles
Rows give you more query and redundancy
More Query roles
46. FAST Search – Pilot/Dev
Deployment
SP2010 Farm FAST Search for
SP 2010 Farm
All roles All roles
47. FAST Search – Extra Small Farm
SP2010 Farm FAST Search for
SharePoint 2010 Farm
Admin
Web Front End Index (Search)
Query Content Distributor
SP Crawl Indexing Dispatcher
People Crawl Web Analyzer
SQL Server 4 Docprocs+
(Index) Search
Web Front End Content Distributor
Query Indexing Dispatcher
SP Crawl Web Analyzer
People Crawl 4 Docprocs+
SQL Server
48. FAST Search – Small Deployment
SP2010 Farm FAST Search for SharePoint 2010 Farm
*
Admin Index (Search)
Web Front End Web Front End Content Distributor Content Distributor
Query Query Indexing Dispatcher Indexing Dispatcher
12 Docprocs+ 12 Docprocs+
Web Analyzer Web Analyzer
QR Server
* *
SP Crawl SP Crawl
People Crawl People Crawl
(Index) Search
QR Server
Search Admin DB
Crawl DB
SharePoint
SQL 2008 Cluster
Note:
Servers marked with * are only
needed for high availability
49. FAST Search – Medium
Deployment
SP2010 Farm FAST Search for SharePoint 2010 Farm
Admin Index (Search) Index (Search) Index (Search)
WFE WFE Content Distributor Content Distributor Web Analyzer Web Analyzer
Query SSA Query SSA Web Analyzer Web Analyzer Indexing Dispatcher Indexing Dispatcher
12 Docprocs+ 12 Docprocs+ 12 Docprocs+ 12 Docprocs+
SP Crawl SP Crawl (Index) Search (Index) Search (Index) Search
People Crawl People Crawl QR Server QR Server QR Server
Search Admin DB
Crawl DB
SharePoint DB
SQL 2008 Cluster
50. FAST Search – Large Deployment
SP2010 Farm
Web Front End Web Front End
SP Crawl SP Crawl
Query Query
People Crawl People Crawl
Search Admin DB
Crawl DB
SharePoint
SQL 2008 Cluster
FAST Search for SharePoint 2010 Farm
Admin Index (Search) Index (Search) Index (Search) Index (Search) Index (Search) Index (Search)
ConfigServer Content Distributor Indexing Dispatcher Indexing Dispatcher Web Analyzer Web Analyzer Web Analyzer
Content Distributor Web Analyzer Web Analyzer Web Analyzer 12 Docprocs+ 12 Docprocs+ 12 Docprocs+
Web Analyzer 12 Docprocs+ 12 Docprocs+ 12 Docprocs+
12 Docprocs+
(Index) Search (Index) Search (Index) Search (Index) Search (Index) Search (Index) Search
QR Server QR Server QR Server QR Server QR Server QR Server
51. Introducing FAST Search for SharePoint
OOB User Experience
Tailoring General Productivity
Search Platform and Architecture
Search Driven Applications
Deployment and Administration
Summary and Resources
52. Tools – QR Server
Neil Richard’s Blog Enabling the QR Server Blog Post - http://tinyurl.com/3b9ren4
53. FAST Search for Sharepoint Query Tool
http://fastforsharepoint.codeplex.com/
Connect to web app running FAST SSA (SP
box)
Use it to test FQL
54. Useful Resources
FAST University Training
MSDN & TechNet
Blogs
Leonardo De Souza’s Blog
http://searchunleashed.wordpresss.com
Thomas Svensen’s Blog
http://blogs.msdn.com/b/thomsven/
Comperio Search Nuggets
http://nuggets.comperiosearch.com/
Books
Editor's Notes
Example and a picture
Tony Hart & Mark Stone are working on user context Keyword – for a manual approachAdvanced - (Rank Profiles that are contextually aware)
Structure under the hood within SharePoint..done in layersContent: collected, processed, indexedQueries: federated, processed, searched, results passed backAlso, have developed a structure where people search is different. Uses profile store structure, relevance tuned for people, and lots of cool stuff.Leveraged these layers to ADD FAST to SharePointShunt content, federate queriesRemember, will see this in the user experience, as IT Pros, as DevelopersSee this in admin: FAST connector
-Query Pipeline – define. OM path used when the user sends a query and results are returned.-Web Service enh: (FAST) Refinement data; Query Suggestions; Click logging; SQL/FQL Syntax; Query Options (Phonetic, nickname, rel model)
Time: 1 minuteSpeaker notes:Patent searches typically involved very precise query terms. Note the use of fielded search and Boolean operators. Sophisticated queries are not a required, though.Note the use of visual refiners as bar charts.Also application code to Save Query, etc…And the rich search results with structured patent fields attributes like Pub#, Assignee, Inventor, …
Using new Services architectureBlue is core SharePointOrange is what is added with FAST
Include both quad + six cores (FS14 loves multicores). Disk setups: There are now 1TB 7200 RPM 3.5” SAS disks out that are quite attractively priced, not much more than the 300GBs. We are soon testing e.g. a Dell R510 with 12x1TB in RAID10 (since we do not need 10+TB disk space anyhow). Since FS14 is less IOPS demanding than ESP, they should be able to hold up. I have a setup in RAID50@NPG2 now (12x2TB, 20TB effective) for the “how far can we get” testing, and it’s more the CPU rather than IOPS that holds back on performance. RAID 10 should be even better for perf. These disks (at least when combined with the Dell H700 RAID controller) has also an amazing read/write bandwidth in RAID, e.g. ~900MB/s sustained for bulk reads/writes on the RAID50 above.
MAIN COMPONENTS, NEXT SLIDE SHOWS HOW TO SCALE OUTWeb Analyzer: Explain that it’s there for improved relevancy (The Page rank algorithm, etc.)
Thomas M
Notice:Two alternatives on SP side: Replicated components for HASplit SP/SQL environment for better performance
Notice:Scaling of the main components is based on QPS and content volume.Scaling of People Crawl component is based on QPS and # of people (employees).FAST farm can potentially be 1 (all) + 1 (indexer/search), but not good for scaling out and can give poor performance if heavy work on Web Analyzer processRef people crawl/query SSA: These can not be split, but can be scaled across multiple servers. The scaling for these are fully independent of the content volume on the FS14 backed, and should thus be scaled separately. Use (or point to) guidance from Search Server for scaling these along the two orthogonal axis: QPS and number of people.Additional servers on the Crawler side is only needed to ensure feeding redundancy.
Notes:Here you actually start to need 2x crawl in SP not only for redundancy, but also for network throughput (unless they have 10Gbit crawl <-> CD) 3 Col setup needs more than 1Gbit/s of data to utilize the system during feeds.“scale out” strategy: columns for data volume and rows for redundancy/query volume3 columns for 45M docs1 adm1 WADatabase nodes represent databases and dependent on the throughput of the database, this can run on one database server.
SQL server needs to be scaled for increased IO# of Web Analyzers depends on amount of clicks, 2 or 3 needed for 100M index