SlideShare a Scribd company logo
1 of 1
Download to read offline
Design Search Architectures for Microsoft SharePoint Server 2010
This poster will help you go through the initial design steps to determine a basic design for a Microsoft® SharePoint® Server 2010 search
architecture. Each successive step will help you refine the initial design by identifying key design drivers, starting with business requirements                                                                                                                                                                                                                                                                                                                                                    Search model 4 of 4
and metrics. You can use the outputs of each step to inform the next set of questions. After going through each step in this poster, you will be
able to map business requirements and key performance metrics to a baseline search architecture.


  1                                                                                                                                2                                                                                                                                                                                      3                                                                                                                                                        4
Identify corpus volume and key                                                                                                 Map key metrics to logical topology                                                                                                                                                     Map logical topology to a physical                                                                                                                      Stage, test, and iterate
performance metrics                                                                                                            and search components                                                                                                                                                                   architecture design                                                                                                                                     Now that you have an initial search architecture design, deploy to a staging
                                                                                                                                                                                                                                                                                                                                                                                                                                                                               environment, and then test to identify weak points in the design. You can then change
                                                                                                                               In this step, map key performance metrics and business requirements to specific logical                                                                                                 In this step, you will see how logical topology requirements map to hardware and physical
The number of items (sites, lists, items in document libraries, etc.) in the organization                                                                                                                                                                                                                                                                                                                                                                                      the design to resolve issues before you deploy to a production environment.
                                                                                                                               topology choices.                                                                                                                                                                       architecture design considerations. Use the information in the table below to determine the
plays a key role in determining architectural requirements for search.
                                                                                                                                                                                                                                                                                                                       physical servers and topology you will need to support the logical topology requirements
                                                                                                                               Use this table to decide how to distribute logical components to support the performance                                                                                                you identified in Step 2.
The following table describes how the number of items you plan to crawl affects design
decisions. Use this information to determine a starting-point architecture. For examples                                       metrics and other requirements you identified in Step 1.                                                                                                                                Hardware Logical                             # of items (in millions):                Physical topology considerations                                                                                                      Design and redesign
                                                                                                                                                                                                                                                                                                                       component component                                                                                                                                                                                                         based on test results
of starting-point architectures, see Poster 3 in this 4-part series.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Design
                                                                                                                                To satisfy this    Take these actions                                                                                                                                                                               <10               10-40     40-60     60-80    80-100
                                                                                                                                metric                                                                                                                                                                                 Query         Query          Shared with       2-4       5-6       7-8      9-10   The number of query servers is dependent on the volume
The starting point architecture you select in this step may change depending on your                                                                                                                                                                                                                                   server        server role    crawl server                                          of queries, the number of items in the corpus, and
requirements in the subsequent steps.                                                                                           Full crawl time    Add crawl servers, crawlers, and crawl databases.
                                                                                                                                                                                                                                                            Crawl                                                                                   role, or 1-2                                          redundancy and availability requirements.
                                                                                                                                and result                                                                                                                  server                                                                                  independent
                                                                                                                                freshness          Each crawl database can contain content from independent                                                                                                                                         query servers                                                                                                                                                                                                    Stage the test
Number of Items                       Starting point architecture                                                                                  sources. Each crawl database can have several crawlers                              Admin          Crawler                            Crawl db
                                                                                                                                                                                                                                                                                                                                                                                                             The query server role can run on the same server with any                                                                                              architecture and
                                                                                                                                                   associated with it, and those crawlers can be distributed                                                                                                                                                                                                 other SharePoint services or by itself. If you expect a large                        Load-test the                                                    implement design
0-1 million                           Limited deployment                                                                                           among multiple crawl servers. The more crawlers you use,                                                                                                                                                                                                  volume of query traffic, and require low query latency,                             design changes                                                         changes
                                                                                                                                                   the more parallel crawl processes can be performed.                                                                                                                                                                                                       consider deploying at least one dedicated query server.
1-10 million                          Small farm topology                                                                                                                                                                                                   Crawl                                   Crawl
                                                                                                                                                                                                                                                            server

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Test                             Stage
                                                                                                                                                                                                                                                                                                    server
                                                                                                                                                                                                                                                                                                                                     Index                                                                   Each index partition contains a discrete portion of the
10-20 million                         Medium shared farm topology                                                                                  If you have several content locations to crawl, multiple
                                                                                                                                                                                                                                      Admin         Crawler        Crawler              Crawler        Crawler                       partition (2                                                            corpus, and can contain up to 10 million items. Each index
                                                                                                                                                   crawlers and associated crawl databases enable you to
20-40 million                         Medium dedicated farm topology                                                                                                                                                                                                                                                                 per query                                                               partition can be “mirrored” using query components (see
                                                                                                                                                   crawl them concurrently. Because each crawler runs parallel                                                                                                                       server)                                                                 below). We recommend that you deploy one query server
40-100 million                        Large dedicated farm topology                                                                                to the others, overall crawl time is reduced as you add                                                                                                                                                                                                   for every two index partitions you add.
                                                                                                                                                   crawlers.
                                                                                                                                                                                                                                              Crawl db                                   Crawl db

                                                                                                                                                                                                                                                                                                                                     Query          1-4               4-8       10-12     14-16    18-20     Query components are mirror copies of a given index

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Stage
                                                                                                                                                                                                                                                                                                                                     component                                                               partition. Query components associated with the same
You now have to determine the importance and relative priority of performance
                                                                                                                                                                                                                                                                                                                                                                                                             index partition can be distributed among several query
requirements for the environment.                                                                                               Time required     If query latency or low query throughput is caused by high                                                                                                                                                                                                 servers for redundancy and to improve query
                                                                                                                                for results to be peak query load, deploy new query servers and multiple               Query server                            Query server                    Query server                                                                                                  performance.
The following table lists the different variables that compose the “big picture” of overall                                     returned          query components for existing index partitions across query                                                                                                                                                                                                Generally, two query components for a given index                  Using the decisions that you made by following the steps in
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Web server                                 Web server
performance. The relative importance of these variables in the environment is                                                                     servers.                                                                                                                                                                                                                                                   partition, each hosted on a different query server, are            this poster to drive the initial design, deploy the initial farm
                                                                                                                                                                                                                                                                                                                                                                                                             sufficient to fulfill performance and redundancy                   design to a staging environment. You should build the
generally driven by a service level agreement (SLA). Similarly, the SLA affects certain                                                                                                                                                    Index partition 1                                       Index partition 3
                                                                                                                                                   Each index partition can contain up to ~10 million items. If                                                                                                                                                                                              requirements.                                                      staging environment to exactly correspond to the production
design considerations.                                                                                                                                                                                                  Query comp 1                           Query comp 1m                       Query comp 3
                                                                                                                                                                                                                                                                                                                       Crawl         Crawl          Shared with      1-2        2         3        3-4       The crawl server role can run on the same server with any          design to ensure that test results accurately reflect the
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Query server                                Query server

                                                                                                                                                   you have more than 10 million items per index partition,            Index partition 3                                       Index partition 2
                                                                                                                                                                                                                                                                                                                       server        server role    query role, or 1                                         other SharePoint services or by itself. If you expect to crawl     behavior of the production environment.
                                                                                                                                                   add index partitions and distribute their query components                                                                                                                                                                                                                                                                                                                                                             Index partition 1
For example, if you know that query results must be returned in <1 second on average,                                                              across multiple query servers.
                                                                                                                                                                                                                       Query comp 3m                           Query comp 2m                        Query comp 2
                                                                                                                                                                                                                                                                                                                                                    independent                                              a large volume of content, plan to crawl content across a
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Query component 1                         Query component 1m
low query latency is a key requirement. If you also expect a large average volume of                                                                                                                                                                                                                                                                crawl server                                             variety of sources, or require that crawling takes place
                                                                                                                                                                                                                                                                                                                                                                                                             while queries are being performed, consider deploying at                                                                                                     Index partition 2

concurrent queries, these two factors together suggest the need for multiple query                                                                 If query latency or low query throughput is caused by
                                                                                                                                                                                                                                                         Crawl db                         Crawl db                                                                                                           least one dedicated crawl server.                                                                                                      Query component 2m                          Query component 2

servers and possibly several index partitions. If these factors are more important than                                                            database load, isolate the property database from crawl
                                                                                                                                                                                                                                                        Property db                                                                                                                                                                                                                                                                                          Crawl server                               Crawl server
                                                                                                                                                   databases by moving it to a separate database server. If you
crawl speed (for example, if the content to be crawled is fairly small in volume), you                                                                                                                                                               Search admin db                                                                 Crawler      1-2                 1-4       4         6        6-8       You can have as many crawlers on a given crawl server as
                                                                                                                                                   have over 25 million items in your corpus, or a large volume
should allocate more resources to the query role than the crawl server role.                                                                       of metadata, you may need to add another property
                                                                                                                                                                                                                                                                                                                                     (2 per crawl                                                            resources permit, but we recommend two per crawl
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Admin     Crawler                    Crawler
                                                                                                                                                                                                                                                                                                                                     server)                                                                 server. If you have a variety of content sources, you can
This metric                                              Is affected by these factors                                                              database.                                                                                                                                                                                                                                                 add crawlers and crawl databases and dedicate them to
Full crawl time and result freshness                     ·      Number of data sources                                          Availability of    Deploy redundant query servers, redundant index                                                                                                                                                                                                           specific sources.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 All other farm

                                                         ·      Data source response time                                       query              partitions and query components, and use clustered or                                                                                                                                                                                                     Each crawler on a given crawl server should be associated                                                                                           databases

                                                                                                                                functionality      mirrored database servers to host crawl and property                                                                                                                                                                                                      with a separate crawl database. For example, if you have                                                                                            Property db                               Crawl db
                                                         ·      Size and type of files                                                                                                                                                                              Index partition 1

                                                                                                                                                   databases.                                                                                                                                                                                                                                                two crawl servers and four crawlers, you should have two                                                                                            Search admin db
                                                         ·
                                                                                                                                                                                                                                         Query component 1                              Query component 1m
                                                                Network bandwidth                                                                                                                                                                                                                                                                                                                            crawl databases. See the example diagram below for
                                                         ·
                                                                                                                                                                                                                                                                   Index partition 2
                                                                Query load while crawling                                                                                                                                                                                                                                                                                                                    details.
                                                                                                                                                                                                                                        Query component 2m                              Query component 2

Time required for results to be returned                 ·    Number of concurrent user queries                                                                                                                                                                                                                        Database      Crawl          1                 1-2       2         3        3-4       The crawl database contains crawled content, and should
                                                                                                                                Availability of    Use multiple crawlers on redundant crawl servers, and add                                                                                                           server        database                                                                be maintained on a separate hard disk from the property
                                                         ·    Number of applications using Search                               content crawling   crawl databases. Crawlers associated with a given crawl                                                                                                                                                                                                   database as a best practice to prevent I/O contention. If

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Test
                                                                                                                                                                                                                                                            Crawl                                   Crawl
                                                                                                                                and indexing       database can be distributed across crawl servers for                                                     server                                  server                                                                                                   the crawl window overlaps with times when users are
Availability of query functionality                      ·    Hardware availability                                                                                                                                                                                                                                                                                                                          querying, or several crawlers are connected to a crawl
                                                                                                                                functionality      availability and load distribution.
Availability of content crawling and indexing            ·    Hardware availability
                                                                                                                                                                                                                                       Admin         Crawler        Crawler              Crawler        Crawler                                                                                              database, consider deploying the crawl database to a
                                                                                                                                                                                                                                                                                                                                                                                                             separate database server. You can also have multiple crawl
functionality                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Conduct load testing against the staged deployment. Load
                                                                                                                                                                                                                                                                                                                                                                                                             databases with different crawlers connected to them.                                                                                            Web server                                 Web server
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                testing will reveal any weak points in the design.
                                                                                                                                                                                                                                                 Crawl db                               Crawl db
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                In this example, test results reveal that when an expected                    Query server                               Query server
                                                                                                                                                                                                                                                                                                                                     Property       1                 1-2       2         3        3-4       The property database contains metadata for all crawled
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                volume of queries and other farm activity occur at the
                                                                                                                                                                                                                                                                                                                                     database                                                                content. You may need more than one property database
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                same time that crawling occurs, query latency increases to
                                                                                                                                                                                                                                                                                                                                                                                                             per 25 million items if there is a large amount of metadata                                                                                                  Index partition 1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                an unacceptable level because of resource contention on
                                                                                                                                                                                                                                                                                                                                                                                                             associated with crawled content.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                the database server hosting the crawl database and other             Query component 1                          Query component 1m


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                farm databases.                                                                           Index partition 2
                                                                                                                                                                                                                                                                                                                                     Search         1                 1         1         1        1         Only one Search Administration database is required per                                                                                Query component 2m                          Query component 2
                                                                                                                                                                                                                                                                                                                                     Admin                                                                   farm.                                                              This bottleneck is revealed through observation of
                                                                                                                                                                                                                                                                                                                                     database                                                                                                                                                                                                                Crawl server                               Crawl server
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                unacceptably high disk I/O and large disk queue lengths
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                on the database server that is hosting the crawl database.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Admin     Crawler                    Crawler




 Example – Company A                                                                                                                   Example – Company A                                                                                                                                                                      Example – Company A                                                                                                                                                                                                              All other farm
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 databases



 Step 1                                                                                                                                Step 2                                                                                                                                                                                   Step 3                                                                                                                                                                                                                           Property db

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Search admin db
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Crawl db




 In this example, Company A’s corpus contains 10-20          Requirements:                                                             Company A’s requirements from Step 1 were                                                                                                                                                In this step, Company A’s starting-point physical
 million items. Based on this corpus size, the best                                                                                    primarily low query latency and fast crawl speed.                                                                                                                                        architecture is revised to support the logical
                                                             · Low query latency                                                                                                                                                                                                                                                                                                                                   Web server                             Web server
 starting point architecture is the medium shared                                                                                                                                                                                                                                                                               topology requirements that were identified in step
                                                             · Fast crawl speed                                                                                                                                                                                  Web server

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Iterate
                                                                                                                                                                                                                 Web server
 search topology from Poster 3.                                                                                                        The number of items in the corpus (<20 million)                            Query server                                   Query server                                                   2.
                                                             · Concurrent crawls of
                                                                                                                                       requires a minimum of two index partitions, which
                                                               different content sources
 Company A’s business requirements include a very                                                                                      can effectively contain up to 10 million items each.                                  Index partition 1
                                                                                                                                                                                                                                                                                                                                To ensure adequate query throughput, the query
                                                                                                                                                                                                                                                                                                                                                                                                                    Query server                           Query server

 low average query latency, with search results              Number of items:                                                          If substantial growth of the corpus is expected,                Query component 1                          Query component 1m
                                                                                                                                                                                                                                                                                                                                role has been deployed to twp separate query
 returned in less than one second on average.                <20 million                                                               more index partitions can be added to anticipate a                                    Index partition 2
                                                                                                                                                                                                                                                                                                                                servers, isolating query functionality from Web                                                 Index partition 1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                To resolve the problem that testing revealed, a third
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Web server                                  Web server
                                                             Starting point architecture:                                              higher volume of items.                                         Query component 2m                          Query component 2                                                            server demand and maintaining redundancy of                                Query component 1                        Query component 1m
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                database server is added, and a second crawl database is
 Content to be crawled spans multiple content                Medium shared search topology                                                                                                                                                                                                                                      the query server role. This will help to ensure low                                                                                             added to the new server. Two new crawlers are added,
                                                                                                                                                                                                                                                                                                                                                                                                                                Index partition 2
 locations, some of which may be across low-                                                                                           To help ensure low query latency, the query                                                  Query                                          Query
                                                                                                                                                                                                                                                                                                                                query latency, and make it easier to scale out                                                                                                  one on each crawl server, to crawl content for the new                        Query server                                Query server
                                                                       Web server                               Web server                                                                                                        server role                                    server role                                                                                                               Query component 2m                       Query component 2
 bandwidth WAN connections.                                            Query server                             Query server           server role should be separated from the Web                                                                                                                                             this role to meet future needs without affecting                                                                                                crawl database.
                                                                                                                                       server role. Multiple index partitions and query                             Crawl server                                 Crawl server                                                   Web server performance.                                                            Crawl server                          Crawl server                                                                                                      Index partition 1
 A high peak query load is not expected, but content                                  Index partition 1                                components are needed.                                                                                                                                                                                                                                                                                                                   After changes have been made to resolve the problem
 freshness is important. This means that crawl                  Query component 1                         Query component 1m                                                                                                                                                                                                    Two crawl servers are required for redundancy                                                                                                   revealed in testing, test again to see whether the changes           Query component 1                          Query component 1m

                                                                                                                                                                                                            Admin     Crawler       Crawler         Crawler        Crawler
 speeds must be fast, and it is important that overall                               Index partition 2                                 Because Company A requires high crawl speed                                                                                                                                              and to help ensure that crawl speed is fast                                Admin     Crawler    Crawler        Crawler     Crawler              actually resolved the problem and also to identify any                                    Index partition 2


 crawling is not delayed if some content locations are                                                                                 and daily crawls to maximize freshness of query                                                                                                                                          enough to maintain content freshness. A crawler                                                                                                 other problems that may have been masked by issues that             Query component 2m                           Query component 2
                                                                Query component 2m                         Query component 2
 slow to respond.                                                                                                                      results, multiple crawl servers and crawlers are                                                                                                                                         is added to each of the two crawl servers to                                                                                                    have now been resolved.                                                      Crawl server
                                                                                                                                                                                                                           Crawl db                                                                                                                                                                                                                                                                                                                                                                     Crawl server

                                                                      Crawl server                              Crawl server
                                                                                                                                       needed. Because some of the content locations                                                                                                                                            accommodate multiple content locations, and a                                           Crawl db

                                                                                                                                       may be slow to respond, additional crawlers and a                                                                             Crawl db                                                   new crawl database is added on its own                                                                                       Crawl db




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           DRAFT
                                                                                                                                                                                                                        All other farm                                                                                                                                                                                                                                                                                                               Admin      Crawler     Crawler           Crawler     Crawler
                                                                                                                                       new crawl database should be added to increase                                   databases                                                                                               database server to service the new crawlers.                                           All other farm
                                                                                                                                                                                                                                                                                                                                                                                                                       databases
                                                                  Admin     Crawler                       Crawler                      parallelization of the crawl process. This will help                             Property db                                                                                                                                                                                    Property db
                                                                                                                                       to ensure that crawling of all content locations                                 Search admin db
                                                                                                                                                                                                                                                                                                                                In this particular case, the volume of metadata
                                                                                                                                                                                                                                                                                                                                                                                                                       Search admin db
                                                                                                                                       does not take more than 24 hours.                                                                                                                                                        associated with crawled content is estimated to                                                                                                                                                                                   Crawl db

                                                                                                                                                                                                                                                                                                                                be relatively small, so one property database                                                                                                 This document supports a preliminary release of                                                                              Crawl db
                                                                           All other farm
                                                                           databases
                                                                                                                    Crawl db
                                                                                                                                                                                                                                                                                                                                should be sufficient. However, if the volume of                   As shown in the example above, crawl databases should always run             a software program that bears the project code                                    All other farm
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 databases
                                                                                                                                                                                                                                                                                                                                metadata increases over time, or if the volume of                 on separate database servers from the property database to prevent
                                                                           Property db
                                                                                                                                                                                                                                                                                                                                the corpus approaches ~25 million items, another                  resource contention from queries when crawling is taking place.             name Microsoft® SharePoint® Server 2010 Beta.                                      Property db
                                                                           Search admin db
                                                                                                                                                                                                                                                                                                                                property database on an additional database                       In environments where crawl freshness is a priority, the crawl
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   © 2009 Microsoft Corporation. All rights                                      Search admin db

                                                                                                                                                                                                                                                                                                                                server will be needed to maintain adequate query                  database should generally be hosted on its own database server.                  reserved. To send feedback about this
                                                                                                                                                                                                                                                                                                                                throughput.                                                                                                                                          documentation, please write to us at
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         ITSPdocs@microsoft.com.

More Related Content

Similar to Search model 4 of 4 farm-level design

Jeff.robinson
Jeff.robinsonJeff.robinson
Jeff.robinsonNASAPMC
 
Jeff.robinson
Jeff.robinsonJeff.robinson
Jeff.robinsonNASAPMC
 
4+1view architecture
4+1view architecture4+1view architecture
4+1view architecturedrewz lin
 
4+1view architecture
4+1view architecture4+1view architecture
4+1view architectureTot Bob
 
Software design
Software designSoftware design
Software designambitlick
 
Unit 3 3 architectural design
Unit 3 3 architectural designUnit 3 3 architectural design
Unit 3 3 architectural designHiren Selani
 
Architecture solution architecture method
Architecture solution architecture methodArchitecture solution architecture method
Architecture solution architecture methodChris Eaton
 
FBSIC Functionalities Matrix
FBSIC Functionalities MatrixFBSIC Functionalities Matrix
FBSIC Functionalities MatrixFernando Gil
 
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...Chandrashekhar More
 
Design Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation SystemsDesign Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation SystemsProf. Dr. Alexander Maedche
 
Objectives, methodology, flowchart, and delivarable
Objectives, methodology, flowchart, and delivarableObjectives, methodology, flowchart, and delivarable
Objectives, methodology, flowchart, and delivarableFatini Fatini
 
Lecture Notes in Computer Science:
Lecture Notes in Computer Science:Lecture Notes in Computer Science:
Lecture Notes in Computer Science:butest
 
Enterprise search in SharePoint 2013 - Sydney 15th of January 2013
Enterprise search in SharePoint 2013 - Sydney 15th of January 2013Enterprise search in SharePoint 2013 - Sydney 15th of January 2013
Enterprise search in SharePoint 2013 - Sydney 15th of January 2013Findwise
 
Software Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction TechniquesSoftware Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction Techniquesaliraza786
 
Personality attrib software_arch
Personality attrib software_archPersonality attrib software_arch
Personality attrib software_archAnil Sharma
 
Scalable Keyword Cover Search using Keyword NNE and Inverted Indexing
Scalable Keyword Cover Search using Keyword NNE and Inverted IndexingScalable Keyword Cover Search using Keyword NNE and Inverted Indexing
Scalable Keyword Cover Search using Keyword NNE and Inverted IndexingIRJET Journal
 
Alm assessment poster en
Alm assessment poster enAlm assessment poster en
Alm assessment poster enreidca
 
Ba course content intensive
Ba course content intensiveBa course content intensive
Ba course content intensiveCGI Federal
 

Similar to Search model 4 of 4 farm-level design (20)

Jeff.robinson
Jeff.robinsonJeff.robinson
Jeff.robinson
 
Jeff.robinson
Jeff.robinsonJeff.robinson
Jeff.robinson
 
4+1view architecture
4+1view architecture4+1view architecture
4+1view architecture
 
4+1view architecture
4+1view architecture4+1view architecture
4+1view architecture
 
Software design
Software designSoftware design
Software design
 
Sdd template
Sdd templateSdd template
Sdd template
 
Unit 3 3 architectural design
Unit 3 3 architectural designUnit 3 3 architectural design
Unit 3 3 architectural design
 
Architecture solution architecture method
Architecture solution architecture methodArchitecture solution architecture method
Architecture solution architecture method
 
FBSIC Functionalities Matrix
FBSIC Functionalities MatrixFBSIC Functionalities Matrix
FBSIC Functionalities Matrix
 
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
Enterprise Architecture using TOGAF 's ADM - Architecture Delivery Method (...
 
Design Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation SystemsDesign Principles of Advanced Task Elicitation Systems
Design Principles of Advanced Task Elicitation Systems
 
Objectives, methodology, flowchart, and delivarable
Objectives, methodology, flowchart, and delivarableObjectives, methodology, flowchart, and delivarable
Objectives, methodology, flowchart, and delivarable
 
Lecture Notes in Computer Science:
Lecture Notes in Computer Science:Lecture Notes in Computer Science:
Lecture Notes in Computer Science:
 
Enterprise search in SharePoint 2013 - Sydney 15th of January 2013
Enterprise search in SharePoint 2013 - Sydney 15th of January 2013Enterprise search in SharePoint 2013 - Sydney 15th of January 2013
Enterprise search in SharePoint 2013 - Sydney 15th of January 2013
 
Software Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction TechniquesSoftware Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction Techniques
 
Green EA
Green EAGreen EA
Green EA
 
Personality attrib software_arch
Personality attrib software_archPersonality attrib software_arch
Personality attrib software_arch
 
Scalable Keyword Cover Search using Keyword NNE and Inverted Indexing
Scalable Keyword Cover Search using Keyword NNE and Inverted IndexingScalable Keyword Cover Search using Keyword NNE and Inverted Indexing
Scalable Keyword Cover Search using Keyword NNE and Inverted Indexing
 
Alm assessment poster en
Alm assessment poster enAlm assessment poster en
Alm assessment poster en
 
Ba course content intensive
Ba course content intensiveBa course content intensive
Ba course content intensive
 

More from Ard van Someren

Het outsourcing congres 20011 - Documents in/to The Cloud, do not forget unst...
Het outsourcing congres 20011 - Documents in/to The Cloud, do not forget unst...Het outsourcing congres 20011 - Documents in/to The Cloud, do not forget unst...
Het outsourcing congres 20011 - Documents in/to The Cloud, do not forget unst...Ard van Someren
 
Productbrochure Ricoh WerkBOX
Productbrochure Ricoh WerkBOXProductbrochure Ricoh WerkBOX
Productbrochure Ricoh WerkBOXArd van Someren
 
Onbeperkt archiveren en opslaan van documenten en informatie v4.1pub avs
Onbeperkt archiveren en opslaan van documenten en informatie v4.1pub   avsOnbeperkt archiveren en opslaan van documenten en informatie v4.1pub   avs
Onbeperkt archiveren en opslaan van documenten en informatie v4.1pub avsArd van Someren
 
Dspinformatielogistiekmei2010avsv01
Dspinformatielogistiekmei2010avsv01Dspinformatielogistiekmei2010avsv01
Dspinformatielogistiekmei2010avsv01Ard van Someren
 
IDRM2 informatie beheer en onderhoud
IDRM2  informatie beheer en onderhoudIDRM2  informatie beheer en onderhoud
IDRM2 informatie beheer en onderhoudArd van Someren
 
Upgrade services share_pointserver
Upgrade services share_pointserverUpgrade services share_pointserver
Upgrade services share_pointserverArd van Someren
 
Upgrade approaches share_pointproducts
Upgrade approaches share_pointproductsUpgrade approaches share_pointproducts
Upgrade approaches share_pointproductsArd van Someren
 
Topologies share pointserver2010
Topologies share pointserver2010Topologies share pointserver2010
Topologies share pointserver2010Ard van Someren
 
Svs singlefarm sharepointproducts2010
Svs singlefarm sharepointproducts2010Svs singlefarm sharepointproducts2010
Svs singlefarm sharepointproducts2010Ard van Someren
 
Svs crossfarm sharepointproducts2010
Svs crossfarm sharepointproducts2010Svs crossfarm sharepointproducts2010
Svs crossfarm sharepointproducts2010Ard van Someren
 
Search model 1 of 4 search technologies
Search model 1 of 4   search technologiesSearch model 1 of 4   search technologies
Search model 1 of 4 search technologiesArd van Someren
 
Oit2010 model extranet_topologies
Oit2010 model extranet_topologiesOit2010 model extranet_topologies
Oit2010 model extranet_topologiesArd van Someren
 
Hosting share pointproducts2010
Hosting share pointproducts2010Hosting share pointproducts2010
Hosting share pointproducts2010Ard van Someren
 
Deployment share point2010products
Deployment share point2010productsDeployment share point2010products
Deployment share point2010productsArd van Someren
 
Content deployment sharepointserver2010
Content deployment sharepointserver2010Content deployment sharepointserver2010
Content deployment sharepointserver2010Ard van Someren
 
Choose a tool for business intelligence in share point 2010
Choose a tool for business intelligence in share point 2010Choose a tool for business intelligence in share point 2010
Choose a tool for business intelligence in share point 2010Ard van Someren
 
Business connectivityservices model
Business connectivityservices modelBusiness connectivityservices model
Business connectivityservices modelArd van Someren
 
Business productivity at its best whitepaper
Business productivity at its best whitepaperBusiness productivity at its best whitepaper
Business productivity at its best whitepaperArd van Someren
 
Share point 2010 ard van someren working version
Share point 2010  ard van someren working versionShare point 2010  ard van someren working version
Share point 2010 ard van someren working versionArd van Someren
 

More from Ard van Someren (20)

Het outsourcing congres 20011 - Documents in/to The Cloud, do not forget unst...
Het outsourcing congres 20011 - Documents in/to The Cloud, do not forget unst...Het outsourcing congres 20011 - Documents in/to The Cloud, do not forget unst...
Het outsourcing congres 20011 - Documents in/to The Cloud, do not forget unst...
 
Productbrochure Ricoh WerkBOX
Productbrochure Ricoh WerkBOXProductbrochure Ricoh WerkBOX
Productbrochure Ricoh WerkBOX
 
Onbeperkt archiveren en opslaan van documenten en informatie v4.1pub avs
Onbeperkt archiveren en opslaan van documenten en informatie v4.1pub   avsOnbeperkt archiveren en opslaan van documenten en informatie v4.1pub   avs
Onbeperkt archiveren en opslaan van documenten en informatie v4.1pub avs
 
Dspinformatielogistiekmei2010avsv01
Dspinformatielogistiekmei2010avsv01Dspinformatielogistiekmei2010avsv01
Dspinformatielogistiekmei2010avsv01
 
IDRM2 informatie beheer en onderhoud
IDRM2  informatie beheer en onderhoudIDRM2  informatie beheer en onderhoud
IDRM2 informatie beheer en onderhoud
 
Upgrade services share_pointserver
Upgrade services share_pointserverUpgrade services share_pointserver
Upgrade services share_pointserver
 
Upgrade approaches share_pointproducts
Upgrade approaches share_pointproductsUpgrade approaches share_pointproducts
Upgrade approaches share_pointproducts
 
Topologies share pointserver2010
Topologies share pointserver2010Topologies share pointserver2010
Topologies share pointserver2010
 
Svs singlefarm sharepointproducts2010
Svs singlefarm sharepointproducts2010Svs singlefarm sharepointproducts2010
Svs singlefarm sharepointproducts2010
 
Svs crossfarm sharepointproducts2010
Svs crossfarm sharepointproducts2010Svs crossfarm sharepointproducts2010
Svs crossfarm sharepointproducts2010
 
Search model 1 of 4 search technologies
Search model 1 of 4   search technologiesSearch model 1 of 4   search technologies
Search model 1 of 4 search technologies
 
Oit2010 model extranet_topologies
Oit2010 model extranet_topologiesOit2010 model extranet_topologies
Oit2010 model extranet_topologies
 
Oit2010 model databases
Oit2010 model databasesOit2010 model databases
Oit2010 model databases
 
Hosting share pointproducts2010
Hosting share pointproducts2010Hosting share pointproducts2010
Hosting share pointproducts2010
 
Deployment share point2010products
Deployment share point2010productsDeployment share point2010products
Deployment share point2010products
 
Content deployment sharepointserver2010
Content deployment sharepointserver2010Content deployment sharepointserver2010
Content deployment sharepointserver2010
 
Choose a tool for business intelligence in share point 2010
Choose a tool for business intelligence in share point 2010Choose a tool for business intelligence in share point 2010
Choose a tool for business intelligence in share point 2010
 
Business connectivityservices model
Business connectivityservices modelBusiness connectivityservices model
Business connectivityservices model
 
Business productivity at its best whitepaper
Business productivity at its best whitepaperBusiness productivity at its best whitepaper
Business productivity at its best whitepaper
 
Share point 2010 ard van someren working version
Share point 2010  ard van someren working versionShare point 2010  ard van someren working version
Share point 2010 ard van someren working version
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Search model 4 of 4 farm-level design

  • 1. Design Search Architectures for Microsoft SharePoint Server 2010 This poster will help you go through the initial design steps to determine a basic design for a Microsoft® SharePoint® Server 2010 search architecture. Each successive step will help you refine the initial design by identifying key design drivers, starting with business requirements Search model 4 of 4 and metrics. You can use the outputs of each step to inform the next set of questions. After going through each step in this poster, you will be able to map business requirements and key performance metrics to a baseline search architecture. 1 2 3 4 Identify corpus volume and key Map key metrics to logical topology Map logical topology to a physical Stage, test, and iterate performance metrics and search components architecture design Now that you have an initial search architecture design, deploy to a staging environment, and then test to identify weak points in the design. You can then change In this step, map key performance metrics and business requirements to specific logical In this step, you will see how logical topology requirements map to hardware and physical The number of items (sites, lists, items in document libraries, etc.) in the organization the design to resolve issues before you deploy to a production environment. topology choices. architecture design considerations. Use the information in the table below to determine the plays a key role in determining architectural requirements for search. physical servers and topology you will need to support the logical topology requirements Use this table to decide how to distribute logical components to support the performance you identified in Step 2. The following table describes how the number of items you plan to crawl affects design decisions. Use this information to determine a starting-point architecture. For examples metrics and other requirements you identified in Step 1. Hardware Logical # of items (in millions): Physical topology considerations Design and redesign component component based on test results of starting-point architectures, see Poster 3 in this 4-part series. Design To satisfy this Take these actions <10 10-40 40-60 60-80 80-100 metric Query Query Shared with 2-4 5-6 7-8 9-10 The number of query servers is dependent on the volume The starting point architecture you select in this step may change depending on your server server role crawl server of queries, the number of items in the corpus, and requirements in the subsequent steps. Full crawl time Add crawl servers, crawlers, and crawl databases. Crawl role, or 1-2 redundancy and availability requirements. and result server independent freshness Each crawl database can contain content from independent query servers Stage the test Number of Items Starting point architecture sources. Each crawl database can have several crawlers Admin Crawler Crawl db The query server role can run on the same server with any architecture and associated with it, and those crawlers can be distributed other SharePoint services or by itself. If you expect a large Load-test the implement design 0-1 million Limited deployment among multiple crawl servers. The more crawlers you use, volume of query traffic, and require low query latency, design changes changes the more parallel crawl processes can be performed. consider deploying at least one dedicated query server. 1-10 million Small farm topology Crawl Crawl server Test Stage server Index Each index partition contains a discrete portion of the 10-20 million Medium shared farm topology If you have several content locations to crawl, multiple Admin Crawler Crawler Crawler Crawler partition (2 corpus, and can contain up to 10 million items. Each index crawlers and associated crawl databases enable you to 20-40 million Medium dedicated farm topology per query partition can be “mirrored” using query components (see crawl them concurrently. Because each crawler runs parallel server) below). We recommend that you deploy one query server 40-100 million Large dedicated farm topology to the others, overall crawl time is reduced as you add for every two index partitions you add. crawlers. Crawl db Crawl db Query 1-4 4-8 10-12 14-16 18-20 Query components are mirror copies of a given index Stage component partition. Query components associated with the same You now have to determine the importance and relative priority of performance index partition can be distributed among several query requirements for the environment. Time required If query latency or low query throughput is caused by high servers for redundancy and to improve query for results to be peak query load, deploy new query servers and multiple Query server Query server Query server performance. The following table lists the different variables that compose the “big picture” of overall returned query components for existing index partitions across query Generally, two query components for a given index Using the decisions that you made by following the steps in Web server Web server performance. The relative importance of these variables in the environment is servers. partition, each hosted on a different query server, are this poster to drive the initial design, deploy the initial farm sufficient to fulfill performance and redundancy design to a staging environment. You should build the generally driven by a service level agreement (SLA). Similarly, the SLA affects certain Index partition 1 Index partition 3 Each index partition can contain up to ~10 million items. If requirements. staging environment to exactly correspond to the production design considerations. Query comp 1 Query comp 1m Query comp 3 Crawl Crawl Shared with 1-2 2 3 3-4 The crawl server role can run on the same server with any design to ensure that test results accurately reflect the Query server Query server you have more than 10 million items per index partition, Index partition 3 Index partition 2 server server role query role, or 1 other SharePoint services or by itself. If you expect to crawl behavior of the production environment. add index partitions and distribute their query components Index partition 1 For example, if you know that query results must be returned in <1 second on average, across multiple query servers. Query comp 3m Query comp 2m Query comp 2 independent a large volume of content, plan to crawl content across a Query component 1 Query component 1m low query latency is a key requirement. If you also expect a large average volume of crawl server variety of sources, or require that crawling takes place while queries are being performed, consider deploying at Index partition 2 concurrent queries, these two factors together suggest the need for multiple query If query latency or low query throughput is caused by Crawl db Crawl db least one dedicated crawl server. Query component 2m Query component 2 servers and possibly several index partitions. If these factors are more important than database load, isolate the property database from crawl Property db Crawl server Crawl server databases by moving it to a separate database server. If you crawl speed (for example, if the content to be crawled is fairly small in volume), you Search admin db Crawler 1-2 1-4 4 6 6-8 You can have as many crawlers on a given crawl server as have over 25 million items in your corpus, or a large volume should allocate more resources to the query role than the crawl server role. of metadata, you may need to add another property (2 per crawl resources permit, but we recommend two per crawl Admin Crawler Crawler server) server. If you have a variety of content sources, you can This metric Is affected by these factors database. add crawlers and crawl databases and dedicate them to Full crawl time and result freshness · Number of data sources Availability of Deploy redundant query servers, redundant index specific sources. All other farm · Data source response time query partitions and query components, and use clustered or Each crawler on a given crawl server should be associated databases functionality mirrored database servers to host crawl and property with a separate crawl database. For example, if you have Property db Crawl db · Size and type of files Index partition 1 databases. two crawl servers and four crawlers, you should have two Search admin db · Query component 1 Query component 1m Network bandwidth crawl databases. See the example diagram below for · Index partition 2 Query load while crawling details. Query component 2m Query component 2 Time required for results to be returned · Number of concurrent user queries Database Crawl 1 1-2 2 3 3-4 The crawl database contains crawled content, and should Availability of Use multiple crawlers on redundant crawl servers, and add server database be maintained on a separate hard disk from the property · Number of applications using Search content crawling crawl databases. Crawlers associated with a given crawl database as a best practice to prevent I/O contention. If Test Crawl Crawl and indexing database can be distributed across crawl servers for server server the crawl window overlaps with times when users are Availability of query functionality · Hardware availability querying, or several crawlers are connected to a crawl functionality availability and load distribution. Availability of content crawling and indexing · Hardware availability Admin Crawler Crawler Crawler Crawler database, consider deploying the crawl database to a separate database server. You can also have multiple crawl functionality Conduct load testing against the staged deployment. Load databases with different crawlers connected to them. Web server Web server testing will reveal any weak points in the design. Crawl db Crawl db In this example, test results reveal that when an expected Query server Query server Property 1 1-2 2 3 3-4 The property database contains metadata for all crawled volume of queries and other farm activity occur at the database content. You may need more than one property database same time that crawling occurs, query latency increases to per 25 million items if there is a large amount of metadata Index partition 1 an unacceptable level because of resource contention on associated with crawled content. the database server hosting the crawl database and other Query component 1 Query component 1m farm databases. Index partition 2 Search 1 1 1 1 1 Only one Search Administration database is required per Query component 2m Query component 2 Admin farm. This bottleneck is revealed through observation of database Crawl server Crawl server unacceptably high disk I/O and large disk queue lengths on the database server that is hosting the crawl database. Admin Crawler Crawler Example – Company A Example – Company A Example – Company A All other farm databases Step 1 Step 2 Step 3 Property db Search admin db Crawl db In this example, Company A’s corpus contains 10-20 Requirements: Company A’s requirements from Step 1 were In this step, Company A’s starting-point physical million items. Based on this corpus size, the best primarily low query latency and fast crawl speed. architecture is revised to support the logical · Low query latency Web server Web server starting point architecture is the medium shared topology requirements that were identified in step · Fast crawl speed Web server Iterate Web server search topology from Poster 3. The number of items in the corpus (<20 million) Query server Query server 2. · Concurrent crawls of requires a minimum of two index partitions, which different content sources Company A’s business requirements include a very can effectively contain up to 10 million items each. Index partition 1 To ensure adequate query throughput, the query Query server Query server low average query latency, with search results Number of items: If substantial growth of the corpus is expected, Query component 1 Query component 1m role has been deployed to twp separate query returned in less than one second on average. <20 million more index partitions can be added to anticipate a Index partition 2 servers, isolating query functionality from Web Index partition 1 To resolve the problem that testing revealed, a third Web server Web server Starting point architecture: higher volume of items. Query component 2m Query component 2 server demand and maintaining redundancy of Query component 1 Query component 1m database server is added, and a second crawl database is Content to be crawled spans multiple content Medium shared search topology the query server role. This will help to ensure low added to the new server. Two new crawlers are added, Index partition 2 locations, some of which may be across low- To help ensure low query latency, the query Query Query query latency, and make it easier to scale out one on each crawl server, to crawl content for the new Query server Query server Web server Web server server role server role Query component 2m Query component 2 bandwidth WAN connections. Query server Query server server role should be separated from the Web this role to meet future needs without affecting crawl database. server role. Multiple index partitions and query Crawl server Crawl server Web server performance. Crawl server Crawl server Index partition 1 A high peak query load is not expected, but content Index partition 1 components are needed. After changes have been made to resolve the problem freshness is important. This means that crawl Query component 1 Query component 1m Two crawl servers are required for redundancy revealed in testing, test again to see whether the changes Query component 1 Query component 1m Admin Crawler Crawler Crawler Crawler speeds must be fast, and it is important that overall Index partition 2 Because Company A requires high crawl speed and to help ensure that crawl speed is fast Admin Crawler Crawler Crawler Crawler actually resolved the problem and also to identify any Index partition 2 crawling is not delayed if some content locations are and daily crawls to maximize freshness of query enough to maintain content freshness. A crawler other problems that may have been masked by issues that Query component 2m Query component 2 Query component 2m Query component 2 slow to respond. results, multiple crawl servers and crawlers are is added to each of the two crawl servers to have now been resolved. Crawl server Crawl db Crawl server Crawl server Crawl server needed. Because some of the content locations accommodate multiple content locations, and a Crawl db may be slow to respond, additional crawlers and a Crawl db new crawl database is added on its own Crawl db DRAFT All other farm Admin Crawler Crawler Crawler Crawler new crawl database should be added to increase databases database server to service the new crawlers. All other farm databases Admin Crawler Crawler parallelization of the crawl process. This will help Property db Property db to ensure that crawling of all content locations Search admin db In this particular case, the volume of metadata Search admin db does not take more than 24 hours. associated with crawled content is estimated to Crawl db be relatively small, so one property database This document supports a preliminary release of Crawl db All other farm databases Crawl db should be sufficient. However, if the volume of As shown in the example above, crawl databases should always run a software program that bears the project code All other farm databases metadata increases over time, or if the volume of on separate database servers from the property database to prevent Property db the corpus approaches ~25 million items, another resource contention from queries when crawling is taking place. name Microsoft® SharePoint® Server 2010 Beta. Property db Search admin db property database on an additional database In environments where crawl freshness is a priority, the crawl © 2009 Microsoft Corporation. All rights Search admin db server will be needed to maintain adequate query database should generally be hosted on its own database server. reserved. To send feedback about this throughput. documentation, please write to us at ITSPdocs@microsoft.com.