SlideShare a Scribd company logo
1 of 17
Download to read offline
PAGE 1 of 17 (877)-476-5973 www.polyserve.com
White Paper
Scalable Shared Databases for SQL Server 2005
Achieving Linear Scalability for Scale-out Reporting using
SQL Server 2005 Enterprise Edition
Abstract: Microsoft SQL Server™ 2005 Enterprise Edition supports scale-out
reporting through scalable shared databases. Scale-out reporting enables multiple
SQL Server 2005 systems to attach a read-only copy of the same database.
When deployed using PolyServe’s Database Utility™ for SQL Server, Enterprises
reduce report completion times by up to 16x. The solution reduces storage
complexity, simplifying SQL Server scale-out for complex, off-hours reporting
workloads. The PolyServe solution enables rapid transformation of OLTP to read-
only data warehousing for scale-out and back again to OLTP—in seconds.
This proof of concept (POC) demonstrates PolyServe’s solution for scalable shared
databases. The POC consists of a 4-node PolyServe Matrix Server cluster running
PolyServe’s Database Utility for SQL Server and Microsoft SQL Server 2005
Enterprise Edition, connected to a SAN.
WhWr
PAGE 1 of 17 (877)-476-5973 www.polyserve.com
White Paper
Scalable Shared Databases for SQL Server 2005.....................................................1
Achieving Linear Scalability for Scale-out Reporting using SQL Server 2005 Enterprise Edition ....... 1
Introduction ...............................................................................................................2
Single System Performance Limits ..................................................................................................... 2
Data Warehousing Challenges ........................................................................................................... 3
Introducing the Scalable Shared Database................................................................3
Analysis............................................................................................................................................... 4
Another Model for Scale Out: Shared Data................................................................5
Storage Management.......................................................................................................................... 5
Concurrent Scalability—for Scale Out................................................................................................. 6
Proof of Concept Results...........................................................................................6
Data Center Use Case........................................................................................................................ 7
Configuration Overview....................................................................................................................... 8
Database Configuration Overview ...................................................................................................... 8
Performance Results........................................................................................................................... 8
Conclusions.............................................................................................................11
The Database Utility Overview ................................................................................12
The Database Utility™ for SQL Server Components ...............................................13
PolyServe’s Cluster Volume Manager .....................................................................14
Summary.................................................................................................................15
WhWr
PAGE 2 of 17 (877)-476-5973 www.polyserve.com
White Paper
Introduction
With rapidly growing production databases deployed on Microsoft SQL Server 2005,
the need for timely and scalable reporting operations has become a business-critical
need among Enterprises.
Today, the size of databases deployed on individual SQL Server systems is often
measured in the hundreds of Gigabytes to Terabytes. For larger databases, data
warehouse preparation is challenging and time-consuming.
Further, companies face time-constrained reporting windows where there is
effectively less time and (as the database grows) less available computing power to
complete resource-intensive reporting jobs within an off-hours reporting window.
Single System Performance Limits
When the power of SQL Server 2005 is combined with modern, industry-standard
x64 servers and Storage Area Networks (SANs), Enterprises are provided with a
robust platform for deploying mission-critical databases, at an optimal price-
performance. For single system scalability, SQL Server 2005 possesses numerous
performance and memory management advancements to exploit resource utilization
within a single system.
For Online Transaction Processing (OLTP)—where queries tend to be shorter and
less resource-intensive—a single server often provides adequate bandwidth.
However, more complex workloads—such as reporting, ad hoc queries, and data
warehouse preparation—often require more throughput than a single server can
provide. Scanning tables, sorting large amounts of data, and running multiple
reporting jobs concurrently—all against the same database—these are resource-
intensive tasks that can easily overburden a single server system. For these
workloads, the server is often the bottleneck.
Consider a business requirement to execute 8 reports against a 300GB table used
primarily for OLTP. The indexes have been optimized for OLTP. The query plans
generated for the reports are based upon full table scans. If such reporting saturates
a single server, the only way to complete the 8 reporting jobs in less time is to scale
out to several servers, running multiple reports concurrently across multiple servers.
WhWr
PAGE 3 of 17 (877)-476-5973 www.polyserve.com
White Paper
Data Warehousing Challenges
Data warehouses and data marts often start small and simple (with a fact table or
two and a few dimension tables). If successful, these small data warehouses may
grow across an organization over time, transforming into corporate-wide repositories
used for business intelligence and senior management decision support.
One goal behind transforming a large read-write database to a read-only data
warehouses is to offload performance-intensive reporting functions to another server.
Another overarching goal of a data warehouse is to maintain fresh data. Every
second spent on data warehouse preparation means less time spent on reporting—
and progressively less up-to-date data.
From a performance perspective, it does not take long for multiple reporting
operations to overburden servers and direct-attached storage during the off-hours
data warehouse preparation and reporting usage periods.
When faced with a fixed amount of time and fixed amount of bandwidth, the scalable
shared database is a revolutionary breakthrough.
Introducing the Scalable Shared Database
Microsoft SQL Server™ 2005 Enterprise Edition supports scale-out reporting through
Scalable Shared Databases. Scale-out reporting enables multiple SQL Server 2005
instances to attach a read-only version of the database.
The KB article focuses on use-cases for reporting and data warehouse activity.
In summary, the implementation described in the Microsoft KB article recommends
the following configuration guidelines:
• Read Only NTFS Volumes. A Scalable Database must reside in a read-only
volume or set of volumes. Note: this POC validates using Scalable Shared
Databases on the PSFS (PolyServe File System), an NTFS-compatible file
system built using Microsoft’s Installable File System (IFS) Kit.
• Private tempdb. The Scalable Database must be attached to an instance
that has private tempdb.
• SAN. The database must reside on a read-only volume configured using the
diskpart.exe utility from the database server attached to a Storage Area
Network. This POC obviates the need for this function.
• Windows Version Requirement. Scalable Shared Databases are supported
only on Microsoft Windows Server 2003 Service Pack 1 or later.
WhWr
PAGE 4 of 17 (877)-476-5973 www.polyserve.com
White Paper
The non-shared data approach described in the KB article documents how to
maintain an updated copy of the production database. To use this copy of the
database for scale-out reporting, a new read-only volume must be created and
managed using DISKPART.exe. To do so, a volume and its database copy must be
unmounted and remounted in read-only mode on all the reporting servers.
Conversely, the volume must then be unmounted from all the reporting servers and
re-mounted in read-write mode to refresh its contents (i.e., bring the database copy
up to date with the production database). This cycle must be repeated for each
reporting exercise. To say the least, it’s a complicated process.
This approach has been tested with scaling out to 8 nodes by Microsoft Corporation.
Analysis
The approach covered in the KB article is based on using read-only volumes
containing replicated database copies for reporting. This is a functional approach.
But, given the operational overhead, likely not entirely useful.
An essential premise behind scalable shared databases is that the production
database is large and growing and cannot be easily serviced by a single server for
reporting or ad hoc query purposes.
There are still challenges with scale out:
• Each database requires a replicate or snapshot copy to manage. This
creates storage management overhead. There is effectively more storage
and data to manage, across more logical management points—for both the
storage operator and DBA.
• Costly processing refreshes. Refreshing the replicated database affects
the production database.
• Challenging to administer free space. There will be space management
tasks for 1 large production database and its large replica.
• Challenging to maintain. Since the database is large, it would likely reside
in several volumes, each of which must be individually managed. Mounting
and unmounting volumes on each reporting server also creates
administration overhead.
WhWr
PAGE 5 of 17 (877)-476-5973 www.polyserve.com
White Paper
In the non-shared data approach, the volumes used for the replicated reporting
database should be dedicated exclusively to reporting, since the volumes will be
routinely unmounted and re-mounted in read-only mode. This will likely increase the
total number of volumes to manage, per cluster.
Another Model for Scale Out: Shared Data
Another deployment option for scaling out databases on SQL Server 2005 is shared
data. Through PolyServe’s Database Utility, all servers have read-write access to all
storage. This makes storage management, data warehouse preparation time, and
scale out and back for reporting, an easy operation.
To fully exploit the scale-out databases, an environment that supports scaling out to
many nodes without replication and without explicit volume creation (as read-only),
or mount/unmount operations, may be supported using the PolyServe solution.
Storage Management
Storage management is greatly simplified because all servers can “see” all storage.
There are few volumes to manage, and storage does not have to be reconfigured. In
the utility model, storage does not have to be reconfigured to be used in support of
WhWr
PAGE 6 of 17 (877)-476-5973 www.polyserve.com
White Paper
scalable shared databases. Once provisioned, databases can be easily moved from
server to server, in support of scale out and scale back operations.
Concurrent Scalability—for Scale Out
Through a simple script operation, all SQL Server systems may attach the database
as read-only. Reports or ad hoc queries can safely run on a server without affecting
the performance of reports or queries running on other servers.
In effect, reports are isolated to a given server. More servers can be added as
needed, running concurrently across more servers—against the same data.
Proof of Concept Results
As summarized above, SQL Server 2005 enables attaching a database in a read-
only volume to more than one server. With PolyServe, there is no manual volume
creation required. Attaching the read-only database to multiple servers can be
automated through scheduling, or performed manually through a few easy steps.
This vastly simplifies the job for the DBA. Now, instead of having to create, mount
and un-mount multiple volumes to each new server in the cluster, a simple file group
permissions operation is sufficient. Thus, the production database itself can be
scaled out (mounted by multiple servers) in support of reporting operations.
The benefits of Scalable Shared Database in the PolyServe Database Utility for SQL
Server:
• No stale data—work in near-real-time. The database can be kept up-to-date
since no copying via replication nor are snapshots required. While replication
and snapshots are both supported in the PolyServe solution, there is no need
for replication, or snapshots—unless the reporting job takes place on a
remote cluster.
• Single-touch scale-out. Since this solution is based on PolyServe’s solution
for Windows Server, the degree of scale-out is 16 servers.
• No need for operator intervention to perform complicated mount/unmount
operations
• Rapid transformation from OLTP to Scalable Shared Database for reporting
operations, and back again
o Transform from OLTP to Scalable Shared Database on 16 servers in
less than 60 seconds
o Scale back to read-write mode in less then 30 seconds
WhWr
PAGE 7 of 17 (877)-476-5973 www.polyserve.com
White Paper
Figure 1: Proof of Concept Main Table with 1,000,000 rows
o Reporting can begin within seconds on current data on up to 16
servers
• Volumes can be shared—and do not have to be dedicated to the scale-out
database
• Tempdb databases for all servers can reside in the same high-performance
volume as the production database
• No need to burden internal drives and other server resources. Since
operations are automated, there is less burden on each server hosting the
scale-out database.
• No “volume sprawl”. Since all pre-defined servers can share volumes in the
SAN, there are fewer volumes to create, manage, and backup.
• Single point of backup. During the reporting window, one or more nodes in
the cluster may be dedicated to backing up the production database while
reporting jobs occur in parallel.
Data Center Use Case
Most datacenters dedicate
certain hours of operation to
reporting, ad hoc queries, and
data warehouse preparation.
With PolyServe’s Utility
approach to Scalable Shared
Databases, not only is the
transformation of the same
database from OLTP to
reporting performed in a matter
of seconds, but the degree of
scalability can be up to 16
servers. With 16 servers
working on a set of reports, or
performing intensive data
warehouse preparation, the
resultant window is reduced
significantly freeing time to
perform important database
maintenance operations.
For example, consider a data
center that currently performs,
say, 50 reports each night in a
reporting window of 4 hours. With the PolyServe approach to Scalable Shared
WhWr
PAGE 8 of 17 (877)-476-5973 www.polyserve.com
White Paper
Figure 2: Proof of Concept Main Table
File Properties
Database, the reporting could be reduced to as little as approximately 1/16th
the time,
or roughly 15 minutes. With roughly 3 hours and 45 minutes of new found “free
time”, it is then possible to scale back to OLTP mode to perform maintenance tasks
such as index reorganization or statistics updating.
Configuration Overview
The Proof of Concept consisted of a 4-node PolyServe Matrix Server cluster for
Windows, running SQL Server 2005 Enterprise Edition, attached to a Fiber Channel
SAN. The servers were commodity dual-processor systems configured with 2GB
physical memory. A PolyServe cluster filesystem was created in a high-performance
cluster volume, created with Matrix Volume Manager, and mounted as drive “S:”
Database Configuration Overview
The database used for the Scalable Shared Database was called debit. The debit
database had a single table, called card, which had 100,000,000 credit card
transactions (see Figure 1). The properties of the primary file for the debit table is
shown to the right, in Figure 2. The card schema is depicted in Figure 3, on the
following page.
The PolyServe cluster filesystem
mounted as S: contained directories for
both the scale-out database and all of
the tempdb databases required by the
scale-out servers; this is a simple way
to deploy Scalable Shared Databases.
Figure 4 shows, simply, how easy it is
to have many tempdb databases
located in a single, high performance
PolyServe cluster filesystem in support
of a large scale-out configuration. In
the example, 4 directories were
created named for each of the servers
(e.g., tmr6s3) used in the Proof of
Concept. The total space consumed by
the card table was roughly 4.3GB.
Performance Results
Using the card table described above,
a set of queries were constructed to
simulate a reporting workload that tests the Scalable Shared Database feature in the
Database Utility model. The queries were constructed to stress all aspects of query
processing.
WhWr
PAGE 9 of 17 (877)-476-5973 www.polyserve.com
White Paper
Figure 3: Card Table Attributes
These workloads include:
• Physical Disk I/O
o The Query Plans involved full
table scans of 100,000,000
rows
o A full scan of the card table
required 4.3GB of physical disk
reads
• Processor and Memory Utilization
o Data filtering
Processing the WHERE
predicate
o Sorting
o Grouping
o Aggregation
The measured test consisted of executing 8
consecutive ad hoc queries based on the
example listed in the box below. As mentioned
above, the card table was 4.3GB so the total
reporting workload consisted of 34.4GB of
sequential I/O. Randomization of the queries
was achieved by plugging in different values
for m and n in the BETWEEN clause. There
were approximately 1,000,000 unique vendors
stored in the vendor_id column, and 26
transaction types stored in the trantype
column. On average, randomly assigning
values for the BETWEEN clause rendered a dataset of approximately 13,000,000
rows for the sorting, grouping and aggregation tasks.
After the baseline results were collected from 1 server, the PolyServe sql_scale.exe
command was executed to prepare the Scalable Shared Database on 2 servers. The
same 8 queries were then executed 4 per server and complete times measured. The
test was then scaled out to 4 servers where 2 of the 8 reports were executed on
each server and complete times measured.
Between each server count test, the database was transformed from scale-out mode
to OLTP mode and a few more rows were inserted in order to validate the
SELECT vendor_id, avg(amt) avamt FROM card
WHERE trantype BETWEEN m AND n
GROUP BY vendor_id
ORDER BY avamt desc;
WhWr
PAGE 10 of 17 (877)-476-5973 www.polyserve.com
White Paper
Figure 4: Simplicity of using PolyServe Matrix Server for
tempdb requirements
transformation from Scalable Shared Database mode to OLTP mode. The database
was then transformed once again to a scale-out database on the next number of
servers to be tested (e.g., scaling from 2 to 4 servers). At most only 58 seconds
transpired between measured executions of the query set. That is, no more than 58
seconds was required for the following tasks between measured runs of the
benchmark:
1. Scaling back the database from scale-out read-only mode to OLTP read/write
mode
2. Execute a small number of insertions into the card table
3. Scaling out the database to the next number of nodes (e.g., from 2 to 4).
The task of scaling out the database includes the startup time for all the instances.
Incidentally, with PolyServe’s solution, the instances can be started in parallel on all
servers so the preparation time of the Scalable Shared Database was standardized.
Measured Results
The baseline job complete time
for the 8 queries executed on 1
server was 48 minutes. After
scaling out with the
sql_scale.exe tool and running
4 queries on each of 2 servers,
the job complete time was
reduced to 24 minutes – linear
scalability. Finally, the
sql_scale.exe tool was used to
scale-out to 4 servers. The
same 8 queries were once
again executed, 2 per server.
The job complete time was 14
minutes as depicted in Figure 6.
All told, the scalability from 1 to
4 nodes was 86%.
WhWr
PAGE 11 of 17 (877)-476-5973 www.polyserve.com
White Paper
Reporting Workload Complete Times
0
10
20
30
40
50
60
1 2 4
Number of Servers
Minutes
Figure 6: Scalability of the Scalable Shared Database on PolyServe Matrix Server
Linear Scalability Requires a Balanced Hardware Configuration
Given the architecture of the Scalable Shared Database, the only component that
can affect scalability is storage bandwidth, as documented in this Proof of Concept at
4 servers. The storage allocated to the small Matrix Server test cluster from the SAN
was sufficient to sustain increased I/O demand from 1 to 2 servers, but I/O latency
increased from 2 to 4 servers. Thus, scalability was slightly affected. Adding disk
capacity in the PolyServe Database Utility for SQL Server is a non-intrusive
adminstrative action so this sort of bottleneck is simple to remedy.
Conclusions
Given ample SAN resources, PolyServe’s implementation of the Scalable Shared
Database delivers linear scalability for SQL Server 2005.
To achieve linear scalability, an appropriate amount of both system and storage
bandwidth must be available to the reporting operations. An individual array,
depending on the vendor, may provide sufficient bandwidth for some applications.
For others, more storage bandwidth may be required.
The Scalable Shared Database option, combined with PolyServe Matrix Server for
Windows Server, included with the Database Utility for SQL Server, provides an easy
way to achieve linear system scalability.
To achieve cost-effective storage bandwidth scalability through software—to scale-
out storage—PolyServe’s Cluster Volume Manager (CVM) may be utilized.
WhWr
PAGE 12 of 17 (877)-476-5973 www.polyserve.com
White Paper
The Database Utility Overview
Built on SQL Server and Windows Server, PolyServe’s Database Utility for SQL
Server is deeply integrated with both Windows Server and SQL Server. One of the
core products provided with the Database Utility for SQL Server is Matrix Server for
Windows Server.
Matrix Server provides the underlying technology enabling this form of scale-out
reporting. This technology provides the building blocks for shared data. Shared data
means all servers can safely share storage and data on the SAN. PolyServe’s
NTFS-compatible Cluster File System, included with Matrix Server, and designed
using Microsoft’s Installable File System Kit, supports concurrent, direct, read/write
activity through a cache-coherent cluster file system (PSFS). This enables a simple,
one-touch operation for mounting the shared database across multiple servers in the
cluster simultaneously.
Shared data also means all the databases are stored in a single place that can be
accessed by all of the servers simultaneously. This means moving any SQL
database (read-only or read-write) from one server to another can be performed
rapidly, and with minimal storage operator or DBA intervention.
In short, shared data brings the following fundamental benefits:
• A single pool of servers: In this approach, you no longer think of installing a
database on any particular server. Instead, you install into the cluster, and the
database can then run on any server in the cluster. This allows an
administrator to move a database from one server to another in order to
rebalance load and maintain an appropriate level of utilization—without any
need to copy or migrate data. It also means if any server fails, the databases
it had been running can immediately and automatically be restarted on other
machines, ensuring high availability.
• A single pool of storage: Similarly, there is no need to manage storage on a
server-by-server basis and no requirement to have a backup job per server or
separate monitoring for each server’s free space pool. In addition, because
the servers are connected to storage over a SAN, it is easy to provision more
storage when required—but that is only done when the environment as a
whole needs more space.
• Flexibility: A shared data cluster can be formed from servers you already
own. There is no need to buy matched servers; you can mix servers from
different vendors, with different processor types and speeds, different
numbers of processors and different memory configurations. You can even
mix Windows 2000 and Windows Server 2003 in the same cluster, and use
WhWr
PAGE 13 of 17 (877)-476-5973 www.polyserve.com
White Paper
the shared storage pool to migrate databases to Windows Server 2003
without having to copy data.
• Easy scalability: It is easy to add servers to a cluster if overall demand
grows, and databases can be shifted in a matter of seconds onto newly
added servers with no need for data copying. (Conversely, if workloads drop,
you can shift databases off some of the servers and remove them for
repurposing.) Databases receive full native performance with no virtualization
overhead or virtualization limits, so if a database requires the full speed and
capacity of a large server, it can have it. Once a cluster is created, adding
Scalable Shared Databases becomes as easy as right-clicking and rehosting
a given read-only database to any other server in the cluster.
• Easy high availability: Because all servers have access to all databases,
high availability becomes easy to implement with no requirement for doubling
up hardware. Simply by specifying where a database should be restarted in
the event of failure—from among any of the servers in the cluster—you can
ensure the database will remain available. The shared storage pool requires
none of the complicated and brittle configuration on a server-pair-by-server-
pair basis that traditional failover clustering can entail.
The following section describes the SQL Server-specific components included with
PolyServe Database Utility for SQL Server.
The Database Utility™ for SQL Server Components
The final necessary ingredient to both Scalable Shared Databases and simplified
management of SQL Server is SQL Server integration. The Database Utility provides
an integration layer that applies the core capabilities of Matrix Server and the Cluster
Volume Manager to SQL Server.
This layer, called the Database Utility for SQL Server, includes:
• A SQL Server Health Monitor that periodically probes SQL Server instances
within the cluster to ensure that client requests are being successfully
handled. This will detect if a SQL Server instance hangs, even if the
operating system and hardware remain healthy.
• A SQL Instance Virtualizer that allows the creation of a “Virtual SQL
Server.” A virtual SQL Server is an adaptation of the Matrix Server virtual
host concept. It consists of a virtual IP address that clients use to connect, a
specified primary server in the cluster, a prioritized list of backup servers, and
from one to 16 associated SQL Server instances. If a server fails, the
Virtualizer will move the virtual IP address to the top server in the list that is
capable of hosting the virtual SQL server; it then restarts the database
instances in the new location. Note that, unlike server virtualization, the SQL
WhWr
PAGE 14 of 17 (877)-476-5973 www.polyserve.com
White Paper
instance Virtualizer component itself is not active during normal operation.
Thus there is no performance penalty; SQL runs at full native speed, with
native access to all hardware resources. The Virtualizer also ensures that a
given database is only ever accessed from one server at a time, since SQL
Server is not designed to allow multiple server concurrent access.
• A SQL Server Registry Replicator. SQL Server stores some configuration
information in the Windows registry, which is located on each server’s
individual C: drive. To ensure this information is available to other servers in
the cluster if a database instance needs to move, the Solution Pack includes
a component that automatically replicates relevant registry information into
the cluster-wide shared storage.
• A SQL Installation and hotfix Updater agent. To simplify installation of
SQL Server and of hotfixes across multiple servers, the Solution Pack
includes a push installer agent. The administrator simply places the
appropriate installation packages in the shared file system, then uses the
push installer agent to perform installations or hotfix updates across the entire
cluster—what PolyServe calls one-click maintenance.
The Database Utility for SQL Server thus provides an easy way to move SQL Server
instances within a cluster to improve resource utilization, an easy path to complete
high availability for all instances in the cluster, and a simple way of performing typical
SQL Server installation and maintenance tasks, all leveraging the core capabilities of
Matrix Server and Volume Manager. The shared data pool provided by Matrix Server
is used to store SQL databases and log files in a single location, accessible to all
servers. The Matrix Volume Manager allows this storage pool to be spread across
space on multiple storage arrays. Matrix Server’s high-availability and application-
control engine ensures that databases remain available regardless of server failures
and, through Dynamic Re-Hosting, allows administrators to adjust work assignments
to maximize utilization. Matrix Manager provides a single point for monitoring and
controlling these elements.
PolyServe’s Cluster Volume Manager
PolyServe’s Windows-based Cluster Volume Manager enables multiple storage
arrays (or storage devices within an array) to be grouped together into scalable,
high-speed, storage pools—available to all SQL Server systems in the cluster.
Matrix Server provides the foundation for the Database Utility for SQL Server by
allowing a set of servers to access shared file systems simultaneously and thus be
managed as a single unit. The Cluster Volume Manager provides an analogous
capability for storage. The CVM allows disk space from multiple storage arrays to be
used and managed as a single pool.
WhWr
PAGE 15 of 17 (877)-476-5973 www.polyserve.com
White Paper
• With the CVM, an administrator can create a single volume from free space
on multiple arrays (or in multiple LUNs on a single array). In this way a file
system can make use of whatever storage is available.
• The CVM also allows a volume to be striped across multiple arrays. This can
improve I/O rates by aggregating the performance of the arrays, which is
especially useful for sequential workloads frequently associated with data
warehouses.
• Through concatenation or striping, CVM permits the construction of huge file
systems that exceed the typical 2TB limit on the size of individual LUNs.
• In some environments, it is customary to configure every storage array with a
set of LUNs of a fixed size—say, 30 gigabytes. The CVM allows a server
administrator to construct file systems of whatever sizes are desired using
these fixed LUNs.
• Finally, if a file system is becoming too full, the CVM can be used to expand
the file system, without taking the cluster down, using free space from any
array accessible to the cluster.
If the storage ever becomes a bottleneck, as it did in the POC, the Volume Manager
enables multiple storage devices—controllers, cabinets, or arrays—to be aggregated
together into high-performance shared storage pools. Once these pools are created,
the volume manager can be used to stripe across each volume (LUN) to effectively
aggregate the bandwidth across each device. Thus, if a single shared pool was
composed of two LUNs, each located on a separate array, the logical, shared
volume could span both arrays, aggregating the performance of each array (across
spindles, controllers, and paths to the cluster attached to the storage).
Like all other aspects of the Database Utility, the CVM is managed cluster-wide from
a single control point. Thus, just as Matrix Server allows servers’ data sets to be
handled in a unified way, the Volume Manager allows the physical storage resources
associated with a cluster to be used and managed as a single entity.
Summary
Microsoft SQL Server 2005™ Enterprise Edition now supports scale-out reporting
through Scalable Shared Databases. Scale-out reporting enables multiple SQL
Server 2005 systems to mount a read-only copy of a database.
When deployed using PolyServe’s Database Utility™ for SQL Server, Enterprises
may achieve linear scalability of SQL Server for reporting, reduce reporting times up
to 16x, and eliminate manual storage configuration processes.
WhWr
PAGE 16 of 17 (877)-476-5973 www.polyserve.com
White Paper
This proof of concept (POC) demonstrates PolyServe’s solution for the Scalable
Shared Database. The POC consists of a 4-node PolyServe Matrix Server cluster
running PolyServe’s Database Utility for SQL Server and Microsoft SQL Server 2005
Enterprise Edition, connected to a SAN.

More Related Content

What's hot

SSAS Reference Architecture
SSAS Reference ArchitectureSSAS Reference Architecture
SSAS Reference ArchitectureMarcel Franke
 
Introducing Postgres Plus Advanced Server 9.4
Introducing Postgres Plus Advanced Server 9.4 Introducing Postgres Plus Advanced Server 9.4
Introducing Postgres Plus Advanced Server 9.4 EDB
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL ServerStephen Rose
 
Oracle database high availability solutions
Oracle database high availability solutionsOracle database high availability solutions
Oracle database high availability solutionsKirill Loifman
 
Consolidating database servers with Lenovo ThinkServer RD630
Consolidating database servers with Lenovo ThinkServer RD630Consolidating database servers with Lenovo ThinkServer RD630
Consolidating database servers with Lenovo ThinkServer RD630Principled Technologies
 
EDB Postgres Replication Server
EDB Postgres Replication ServerEDB Postgres Replication Server
EDB Postgres Replication ServerEDB
 
Product Update: EDB Postgres Platform 2017
Product Update: EDB Postgres Platform 2017Product Update: EDB Postgres Platform 2017
Product Update: EDB Postgres Platform 2017EDB
 
Oracle database 12c new features
Oracle database 12c new featuresOracle database 12c new features
Oracle database 12c new featuresJakkrapat S.
 
Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?DLT Solutions
 
SQL Server 2014 New Features (Sql Server 2014 Yenilikleri)
SQL Server 2014 New Features (Sql Server 2014 Yenilikleri)SQL Server 2014 New Features (Sql Server 2014 Yenilikleri)
SQL Server 2014 New Features (Sql Server 2014 Yenilikleri)BT Akademi
 
twp-oracledatabasebackupservice-2183633
twp-oracledatabasebackupservice-2183633twp-oracledatabasebackupservice-2183633
twp-oracledatabasebackupservice-2183633Arush Jain
 
WebSphere Portal Version 6.0 Web Content Management and DB2 Tuning Guide
WebSphere Portal Version 6.0 Web Content Management and DB2 Tuning GuideWebSphere Portal Version 6.0 Web Content Management and DB2 Tuning Guide
WebSphere Portal Version 6.0 Web Content Management and DB2 Tuning GuideTan Nguyen Phi
 
Paper: Oracle RAC Internals - The Cache Fusion Edition
Paper: Oracle RAC Internals - The Cache Fusion EditionPaper: Oracle RAC Internals - The Cache Fusion Edition
Paper: Oracle RAC Internals - The Cache Fusion EditionMarkus Michalewicz
 
Oracle Database 12c Multitenant for Consolidation
Oracle Database 12c Multitenant for ConsolidationOracle Database 12c Multitenant for Consolidation
Oracle Database 12c Multitenant for ConsolidationYudi Herdiana
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationSchubert Zhang
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase Cynthia Saracco
 
Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...
Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...
Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...Principled Technologies
 
Essbase ASO and BSO tuning
Essbase ASO and BSO tuningEssbase ASO and BSO tuning
Essbase ASO and BSO tuningsodhiranga
 
PostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesPostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesSaiful
 

What's hot (20)

SSAS Reference Architecture
SSAS Reference ArchitectureSSAS Reference Architecture
SSAS Reference Architecture
 
Introducing Postgres Plus Advanced Server 9.4
Introducing Postgres Plus Advanced Server 9.4 Introducing Postgres Plus Advanced Server 9.4
Introducing Postgres Plus Advanced Server 9.4
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
Oracle database high availability solutions
Oracle database high availability solutionsOracle database high availability solutions
Oracle database high availability solutions
 
Consolidating database servers with Lenovo ThinkServer RD630
Consolidating database servers with Lenovo ThinkServer RD630Consolidating database servers with Lenovo ThinkServer RD630
Consolidating database servers with Lenovo ThinkServer RD630
 
EDB Postgres Replication Server
EDB Postgres Replication ServerEDB Postgres Replication Server
EDB Postgres Replication Server
 
Product Update: EDB Postgres Platform 2017
Product Update: EDB Postgres Platform 2017Product Update: EDB Postgres Platform 2017
Product Update: EDB Postgres Platform 2017
 
Oracle database 12c new features
Oracle database 12c new featuresOracle database 12c new features
Oracle database 12c new features
 
Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?
 
SQL Server 2014 New Features (Sql Server 2014 Yenilikleri)
SQL Server 2014 New Features (Sql Server 2014 Yenilikleri)SQL Server 2014 New Features (Sql Server 2014 Yenilikleri)
SQL Server 2014 New Features (Sql Server 2014 Yenilikleri)
 
twp-oracledatabasebackupservice-2183633
twp-oracledatabasebackupservice-2183633twp-oracledatabasebackupservice-2183633
twp-oracledatabasebackupservice-2183633
 
WebSphere Portal Version 6.0 Web Content Management and DB2 Tuning Guide
WebSphere Portal Version 6.0 Web Content Management and DB2 Tuning GuideWebSphere Portal Version 6.0 Web Content Management and DB2 Tuning Guide
WebSphere Portal Version 6.0 Web Content Management and DB2 Tuning Guide
 
Paper: Oracle RAC Internals - The Cache Fusion Edition
Paper: Oracle RAC Internals - The Cache Fusion EditionPaper: Oracle RAC Internals - The Cache Fusion Edition
Paper: Oracle RAC Internals - The Cache Fusion Edition
 
Oracle Database 12c Multitenant for Consolidation
Oracle Database 12c Multitenant for ConsolidationOracle Database 12c Multitenant for Consolidation
Oracle Database 12c Multitenant for Consolidation
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...
Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...
Resource balancing comparison: VMware vSphere 6 vs. Red Hat Enterprise Virtua...
 
Essbase ASO and BSO tuning
Essbase ASO and BSO tuningEssbase ASO and BSO tuning
Essbase ASO and BSO tuning
 
Oracle 12c
Oracle 12cOracle 12c
Oracle 12c
 
PostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesPostgreSQL 9.5 Features
PostgreSQL 9.5 Features
 

Similar to polyserve-sql-server-scale-out-reporting

Sql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen NedaskivskyiSql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen NedaskivskyiAlex Tumanoff
 
Sql server 2008 r2 perf and scale datasheet
Sql server 2008 r2 perf and scale   datasheetSql server 2008 r2 perf and scale   datasheet
Sql server 2008 r2 perf and scale datasheetKlaudiia Jacome
 
Sql server 2016 Discovery Day
Sql server 2016 Discovery DaySql server 2016 Discovery Day
Sql server 2016 Discovery DayThomas Sykes
 
SQL Server 2017 - Mejoras Impulsadas por la Comunidad
SQL Server 2017 - Mejoras Impulsadas por la ComunidadSQL Server 2017 - Mejoras Impulsadas por la Comunidad
SQL Server 2017 - Mejoras Impulsadas por la ComunidadJavier Villegas
 
Qnap NAS TVS Serie x80plus-catalogo
Qnap NAS TVS Serie x80plus-catalogoQnap NAS TVS Serie x80plus-catalogo
Qnap NAS TVS Serie x80plus-catalogoFernando Barrientos
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
 
ovm3-server-pool-459310
ovm3-server-pool-459310ovm3-server-pool-459310
ovm3-server-pool-459310Enoch Antwi
 
Xd planning guide - storage best practices
Xd   planning guide - storage best practicesXd   planning guide - storage best practices
Xd planning guide - storage best practicesNuno Alves
 
Optimized dso data activation using massive parallel processing in sap net we...
Optimized dso data activation using massive parallel processing in sap net we...Optimized dso data activation using massive parallel processing in sap net we...
Optimized dso data activation using massive parallel processing in sap net we...Nuthan Kishore
 
My sql performance tuning course
My sql performance tuning courseMy sql performance tuning course
My sql performance tuning courseAlberto Centanni
 
Run more applications without expanding your datacenter
Run more applications without expanding your datacenterRun more applications without expanding your datacenter
Run more applications without expanding your datacenterPrincipled Technologies
 
Data Warehouse Scalability Using Cisco Unified Computing System and Oracle Re...
Data Warehouse Scalability Using Cisco Unified Computing System and Oracle Re...Data Warehouse Scalability Using Cisco Unified Computing System and Oracle Re...
Data Warehouse Scalability Using Cisco Unified Computing System and Oracle Re...EMC
 
Case Study For Replication For PCMS
Case Study For Replication For PCMSCase Study For Replication For PCMS
Case Study For Replication For PCMSShahzad
 
Software architecture case study - why and why not sql server replication
Software architecture   case study - why and why not sql server replicationSoftware architecture   case study - why and why not sql server replication
Software architecture case study - why and why not sql server replicationShahzad
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsCalpont
 
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic ApplicationsBuilding High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic Applicationsguest40cda0b
 
What's New in MySQL 5.7
What's New in MySQL 5.7What's New in MySQL 5.7
What's New in MySQL 5.7Olivier DASINI
 
Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Antonios Chatzipavlis
 

Similar to polyserve-sql-server-scale-out-reporting (20)

Sql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen NedaskivskyiSql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen Nedaskivskyi
 
Sql server 2008 r2 perf and scale datasheet
Sql server 2008 r2 perf and scale   datasheetSql server 2008 r2 perf and scale   datasheet
Sql server 2008 r2 perf and scale datasheet
 
Sql server 2016 Discovery Day
Sql server 2016 Discovery DaySql server 2016 Discovery Day
Sql server 2016 Discovery Day
 
SQL Server 2017 - Mejoras Impulsadas por la Comunidad
SQL Server 2017 - Mejoras Impulsadas por la ComunidadSQL Server 2017 - Mejoras Impulsadas por la Comunidad
SQL Server 2017 - Mejoras Impulsadas por la Comunidad
 
Qnap NAS TVS Serie x80plus-catalogo
Qnap NAS TVS Serie x80plus-catalogoQnap NAS TVS Serie x80plus-catalogo
Qnap NAS TVS Serie x80plus-catalogo
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
ovm3-server-pool-459310
ovm3-server-pool-459310ovm3-server-pool-459310
ovm3-server-pool-459310
 
Xd planning guide - storage best practices
Xd   planning guide - storage best practicesXd   planning guide - storage best practices
Xd planning guide - storage best practices
 
Optimized dso data activation using massive parallel processing in sap net we...
Optimized dso data activation using massive parallel processing in sap net we...Optimized dso data activation using massive parallel processing in sap net we...
Optimized dso data activation using massive parallel processing in sap net we...
 
My sql performance tuning course
My sql performance tuning courseMy sql performance tuning course
My sql performance tuning course
 
Run more applications without expanding your datacenter
Run more applications without expanding your datacenterRun more applications without expanding your datacenter
Run more applications without expanding your datacenter
 
Data Warehouse Scalability Using Cisco Unified Computing System and Oracle Re...
Data Warehouse Scalability Using Cisco Unified Computing System and Oracle Re...Data Warehouse Scalability Using Cisco Unified Computing System and Oracle Re...
Data Warehouse Scalability Using Cisco Unified Computing System and Oracle Re...
 
Bb sql serverdell
Bb sql serverdellBb sql serverdell
Bb sql serverdell
 
Case Study For Replication For PCMS
Case Study For Replication For PCMSCase Study For Replication For PCMS
Case Study For Replication For PCMS
 
Software architecture case study - why and why not sql server replication
Software architecture   case study - why and why not sql server replicationSoftware architecture   case study - why and why not sql server replication
Software architecture case study - why and why not sql server replication
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
 
Building High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic ApplicationsBuilding High Performance MySql Query Systems And Analytic Applications
Building High Performance MySql Query Systems And Analytic Applications
 
What's New in MySQL 5.7
What's New in MySQL 5.7What's New in MySQL 5.7
What's New in MySQL 5.7
 
Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014
 

polyserve-sql-server-scale-out-reporting

  • 1. PAGE 1 of 17 (877)-476-5973 www.polyserve.com White Paper Scalable Shared Databases for SQL Server 2005 Achieving Linear Scalability for Scale-out Reporting using SQL Server 2005 Enterprise Edition Abstract: Microsoft SQL Server™ 2005 Enterprise Edition supports scale-out reporting through scalable shared databases. Scale-out reporting enables multiple SQL Server 2005 systems to attach a read-only copy of the same database. When deployed using PolyServe’s Database Utility™ for SQL Server, Enterprises reduce report completion times by up to 16x. The solution reduces storage complexity, simplifying SQL Server scale-out for complex, off-hours reporting workloads. The PolyServe solution enables rapid transformation of OLTP to read- only data warehousing for scale-out and back again to OLTP—in seconds. This proof of concept (POC) demonstrates PolyServe’s solution for scalable shared databases. The POC consists of a 4-node PolyServe Matrix Server cluster running PolyServe’s Database Utility for SQL Server and Microsoft SQL Server 2005 Enterprise Edition, connected to a SAN.
  • 2. WhWr PAGE 1 of 17 (877)-476-5973 www.polyserve.com White Paper Scalable Shared Databases for SQL Server 2005.....................................................1 Achieving Linear Scalability for Scale-out Reporting using SQL Server 2005 Enterprise Edition ....... 1 Introduction ...............................................................................................................2 Single System Performance Limits ..................................................................................................... 2 Data Warehousing Challenges ........................................................................................................... 3 Introducing the Scalable Shared Database................................................................3 Analysis............................................................................................................................................... 4 Another Model for Scale Out: Shared Data................................................................5 Storage Management.......................................................................................................................... 5 Concurrent Scalability—for Scale Out................................................................................................. 6 Proof of Concept Results...........................................................................................6 Data Center Use Case........................................................................................................................ 7 Configuration Overview....................................................................................................................... 8 Database Configuration Overview ...................................................................................................... 8 Performance Results........................................................................................................................... 8 Conclusions.............................................................................................................11 The Database Utility Overview ................................................................................12 The Database Utility™ for SQL Server Components ...............................................13 PolyServe’s Cluster Volume Manager .....................................................................14 Summary.................................................................................................................15
  • 3. WhWr PAGE 2 of 17 (877)-476-5973 www.polyserve.com White Paper Introduction With rapidly growing production databases deployed on Microsoft SQL Server 2005, the need for timely and scalable reporting operations has become a business-critical need among Enterprises. Today, the size of databases deployed on individual SQL Server systems is often measured in the hundreds of Gigabytes to Terabytes. For larger databases, data warehouse preparation is challenging and time-consuming. Further, companies face time-constrained reporting windows where there is effectively less time and (as the database grows) less available computing power to complete resource-intensive reporting jobs within an off-hours reporting window. Single System Performance Limits When the power of SQL Server 2005 is combined with modern, industry-standard x64 servers and Storage Area Networks (SANs), Enterprises are provided with a robust platform for deploying mission-critical databases, at an optimal price- performance. For single system scalability, SQL Server 2005 possesses numerous performance and memory management advancements to exploit resource utilization within a single system. For Online Transaction Processing (OLTP)—where queries tend to be shorter and less resource-intensive—a single server often provides adequate bandwidth. However, more complex workloads—such as reporting, ad hoc queries, and data warehouse preparation—often require more throughput than a single server can provide. Scanning tables, sorting large amounts of data, and running multiple reporting jobs concurrently—all against the same database—these are resource- intensive tasks that can easily overburden a single server system. For these workloads, the server is often the bottleneck. Consider a business requirement to execute 8 reports against a 300GB table used primarily for OLTP. The indexes have been optimized for OLTP. The query plans generated for the reports are based upon full table scans. If such reporting saturates a single server, the only way to complete the 8 reporting jobs in less time is to scale out to several servers, running multiple reports concurrently across multiple servers.
  • 4. WhWr PAGE 3 of 17 (877)-476-5973 www.polyserve.com White Paper Data Warehousing Challenges Data warehouses and data marts often start small and simple (with a fact table or two and a few dimension tables). If successful, these small data warehouses may grow across an organization over time, transforming into corporate-wide repositories used for business intelligence and senior management decision support. One goal behind transforming a large read-write database to a read-only data warehouses is to offload performance-intensive reporting functions to another server. Another overarching goal of a data warehouse is to maintain fresh data. Every second spent on data warehouse preparation means less time spent on reporting— and progressively less up-to-date data. From a performance perspective, it does not take long for multiple reporting operations to overburden servers and direct-attached storage during the off-hours data warehouse preparation and reporting usage periods. When faced with a fixed amount of time and fixed amount of bandwidth, the scalable shared database is a revolutionary breakthrough. Introducing the Scalable Shared Database Microsoft SQL Server™ 2005 Enterprise Edition supports scale-out reporting through Scalable Shared Databases. Scale-out reporting enables multiple SQL Server 2005 instances to attach a read-only version of the database. The KB article focuses on use-cases for reporting and data warehouse activity. In summary, the implementation described in the Microsoft KB article recommends the following configuration guidelines: • Read Only NTFS Volumes. A Scalable Database must reside in a read-only volume or set of volumes. Note: this POC validates using Scalable Shared Databases on the PSFS (PolyServe File System), an NTFS-compatible file system built using Microsoft’s Installable File System (IFS) Kit. • Private tempdb. The Scalable Database must be attached to an instance that has private tempdb. • SAN. The database must reside on a read-only volume configured using the diskpart.exe utility from the database server attached to a Storage Area Network. This POC obviates the need for this function. • Windows Version Requirement. Scalable Shared Databases are supported only on Microsoft Windows Server 2003 Service Pack 1 or later.
  • 5. WhWr PAGE 4 of 17 (877)-476-5973 www.polyserve.com White Paper The non-shared data approach described in the KB article documents how to maintain an updated copy of the production database. To use this copy of the database for scale-out reporting, a new read-only volume must be created and managed using DISKPART.exe. To do so, a volume and its database copy must be unmounted and remounted in read-only mode on all the reporting servers. Conversely, the volume must then be unmounted from all the reporting servers and re-mounted in read-write mode to refresh its contents (i.e., bring the database copy up to date with the production database). This cycle must be repeated for each reporting exercise. To say the least, it’s a complicated process. This approach has been tested with scaling out to 8 nodes by Microsoft Corporation. Analysis The approach covered in the KB article is based on using read-only volumes containing replicated database copies for reporting. This is a functional approach. But, given the operational overhead, likely not entirely useful. An essential premise behind scalable shared databases is that the production database is large and growing and cannot be easily serviced by a single server for reporting or ad hoc query purposes. There are still challenges with scale out: • Each database requires a replicate or snapshot copy to manage. This creates storage management overhead. There is effectively more storage and data to manage, across more logical management points—for both the storage operator and DBA. • Costly processing refreshes. Refreshing the replicated database affects the production database. • Challenging to administer free space. There will be space management tasks for 1 large production database and its large replica. • Challenging to maintain. Since the database is large, it would likely reside in several volumes, each of which must be individually managed. Mounting and unmounting volumes on each reporting server also creates administration overhead.
  • 6. WhWr PAGE 5 of 17 (877)-476-5973 www.polyserve.com White Paper In the non-shared data approach, the volumes used for the replicated reporting database should be dedicated exclusively to reporting, since the volumes will be routinely unmounted and re-mounted in read-only mode. This will likely increase the total number of volumes to manage, per cluster. Another Model for Scale Out: Shared Data Another deployment option for scaling out databases on SQL Server 2005 is shared data. Through PolyServe’s Database Utility, all servers have read-write access to all storage. This makes storage management, data warehouse preparation time, and scale out and back for reporting, an easy operation. To fully exploit the scale-out databases, an environment that supports scaling out to many nodes without replication and without explicit volume creation (as read-only), or mount/unmount operations, may be supported using the PolyServe solution. Storage Management Storage management is greatly simplified because all servers can “see” all storage. There are few volumes to manage, and storage does not have to be reconfigured. In the utility model, storage does not have to be reconfigured to be used in support of
  • 7. WhWr PAGE 6 of 17 (877)-476-5973 www.polyserve.com White Paper scalable shared databases. Once provisioned, databases can be easily moved from server to server, in support of scale out and scale back operations. Concurrent Scalability—for Scale Out Through a simple script operation, all SQL Server systems may attach the database as read-only. Reports or ad hoc queries can safely run on a server without affecting the performance of reports or queries running on other servers. In effect, reports are isolated to a given server. More servers can be added as needed, running concurrently across more servers—against the same data. Proof of Concept Results As summarized above, SQL Server 2005 enables attaching a database in a read- only volume to more than one server. With PolyServe, there is no manual volume creation required. Attaching the read-only database to multiple servers can be automated through scheduling, or performed manually through a few easy steps. This vastly simplifies the job for the DBA. Now, instead of having to create, mount and un-mount multiple volumes to each new server in the cluster, a simple file group permissions operation is sufficient. Thus, the production database itself can be scaled out (mounted by multiple servers) in support of reporting operations. The benefits of Scalable Shared Database in the PolyServe Database Utility for SQL Server: • No stale data—work in near-real-time. The database can be kept up-to-date since no copying via replication nor are snapshots required. While replication and snapshots are both supported in the PolyServe solution, there is no need for replication, or snapshots—unless the reporting job takes place on a remote cluster. • Single-touch scale-out. Since this solution is based on PolyServe’s solution for Windows Server, the degree of scale-out is 16 servers. • No need for operator intervention to perform complicated mount/unmount operations • Rapid transformation from OLTP to Scalable Shared Database for reporting operations, and back again o Transform from OLTP to Scalable Shared Database on 16 servers in less than 60 seconds o Scale back to read-write mode in less then 30 seconds
  • 8. WhWr PAGE 7 of 17 (877)-476-5973 www.polyserve.com White Paper Figure 1: Proof of Concept Main Table with 1,000,000 rows o Reporting can begin within seconds on current data on up to 16 servers • Volumes can be shared—and do not have to be dedicated to the scale-out database • Tempdb databases for all servers can reside in the same high-performance volume as the production database • No need to burden internal drives and other server resources. Since operations are automated, there is less burden on each server hosting the scale-out database. • No “volume sprawl”. Since all pre-defined servers can share volumes in the SAN, there are fewer volumes to create, manage, and backup. • Single point of backup. During the reporting window, one or more nodes in the cluster may be dedicated to backing up the production database while reporting jobs occur in parallel. Data Center Use Case Most datacenters dedicate certain hours of operation to reporting, ad hoc queries, and data warehouse preparation. With PolyServe’s Utility approach to Scalable Shared Databases, not only is the transformation of the same database from OLTP to reporting performed in a matter of seconds, but the degree of scalability can be up to 16 servers. With 16 servers working on a set of reports, or performing intensive data warehouse preparation, the resultant window is reduced significantly freeing time to perform important database maintenance operations. For example, consider a data center that currently performs, say, 50 reports each night in a reporting window of 4 hours. With the PolyServe approach to Scalable Shared
  • 9. WhWr PAGE 8 of 17 (877)-476-5973 www.polyserve.com White Paper Figure 2: Proof of Concept Main Table File Properties Database, the reporting could be reduced to as little as approximately 1/16th the time, or roughly 15 minutes. With roughly 3 hours and 45 minutes of new found “free time”, it is then possible to scale back to OLTP mode to perform maintenance tasks such as index reorganization or statistics updating. Configuration Overview The Proof of Concept consisted of a 4-node PolyServe Matrix Server cluster for Windows, running SQL Server 2005 Enterprise Edition, attached to a Fiber Channel SAN. The servers were commodity dual-processor systems configured with 2GB physical memory. A PolyServe cluster filesystem was created in a high-performance cluster volume, created with Matrix Volume Manager, and mounted as drive “S:” Database Configuration Overview The database used for the Scalable Shared Database was called debit. The debit database had a single table, called card, which had 100,000,000 credit card transactions (see Figure 1). The properties of the primary file for the debit table is shown to the right, in Figure 2. The card schema is depicted in Figure 3, on the following page. The PolyServe cluster filesystem mounted as S: contained directories for both the scale-out database and all of the tempdb databases required by the scale-out servers; this is a simple way to deploy Scalable Shared Databases. Figure 4 shows, simply, how easy it is to have many tempdb databases located in a single, high performance PolyServe cluster filesystem in support of a large scale-out configuration. In the example, 4 directories were created named for each of the servers (e.g., tmr6s3) used in the Proof of Concept. The total space consumed by the card table was roughly 4.3GB. Performance Results Using the card table described above, a set of queries were constructed to simulate a reporting workload that tests the Scalable Shared Database feature in the Database Utility model. The queries were constructed to stress all aspects of query processing.
  • 10. WhWr PAGE 9 of 17 (877)-476-5973 www.polyserve.com White Paper Figure 3: Card Table Attributes These workloads include: • Physical Disk I/O o The Query Plans involved full table scans of 100,000,000 rows o A full scan of the card table required 4.3GB of physical disk reads • Processor and Memory Utilization o Data filtering Processing the WHERE predicate o Sorting o Grouping o Aggregation The measured test consisted of executing 8 consecutive ad hoc queries based on the example listed in the box below. As mentioned above, the card table was 4.3GB so the total reporting workload consisted of 34.4GB of sequential I/O. Randomization of the queries was achieved by plugging in different values for m and n in the BETWEEN clause. There were approximately 1,000,000 unique vendors stored in the vendor_id column, and 26 transaction types stored in the trantype column. On average, randomly assigning values for the BETWEEN clause rendered a dataset of approximately 13,000,000 rows for the sorting, grouping and aggregation tasks. After the baseline results were collected from 1 server, the PolyServe sql_scale.exe command was executed to prepare the Scalable Shared Database on 2 servers. The same 8 queries were then executed 4 per server and complete times measured. The test was then scaled out to 4 servers where 2 of the 8 reports were executed on each server and complete times measured. Between each server count test, the database was transformed from scale-out mode to OLTP mode and a few more rows were inserted in order to validate the SELECT vendor_id, avg(amt) avamt FROM card WHERE trantype BETWEEN m AND n GROUP BY vendor_id ORDER BY avamt desc;
  • 11. WhWr PAGE 10 of 17 (877)-476-5973 www.polyserve.com White Paper Figure 4: Simplicity of using PolyServe Matrix Server for tempdb requirements transformation from Scalable Shared Database mode to OLTP mode. The database was then transformed once again to a scale-out database on the next number of servers to be tested (e.g., scaling from 2 to 4 servers). At most only 58 seconds transpired between measured executions of the query set. That is, no more than 58 seconds was required for the following tasks between measured runs of the benchmark: 1. Scaling back the database from scale-out read-only mode to OLTP read/write mode 2. Execute a small number of insertions into the card table 3. Scaling out the database to the next number of nodes (e.g., from 2 to 4). The task of scaling out the database includes the startup time for all the instances. Incidentally, with PolyServe’s solution, the instances can be started in parallel on all servers so the preparation time of the Scalable Shared Database was standardized. Measured Results The baseline job complete time for the 8 queries executed on 1 server was 48 minutes. After scaling out with the sql_scale.exe tool and running 4 queries on each of 2 servers, the job complete time was reduced to 24 minutes – linear scalability. Finally, the sql_scale.exe tool was used to scale-out to 4 servers. The same 8 queries were once again executed, 2 per server. The job complete time was 14 minutes as depicted in Figure 6. All told, the scalability from 1 to 4 nodes was 86%.
  • 12. WhWr PAGE 11 of 17 (877)-476-5973 www.polyserve.com White Paper Reporting Workload Complete Times 0 10 20 30 40 50 60 1 2 4 Number of Servers Minutes Figure 6: Scalability of the Scalable Shared Database on PolyServe Matrix Server Linear Scalability Requires a Balanced Hardware Configuration Given the architecture of the Scalable Shared Database, the only component that can affect scalability is storage bandwidth, as documented in this Proof of Concept at 4 servers. The storage allocated to the small Matrix Server test cluster from the SAN was sufficient to sustain increased I/O demand from 1 to 2 servers, but I/O latency increased from 2 to 4 servers. Thus, scalability was slightly affected. Adding disk capacity in the PolyServe Database Utility for SQL Server is a non-intrusive adminstrative action so this sort of bottleneck is simple to remedy. Conclusions Given ample SAN resources, PolyServe’s implementation of the Scalable Shared Database delivers linear scalability for SQL Server 2005. To achieve linear scalability, an appropriate amount of both system and storage bandwidth must be available to the reporting operations. An individual array, depending on the vendor, may provide sufficient bandwidth for some applications. For others, more storage bandwidth may be required. The Scalable Shared Database option, combined with PolyServe Matrix Server for Windows Server, included with the Database Utility for SQL Server, provides an easy way to achieve linear system scalability. To achieve cost-effective storage bandwidth scalability through software—to scale- out storage—PolyServe’s Cluster Volume Manager (CVM) may be utilized.
  • 13. WhWr PAGE 12 of 17 (877)-476-5973 www.polyserve.com White Paper The Database Utility Overview Built on SQL Server and Windows Server, PolyServe’s Database Utility for SQL Server is deeply integrated with both Windows Server and SQL Server. One of the core products provided with the Database Utility for SQL Server is Matrix Server for Windows Server. Matrix Server provides the underlying technology enabling this form of scale-out reporting. This technology provides the building blocks for shared data. Shared data means all servers can safely share storage and data on the SAN. PolyServe’s NTFS-compatible Cluster File System, included with Matrix Server, and designed using Microsoft’s Installable File System Kit, supports concurrent, direct, read/write activity through a cache-coherent cluster file system (PSFS). This enables a simple, one-touch operation for mounting the shared database across multiple servers in the cluster simultaneously. Shared data also means all the databases are stored in a single place that can be accessed by all of the servers simultaneously. This means moving any SQL database (read-only or read-write) from one server to another can be performed rapidly, and with minimal storage operator or DBA intervention. In short, shared data brings the following fundamental benefits: • A single pool of servers: In this approach, you no longer think of installing a database on any particular server. Instead, you install into the cluster, and the database can then run on any server in the cluster. This allows an administrator to move a database from one server to another in order to rebalance load and maintain an appropriate level of utilization—without any need to copy or migrate data. It also means if any server fails, the databases it had been running can immediately and automatically be restarted on other machines, ensuring high availability. • A single pool of storage: Similarly, there is no need to manage storage on a server-by-server basis and no requirement to have a backup job per server or separate monitoring for each server’s free space pool. In addition, because the servers are connected to storage over a SAN, it is easy to provision more storage when required—but that is only done when the environment as a whole needs more space. • Flexibility: A shared data cluster can be formed from servers you already own. There is no need to buy matched servers; you can mix servers from different vendors, with different processor types and speeds, different numbers of processors and different memory configurations. You can even mix Windows 2000 and Windows Server 2003 in the same cluster, and use
  • 14. WhWr PAGE 13 of 17 (877)-476-5973 www.polyserve.com White Paper the shared storage pool to migrate databases to Windows Server 2003 without having to copy data. • Easy scalability: It is easy to add servers to a cluster if overall demand grows, and databases can be shifted in a matter of seconds onto newly added servers with no need for data copying. (Conversely, if workloads drop, you can shift databases off some of the servers and remove them for repurposing.) Databases receive full native performance with no virtualization overhead or virtualization limits, so if a database requires the full speed and capacity of a large server, it can have it. Once a cluster is created, adding Scalable Shared Databases becomes as easy as right-clicking and rehosting a given read-only database to any other server in the cluster. • Easy high availability: Because all servers have access to all databases, high availability becomes easy to implement with no requirement for doubling up hardware. Simply by specifying where a database should be restarted in the event of failure—from among any of the servers in the cluster—you can ensure the database will remain available. The shared storage pool requires none of the complicated and brittle configuration on a server-pair-by-server- pair basis that traditional failover clustering can entail. The following section describes the SQL Server-specific components included with PolyServe Database Utility for SQL Server. The Database Utility™ for SQL Server Components The final necessary ingredient to both Scalable Shared Databases and simplified management of SQL Server is SQL Server integration. The Database Utility provides an integration layer that applies the core capabilities of Matrix Server and the Cluster Volume Manager to SQL Server. This layer, called the Database Utility for SQL Server, includes: • A SQL Server Health Monitor that periodically probes SQL Server instances within the cluster to ensure that client requests are being successfully handled. This will detect if a SQL Server instance hangs, even if the operating system and hardware remain healthy. • A SQL Instance Virtualizer that allows the creation of a “Virtual SQL Server.” A virtual SQL Server is an adaptation of the Matrix Server virtual host concept. It consists of a virtual IP address that clients use to connect, a specified primary server in the cluster, a prioritized list of backup servers, and from one to 16 associated SQL Server instances. If a server fails, the Virtualizer will move the virtual IP address to the top server in the list that is capable of hosting the virtual SQL server; it then restarts the database instances in the new location. Note that, unlike server virtualization, the SQL
  • 15. WhWr PAGE 14 of 17 (877)-476-5973 www.polyserve.com White Paper instance Virtualizer component itself is not active during normal operation. Thus there is no performance penalty; SQL runs at full native speed, with native access to all hardware resources. The Virtualizer also ensures that a given database is only ever accessed from one server at a time, since SQL Server is not designed to allow multiple server concurrent access. • A SQL Server Registry Replicator. SQL Server stores some configuration information in the Windows registry, which is located on each server’s individual C: drive. To ensure this information is available to other servers in the cluster if a database instance needs to move, the Solution Pack includes a component that automatically replicates relevant registry information into the cluster-wide shared storage. • A SQL Installation and hotfix Updater agent. To simplify installation of SQL Server and of hotfixes across multiple servers, the Solution Pack includes a push installer agent. The administrator simply places the appropriate installation packages in the shared file system, then uses the push installer agent to perform installations or hotfix updates across the entire cluster—what PolyServe calls one-click maintenance. The Database Utility for SQL Server thus provides an easy way to move SQL Server instances within a cluster to improve resource utilization, an easy path to complete high availability for all instances in the cluster, and a simple way of performing typical SQL Server installation and maintenance tasks, all leveraging the core capabilities of Matrix Server and Volume Manager. The shared data pool provided by Matrix Server is used to store SQL databases and log files in a single location, accessible to all servers. The Matrix Volume Manager allows this storage pool to be spread across space on multiple storage arrays. Matrix Server’s high-availability and application- control engine ensures that databases remain available regardless of server failures and, through Dynamic Re-Hosting, allows administrators to adjust work assignments to maximize utilization. Matrix Manager provides a single point for monitoring and controlling these elements. PolyServe’s Cluster Volume Manager PolyServe’s Windows-based Cluster Volume Manager enables multiple storage arrays (or storage devices within an array) to be grouped together into scalable, high-speed, storage pools—available to all SQL Server systems in the cluster. Matrix Server provides the foundation for the Database Utility for SQL Server by allowing a set of servers to access shared file systems simultaneously and thus be managed as a single unit. The Cluster Volume Manager provides an analogous capability for storage. The CVM allows disk space from multiple storage arrays to be used and managed as a single pool.
  • 16. WhWr PAGE 15 of 17 (877)-476-5973 www.polyserve.com White Paper • With the CVM, an administrator can create a single volume from free space on multiple arrays (or in multiple LUNs on a single array). In this way a file system can make use of whatever storage is available. • The CVM also allows a volume to be striped across multiple arrays. This can improve I/O rates by aggregating the performance of the arrays, which is especially useful for sequential workloads frequently associated with data warehouses. • Through concatenation or striping, CVM permits the construction of huge file systems that exceed the typical 2TB limit on the size of individual LUNs. • In some environments, it is customary to configure every storage array with a set of LUNs of a fixed size—say, 30 gigabytes. The CVM allows a server administrator to construct file systems of whatever sizes are desired using these fixed LUNs. • Finally, if a file system is becoming too full, the CVM can be used to expand the file system, without taking the cluster down, using free space from any array accessible to the cluster. If the storage ever becomes a bottleneck, as it did in the POC, the Volume Manager enables multiple storage devices—controllers, cabinets, or arrays—to be aggregated together into high-performance shared storage pools. Once these pools are created, the volume manager can be used to stripe across each volume (LUN) to effectively aggregate the bandwidth across each device. Thus, if a single shared pool was composed of two LUNs, each located on a separate array, the logical, shared volume could span both arrays, aggregating the performance of each array (across spindles, controllers, and paths to the cluster attached to the storage). Like all other aspects of the Database Utility, the CVM is managed cluster-wide from a single control point. Thus, just as Matrix Server allows servers’ data sets to be handled in a unified way, the Volume Manager allows the physical storage resources associated with a cluster to be used and managed as a single entity. Summary Microsoft SQL Server 2005™ Enterprise Edition now supports scale-out reporting through Scalable Shared Databases. Scale-out reporting enables multiple SQL Server 2005 systems to mount a read-only copy of a database. When deployed using PolyServe’s Database Utility™ for SQL Server, Enterprises may achieve linear scalability of SQL Server for reporting, reduce reporting times up to 16x, and eliminate manual storage configuration processes.
  • 17. WhWr PAGE 16 of 17 (877)-476-5973 www.polyserve.com White Paper This proof of concept (POC) demonstrates PolyServe’s solution for the Scalable Shared Database. The POC consists of a 4-node PolyServe Matrix Server cluster running PolyServe’s Database Utility for SQL Server and Microsoft SQL Server 2005 Enterprise Edition, connected to a SAN.