project_real_wp

Project REAL: SQL Server 2005 Storage
Management and Availability
SQL Server Technical Article
Author: Daren Bieniek, Solid Quality Learning
Technical Reviewers: Douglas McDowell, Solid Quality Learning,
Robert McPhail, EMC,
Brian Martin, EMC
Partner: EMC
Published: March 2006
Applies To: SQL Server 2005
Summary: A discussion of the data storage hardware and techniques used in the
business intelligence reference implementation called Project REAL. Topics such as
partitioning, storage and data management, data migration, and backup/recovery are
presented as they pertain to a multi-terabyte data warehouse.

Copyright
This is a preliminary document and may be changed substantially prior to final commercial release of the software
described herein.
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed
as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted
to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented
after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR
STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright,
no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form
or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express
written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering
subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the
furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual
property.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos,
people, places, and events depicted herein are fictitious, and no association with any real company, organization, product,
domain name, e-mail address, logo, person, place, or event is intended or should be inferred.
© 2006 Microsoft Corporation. All rights reserved.
Microsoft, Visual Basic, Visual Source Safe, and Visual Studio are either registered trademarks or trademarks of Microsoft
Corporation in the United States and/or other countries.
EMC, EMC2
, CLARiiON, ControlCenter, Navisphere, Symmetrix, and where information lives are registered trademarks and
EMC NetWorker is a trademark of EMC Corporation
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Table of Contents
Project REAL: SQL Server 2005 Storage Management and Availability............. 1
Project REAL Overview................................................................................... 1
Introduction ............................................................................................ 1
Overview of configurations and tests........................................................... 2
Data for Project REAL .................................................................................... 3
Description of the data.............................................................................. 3
Physical Components of the Implementation ..................................................... 8
Server systems: distributed and consolidated............................................... 8
Storage configurations.............................................................................10
The challenges faced, specific to storage ....................................................12
The solution ...........................................................................................12
Table Partitioning.........................................................................................13
Overview ...............................................................................................13
Benefits.................................................................................................14
Cube (Measure Group) Partitioning .................................................................15
Data Migration.............................................................................................15
Overview ...............................................................................................15
Migrating partitions with SQL ....................................................................16
Migrating partitions / LUNs within the storage array .....................................17
Backup Methodology ....................................................................................18
Variations on recovery .............................................................................20
Storage Hardware ........................................................................................21
EMC Symmetrix ......................................................................................21
EMC CLARiiON ........................................................................................21
EMC CLARiiON Disk Library.......................................................................21
References..................................................................................................25

Project REAL: SQL Server 2005 Storage
Management and Availability
Project REAL Overview
Project REAL is a cooperative endeavor between Microsoft and a number of its partners
in the BI area. These partners include (in alphabetical order): Apollo Data Technologies,
EMC, Intellinet, Panorama, ProClarity, Scalability Experts, and Unisys. The business
scenario for Project REAL and the source data set were graciously provided by Barnes &
Noble.
Introduction
Project REAL is an effort to discover best practices for creating business intelligence (BI)
applications that are based on Microsoft®
SQL Server™ 2005. In Project REAL this is
being done by creating reference implementations based on actual customer scenarios.
This means that customer data is brought in-house and is used to work through the
same issues that customers face during deployment. These issues include:
• Design of schemas — both relational schemas and those used in Analysis Services.
• Implementation of data extraction, transformation, and loading (ETL) processes.
• Design and deployment of client front-end systems, both for reporting and for
interactive analysis.
• Sizing of systems for production.
• Management and maintenance of the systems on an ongoing basis, including
incremental updates to the data.
By working with real deployment scenarios, a complete understanding is gained of how
to implement a BI system using SQL Server BI tools. Our goal is to attempt to address
the full gamut of concerns that a company wishing to analyze potentially large data sets
would face during its own real-world deployment.
Project REAL will result in a number of papers, tools and samples over its lifetime. To
find the latest information, visit the following site:
http://www.microsoft.com/sql/bi/ProjectREAL
For purposes of Project REAL, B&N delivered their existing data warehouse databases
plus three months of daily incremental update data. This way the Project REAL systems
can be operated to simulate ongoing usage of the system. The data has been "masked"
for Project REAL to protect the confidentiality of information. The B&N databases,
originally developed using SQL Server 2000, were converted to SQL Server 2005 and
adapted in various ways that would illustrate best practices for SQL Server 2005. Note
that B&N chose to do their ETL work using prerelease versions of SQL Server
Integration Services (SSIS) because of its new design features and high performance.
The packages developed for use at B&N formed the basis for the ETL work in Project

Project REAL SQL Server 2005 Storage Management and Availability altered H&F3.doc 2
REAL, but again, the packages have been adapted to show additional best practices and
features.
Note: Data Transformation Services (DTS) in SQL Server 2000 has been redeveloped
into a new ETL facility called SQL Server Integration Services (SSIS).
Overview of configurations and tests
Project REAL is not a single system implementation — it is a set of architectural variants
that show the merits and performance of different configurations. Regardless of the
physical implementation, there are certain logical and software components that will
always be present. As illustrated in Figure 1, these include:
• The "source" database from which data is extracted to feed the data warehouse.
• The ETL packages and SSIS service that integrate data from the data source into
the relational warehouse database.
• The relational data warehouse database and service which are used for relational
reporting and as a source for creating cubes, and provide long-term highly reliable
storage.
• The Analysis Services database and service for analytical queries, for reporting, and
for data mining.
• The Reporting Services service provides a vehicle for distributing reports in many
formats to a wide audience, using a report cache for high performance.
• There may be multiple instances of Web servers, allowing a 3-tier implementation
for intranet, extranet, or public Internet scenarios.
• Client emulation facilities allow a workload to be created that emulates the behavior
of a large number of users accessing the system.
Figure 1: High-level architecture

Project REAL SQL Server 2005 Storage Management and Availability altered H&F3.doc
These components can be implemented in a variety of physical implementations, which
will have differing advantages in terms of performance, ease of deployment, client
connectivity, security, and cost:
• Consolidated vs. distributed server architecture. Both consolidated and distributed
architectures will be tested since it is possible to run each principle service on a
separate server, or to consolidate services onto a single larger server. The goal of
Project REAL is to reflect actual customer choices and tradeoffs. Unisys provided a
variety of servers so that both configurations could be tested and the approximately
40 TB of storage that EMC provided allows us to provision both configurations at the
same time.
• 32-bit and 64-bit performance. 32-bit vs. 64-bit performance will be compared and
evaluated for the consolidated scenario.
• Client connectivity scenarios. Clients may exist on the corporate network (intranet
scenario), may tunnel in through firewalls but still use corporate security (extranet
scenario), or may exist on the public Internet accessing BI services (Internet
scenario). These have various access and configuration concerns. In Project REAL all
three can be implemented.
Data for Project REAL
Description of the data
Barnes and Noble keeps their data warehouse in several related databases. For
purposes of Project REAL, the primary ones of interest are a Sales database and an
Inventory database. The data from those two databases is combined into one larger
database called REAL_Warehouse for Project REAL. A star schema is used, with a set of
dimension tables that define the entities in the warehouse and fact tables with
measures for each subject area. An excellent overview of the subject of dimensional
modeling of this sort can be found in Ralph Kimball's book “The Data Warehouse
Toolkit: The Complete Guide to Dimensional Modeling”
(http://www.kimballgroup.com/html/books.html).
The primary subject areas are store sales (Tbl_Fact_Store_Sales_YYYY_MM_DD tables),
store inventory (Tbl_Fact_Store_Inventory_YYYY_MM_DD tables) and distribution
center (DC) inventory (Tbl_Fact_DC_Inventory_YYYY_MM_DD tables). These fact areas
are divided into weekly partition tables, each tagged with the closing date of the
partition. There are 15 dimension tables in all, the primary ones representing:
• Buyers (Tbl_Dim_Buyer, 584 rows)
• Customers (Tbl_Dim_Customer, 5,636,425 rows)
• Time (Tbl_Dim_Date, 6,615 rows, because it is pre-populated to the year 2020)
• Vendors (Tbl_Dim_DC_Vendor, 12,686 rows)
• Products (Tbl_Dim_Item, 6,975,084 rows)
• Stores (Tbl_Dim_Store, 4,127 rows)
• Employees (Tbl_Dim_Store_Employee, 30,226 rows)
3

Table 1 shows the initial space used in the REAL_Warehouse database, before applying
any incremental updates. Because weekly partitions are maintained for the sales and
inventory fact tables, the number of rows per table will vary, but an average is given.
Note: The table sizes will increase as indexes are added in support of relational
queries. Creating cubes over the fact tables does not require indexes on the tables.
Tables Rows Size (MB) Bytes/Row Rows/Table
Dimension tables 15 12,666,277 6,420 n/a n/a
DC Inventory facts 18 54,405,164 4,363 84 3,022,509
Store Inventory
facts
53 8,630,298,635 435,983 53 162,835,823
Store Sales facts 157 1,366,052,628 192,354 148 8,700,972
Division Strategy
facts
1 33,417,014 2,013 63 33,417,014
Table 1: Initial data warehouse statistics
At B&N, incremental updates are pulled using SSIS from two Oracle databases — one
for point-of-sale transactions and one for inventory management — and retained in a
staging database before being integrated (again using SSIS) into the warehouse
databases. For Project REAL a copy of the staged incremental updates has been
forwarded to Microsoft. There are three months of daily incremental updates available
for the project.
Updates to various tables in the source databases are captured as rows with a date
stamp indicating when the modification occurred. There is one table for each data type
that is fed in — stores (BN_Store_Full), buyers (IMM_Buyer_Full), sales transactions
(SRS_Sales_Trans_Full), etc. Table 2 shows the amount of data available for
incremental updates.
Tables Rows Size Bytes/Row
Dimension data 7 12,050,392 8,734 n/a
DC Inventory facts 1 298,496,583 31,525 111
Store Inventory facts 1 294,776,968 65,713 234
Store Sales facts 1 148,801,022 29,129 205
Division Strategy facts 1 6,782,314 517 80
Table 2: Incremental update source data statistics
It should be noted that the initial data set contained three years of sales data, one year
of store inventory data, and just three months of DC inventory data. This means that
when three months of incremental update data are added to the system, the data
volume of the store inventory will grow proportionally more than the sales data, and the
DC inventory can be expected to double in volume.
To support the ETL process, EMC provided sufficient storage to support the creation of
restartable disk images of the data warehouse. On the EMC®
Symmetrix®
DMX1000,

Business Continuance Volumes or BCVs were used to accomplish this. EMC Replication
Manager was used to script BCV management during the ETL process. It should be
noted that multiple BCVs could be created to manage various states of instant
recoverability.
Before any ETL process begins, the Production data warehouse has already been
synchronized with the BCVs and then fractured or split to protect the BCV copy. The
utilization of these images was straight forward. The following is a high-level diagram
of this ETL process.
This is just one example of how the disk based replications can be used during the data
warehouse update. Often in recovery scenarios, you will first want to examine the
problem before taking action. For manual recovery, the EMC Replication Manager UI is
used. Replication Manager automates mounting, dismounting, scheduling, and
expiration of replicas and provides an easy to use graphical interface that can be used
to resynchronize the production warehouse with the BCVs. This can be performed after
successful manual verification of the warehouse update.
Using this approach the data warehouse can be quickly restored to a state prior to the
beginning of the ETL process.
5

In addition to using EMC Replication Manager for managing the synchronization of the
BCVs, it can also be used to mount the BCVs to another host, which further enhances
the utilization of BCVs. After being mounted to another host they can be used for
processes such as data integrity checks, running backups, running various tests, or as
another data copy for report generation. Mounting BCVs to a “mount” host makes all
this possible without taking valuable cycles away from the production hosts.
Additional BCVs could also be created for other purposes. For example, 2 BCVs could
be used in the following weekly timeline.
Friday: BCV1 and BCV2 are in sync with the primary (STD). Just before the data
warehouse load process begins on Friday night, BCV1 is broken from the set. BCV1 is
then mounted to a different SQL Server (used for backup purposes) to allow a backup
of the database to tape or other device such as an EMC CLARiiON®
Disk Library. If the
backup is to tape and the data warehouse is large, the backup could take several hours
or even days to complete.
Saturday & Sunday: The data warehouse load completes without any failures. BCV2
is then split from the primary, so that it can represent a snapshot of the post load data
warehouse.
Late Sunday: BCV2 is mounted to the alternate SQL host and it is then backed up as
the post ETL data warehouse.
Monday through Wednesday: Manual verification of the data warehouse is
completed and the new data is considered to be acceptable. If the data had been found
to have problems, many options would be available, from reloading a few erroneous
records, to reverting the entire data warehouse back to the state of BCV1.
Wednesday: Since we did not have any problems this week, we are now able to begin
resync’ing BCV1 with the primary.
Monday though Friday: Our Data warehouse consists of both planned and actuals
data. The actuals are loaded on the weekends, but the planned data can be changed at
any time. Therefore, we leave BCV2 split from the primary until Friday morning, just in
case someone does something to the data warehouse and we needed to revert back to
the post load version.
Friday: BCV2 is brought into sync with the others and everything is now ready again
for our next weekend load.

FridayWednesday
Monday - WednesdayFriday Night & Sat
STD
BCV1 BCV2
SQL
SQL2
STD
BCV1
BCV2
SQL
SQL2
STD
BCV1
BCV2
SQL
SQL2
STD
BCV1
BCV2
SQL
SQL2
STD
BCV1
BCV2
SQL
SQL2
STD
BCV1
BCV2
SQL
SQL2
STD
BCV1 BCV2
SQL
SQL2
DW Load Occurring
Ready to Start next Load
Backup BCV1 Occurring
Normal Weekday Usage
BCVs Standing By
DW Load OK’d
BCV1 Resync Started
Thursday
BCV2 Standing ByBCV2 Standing By
BCV1 Resync Complete
Sunday
DW Load Complete
Friday Evening
BCV2 Resync Started
STD
BCV1 BCV2
SQL
SQL2
Friday EveningFridayWednesday
Monday - WednesdayFriday Night & Sat
STD
BCV1BCV1 BCV2BCV2
SQL
SQL2
STD
BCV1BCV1
BCV2BCV2
SQL
SQL2
STD
BCV1BCV1
BCV2BCV2
SQL
SQL2
STD
BCV1BCV1
BCV2BCV2
SQL
SQL2
STD
BCV1BCV1
BCV2BCV2
SQL
SQL2
STD
BCV1BCV1
BCV2BCV2
SQL
SQL2
STD
BCV1BCV1 BCV2BCV2
SQL
SQL2
DW Load Occurring
Ready to Start next Load
Normal Weekday Usage
BCVs Standing By
DW Load OK’d
BCV1 Resync Started
Thursday
BCV2 Standing ByBCV2 Standing By
Sunday
DW Load Complete
Friday Evening
BCV2 Resync Started
STD
BCV1BCV1 BCV2BCV2
SQL
SQL2
Friday Evening
These BCVs do not necessarily have to be full copies (mirrors). The BCVs could use
copy-on-first-write to store only the differential, which depending upon many factors
such as overall DB size and update rate could provide adequate performance at a much
lower cost. These 2 types could even be mixed so that BCV1 could be copy-on-first-
write BCV, while BCV2 is a mirror, or vice versa. More on BCV types later;
Additionally, BCVs were used as part of project REAL to make data available to multiple
hosts for testing of the various configurations. For example, a BCV containing a
database could be attached to BI-REAL-DW to test the build of the AS cubes using BI-
REAL-AS as part of the distributed scenario. Afterwards, the same BCV could be
dismounted from BI-REAL-DW and mounted to BI-REAL-ES32 for the same test. In this
way the IO portion of the performance equation is held constant.
There are 2 primary types of BCVs, the mirror and the copy-on-first-write. The
discussions so far have focused mostly on the mirror and/or split mirror type of BCV.
Using the mirror type of BCV, a complete copy of the data is made and then is either
kept in sync or split to create a snapshot or version of the storage as it looked at the
time the split occurred. Thus the name split mirror. This is a good option especially if
you plan on mounting the BCV on another host as a second reporting database or even
to run backups, but it is the most expensive option since it doubles the disk space
7

requirements. Determining which technology to use is influenced by a variety factors
including data change rates, performance requirements, and economics.
The other type of BCV uses a copy-on-first-write methodology to produce a snapshot, in
much the same way that SQL Server 2005 database snapshots work. At the moment
that the “snapshot” is taken, the storage then knows to do a procedure called copy-on-
first-write. In other words, if any of the sectors are about to change and therefore vary
from the “snapshot”, that sector is first copied to another storage area and then the
change to the original sector is allowed. In this way, the copy of the original sectors
could be overlaid on top of the current state of the storage to produce the snapshot.
The copy of the sector that is made can be either stored in the solid state cache of the
storage or placed on an array, depending upon the settings. Which type of BCV is the
right type to use varies based on the situation and the percent of the data that will
likely change.
One note about methods of taking snap shots is that there are 2 basic methodologies
that are used in the industry. One is a copy-on-write as was just discussed. Another
type just writes the new (changed) sector to a separate area, instead of copying the
original to the new area and replacing it with the changed sector. At first glance, it may
seem as if the latter method is more efficient and it is more efficient when it comes to
writes. However, data is often read many more times than it is written. Therefore, it is
more important to optimize the reads than it is the writes. So which method is a better
read performer? That depends upon which version of the data will be read more. In
the end, it comes down to the old concept of fragmentation. In most cases the current
version of the data will be used by almost all reads and the snapshot will get minimal
activity. In this case, if the latter method of writing the changed sector to another area
was used, it would cause the data to appear fragmented. Instead of the disk being able
to just read sectors 1500 to 1510, it would also need to fetch the replacement for 1505
and this induces overhead. By using the copy-on-write method, no fragmentation is
created in the current version of the data.
Physical Components of the Implementation
Up to now the Project REAL implementation has been discussed from a logical point of
view — the software systems and the data they support. Now, to look at the physical
implementation — the server, storage, and network configurations. Many parts of the
system were implemented multiple ways to help understand the tradeoffs in various
approaches.
Server systems: distributed and consolidated
One of the most common questions about Integration Services, Reporting Services, and
Analysis Services is whether the service should be run on the same server as the
relational database or a different one. While there are some "rules of thumb" that have

been used in the past to answer this question, there never has been a single correct
answer, and with SQL Server 2005 the parameters may have changed. Therefore, one
aspect of Project REAL is to deploy into each of those configurations, and to explore the
tradeoffs in this context.
In order to support the objectives of Project REAL, Unisys provided significant hardware
resources to facilitate exploration and testing. Table 3 lists the primary servers. The
complete set of servers provides flexibility for testing and is not reflective of the typical
customer environment. For each architectural scenario, the benefits and best practices
will be explored and shared via the Project REAL experience.
There are four servers for the distributed environment, so that each machine can
support one of the major services: the relational data warehouse, Integration Services,
Analysis Services, and Reporting Services. The configurations of these machines were
chosen to represent typical "commodity" servers. In addition, there is one machine
whose sole purpose is to provide the source of the incremental update data feed.
To support the consolidated environment, Unisys has provided two ES7000 servers —
one 32-bit and one 64-bit. Each of them will be tested in the consolidated role
independently. In the consolidated role, one system will run the relational data
warehouse, Integration Services, Analysis Services, and Reporting Services.
Role Server Name Model CPU Cache Memory
Data source BI-REAL-DS 4x 700 MHz 4 GB
Distributed relational DW BI-REAL-DW ES3040L 4x 2.2 GHz 2 MB 8 GB
Distributed Integration
Services
BI-REAL-IS ES3040L 4x 2.2 GHz 2 MB 4 GB
Distributed Analysis
Services
BI-REAL-AS ES3040L 4x 2.2 GHz 2 MB 4 GB
Distributed Reporting
Services
BI-REAL-RS ES3040L 4x 2.2 GHz 2 MB 4 GB
Consolidated 32-bit server BI-REAL-ES32 ES7000 16x 2.2 GHz 2 MB 32 GB
Consolidated 64-bit server BI-REAL-ES64 ES7000-420 16x 1.3 GHz 3 MB 32 GB
Table 3: Servers provided by Unisys
It is assumed that there will be tradeoffs in performance, manageability, and reliability
between these configurations. For example:
• Does the system perform better if each service has its own dedicated server, or is it
better to share a larger pool of processors?
• Does the network slow down interactions between the services, or are other factors
more significant?
• Is it easier to maintain a single large server or multiple small ones?
• Are there more failures when more systems are in the configuration? Are failures
easier to isolate?
• What is the price/performance tradeoff between these configurations?
9

The results of these comparisons will be reported in a paper tentatively titled "Project
REAL: Performance Results and Architectural Comparisons" to be published in the
future.
It should be noted that the distributed / consolidated comparison is not the same as a
scale-out vs. scale-up comparison. These latter terms have to do with multiple servers
cooperating in a single role vs. a single larger server fulfilling the role. In a BI
implementation, these concepts tend to be more relevant in the front-end systems:
multiple reporting servers or analysis servers supporting end-user queries. These will be
more thoroughly discussed in a paper tentatively titled "Implementing Secure Web
Applications on the SQL Server BI Platform" to be published in the future. For a
complete description of the servers used in Project REAL, see the upcoming paper
tentatively titled "Project REAL: Architecture of the Hardware Environment."
Storage configurations
A properly designed storage system is necessary for assuring successful large BI
deployments. EMC invested significant storage resources to help explore various
designs and work toward establishing best practices specifically for BI environments.
This is an ongoing process and papers such as this one, written early in the testing
phase, could be revised or replaced by future papers, as testing continues.
The equipment provided by EMC includes both Symmetrix storage (DMX1000-P) and
CLARiiON storage (CX700), a CLARiiON Disk Library (CDL300) for backups, and two 2
Gb Fibre Channel switches. In addition, Emulex has provided LP10000 Host Bus
Adapters (HBAs) for connecting the servers to the switch.
Figure 2 is a high-level illustration of the storage area network and demonstrates the
high availability design. There are redundant paths from each server to the disks in
each storage array. Each of the storage systems (the DMX1000-P and the CX700),
which contain many physical disks, is configured as a number of logical volumes.
Through the switches, each volume is exposed to one of the servers. On the server the
volume is made available via a drive letter (e.g., the H: drive) or a mount point (e.g.,
C:mountDW4). The figure is a high-level illustration of the multiple paths available
from the hosts to the storage. Each server can access many logical volumes
simultaneously.

Figure 2: Storage connectivity
The logical volumes are designed for optimal performance in a particular role in the
overall system — for example, a volume that will hold relational log files is configured
for sequential write performance. There are a number of storage roles in the overall
system:
• Source data for incremental updates (relational)
• Relational data warehouse
• Relational database log files
• Analysis Services cubes
• Report databases (relational)
EMC has provided sufficient storage so that multiple copies of the data can be retained
while different configurations are experimented with. For example, cubes created with
different aggregation designs can be stored, which will have different performance
characteristics even though they have the same underlying data. In addition, the ability
to keep data for various roles on both Symmetrix and CLARiiON storage is available, so
that both storage systems can be tested with each server configuration (consolidated
and distributed).
The cross-product of the above data roles, storage systems, and multiple server
configurations means that the overall needs of Project REAL are much larger than a
typical BI system with this amount of data. Any single configured scenario, however, is
indicative of the type of configuration that would be in a typical deployment.
As is typical with storage systems of this class, the logical volumes are configured using
some form of RAID depending on the role. This is done to improve reliability, and
means that the sum of the logical volume sizes is less than the physical storage
available. The available space provided by EMC is:
System Physical Space
Symmetrix (DMX1000) 20 TB
CLARiiON (CX700) 24 TB
11

The challenges faced, specific to storage
Whenever clients begin looking at their storage requirements, they usually speak in
terms of storage size, not performance. In fact many times they have no idea what
their storage performance needs are. This was the case with Project REAL. When
clients are unsure of their storage performance requirements, the person designing the
storage has to try to elicit those requirements and then design for what they believe will
be adequate. Ideally, when clients request storage they will have the following
information for each type of item being stored;
• Total storage size needed now and growth projections
• Peak (including when the peaks occur), sustained, and average bandwidth
requirements (MB/s) for both Read and Write, with future projections
• Peak (including when the peaks occur), sustained, and average throughput in IOps
(IOs per second) required for both Read and Write, with future projections
• Fault tolerance needs
• Backup and recovery, including rapid recovery, needs
Knowing these things allows a storage solution to be designed to meet the client’s
needs.
A requirement for this project that differs from a regular implementation is that many
configurations and scenarios are to be tested and contrasted with each other. This
requires the ability to store multiple copies of the data and the ability to quickly make
those copies available to whichever hosts need to use them at that time. This was
accomplished through the use of several EMC technologies including Replication
Manager. The testing of configurations is an ongoing process and papers such as this
one, which are written early in this testing, could be revised or replaced by future
papers when new data is available.
The solution
Since little was known about what was needed on the performance front for either
throughput or IOps, a solution was presented that would have strong performance, with
the understanding that once more is known about the performance characteristics the
solution could be altered.
What was known was that about 3 TB of source data would be used as the basis for the
system. It was also known that fault tolerance would be needed, as well as a backup
and recovery system. Additionally, to test the various configurations the disks and/or
files on them would have to be made available to different machines for different tests.

Knowing that a total of 3 TB of data would be used for the source tells very little about
how much storage would be needed for the Relational Data warehouse and the MOLAP
storage for the Analysis Services cubes. Once the base relational table design is known
the table size can be approximated, but the size of other elements such as indices is
still unknown. On the MOLAP front, the size of the cubes can vary substantially based
on the number of aggregations performed and other factors. Also, since part of the
scheduled testing is to test the impact of different aggregation levels on cube build and
query performance, multiple cubes will be maintained on the storage, simultaneously.
To facilitate these unusual needs EMC delivered a Symmetrix (DMX1000-P), a CLARiiON
(CX700), and a CLARiiON Disk Library (Model DL310).
As the project continues and begins the performance testing stage, various EMC
software products, such as EMC ControlCenter®
(Performance Manager) and EMC
Navisphere®
Manager (Navisphere Analyzer), will be used to monitor the performance
of the storage and provide for adjustments wherever they might be needed.
Table Partitioning
Before other parts of this document are read, a few things need to be understood first.
One of those things is partitioning.
Overview
There are 2 types of partitioning that can be done to a table, horizontal and vertical.
Either way they involve breaking a table down into smaller more manageable and
usually faster pieces. Horizontal partitioning involves breaking the table apart into sets
of rows and vertical partitioning is sets of columns. Vertical partitioning is usually used
to put frequently accessed columns of a table into Part A and the, usually larger, less
frequently used columns into Part B. The parts are actually separate tables and need to
be joined for the data from both tables to be used; therefore the primary key has to be
carried in both tables. A similar but not exact version of vertical partitioning is done
when you use text columns in tables. The actual text of the text column is stored
separately. Horizontal partitioning is the primary method used by most who
implement large databases or other scenarios that might benefit from it. Therefore, it
will be the type of partitioning that will be used for the rest of this paper.
Horizontal partitioning involves breaking the table into sets of rows. This can be done
in many ways. Two of the most common ways are by either using ranges for the
partitions or a modulus function on an Identity type value. The modulus function
produces a kind of round robin placement of rows between the partitions. This does not
provide the type of benefits that are needed so it will not be discussed further. The
other is partitioning by ranges. These ranges can be created against any single column
in SQL Server 2005. For Project REAL, the tables were partitioned by calendar weeks.
There are no defined limitations that say that your partitioning ranges have to be equal,
but it is usually done that way for simplicity’s sake. For example it would be perfectly
13

valid to have 3 partitions in a table where partition 1 spanned 1/1/2004 to 6/30/2004,
partition 2 spanned 7/1/2004 to 7/31/2004, and partition 3 spanned 8/1/2004 to
8/7/2004. To learn more about how partitioning was implemented for Project REAL
read the paper entitled, “Project REAL:Data Lifecycle – Partitioning” and for more
information about partitioning in SQL Server 2005 read the papers entitled, “Partitioned
Tables and Indexes in SQL Server 2005” and “Strategies for Partitioning Relational Data
Warehouses in Microsoft SQL Server” all of which are referenced at the end of this
paper.
Benefits
All of the benefits of partitioning involve being able to work on a table in smaller
chunks, rather than an entire monolithic piece. This provides several performance and
management benefits such as;
• Increased query performance
• Increased maintenance performance
• Increased availability
• Easier to manage backup and restore methodologies
• The ability to migrate data to less expensive disks
For example, the data in the sales fact table would become read only after 8 weeks.
With the data partitioned into weekly chunks doing something as simple as an index
rebuild becomes much more manageable. If a worst case scenario were to be taken,
where an index would need to be rebuilt for 8 weeks of data out of the 157 weeks that
were stored, it would yield a 95% reduction in the amount of IO needed to do this, as
can be seen in the table below.
# of
Partitions
Size
(GB)
IO
Reduction
Table N/A 2000.0 0%
1 Partition 1 12.7 99%
8 Partitions 8 101.9 95%
Additionally locks would only be maintained on the partitions that are having their
indexes rebuilt.
This same type of gain can be had by other operations such as queries and backups.
The increased query and maintenance performance come from the smaller units of work
that need to be done.

Cube (Measure Group) Partitioning
Cubes (actually measure groups in Analysis Services 2005) are somewhat similar to
tables when it comes to partitioning. They reap the same benefits as tables from
having more manageable pieces rather than a single large piece. One advantage that
partitioned measure groups have over tables is that the partitions of a cube can be
distributed amongst a group of servers, thereby dividing the workload among them,
using remote partitions. For more information on cube (measure group) partitioning
refer to the paper entitled, “Project REAL:Data Lifecycle – Partitioning” referenced at
the end of this paper.
Data Migration
Overview
The common practice to date has been for companies to use a sliding window of
storage. This means that they keep their history up to certain depth such as 3 years,
which was the case at B&N. Because of regulatory issues and the recognition that
having more history can be substantially beneficial to analysis, companies sometimes
keep their data even longer, but in a lower performance fashion. For example some
companies move their data to tape or other off-line media. The newest trend is for
companies to keep all of their history, but to move it from the more expensive high
performance storage, to less expensive higher density lower performance storage. The
latter of these methods is being used for Project REAL. In the primary approach used in
this project, there were 3 tiers of storage for aging data.
1. Tier one demands the highest performance. For this tier 15,000 rpm 72 GB Fibre
Channel drives in a RAID 10 configuration were used. This yields the highest
performance for both reads and writes. The data stored on tier 1 will be the most
frequently queried and will also be heavily written to during ETL processing. In fact
it is the only tier that will contain partitions that can be written to. All other tiers
will contain read only partitions.
2. Tier two is on higher density slower drives such as 10,000 rpm 146 GB Fibre
Channel drives in a RAID 5 configuration. This provides higher density storage for
the partitions, which are read only, by the time they make it to this level. Because
they are read only RAID 5 provides adequate performance at a cost that is
appreciably lower than RAID 10. Additionally, the partitions stored here are used
much less frequently than those on level or tier 1.
3. Tier three is stored on the highest density drives which are 5,400 rpm 320 GB ATA
drives in a RAID 5 configuration. This minimizes the cost of keeping historical data
that is infrequently queried and needs to be kept on-line. Again, these partitions
are read only, which means that RAID 5 is adequate for this level. As drive sizes
continue to go up and cost per GB continues to drop, this tier will become a
permanent storage point for historical data. The day of the 1 TB hard drive is just
around the corner.
15

In other Relational Data Warehouse testing very good performance was achieved using
concatenated RAID 5. For sequential reads, which is the bulk of I/O against the
warehouse, the RAID 5 write penalty didn’t apply. The results showed that in some
cases, a 2 tier storage approach may be sufficient.
Aging Ranges Across
Storage Platforms
wk26wk26
20052005
wk27wk27
20052005
wk28wk28
20052005
wk1wk1
20032003
wk2wk2
20032003
. . .. . .
wk25wk25
20052005
. . .. . .
wk38wk38
20052005
wk39wk39
20052005
wk40wk40
20052005
wk26wk26
20052005
Partitions move fromPartitions move from
primary storage toprimary storage to
secondary storage assecondary storage as
usage decreasesusage decreases
Figure 3: Data Migration
Migrating partitions with SQL
To migrate partitions from one logical volume to another requires several steps.
Here is an excerpt from the “Project REAL: Data Lifecycle – Partitioning” paper written
by Erin Welker.
The following gives a high-level overview of the steps used to perform the
movement of aged data. Initially, this process appears to be somewhat complex,
particularly in a situation, as our in Project REAL, where the Sales partitioned
table has over 150 partitions! Since all of the above steps are metadata steps,
except for the data movement step, it actually runs very quickly.
1. Create a new partition scheme, based on the existing partition function that
exactly duplicates the existing partition scheme except for the moving

partition or partitions. The moving partition boundary in the partition scheme
definition will indicate a filegroup on less expensive disk.
2. Create a new partitioned table on top of the new partition scheme.
3. Iterate through each partition and switch from the old partition to the same
partition number in the new partition (both partitioned tables use the same
partition function) until the moving partition is reached. The shaded boxes
refer to populated partitions and the white boxes indicate an empty partition.
4. The moving partition needs to be explicitly copied, since the data is moving.
This can be done by copying the data directly from the old partition to the
new one, using an INSERT INTO...SELECT or we can SELECT INTO an
external table that resides on the same filegroup as the destination partition.
As in the initial load, the SELECT INTO performed far better than the INSERT
INTO so we chose the former method.
5. When using the SELECT INTO method, we then need to switch the external
table into its ultimate destination in the new partitioned table.
6. Now we iterate through the remaining partitions in the current partition
scheme and switch out the partitions to the new partitioned table as we did
in step #3.
7. We clean up by deleting the old partitioned table and partitioning scheme,
and renaming the new partitioned table to the original partitioned table
name.
For more information about this method, including code samples, please refer to the
“Project REAL: Data Lifecycle – Partitioning” white paper.
Migrating partitions / LUNs within the
storage array
As it can be seen there are a substantial number of steps that need to be performed to
migrate partitions using SQL Server 2005 on its own. EMC offers a storage system
based feature that allows for the seamless background moving of logical volumes within
the same storage system.
The key to making this work is using storage aligned partitions. This means that a
single logical volume contains a single partition or time slice of data. After a partition is
marked as read only, and is ready to be aged, a command is issued directly to the
storage array to begin migrating the logical volume from the high performance storage
to the medium density storage. This move is performed at a low priority so it can be
completed with negligible impact to host performance, using only storage array
resources when they are not otherwise being utilized by higher priority tasks, such as
executing queries. After this migration is completed, all host access to the logical
volume is now directed to the medium density storage, completely and invisibly to SQL
Server. Therefore, there are no changes to the partition function or scheme, no
SELECT…INTOs or anything else to accomplish data migration. Also, this is a purely
SAN based operation, which means no additional stress on the servers or LAN. In
addition, should a particular partition later become more important to the business, it
can be migrated back to higher performance storage to satisfy more aggressive queries
or updates.
17

Older data isOlder data is
aged to highestaged to highest
density storagedensity storage
after one yearafter one year
In Place Filegroup Migration
Last quarter dataLast quarter data
is aged to higheris aged to higher
density storagedensity storage
LUN Migration reduces ETL process complexity and
increases storage and database availability
Figure 4: LUN / Partition Migration
Finally, all of the discussion in the section has thus far been around the migration of the
relational data, but what about the MOLAP cube data. It is possible to migrate data
using Analysis Services Partitions, however it is not easy, nor quick. The supported way
to do this with Analysis Services involves dropping the partition from the Measure
Group, adding the partition back into the Measure Group with the altered storage path,
and reprocessing that partition. Using EMC LUN migration, the same task could be
performed without any impact to Analysis Services, the server, LAN, etc. This would be
almost exactly the same type of operation as it is with the SQL Server relational engine.
Backup Methodology
EMC NetWorker™, combined with a CLARiiON Disk Library model CDL300, provided the
backup solution for project REAL. For each server, NetWorker schedules were created
and enabled. For the OS and non-relational data, backups are run nightly. For the
databases on each server, a full backup is run each Saturday night and incremental
backups are run Sunday through Friday night. Currently these backups are run directly

from the live data, but future plans are to modify the backups so that they are made
from the split BCV mounted to a different host.
Additionally, there are plans to modify the backups so that they take advantage of the
filegroup/partition level capabilities of SQL Server 2005. This should dramatically
reduce the amount of data that needs to be backed up as part of a single session, since
most of the data is read only. The concept here is that only the most recent 8 weeks of
fact data, the dimensional data, and any supporting data would need to be backed up
each time. Since the only data of any reasonable size is the dimension and fact data, a
quick analysis and extrapolation can be done to show what the impact of this type of
backup strategy would be. The following table shows the per week size of each of the
fact tables and total dim tables’ size (which are not week based, but will grow over
time). It also shows the sizes of each table extrapolated out over 3 and 5 years.
Avg. Weekly
Partition (MB)
Size of 8
week backup
At
1 year
At
3 years
At
5 years
Dimension tables N/A N/A 4,600 6,420 8988
DC Inventory facts 242 1,939 12,847 38,540 64,233
Store Inventory facts 8,226 65,809 435,983 1,307,949 2,179,915
Store Sales facts 1,225 9,801 64,935 194,804 324,674
# of Partitions N/A 8 53 159 265
Total Size (MB) N/A 77,549 518,364 1,541,293 2,568,822
% to be backed up N/A 100% 16% 5% 3%
Approximate Backup
space saved / year (TB)
N/A N/A N/A 72 123
As you can see from the table, using the backup strategy outlined above, we would be
able to reduce the overall size of our weekly backups at the 3 and 5 year marks by 95%
and 97%, respectively. Finally, if we were to use a basic premise that we were keeping
all weekly backups for historical purposes (regulatory or otherwise), then we would be
able to reduce the amount of backup space used by 72 to 123 TB per year.
Since this scenario has not been implemented yet, the full extent of the limitations
surrounding this scenario have not yet been determined.
Between the filegroup/partitioning capabilities of SQL Server, the EMC BCV options
including split mirror and copy-on-first-write, other EMC technology options such as
EMC Snap and Clone, the CLARiiON Disk Library, and EMC NetWorker, the number of
possibilities is substantial. Therefore, it is beyond the scope of Project REAL to test all
of these permutations. Additionally, which technologies are used and how they are
used is completely dependent upon the goals of the project.
19

Variations on recovery
Many times recovery is discussed as a single type of event, when in reality there are
many different types of events that can require different types of recovery. These are
broken down into 2 primary categories:
• Non-catastrophic data integrity events – These events include user corruption of
data, hardware failure or hardware corruption of data, and many other events that
affect the data integrity of a portion of the data, but do not destroy the entire set of
data or geographic area.
• Catastrophic data integrity events – These events include a substantial fire, flood,
hurricane and many other events that effectively destroy the vast majority of the
data and the geographical area, such as the entire building where the servers and
storage were housed.
Traditionally, recovery from these types of events involves a similar set of actions,
which is normally some sort of restoration from slow backup media, such as tape. This
type of recovery can take anywhere from hours to days or weeks, depending on the
amount of data corrupted.
By utilizing the capabilities of the Symmetrix, CLARiiON, and CLARiiON Tape Library,
recovery from a non catastrophic event can be performed in minutes or even seconds.
Some of the methods of recovery from non catastrophic events have already been
utilized during Project REAL. Further methods of recovery will be utilized as Project
REAL’s lifecycle continues.
The ability to recover from catastrophic events is also supported but the testing of such
a recovery is not currently scheduled.

Storage Hardware
This section discusses the various storage hardware components that are being used.
It also includes some additional background information that is helpful to understand
the tiered storage concepts and data migration, at a hardware / RAID array level.
EMC Symmetrix
The EMC Symmetirx DMX™ 1000 is part of the Symmetrix DMX-2 storage family which
was the latest DMX family at the onset of Project REAL. Since then EMC has introduced
the next generation of Symmetrix arrays, the DMX-3. DMX storage arrays are the
industry standard for 24 x 7 availability. They provide non-disruptive everything;
upgrades, operation and service, and can replicate any amount of data, internally or to
another DMX, any time, anywhere, across any distance, to multiple locations, without
impact to service levels. The DMX provides up to 512 GB mirrored global cache and can
support up to 960 Fibre Channel drives of varying size and speed including 300 GB
10,000 rpm, 73 GB and 146 GB 15,000 rpm and 500 GB 7200 rpm drives providing for
in the box tiered storage. The DMX1000 provided for Project REAL contains 144 x 146
GB 10,000 rpm Fibre Channel drives and 64 GB of cache. The disks for Project REAL
were configured using RAID 1 (mirrored) and RAID 5.
EMC CLARiiON
The EMC CLARiiON CX700 is designed for environments requiring high capacities and
maximum application performance. The CX700 can support up to 240 hard drives of
varying types. It supports 10,000 and 15,000 rpm Fibre Channel drives, 7,200 and
10,000 rpm SATA drives, and 5,400 and 7,200 rpm ATA drives. This mix allows the
CX700 to have a mixture of storage performance and density (as is used for tiered data
migration) all within a single enclosure. For Project REAL the CX700 has 8 GB of
system memory that is primarily used for caching, and the following drives;
Quantity Size RPM Type
30 73 GB 15,000 Fibre Channel
30 320 GB 5,400 ATA
EMC CLARiiON Disk Library
The EMC CLARiiON DL310 Disk Library provides the benefits of disk-based
backup/restore in a simple to deploy, easy to use solution that requires no changes to
an environment or processes. The CLARiiON DL310 Disk Library emulates leading tape
solutions and is qualified with most leading backup applications. The DL310 can house
up to 45 drives for a usable drive capacity of 12.5 TB. Additionally the DL310 provides
data compression of up to 3:1 which results in a maximum compressed capacity of 37
TB. For Project REAL the DL310 currently has a total of 45 x 320 GB 5,400 rpm ATA
drives.
21

Types of RAID arrays used, why
Since various levels of RAID arrays were used, a brief discussion of RAID levels is
warranted here. There are 4 levels of RAID arrays primarily used today.
• RAID 0 –Striping
• RAID 1 –Mirroring
• RAID 3 and 5 – Striping, as in RAID 0, but with an extra disk added for parity.
• RAID 10 (0 + 1, 1 + 0) – known as a mirrored stripe or striped mirror combines
RAID 0 and RAID 1 into a single array.
RAID 0 is the fastest of all RAID levels, but it has no fault tolerance. Fault tolerance
means that if a drive in the array fails, the data can still be recovered. RAID 0 simply
stripes the data being written across all of the disks in the array at a predetermined
amount per disk. For example, if 8 disks are used and the setup for the stripe is to put
8 KB per disk, then a stripe would be 64 KB (8 KB x 8 disks). Then having started at
the beginning of the first disk and writing out an extent (which happens to be 64 KB)
then 1/8th
of the extent would be written to each disk in parallel. The speed advantage
is a near linear growth curve in both read and write speed as you add more drives.
However, since there is no fault tolerance, if any drive in the array is lost, then all of
the data on all drives is unrecoverable! Because of this lack of fault tolerance RAID 0 is
seldom used on production systems and was not used as part of project REAL.
RAID 1 by definition must have 2 and only 2 drives in an array. The host will see these
drives as a single unit with the available space of one drive. This is because everything
that is written to the first drive is also written to the second drive, making the second
drive an exact replica of the first; thus the nickname mirroring. In this case, either
drive could fail and the data would still be available from the other drive to the host.
RAID 3 and RAID 5 are the same except for RAID 3 has a dedicated parity disk and in
RAID 5 the parity is distributed across the disks. The difference in the RAID levels is
not pertinent for this paper. Therefore, wherever RAID 3 or RAID 5 is discussed, the
material applies to both RAID levels, unless otherwise noted.
RAID 5 is similar to RAID 0 in that data is striped across all of the drives in the array,
but it also adds an additional drive that is needed to make the array wide enough to
store the data and parity. So, in our example from RAID 0 if an extent were being
written, it would be written as before, but on a 9th
drive, 8 KB of parity information
would be written. Any of the drives in the array could be lost and the combination of
the remaining drives and the parity data could be used to determine the value of the
full stripe. Therefore, the RAID 5 calculation for the number of drives required is
usually referred to as N+1, which means if you intend to store 320 GB of data on drives
that yield 40 GB of storage each, then you would need 8 drives (N) to store the data

and you would need an additional drive (+1) for parity. This would yield a need for 9
(8+1) (N+1) drives to do this as RAID 5.
The added overhead of the parity can cause RAID 5 to be one of the slowest of RAID
levels when it comes to writes. This is caused by the fact that RAID 5 usually needs 4
physical IO operations for every logical write. This can sometimes be to 2 IO operations
if either the data where the write is going to occur is cached or a full stripe can be
written. This is the reason that a RAID 5 array is only recommended in situations
where minimal writes will occur or the systems performance is not based on the write
speed of data to this array.
When it comes to reads, RAID 5 performs well and can compete with RAID 0 for speed.
RAID 10 is a combination of RAID 0 and RAID 1, which is why it is sometimes referred
to as RAID 0+1 , RAID 1+0, or RAID 10. This can be thought of as a set or array (RAID
0) of mirrors (RAID 1). Actually, there is a substantial difference between an array of
mirrors and a mirrored array and you cannot rely on designators like 10, 0+1, or 1+0
to be sure of which you are implementing as different vendors use the notations to
mean different things. For one vendor, 1+0 might mean an array of mirrors, while
another vendor considers it to be a mirrored array. Therefore, if this is important to get
a specific implementation of RAID 10, and it should be, then the vendor’s
implementation should be verified. The differences between the implementations are
too many to go into in this paper, however it is undisputed that an array of mirrors is
more fault tolerant than a mirrored array. EMC implements RAID 10 as an array or
mirrors, for this reason. In our example of writing out an extent of 64 KB to our drives
at 8 KB each, here is how RAID 10 would work. First, the calculation for determining
the number of drives is Nx2 or 2N, which means that double the number of drives will
be necessary to store the data. Using the 40 GB drives from before and still storing
320 GB of data, 8 drives would be needed to store the data and 8 more drives to store
the mirror. Therefore 16 (8+8 or 8x2) drives total would be needed for the RAID 10
array. Back to the 64 KB example, the 64 KB would be broken down into 8 x 8 KB
chunks and each 8 KB chunk would be written to one of the 8 disks and its associated
mirror. Because of the dramatic increase in the number of drives (2N) and its
associated increase in price, many people decide to go with RAID 5 instead. This is
usually a big mistake because of how slow RAID 5 is with writes. They end up with an
array that underperforms and they usually end up upgrading to a RAID 10 array.
Types of drives used, why
There were 3 types of drives used in various places throughout Project REAL. They are
as follows.
• 15,000 rpm Fibre Channel hard drives – These drives provide the highest
performance and the highest cost. They are used primarily for areas that require
high performance reads and writes.
• 10,000 rpm Fibre Channel hard drives – These provide excellent performance and
the cost is less than their 15 k cousins. Larger drives are offered in the 10 k range,
than are offered in the 15 k range, thus increasing data density per disk. These
drives can be used in areas that require moderate read and write rates.
23

• 5,400 rpm ATA hard drives – These drives provide acceptable performance for low
demand situations and have the lowest cost. They come in very large units (those
used for Project REAL were 320 GB per disk) and provide the highest data density
per disk. These drives are recommended for use in areas that require more space
than they do performance, including areas that need to have infrequently used data
readily available and the data is almost exclusively used for reads.

25
References
There were various other papers referenced throughout this whitepaper. They are, in
no special order:
“Project REAL Technical Overview” by Len Wyatt
“Project REAL: Data Lifecycle—Partitioning” by Erin Welker
“Partitioned Tables and Indexes in SQL Server 2005” by Kimberly L. Tripp
“Strategies for Partitioning Relational Data Warehouses in SQL Server” by Gandhi
Swaminathan
Also, one book was mentioned:
Ralph Kimball's book “The Data Warehouse Toolkit: The Complete Guide to Dimensional
Modeling” (http://www.kimballgroup.com/html/books.html).
Project REAL will result in a number of papers, tools and samples over its lifetime. To
find the latest information, visit the following site:
http://www.microsoft.com/sql/bi/ProjectREAL

project_real_wp

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (7)

Similar to project_real_wp

Similar to project_real_wp (20)

project_real_wp