This document discusses SQL Server table partitioning and provides guidance on when it is helpful to use partitioning. It describes the key concepts of partitioning such as partition functions, ranges, schemes and switching partitions. It also outlines some of the fine print around limitations, parallelism, locking and maintenance. The document concludes that the client should use partitioning if their workload exhibits queries by region, they can optimize queries for it, have the disk and memory resources to support it and can test it adequately.
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Introduction to SQL Server Partitioning
1. This work is by Kendra Little and is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License
Kendra Little
Introduction to SQL Server
Partitioning
3. Index
1. A sample case.
2. What is partitioning?
3. When is partitioning helpful?
4. What’s the fine print?
5. Revisiting our sample case.
You are here
5. Index
1. A sample case.
2. What is partitioning?
3. When is partitioning helpful?
4. What’s the fine print?
5. Revisiting our sample case.
You are here
6. All tables have at least one partition.
“In SQL Server, all tables and
indexes in a database are
considered partitioned, even if
they are made up of only one
partition. Essentially, partitions
form the basic unit of organization
in the physical architecture of
tables and indexes. This means
that the logical and physical
architecture of tables and indexes
comprised of multiple partitions
mirrors that of single-partition
tables and indexes.”
…Partitioned Table and Index
Concepts (msdn)
OnePartition
7. “Partitioning” actually means
“horizontal partitioning”
Horizontal partitioning takes
groups of rows in a single table
and allocates them in semi-
independent physical sections.
SQL Server’s horizontal
partitioning is RANGE based.
8. Horizontal ranges are based on a
partition key.
A single column in the table.
Just one!
Use a computed column if you must, but make sure it
performs well as a criterion and works for joins.
Typically a date or integer value
Consider:
A column you will join on
A column you can always use as a criterion
I must
choose
wisely.
9. Ranges of data are defined by a
partition function which uses the key.
The partition function defines your boundary points and
can use either RANGE LEFT or RIGHT.
LEFT: the first value is an UPPER boundary point in
partition #1
RIGHT: the first value is a LOWER boundary point in
partition #2
Keep to the
right. It’s
easier.
10. RIGHT based partition function for
Doll Orders keyed on OrderDate
1/1/2008
1/1/2009
1/1/2010
1/1/2011
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
11. RIGHT based partition function
keyed on PartName (effectively LIST)
Boundary Point 1: BODY
Boundary Point 2: SHOE
Partition 1
Partition 2
Partition 3
Question: how do we get
rows into Partition 1?
12. Filegroups are mapped to the
partition function using a partition
scheme.
1/1/2008
1/1/2009
1/1/2010
1/1/2011
Partition 1:
Compressed
Partition 2:
Compressed
Partition 3
Partition 4
Partition 5
Slow,
Read-only
FG_A
FG_B
FG_C
FG_D
13. Objects are created on the partition
scheme.
Table
(and indexes)
• Created on partition scheme.
Partition
Scheme
• Maps partitions defined by the partition function to physical
filegroups
Partition
Function
• Boundary points
• Defines ranges
• Define an algorithm the engine will use to know where to put rows
14. Indexes can be created on the
partition scheme. Or not.
•Located on your partitioning scheme (or an identical partitioning scheme)
•Must contain the partitioning key.
•If the partitioning key is not specified, it will be added for you. Note: this
affects your primary key for the table!
•Indexes are aligned by default unless it is otherwise specified at creation time.
•Perform better for aggregations and when partition elimination can be used.
Aligned
Indexes
•Physically located elsewhere- either non partitioned or on a non-identical
partitioning scheme
•May perform better with single-record lookup
•Allow unique indexes (because they do not have to contain the partitioning
key)
•However, the presence of these preclude partition-switching!
Non-
aligned
indexes
15. Switching
Requires all indexes to be aligned.
Compatible with filtered indexes
Data may be switched in or out only within the same
filegroup.
Is a metadata-only operation requiring a schema
modification lock. This can be blocked by DML
operations, which require a schema stability lock.
Is an exceptionally fast way to load or remove a large
amount of data from a table!
18. Creating the partition scheme
The partition scheme can map each partition to a
specific filegroup, or all partitions to the PRIMARY
filegroup. Where the
rubber
meets the
road.
19. Query FGs mapped to the partition
function via the partition scheme
This gets a
little
complicated.
20. Creating a table on the partition
scheme and add some rows.
A partitioned
heap: you
can totally
do that.
21. Let’s have a look at that heap.
We’ll use this
query again, but
not show it on
every slide for
obvious reasons.
25. Switching in!
Don’t forget to drop
ordersDaily20101230:
your staging table is
still there, it’s just
empty now.
And you’re gonna
have to rebuild that
non-aligned NC if you
want it back.
26. Index
1. A sample case.
2. What is partitioning?
3. When is partitioning helpful?
4. What’s the fine print?
5. Revisiting our sample case.
You are here
27. Is maintenance a significant problem
for availability?
YES
• Partitioning may be what you
are looking for.
• Keep checking other factors.
NO
• You may have other reasons
to partition, but one of its big
benefits is to help with this.
Maintenance
includes index
rebuilds,
loading data,
and deleting
data.
28. Are query patterns defined by
regions?
YES
• Finding regions of data which are
queried together and have a good
partitioning key is important to good
query performance.
• This is the basis of partition elimination.
NO
• You may not have a good partitioning
key.
• Keep looking at the query patterns for
your workload and evaluating different
partitioning keys.
Data
regions may
be dates,
integers,
codes
29. Can applications and queries be
optimized for partitioning?
YES
• This means you will be able to
rewrite some queries and
procedures as needed to take
advantage of partition elimination.
NO
• If you do not have the ability to
tune user and application queries,
some will likely perform very poorly.
Some
assembly
required.
30. Do you have resources to support
the partitioned system?
• Can your disk configuration be optimized?
• Is enough buffer pool available for what
will need to be read into memory
concurrently?
• Will you be able to tune and configure
parallelism appropriately for the workload?
• Do you have a system you can test with a
production-like workload, or a suitable
rollback plan?
31. Index
1. A sample case.
2. What is partitioning?
3. When is partitioning helpful?
4. What’s the fine print?
5. Revisiting our sample case.
You are here
33. Support for HOW MANY partitions?
15,000 partitions are available in SQL 2008 with SP2
applied
SQL Server 2005, 2008, and 2008 R2 (for now) are
limited to 1,000 partitions. This is less than 3 years for
daily partitioning.
What problems
could happen with
lots of partitions?
34. Parallelism
In 2005, a query touching more than one partition
typically had only one thread per partition.
In 2008, the Partitioned Table Parallelism
improvement allows multiple threads to be used on
each partition for parallel plans.
Partition
1! Partition
1!
Partition
2!
Partition
2!
Partition
3!
Partition
3!
35. Lock escalation AUTO
Lock escalation can be set to AUTO for a table. If the
table is partitioned, locks will escalate to the partition
level rather than the table level.
What’s awesome: greater concurrency!
Partition level deadlocks
are not awesome. Test
your workload (like with
any feature).
36. Partition aware seeks
In SQL 2008, the optimizer has been made more
clever and has a greater chance at achieving partition
elimination. This has been done by:
Changing the internal representation of a partitioned
table to be more optimized for seeking on the
PartitionID (even when the table’s CX is on another
column)
A “skip scan” operation has been added to allow the
optimizer greater flexibility.
More optimized optimizin.
37. Be careful with your statistics
Statistics are not maintained per partition, they are
maintained for the entire index or column. Since there
is a limit to the number of steps in the histogram, the
statistics can become invalid, and on very large tables
may take a long time to update.
Filtered statistics can be used to help with this in
2008: you can create new filtered statistics for your
new partition.
This sounds like work.
38. Index rebuilds and compression
Individual partitions cannot be rebuilt online.
The entirety of a partitioned index can be rebuilt
online.
Individual partitions can be compressed.
For fact tables with archive data, older partitions can be
be rebuilt once with compression. Their filegroups can
then be made read-only.
I’d better check my
maintenance jobs.
39. Switching Feature Compatibility
Works with replication in 2008 and later
Some subscribers can have the partitioning scheme,
others don’t have to
This means you can have some subscribers on Standard.
Works with Change Data Capture (with some special
steps)
Does not work with Change Tracking
@SQLFool replicates her
partitioned tables, check
out her blog.
40. Index
1. A sample case.
2. What is partitioning?
3. When is partitioning helpful?
4. What’s the fine print?
5. Revisiting our sample case. You are here
42. This work is by Kendra Little and is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License
Resources/ Contact
There is a very large amount of documentation online
for horizontal table partitioning. Get my
recommendations here:
http://littlekendra.com/resources/partition/
This presentation would not have been possibly
without whitepapers and blogs by Kimberly Tripp,
Michelle Ufford, and Ron Talmage.
• Twitter: @kendra_little
• Email: littlekendra@gmail.com
• LinkedIn: http://www.linkedin.com/in/kendralittle
10 years working with SQL Server
Started with tsql queries for reporting and analytics
then lab administration
then database administration
Some systems engineering expertise, but mostly specialize in SQL Server layer and above.
Major interests: Performance Tuning and Reporting
Minor interests: Process, Communication Patterns
http://www.flickr.com/photos/lara604/3163790771/sizes/l/
Sample client has a reporting system containing both fact and dimension tables.
Fact tables are up to 300GB in size (including all indexes) with up to 1 billion rows.
Dimension tables are up to 200GB in size (including all indexes) with up to 200 million rows.
A middle tier application has been designed to dynamically create and execute queries to run reports custom-designed by clients.
As of SQL 2005, everything is partitioned!
“The key to determining whether a table is partitioned is the table (or index) data_space_id in the sys.indexes catalog view, and whether it has an associated partition scheme in the sys.data_spaces catalog view. All tables that are placed on a partition scheme will have 'PS ' (for partition scheme) as the type for their data_space_id in sys.data_spaces.”
from Ron Talmage: “Partitioned Table and Index Strategies Using SQL Server 2008” http://msdn.microsoft.com/en-us/library/dd578580.aspx
Horizontal partitioning takes groups of rows in a single table and allocates them in semi-independent physical sections.
Forms of partitioning in other products include:
List Partitioning: an explicit list of key values is specified for each partition. (Postgres, MySQL, Oracle) Note: this can be effectively done in SQL Server with RANGE.
Hash Partitioning: a function is defined with an expression that evaluates values in rows to be inserted in the table. (MySQL, Oracle)
Interval Partitioning: similar to range, but new partitions are automatically created (Oracle)
Composite Partitioning: combinations of the above (Oracle)
From “Partitioned Tables and Indexes in SQL Server 2005” (http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx)
Note Using the datetime data type does add a bit of complexity here, but you need to make sure you set up the correct boundary cases. Notice the simplicity with RIGHT because the default time is 12:00:00.000 A.M. For LEFT, the added complexity is due to the precision of the datetime data type. The reason that 23:59:59.997 MUST be chosen is that datetime data does not guarantee precision to the millisecond. Instead, datetime data is precise within 3.33 milliseconds. In the case of 23:59:59.999, this exact time tick is not available and instead the value is rounded to the nearest time tick that is 12:00:00.000 A.M. of the following day. With this rounding, the boundaries will not be defined properly. For datetime data, you must use caution with specifically supplied millisecond values.
Note Partitioning functions also allow functions as part of the partition function definition. You may use DATEADD(ms,-3,'20010101') instead of explicitly defining the time using '20001231 23:59:59.997'.
Each doll represents 100 million rows, each recording an order
http://www.flickr.com/photos/lara604/3163790401/sizes/l/in/photostream/
From “Partitioned Tables and Indexes in SQL Server 2005” (http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx)
Note Using the datetime data type does add a bit of complexity here, but you need to make sure you set up the correct boundary cases. Notice the simplicity with RIGHT because the default time is 12:00:00.000 A.M. For LEFT, the added complexity is due to the precision of the datetime data type. The reason that 23:59:59.997 MUST be chosen is that datetime data does not guarantee precision to the millisecond. Instead, datetime data is precise within 3.33 milliseconds. In the case of 23:59:59.999, this exact time tick is not available and instead the value is rounded to the nearest time tick that is 12:00:00.000 A.M. of the following day. With this rounding, the boundaries will not be defined properly. For datetime data, you must use caution with specifically supplied millisecond values.
Note Partitioning functions also allow functions as part of the partition function definition. You may use DATEADD(ms,-3,'20010101') instead of explicitly defining the time using '20001231 23:59:59.997'.
Query largely from Ron Talmage: “Partitioned Table and Index Strategies Using SQL Server 2008” http://msdn.microsoft.com/en-us/library/dd578580.aspx
Note that the partition scheme isn’t specified– it defaults
http://msdn.microsoft.com/en-us/library/ms177411.aspx
“If you frequently run queries that involve an equi-join between two or more partitioned tables, their partitioning columns should be the same as the columns on which the tables are joined. Additionally, the tables, or their indexes, should be collocated. This means that they either use the same named partition function, or they use different ones that are essentially the same, in that they:
Have the same number of parameters that are used for partitioning, and the corresponding parameters are the same data types.
Define the same number of partitions.
Define the same boundary values for partitions.
In this way, the SQL Server query optimizer can process the join faster, because the partitions themselves can be joined. If a query joins two tables that are not collocated or are not partitioned on the join field, the presence of partitions may actually slow down query processing instead of accelerate it.
“
Why so many? If you are using daily partitioning for a fact table, 1K partitions limits you to less than three years.
Warning: large amounts of filegroups can affect recovery time.
See http://blogs.msdn.com/b/sqlserverstorageengine/archive/2007/04/22/how-having-too-many-filegroups-can-affect-recovery-time.aspx?wa=wsignin1.0
http://msdn.microsoft.com/en-us/library/ms345599.aspx
“In SQL Server 2008, the internal representation of a partitioned table is changed so that the table appears to the query processor to be a multicolumn index with PartitionID as the leading column. PartitionID is a hidden computed column used internally to represent the ID of the partition containing a specific row. For example, assume the table T, defined as T(a, b, c), is partitioned on column a, and has a clustered index on column b. In SQL Server 2008, this partitioned table is treated internally as a nonpartitioned table with the schemaT(PartitionID, a, b, c) and a clustered index on the composite key (PartitionID, b). This allows the query optimizer to perform seek operations based on PartitionID on any partitioned table or index.
Partition elimination is now done in this seek operation.
In addition, the query optimizer is extended so that a seek or scan operation with one condition can be done on PartitionID (as the logical leading column) and possibly other index key columns, and then a second-level seek, with a different condition, can be done on one or more additional columns, for each distinct value that meets the qualification for the first-level seek operation. That is, this operation, called a skip scan, allows the query optimizer to perform a seek or scan operation based on one condition to determine the partitions to be accessed and a second-level index seek operation within that operator to return rows from these partitions that meet a different condition.
“