SlideShare a Scribd company logo
1 of 12
1 - Consider partitioning large fact tables
Consider partitioning fact tables that are 50 to 100GB or larger.
Partitioning can provide manageability and often performance benefits.
Faster, more granular index maintenance.
More flexible backup / restore options.
Faster data loading and deleting

Faster queries when restricted to a single partition..
Typically partition the fact table on the date key.
Enables sliding window.

Enables partition elimination.
2 - Build clustered index on date key of fact table
This supports efficient queries to populate cubes or retrieve a
historical data slice.
If you load data in a batch window then use the options
ALLOW_ROW_LOCKS = OFF and ALLOW_PAGE_LOCKS = OFF for
the clustered index on the fact table. This helps speed up table scan
operations during query time and helps avoid excessive locking
activity during large updates.
Build nonclustered indexes for each foreign key. This helps ‘pinpoint
queries' to extract rows based on a selective dimension
predicate.Use filegroups for administration requirements such as
backup / restore, partial database availability, etc.
3 - Choose partition grain carefully
Most customers use month, quarter, or year.
For efficient deletes, you must delete one full partition at a
time.
It is faster to load a complete partition at a time.
Daily partitions for daily loads may be an attractive option.
However, keep in mind that a table can have a maximum of 1000
partitions.

Avoid a partition design where only 2 or 3 partitions are
touched by frequent queries, if you need MAXDOP parallelism
(assuming MAXDOP =4 or larger).
4 - Design dimension tables appropriately
Use integer surrogate keys for all dimensions, other than the Date dimension. Use the
smallest possible integer for the dimension surrogate keys. This helps to keep fact table
narrow.
Use a meaningful date key of integer type derivable from the DATETIME data type (for
example: 20060215).
Don't use a surrogate Key for the Date dimension
Easy to write queries that put a WHERE clause on this column, which will allow partition
elimination of the fact table.

Build a clustered index on the surrogate key for each dimension table, and build a nonclustered index on the Business Key (potentially combined with a row-effective-date) to
support surrogate key lookups during loads.
Build nonclustered indexes on other frequently searched dimension columns.
Avoid partitioning dimension tables.
5 - Write effective queries for partition elimination
Whenever possible, place a query predicate
(WHERE condition) directly on the partitioning
key (Date dimension key) of the fact table.
6 - Use Sliding Window technique to maintain data
Maintain a rolling time window for online access to the fact tables. Load newest data, unload oldest data.
Always keep empty partitions at both ends of the partition range to guarantee that the partition split
(before loading new data) and partition merge (after unloading old data) do not incur any data movement.
Avoid split or merge of populated partitions. Splitting or merging populated partitions can be extremely
inefficient, as this may cause as much as 4 times more log generation, and also cause severe locking.
Create the load staging table in the same filegroup as the partition you are loading.
Create the unload staging table in the same filegroup as the partition you are deleteing.
It is fastest to load newest full partition at one time, but only possible when partition size is equal to the
data load frequency (for example, you have one partition per day, and you load data once per day).
If the partition size doesn't match the data load frequency, incrementally load the latest partition.
7 - Efficiently load the initial data
Use SIMPLE or BULK LOGGED recovery model during the initial data load.
Create the partitioned fact table with the Clustered index.
Create non-indexed staging tables for each partition, and separate source data files for
populating each partition.
Build a clustered index on each staging table, then create appropriate CHECK constraints.
SWITCH all partitions into the partitioned table.
Build nonclustered indexes on the partitioned table.
Possible to load 1 TB in under an hour on a 64-CPU server with a SAN capable of 14
GB/Sec throughput (non-indexed table)
8 - Efficiently delete old data
Use partition switching whenever possible.
To delete millions of rows from nonpartitioned, indexed tables
Avoid DELETE FROM ...WHERE ...
Huge locking and logging issues
Long rollback if the delete is canceled

Usually faster to
INSERT the records to keep into a non-indexed table
Create index(es) on

the table

Rename the new table to replace the original

Another alternative is to update the row to mark as deleted, then delete later during non
critical time.
9 - Manage statistics manually
Statistics on partitioned tables are maintained for the table as a whole.
Manually update statistics on large fact tables after loading new data.
Manually update statistics after rebuilding index on a partition.
If you regularly update statistics after periodic loads, you may turn off
autostats on that table.
This is important for optimizing queries that may need to read only the newest
data.
Updating statistics on small dimension tables after incremental loads may
also help performance. Use FULLSCAN option on update statistics on
dimension tables for more accurate query plans.
10 - Consider efficient backup strategies
Backing up the entire database may take significant amount of time for a very
large database.
For example, backing up a 2 TB database to a 10-spindle RAID-5 disk on a SAN
may take 2 hours (at the rate 275 MB/sec).

Snapshot backup using SAN technology is a very good option.
Reduce the volume of data to backup regularly.
The filegroups for the historical partitions can be marked as READ ONLY.
Perform a filegroup backup once when a filegroup becomes read-only.
Perform regular backups only on the read / write filegroups.

Note that RESTOREs of the read-only filegroups cannot be performed in
parallel.
Reference
MSDN Blogs
SQLCAT

More Related Content

What's hot

Big table presentation-final
Big table presentation-finalBig table presentation-final
Big table presentation-finalYunming Zhang
 
How to identify storage shelf type for netapp
How to identify storage shelf type for netappHow to identify storage shelf type for netapp
How to identify storage shelf type for netappAshwin Pawar
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented databaseKanike Krishna
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable영원 서
 
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured DataBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Dataelliando dias
 
Backup and restore
Backup and restoreBackup and restore
Backup and restoreRiteshkiit
 
FAQ on Dedupe NetApp
FAQ on Dedupe NetAppFAQ on Dedupe NetApp
FAQ on Dedupe NetAppAshwin Pawar
 
MySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBMySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBCalpont
 
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesBest Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesDATAVERSITY
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDatabricks
 
10 basic terms so you can talk to data engineer
10 basic terms so you can  talk to data engineer10 basic terms so you can  talk to data engineer
10 basic terms so you can talk to data engineerWorapol Alex Pongpech, PhD
 

What's hot (20)

Big table presentation-final
Big table presentation-finalBig table presentation-final
Big table presentation-final
 
EDW and Hadoop
EDW and HadoopEDW and Hadoop
EDW and Hadoop
 
How to identify storage shelf type for netapp
How to identify storage shelf type for netappHow to identify storage shelf type for netapp
How to identify storage shelf type for netapp
 
CS215 - Lec 9 indexing and reclaiming space in files
CS215 - Lec 9  indexing and reclaiming space in filesCS215 - Lec 9  indexing and reclaiming space in files
CS215 - Lec 9 indexing and reclaiming space in files
 
Google Big Table
Google Big TableGoogle Big Table
Google Big Table
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
 
Optimization in essbase
Optimization in essbaseOptimization in essbase
Optimization in essbase
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable
 
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured DataBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data
 
Designing data intensive applications
Designing data intensive applicationsDesigning data intensive applications
Designing data intensive applications
 
Bigtable
BigtableBigtable
Bigtable
 
Backup and restore
Backup and restoreBackup and restore
Backup and restore
 
Big table
Big tableBig table
Big table
 
FAQ on Dedupe NetApp
FAQ on Dedupe NetAppFAQ on Dedupe NetApp
FAQ on Dedupe NetApp
 
MySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDBMySQL conference 2010 ignite talk on InfiniDB
MySQL conference 2010 ignite talk on InfiniDB
 
Chap4
Chap4Chap4
Chap4
 
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesBest Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar Databases
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing Merge
 
10 basic terms so you can talk to data engineer
10 basic terms so you can  talk to data engineer10 basic terms so you can  talk to data engineer
10 basic terms so you can talk to data engineer
 

Similar to Large scale sql server best practices

Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testingsmittal81
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingAmir Reza Hashemi
 
Informix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableInformix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableKeshav Murthy
 
White paper on Spool space in teradata
White paper on Spool space in teradataWhite paper on Spool space in teradata
White paper on Spool space in teradataSanjeev Kumar Jaiswal
 
database-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptdatabase-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptIftikhar70
 
database-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptdatabase-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptsubbu998029
 
Importance of database design (1)
Importance of database design (1)Importance of database design (1)
Importance of database design (1)yhen06
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008paulguerin
 
Tableau Basic Questions
Tableau Basic QuestionsTableau Basic Questions
Tableau Basic QuestionsSooraj Vinodan
 
Tech-Spark: Scaling Databases
Tech-Spark: Scaling DatabasesTech-Spark: Scaling Databases
Tech-Spark: Scaling DatabasesRalph Attard
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designCalpont
 
Optimize access
Optimize accessOptimize access
Optimize accessAla Esmail
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesSperasoft
 
SAP HANA Interview questions
SAP HANA Interview questionsSAP HANA Interview questions
SAP HANA Interview questionsIT LearnMore
 
Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008paulguerin
 
Ssis partitioning and best practices
Ssis partitioning and best practicesSsis partitioning and best practices
Ssis partitioning and best practicesVinod Kumar
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Michael Rys
 
8 i index_tables
8 i index_tables8 i index_tables
8 i index_tablesAnil Pandey
 

Similar to Large scale sql server best practices (20)

Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testing
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
 
Informix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableInformix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_table
 
White paper on Spool space in teradata
White paper on Spool space in teradataWhite paper on Spool space in teradata
White paper on Spool space in teradata
 
database-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptdatabase-stucture-and-space-managment.ppt
database-stucture-and-space-managment.ppt
 
database-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptdatabase-stucture-and-space-managment.ppt
database-stucture-and-space-managment.ppt
 
Importance of database design (1)
Importance of database design (1)Importance of database design (1)
Importance of database design (1)
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
 
Tableau Basic Questions
Tableau Basic QuestionsTableau Basic Questions
Tableau Basic Questions
 
Tech-Spark: Scaling Databases
Tech-Spark: Scaling DatabasesTech-Spark: Scaling Databases
Tech-Spark: Scaling Databases
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
 
Optimize access
Optimize accessOptimize access
Optimize access
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
 
SAP HANA Interview questions
SAP HANA Interview questionsSAP HANA Interview questions
SAP HANA Interview questions
 
Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008
 
Ssis partitioning and best practices
Ssis partitioning and best practicesSsis partitioning and best practices
Ssis partitioning and best practices
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
8 i index_tables
8 i index_tables8 i index_tables
8 i index_tables
 

Recently uploaded

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Large scale sql server best practices

  • 1.
  • 2. 1 - Consider partitioning large fact tables Consider partitioning fact tables that are 50 to 100GB or larger. Partitioning can provide manageability and often performance benefits. Faster, more granular index maintenance. More flexible backup / restore options. Faster data loading and deleting Faster queries when restricted to a single partition.. Typically partition the fact table on the date key. Enables sliding window. Enables partition elimination.
  • 3. 2 - Build clustered index on date key of fact table This supports efficient queries to populate cubes or retrieve a historical data slice. If you load data in a batch window then use the options ALLOW_ROW_LOCKS = OFF and ALLOW_PAGE_LOCKS = OFF for the clustered index on the fact table. This helps speed up table scan operations during query time and helps avoid excessive locking activity during large updates. Build nonclustered indexes for each foreign key. This helps ‘pinpoint queries' to extract rows based on a selective dimension predicate.Use filegroups for administration requirements such as backup / restore, partial database availability, etc.
  • 4. 3 - Choose partition grain carefully Most customers use month, quarter, or year. For efficient deletes, you must delete one full partition at a time. It is faster to load a complete partition at a time. Daily partitions for daily loads may be an attractive option. However, keep in mind that a table can have a maximum of 1000 partitions. Avoid a partition design where only 2 or 3 partitions are touched by frequent queries, if you need MAXDOP parallelism (assuming MAXDOP =4 or larger).
  • 5. 4 - Design dimension tables appropriately Use integer surrogate keys for all dimensions, other than the Date dimension. Use the smallest possible integer for the dimension surrogate keys. This helps to keep fact table narrow. Use a meaningful date key of integer type derivable from the DATETIME data type (for example: 20060215). Don't use a surrogate Key for the Date dimension Easy to write queries that put a WHERE clause on this column, which will allow partition elimination of the fact table. Build a clustered index on the surrogate key for each dimension table, and build a nonclustered index on the Business Key (potentially combined with a row-effective-date) to support surrogate key lookups during loads. Build nonclustered indexes on other frequently searched dimension columns. Avoid partitioning dimension tables.
  • 6. 5 - Write effective queries for partition elimination Whenever possible, place a query predicate (WHERE condition) directly on the partitioning key (Date dimension key) of the fact table.
  • 7. 6 - Use Sliding Window technique to maintain data Maintain a rolling time window for online access to the fact tables. Load newest data, unload oldest data. Always keep empty partitions at both ends of the partition range to guarantee that the partition split (before loading new data) and partition merge (after unloading old data) do not incur any data movement. Avoid split or merge of populated partitions. Splitting or merging populated partitions can be extremely inefficient, as this may cause as much as 4 times more log generation, and also cause severe locking. Create the load staging table in the same filegroup as the partition you are loading. Create the unload staging table in the same filegroup as the partition you are deleteing. It is fastest to load newest full partition at one time, but only possible when partition size is equal to the data load frequency (for example, you have one partition per day, and you load data once per day). If the partition size doesn't match the data load frequency, incrementally load the latest partition.
  • 8. 7 - Efficiently load the initial data Use SIMPLE or BULK LOGGED recovery model during the initial data load. Create the partitioned fact table with the Clustered index. Create non-indexed staging tables for each partition, and separate source data files for populating each partition. Build a clustered index on each staging table, then create appropriate CHECK constraints. SWITCH all partitions into the partitioned table. Build nonclustered indexes on the partitioned table. Possible to load 1 TB in under an hour on a 64-CPU server with a SAN capable of 14 GB/Sec throughput (non-indexed table)
  • 9. 8 - Efficiently delete old data Use partition switching whenever possible. To delete millions of rows from nonpartitioned, indexed tables Avoid DELETE FROM ...WHERE ... Huge locking and logging issues Long rollback if the delete is canceled Usually faster to INSERT the records to keep into a non-indexed table Create index(es) on the table Rename the new table to replace the original Another alternative is to update the row to mark as deleted, then delete later during non critical time.
  • 10. 9 - Manage statistics manually Statistics on partitioned tables are maintained for the table as a whole. Manually update statistics on large fact tables after loading new data. Manually update statistics after rebuilding index on a partition. If you regularly update statistics after periodic loads, you may turn off autostats on that table. This is important for optimizing queries that may need to read only the newest data. Updating statistics on small dimension tables after incremental loads may also help performance. Use FULLSCAN option on update statistics on dimension tables for more accurate query plans.
  • 11. 10 - Consider efficient backup strategies Backing up the entire database may take significant amount of time for a very large database. For example, backing up a 2 TB database to a 10-spindle RAID-5 disk on a SAN may take 2 hours (at the rate 275 MB/sec). Snapshot backup using SAN technology is a very good option. Reduce the volume of data to backup regularly. The filegroups for the historical partitions can be marked as READ ONLY. Perform a filegroup backup once when a filegroup becomes read-only. Perform regular backups only on the read / write filegroups. Note that RESTOREs of the read-only filegroups cannot be performed in parallel.