Improving Production SQL Server OperationsDocument Transcript
SQL Server Operations
July 15, 2002
Avoiding common problems that cause unexpected SQL Server
Contents outages leads to great performance and availability in your
production SQL Server environment. For example, detecting
capacity issues and data anomalies and avoiding surprise
Improving Production availability loss.
Operations of SQL Server ....... 1
Ensuring SQL Server performance and availability requires that
Service-Level Expectations...... 1 DBAs implement processes and tools to effectively operate and
support SQL Server. For guidance, DBAs can use the Microsoft
Best Practices.............................. 2 Operations Framework (MOF). MOF is a comprehensive
resource providing SQL Server DBAs with guidance in the form
Operations Management ......... 3 of white papers, operations guides, assessment tools, best
practices, case studies, templates, support tools, and services.
Support Management............... 4 The purpose of this white paper is to outline some of the MOF
concepts and processes and the NetIQ SQL management tools
Change Management ............... 5 that can be quickly implemented in a SQL Server environment to
provide immediate results in production operations.
Capacity Management.............. 6
For More Information .............. 8
Improving Production Operations of SQL Server
Microsoft SQL Server provides a reliable and scalable database solution. However, unless you apply
some basic processes to managing your SQL Server environment, it is likely you will encounter
common problems, unpredictable performance, unexpected hardware upgrades, and the resultant
For the DBA, applying basic processes and solutions to improve SQL Server availability and
performance produces happier users and a more stable, predictable work environment. Of course,
satisfied users and stability also helps the organization the DBA supports. During database outages,
user performance suffers, as does the credibility of and confidence in the corporate IT department.
The Microsoft Operations Framework (MOF) provides technical guidance on effectively operating
and supporting Microsoft technologies, including SQL Server. MOF is extremely comprehensive,
offering white papers, operations guides, assessment tools, best practices, case studies, templates,
support tools and services. This extensive knowledge set helps address the people, process,
technology, and management issues related to maintaining high availability and performance. Using
examples from the NetIQ SQL Management Suite, this white paper outlines several of the MOF best
practices for SQL Server, and shows you how to quickly improve production SQL Server operations.
The first steps toward improving production SQL Server operations is identifying the level of
availability and performance SQL Server users need to keep the business running. Then, you need to
determine if that level of service is possible with the systems, processes, and budget available.
The popularity of Service Level Agreements (SLAs) has grown and evolved since they were first
introduced in the 1960s. This evolution from measuring technical services like data center uptime by
IT departments to the much more broad-reaching SLAs of today. Now, SLAs are developed through
detailed back-and-forth discussions between IT departments and business customers. Customers
often use outside sources for research, benchmarking and final negotiations.
For a quicker solution, we recommend developing service-level expectations. Even if you do not have
the time and resources to develop formal SLAs, you can identify key metrics to determine if
SQL Server availability and performance meets the needs of your business users.
Service-level expectation criteria includes:
• Acceptable transaction response times
• Daily/weekly maintenance time
• Acceptable recovery times
• Acceptable information loss (if any)
• Fault reporting and resolution system
• Possible budgetary impact
Improving Production SQL Server Operations 1
Once you determine acceptable criteria against which you want to measure expectations, there are
many automated monitoring and management tools to help track availability and performance, such
as NetIQ AppManager. These solutions constantly monitor the SQL Server environment, and alert
you when exception conditions occur.
NetIQ AppManager automatically runs user transactions at regular intervals and tracks response.
The spikes in the above screenshot highlight exceptions to normal response times.
After service-level expectations are identified in your production operation, focus improvement
efforts on the following four key areas:
• Operations management – Managing current SQL Server performance and tracking values over
a longer period to identify trends
• Support management – Troubleshooting and avoiding data problems
• Change management – Identifying changes made to systems and how changes differ from
systems in a model or golden state
• Capacity management – Ensuring current systems are optimized and accurately predicting when
new systems are necessary
You can improve production operations by implementing best practices in any or all of these areas.
2 White Paper
Operations management involves ensuring SQL Server runs at optimal performance. Remembering
to monitor performance from both a short- and a long-term perspective is extremely important.
To address immediate problems, you need to monitor and know about outages and trouble areas
before users experience them. You need real-time, immediate monitoring of the following SQL Server
• Buffer cache hit ratio
• CPU and I/O usage
• Full scans
• Lock waits
• Memory usage
• Network activity
• Oldest open transactions
• Page life expectancy
• Page lookups and requests
• Page splits
• Physical disk activity
• Procedure cache hit ratios
• Procedure cache sizes
• Read ahead pages
• Replication latency and transaction counts
• Server response times
• SQL batches
• SQL compilations and recompilations
• Table locks escalations
• Tempdb usage
• Work files and work tables
Tracking these values over time allows you to determine if a short-term anomaly is really an
exception to your performance record or a regular occurrence at a specific time. For example, does a
spike in I/O usage on Tuesday morning represent a one-time exception or an event that occurs every
Tracking performance over time also helps when an alternate DBA needs to work on an unfamiliar
system. Instead of having to guess if current system loads are normal, this DBA can use real historical
information to recognize normal conditions versus exceptions that need to be addressed.
Improving Production SQL Server Operations 3
NetIQ DiagnosticManager stores data for more than 20 specific performance areas, and can give
you either a real-time view or a view of performance over time.
Support management requires the management of data issues. DBAs commonly receive requests for
tracing and tracking changes to data. For example, “Who released that purchase order?” or “Is it a
user or an application that is entering this incorrect data?”
A common way to determine the cause of data changes is to investigate stored procedures/triggers to
try to pinpoint where the application changed data in a specific way, as well as the events leading to
the user or application making the changes. This in-depth investigation can be challenging, especially
in complex environments or for complex applications.
Another common situation involves the discovery of seemingly small changes that have huge
impacts on both database and user productivity. For example, a company updates its customer
database from ZipCodes to ZIP+4 with an update procedure but forgets that their customer database
also includes Canadian customers. The minor update effectively eliminates their ability to ship to
thousands of customers. They restore their backup from the previous evening and ask users to re-key
any data entered throughout the day.
4 White Paper
Utilizing the SQL Server transaction log can help you avoid scenarios like the one described above.
NetIQ RecoveryManager (powered by Lumigent Log Explorer) is the only tool that exposes the
contents of the transaction log to the SQL DBA. RecoveryManager allows you to research the source
of data changes and back out transactions in error while the database is online, keeping SQL Server
available while fixing otherwise time and resource extensive problems.
NetIQ RecoveryManager makes data in the SQL Server transaction log useful. It provides online
undo capability for individual transactions, restores dropped tables, and permits investigation of
data change history by users and applications.
In most organizations, multiple people have access to servers. Multiple DBAs might share
responsibility for a computer. Even more common, the DBA has to share access to a computer with
the Windows or network administrator. In these environments, it is vital to know what changes are
made to SQL Servers and how those changes compare to a known-good working state. When
changes are not tracked, it is easy for problems to occur. For example, a DBA and a network
administrator share the administration of a SQL Server database server. The network administrator
decides to make the server a BDC as a precaution in case there is a problem with the network PDC.
Of course, SQL Server performance remains fine until the PDC fails and the BDC is promoted. Being
aware of this type of change in a production SQL Server environment is definitely a best practice.
Improving Production SQL Server Operations 5
SQL Server areas for which to maintain change histories include the following:
• Environment (hardware and operating system)
• Application software and services
• Database schema (tables, columns, indexes)
• Security (permissions, groups, roles)
• SQL Server configuration settings
NetIQ ConfigurationManager keeps a history of more than 200 configuration settings affecting
performance. These settings range from changes to hardware through changes to the operating
system into changes to SQL Server instances themselves.
With the constant demands and everyday fire fighting involved in database administration, capacity
management and planning is often ignored. Unfortunately, ignoring capacity management usually
results in unpredicted and unbudgeted hardware requests.
Ensuring the optimized use of current servers is the first best practice in capacity management. Are
stored procedures or triggers consistently performing poorly? If so, direct your performance tuning
efforts towards them to get the best return from available hardware.
6 White Paper
Of course, even with tuning, you need to address database growth. Businesses today have an
insatiable need for data, and they want immediate access to that data. To understand how
SQL Servers are growing, you need to gather long-term growth statistics on:
• Database growth
• Table growth
• Index growth
• Table fragmentation
By understanding the inter-relationships of how table and index growth affects overall database
growth over time, it becomes easy to predict when hardware upgrades will be required.
DiagnosticManager highlights the databases with the greatest growth both in space and rows.
Within each database, the product displays the tables that have been responsible for that growth.
Improving Production SQL Server Operations 7
The best practices covered in this white paper were:
• Determine realistic user expectations
• Let tools automatically handle monitoring performance and availability
• Be able to quickly recover from outages
• Track changes to your environment
• Capture metrics for accurate capacity planning
Implementing these best practices though processes and tools will help you improve SQL Server
production operations. You will achieve higher availability and performance, while spending less
time on routine and mundane issues.
For More Information
NetIQ’s SQL Management Suite is the industry’s most comprehensive solution dedicated to
improving the production operations of Microsoft SQL Server. With the Suite, DBAs and IT
professionals can improve performance and availability with automated operations, in-depth
diagnosis, granular data recovery and configuration management.
The SQL Management Suite consists of four products:
AppManager for SQL Server is the most widely adopted solution for automatically managing
distributed SQL Servers from a central easy-to-use console. AppManager allows you to optimize
performance, run pre-packaged management reports, and ensure availability through automated
event detection and correction.
DiagnosticManager for SQL Server provides real-time performance and status information,
enabling administrators to quickly diagnose and correct SQL Server problems. You will be able to
quickly identify the root causes of problems and take action, reducing downtime and improving
RecoveryManager for SQL Server provides real-time recovery using SQL Server transaction log
analysis. DBAs can research the source of data changes and back out transactions in error, keeping
ConfigurationManager for SQL Server provides comprehensive change history and configuration
reporting for SQL Servers. By notifying DBAs and NT administrators of changes to key database and
system configuration changes, ConfigurationManager helps reduce downtime caused by less-than-
optimal change control procedures.
The products are available both as individual products and as a bundled suite.
For more information on the SQL Management Suite and to download a free 30-day trial, please visit
For more information on the Microsoft Operations Framework, please visit
8 White Paper