References in this publication to IBM products, programs, or services do
not imply that IBM intends to make these available in all countries in which
IBM operates. Any reference to an IBM product, program, or service is
not intended to state or imply that only that IBM product, program, or
service may be used. Any functionally equivalent product, program, or
service that does not infringe any of the intellectual property rights of IBM
may be used instead of the IBM product, program, or service. The
evaluation and verification of operation in conjunction with other products,
except those expressly designated by IBM, are the responsibility of the
IBM may have patents or pending patent applications covering subject
matter in this document. The furnishing of this document does not give you
any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
208 Harbor Drive
Stamford, Connecticut 06904
AIX, DB2, NUMA-Q, QMF are trademarks of International Business
Microsoft, Windows, Windows NT are registered trademarks of Microsoft
Java or all Java-based trademarks and logos, and Solaris are trademarks of
Sun Microsystems, Inc. in the United States, other countries, or both.
Other company, product, and service names used in this publication may be
trademarks or service marks of others.
Large Scale Warehouse Challenges
Today, database management personnel are facing increasing challenges.
While they want to deliver information to end users as quickly as possible,
they are finding that it takes an enormous amount of resources to be
responsive to the growing number of users demanding information. Even
worse, the users’ needs for information continue to change and evolve, as
new information becomes available.
EMPOWERING USERS WITH INFORMATION
Innovative data mining, information queries, and trend analysis techniques
are providing companies with much needed competitive advantage, and are
enabling radical breakthroughs within industries. As a result, there is great
demand for rapid, innovative query capabilities against increasingly large,
mission critical data warehouses.
In the past, end users typically called upon their Information Systems (IS)
department and asked for a report on the market, sales, or inventory. IS
generated the report on the user’s behalf and provided it to them usually
days or even weeks later. These reports were static in nature and did not
allow the user to change the selection criteria for the reports. Sometimes,
requirements were not communicated properly and the users waited weeks
for a report that did not answer their business questions.
In the late 1980s, desktop query and reporting tools entered the market,
allowing end-users to perform their own queries against the corporate or
departmental database. This immediately provided the companies using
these applications with a competitive advantage. While end users at other
companies waited weeks for a report in order to make a decision,
companies that allowed their business analysts and key managers to query
directly against the database were making decisions in minutes and gaining
a competitive advantage in the marketplace.
The steady increase of end users performing queries against the corporate
database presents a huge challenge for database administrators. As the
number of users performing their own queries increases, the response time
of the system may decline due to increased contention. Large scale data
warehouses that provide breakthrough business value pose a challenge:
How can a company’s data warehouse continue to provide quick response
7 DB2 Query Patroller
time across large amounts of data to an ever increasing number of
end-users tapping the power of ad hoc query tools on their desktops?
One solution to this problem has been to add more hardware. The new
symmetric multi-processing (SMP) and massively parallel processing
(MPP) systems available in recent years can, in part, help handle the
increased load or help spread the load over several machines to improve
query performance. This is the direction many leading edge organizations
have taken with their Decision Support Systems (DSS).
HIGH AVAILABILITY AND STABILITY
Even with the addition of new, more powerful hardware systems, users
may still unknowingly submit “runaway” queries. When users submit these
queries during peak business hours, they can bring even the largest system
to a crawl.
With the advancement in query tools, it is now possible for every business
analyst in a company to quickly generate a query without knowing anything
about the back end database or about SQL. When too many end users
submit complex “runaway” queries to a database running on a large MPP
system at the same time, they can potentially bring the large, multi-terabyte
system to its knees. The drop in response is due primarily too poor query
management. The database load management capability was
overwhelmed because all the queries reached the data warehouse at the
same time. If the query submissions had been controlled, the response time
would not have been significantly impacted.
CORPORATE ASSET PROTECTION
All of these trends point to the need to grow and protect the data
warehouse as the vital corporate asset that it is. The robustness,
availability, performance, and security of large data warehouses are of
paramount importance since they enable radical business breakthroughs
while maintaining competitive advantage in the marketplace. .
DB2 Query Patroller 8
DB2 Query Patroller Developed To Manage
DB2 Query Patroller greatly improves the scalability of a data warehouse
by allowing hundreds of users to safely submit queries on multi-terabyte
class systems. The product is a true three-tier architecture solution. Its
components span the client/server environment to better manage and
control all aspects of query submission.
DB2 Query Patroller acts as an agent on behalf of the end user. It
prioritizes and schedules queries so that query completion is more
predictable and computer resources are efficiently utilized. After an end
user submits a query, DB2 Query Patroller frees up the user’s desktop so
they can perform other work, or even submit additional queries, while
waiting for the original query results. DB2 Query Patroller is integrated
with the DB2 optimizer and performs cost analysis on queries and then
schedules and dispatches those queries so that the load is balanced across
the database partitions.
DB2 Query Patroller sets individual user and group priorities, as well as
user query limits. This enables the data warehouse to deliver the needed
results to its most important users as quickly as possible. It also has the
ability to limit usage of the system by stopping those “runaway” queries
before they even start. If desired, an end user can choose to receive e-mail
notification of scheduled query completion or query failure.
DB2 Query Patroller consists of components running on the database
server and end users’ desktops. DB2 Query Patroller is made up of several
components each having a specific task in providing query and resource
9 DB2 Query Patroller
End User DB2 CAE
Query Patroller Tracker
Query Patroller Server
DB2 EE Agent Agent
... Agent Agent
Query Patroller Server
AIX, Solaris, NT, 2000, HP-UX, NUMA-Q
Figure 1 – DB2 Query Patroller Overview
Query Patroller Server
The Server is the core component of DB2 Query Patroller. It provides an
environment for storing user profiles, storing system parameters,
maintaining job lists, scheduling queries and storing node information. The
Server component executes on a node with the DB2 database server.
The Administrator component gives a DBA or system administrator the
tools needed to manage the DB2 Query Patroller environment. This java
interface allows for the management of the Query Patroller system. The
administrator provides menus to configure user profiles, system parameters
and node parameters.
The system administrator can set up system-wide, partition, user, or group
level thresholds for governing the data warehouse, including:
Ÿ Maximum number of queries running on the system at any given time.
DB2 Query Patroller 10
Ÿ Maximum cost threshold for the entire system. The cumulative cost of all
queries running cannot exceed this number.
Ÿ Maximum cost threshold for each defined user or group in the system.
Ÿ Maximum number of jobs a user can initiate. This value can be configured
differently for each of your users or groups.
Ÿ Specific amount of time to retain temporary result tables. When DB2 Query
Patroller takes control of a query, a temporary table is created in the
database to store the query results. DB2 Query Patroller will automatically
clean up these tables after the period of time specified by the administrator.
Query Patroller will also allow users to share results sets so that a query
can be executed once and all authorized users can reuse the result set.
Query Patroller Agent
The agent component of DB2 Query Patroller resides on each of the
database server nodes. It processes the database requests on behalf of the
query patroller server and gathers resource utilization statistics to allow for
query workload balancing, as well as monitoring of the resource utilization
of each partition.
On a uni-processor or non-clustered SMP machine, the agent and server
components run on the same machine. On MPP or clustered SMP machine,
the server runs on one node and the agents will run on all of the database
The Query Monitor component of DB2 Query Patroller provides both the
administrators and the end users with a Java based interface for viewing
and managing their queries. The Query Monitor component enables end
users to view a job’s status, submit and cancel queries, and drop result
tables. End users can only display information for their own queries and
jobs running on the system while the Query Monitor tool provides
administrators with the ability to manage all queries in the system.
11 DB2 Query Patroller
Figure 2 – DB2 Query Monitor
The DB2 Query Monitor job list maintains the queries submitted by end
users. The job list contains information about each query submitted through
Query Patroller. The system administrator or end user is able to use the job
list to view information for the queries in the system including:
Ÿ Job sequence number
Ÿ Query priority
Ÿ Query status
Ÿ Query source
Ÿ Node on which the query was submitted
Ÿ Type of application submitting the query
Ÿ ID of user submitting the query
Ÿ Date and time the query was submitted
Users and administrators may also view more detailed information on any
of the queries listed in the job list table, such as query run time, cost of the
query, and the SQL statement.
DB2 Query Patroller 12
The Query Enabler component of DB2 Query Patroller executes inside of
the DB2 client. This component intercepts dynamic SQL statements being
passed to DB2 from any front end query tool. Query Enabler interacts
with other DB2 Query Patroller components and with the user to execute
or schedule the query and to return results from previously completed
Query Enabler intercepts queries submitted by end users. If matching
queries exist on the Query Patroller Server, Query Enabler provides the
end user with a display of those queries and prompts the user to indicate
whether or not a new result set should be returned. Whenever a user wants
to submit a query, Query Enabler provides the option to set scheduled run
times or to submit and wait for the results. If the end user does not want to
wait for the query results, Query Enabler releases the desktop application
and passes the query to the DB2 Query Patroller Server. Query Patroller
then takes control of the query and runs it in the background on behalf of
the end user. The next time the user submits that same query, the result set
for that query will be returned to the application.
Query Enabler also has the ability to run in a silent mode so that the end
user does not interact with the Query Enabler, but rather they can run in
the same mode they have today with their end user tools submitting queries
directly to the server. This also enables 3-tier or n-tier applications to
utilize Query Patroller without the need for additional software on the
DB2 Query Patroller Tracker
Figure 3 – DB2 Query Patroller Tracker
13 DB2 Query Patroller
The DB2 Query Patroller Tracker product enables a user to manage the
databases by displaying usage history in a graphical, user-friendly format.
It provides two key features that support system administrators in
managing the database. First, it gives the system administrator the ability
to monitor the database load and activity over time. Second, it provides
the administrator with details on table and column access to assist in tuning
the system. The Query Patroller server stores the historical information in
DB2 tables so that administrators can drill down on whatever aspects of
the database usage that they desire using the query tool of their choice.
DB2 Query Patroller 14
DB2 Query Patroller - Brings it all together
in one product
SETTING THE STANDARD FOR ROBUST QUERY
To understand DB2 Query Patroller functionality and how it differs from
query tool management systems, it is necessary to understand the problems
each tool is designed to solve. At present, four technologies are available to
manage queries and resources:
Ÿ Ad hoc query tools
Ÿ Three-tier proprietary tools
Ÿ Server-based query and resource managers
Ÿ Hardware resource managers
Each has its own strengths that make it appropriate for particular types of
query situations. Figure 4 illustrates the queries divided into four classes
based upon the resource load leveling provided and the management of the
query before it runs against the database.
Managed Proprietary DB2 Query
Query Tools Patroller
Ad Hoc Hardware Load
Not Query Tools Leveling
Resource Load Balancing
Figure 4– Classes of Query and Resource Management
15 DB2 Query Patroller
Ad Hoc Query Tools
Ad hoc query tools do a good job of allowing the end user to directly ask
questions of the database without having to go to IS personnel every time
they have a need for additional information. Generally, with relatively
small databases and few users, there is little need for query and resource
management. However, as databases grow larger and the number of users
increase, the strain on the data warehouse becomes evident. In some cases,
a query tool may have a rudimentary scheduling facility. However, that
requires the user to keep their PC powered on overnight to schedule the
query or puts the burden on the end user to share resources in good faith
with other users. QMF for Windows is a unique exception that provides
predictive query governing. However, it requires managing a distributed
environment rather than all query governing being centrally managed by the
Hardware Load Levelers
Some Database Management Systems and MPP hardware companies offer
system software products that spread the query load across the
database-specified nodes. Queries are routed to free nodes for processing.
Even though this provides a good use of the hardware resources, it does
not look at the type of query being submitted. Any query that comes into
this type of system is immediately run, regardless of the time or cost that
the query will consume.
Proprietary Query Management Tools
Three-tier query tools have a server component that provides some
capabilities for scheduling queries. This component releases the desktop
and submits the query at the pre-scheduled time on behalf of the end user.
Typically, these type of tools only work with their own front end and
provide a canned query interface for end users. This is less adaptable for
ad hoc querying. Typically, many users submit very predictable optimized
queries. Three-tier query tools provide little user prioritization and
DB2 Query Patroller All Management in One Tool
The first three categories of query and resource management tools fail to
provide end users with acceptable query response times and IS with the
control they need. The DB2 Query Patroller product addresses these
challenges. DB2 Query Patroller is the only product of its kind on the
market today that controls and monitors queries. DB2 Query Patroller
works with dynamic SQL query tools to prioritize and schedule user
queries based on user profiles and cost analysis performed on each query.
DB2 Query Patroller 16
Large queries are put on hold and can then be scheduled for a later time
during off-peak hours. Queries with high priority (based on user profiles)
are promoted to the top of the schedule. In addition, DB2 Query Patroller
monitors resource utilization statistics to determine which partitions are the
least busy and provides load distribution functionality that evens out the
workload across the system.
HIGH AVAILABILITY AND STABILITY
Proactive Query Capture
At the core of DB2 Query Patroller’s breakthrough functionality, is its
ability to proactively capture queries. Query Patroller’s proactive
approach to query management helps it guarantee the high levels of
availability and stability required in a mission-critical data warehouse. As
queries are submitted against the data warehouse, Query Patroller traps the
queries, assesses their cost, and prioritizes their execution. Without this
proactive query trap, users could submit “runaway” queries that
compromise the system availability and IS could only report in retrospect
why the system failed. DB2 Query Patroller serves as a vigilant eye over
vital corporate data warehouses. Since the queries are captured, should the
database server fail for any reason, these queries will be automatically
restarted by Query Patroller on behalf of the end user.
Robust Query Termination
The proactive query capture approach is enhanced through Query
Patroller’s robust query termination. One of its strengths is its ability to
effectively terminate queries. Many ad-hoc query tools give end users a
terminate option, but in reality the query is just terminated on the client
workstation. The processes already started on the database server may
not be terminated. If the user assumes that the query has been cancelled
they might be more likely to submit other queries and repeat the same
submit and terminate process. The end result could be that the server gets
bogged down with multiple orphan queries that continue to run, wasting
valuable resources. DB2 Query Patroller addresses this problem by truly
terminating the query. It ensures that both the end user workstation and
the database server are released from a terminated query. This ensures that
the cycles used for processing on the database server are fully utilized by
needed queries and it frees up the system administrator from having to
monitor and kill the orphan queries.
17 DB2 Query Patroller
Frees User Desktop, Improving Productivity
Typically with most front-end query tools, after a user submits a query, the
user’s application is “hung up” in a “pending output” state until the results
of the query return. Users must wait until the query completes for their
desktops to become available, which can greatly reduce their productivity.
Users need to be able to perform other tasks, even submit additional
queries, while earlier queries run in the background.
In many cases, users don’t need their query results back until the next day
or the following Monday morning. Thus, instead of submitting a query for
immediate execution , the query could be scheduled for a later time when
the system load may be lower. DB2 Query Patroller frees up the user
application and improves user productivity by allowing the user to submit
and schedule queries based on their response requirements.
Query Cost Analysis
DB2 Query Patroller is integrated with the optimizer which performs cost
analysis of each query entering the system to determine the static cost of
the query. Query Patroller enables the system administrator to modify a
user’s profile and specify a query cost threshold for each user or group.
After completing cost analysis, DB2 Query Patroller compares the returned
value to the value in the user profile. If the returned value exceeds the user
threshold, DB2 Query Patroller places the query on hold so that the query
can run at a later time. Query Patroller also notifies the end user that their
query is on hold for future execution.
The majority of ad hoc query tools do not take into account a user’s
priority with respect to other users submitting queries into the system. For
example, many times the CEO of a company needs a report right away for
a meeting, but the system is so overloaded that the query does not
complete in time. If the CEO’s priority class level is high, the query
request would automatically move to the top of the query submission
queue and be executed immediately.
DB2 Query Patroller provides an environment that facilitates the prioritized
completion of queries. It maintains a user profile for each user that submits
queries into the system. The user profile defines a priority class, which
identifies the relative priority a user has when submitting a query into the
database. A higher priority class places the user’s query closer to the top of
the query submission queue.
DB2 Query Patroller 18
The system administrator sets individual user and group priorities, thus
enabling the data warehouse to deliver the needed results to your most
important users as quickly as possible.
DB2 Query Patroller also enables the system administrator to limit the
number of queries that each individual user can simultaneously submit. This
feature gives other users the opportunity to have their queries processed in
a timely fashion.
Ideally, query workload should be balanced across available resources.
However, in an MPP environment, ad hoc query tools may only submit
queries onto one or two nodes on the system heavily using some nodes and
under utilizing others. In comparison, a server-based query manager could
more intelligently balance node utilization. This prevents bottlenecks at the
nodes being bombarded with ad hoc query requests.
DB2 Query Patroller provides the system administrator with the ability to
set system and user parameters to govern the queries entering the database.
The system administrator may specify the maximum number of concurrent
queries for each user or group, for each node, and for the entire system.
DB2 Query Patroller provides load leveling across MPP hardware
environments and clustered servers. By tracking node or server utilization,
Query Patroller routes queries to idle nodes or servers and spreads the
query load across the system.
Data Warehouse Optimization
DB2 Query Patroller enables system administrators to monitor the database
load by providing access to the following information:
Ÿ What tables are being accessed for all jobs
Ÿ Columns accessed for each table
Ÿ Number of rows returned, by table, for all jobs
Ÿ Detailed view of job activity over time
Ÿ Historical view of job activity
DB2 Tracker displays this information in an easy to view format by
determining the total number of tables accessed in the database, and
calculating the total number of times each specific table is accessed. For
each table displayed, the user is able to drill down to view the columns
accessed for queries against that table. This enables the administrator to
decide if new indexes should be created on the columns used most in the
table, if an Automatic Summary Table (AST) may improve performance,
or if certain tables should be considered for archival.
DB2 Query Patroller also provides robust charge back mechanisms.
Administrators can track usage by user, group, client hostname or
application submitting the query. The resources that can be accumulated
19 DB2 Query Patroller
for chargeback include elapsed query execution time, rows returned, or
query cost. All of this information is stored in DB2 tables which allows
chargeback reporting using any query tool.
DB2 Query Patroller 20
If you are interested in learning more about DB2 Query Patroller and the
other products in the DB2 UDB database family, please contact your local
IBM representative or visit our Web sites at:
DB2 Query Patroller 22