Top “n” Projects Dec. 7, 2007
Top “n” queries are those using the “top n”, “first n” (as in Ingres), “fetch first n rows”,
etc. features of the various RDBMS products. They are common in decision support
applications and have spawned a significant amount of activity in both the commercial
DBMS and research communities.
This document describes current Ingres support for “first n” and outlines several potential
development directions to enhance Ingres support of “first n”.
2. Current Ingres “first n” Support
The optional “first n” clause (where “n” is a positive integer) was added to the select list
in Ingres in 2000 (actually, it was added informally earlier than that, but was made an
official feature in 2000). Very simply, it passes the “n” value to QEF in the top level
action header of the query plan to direct qea_fetch() to return only the first n rows from
the result set.
It was originally coded because of potential applicability in the TPC C benchmarks that
we were running at the time and because of its utility to application designers and
developers in limiting the size of a result set (I’ve used it countless times for exactly this
It was implemented as simply as possible with no concern for optimization potential or
broader application utility. Indeed, in all my discussions with clients, I have always
pointed out that it simply takes the top n rows from the result set. If the query plan is
inherently expensive with blocking operators (e.g. sorts or aggregation), “first n” will not
make it any faster.
Additionally, to avoid complications or inexplicable side effects, its use is limited to the
outermost select of a query and the left most select of a union. While it is allowed in a
“create table as select …”, it is not permitted in a view definition.
3. “top n” and the SQL Standard
In recognition of the fact that most vendors already had “top n” support of one sort or
another, the standards committee have recently (last year) added it to the SQL standard.
The approach is more ambitious than most (all?) current implementations and permits
both “fetch first n rows” (yes, I know – it’s ugly) and “order by” at the end of each
“query expression” in a query. That means you can code them after each select in a union
and even after subqueries nested in a query (most typically, in a derived table). The
optional pairing of “order by” clauses with “fetch first” clauses is done to allow the
queries to produce deterministic results.
Separate features are defined that allow an implementation to support only a single “fetch
first” (as in Ingres), multiple “fetch first”s in <query expression>s (one per union) or
even “fetch first”s in subqueries. A separate feature also defines whether an
implementation permits “fetch first” in a view definition. Moreover, the “n” value can
now be a variable (host language variable or procedure parameter) and that capability is
defined as yet another distinct feature.
To complement the “fetch first” clause, an “offset n” clause has also just been introduced
to the standard. “fetch first n” defines how many rows will be returned and “offset n”
defines where in the result set the returned rows start. “fetch first”, “offset” and “order
by” clauses are all optional and can be used in any combination with one another. Feature
codes analogous to those for “fetch first n” are defined for “offset n” to limit its use in a
4. Ingres and Standards Syntax
4.1 Present Ingres Support
Ingres already supports the “fetch first n rows” and “offset n” clauses in changes
submitted to the main codeline. They are only permitted once in a query – the most
restrictive feature as defined in the standard. Again, no attempt has been made to perform
any optimization in implementing these features. However, they do satisfy the standard.
4.2 Extensions to Ingres Support
Support of multiple “fetch first” and “offset” clauses in a single query is possible in
Ingres. This might be useful in at least the union case. Implementation in OPF and QEF
would be straightforward and would likely take no more than a day or so. Implementation
in the grammar would likely be more problematic – in particular if multiple “order by”
clauses are also to be supported. Changes to the parse tree structure would be required
that, along with grammar changes, would likely take at least a week.
Supporting parameters for the “n” value would provide challenges of a different sort,
though not insurmountable. As it currently stands, parameters to Ingres queries are
always coded in contexts that result in their materialization in ADF CXs. In the case of
“top n” parameters, the parameters would fill in values in query plan action headers (the
“fetch first” and “offset” values) and aren’t used in the context of CXs. Parameter
descriptors and values are made available through a more general interface to QEF (based
in the QEF_RCB), so we are not bound to fake a CX to materialize them. Accordingly,
flags set in the action headers would probably suffice to indicate that the values must be
materialized from the parameter infrastructure, rather than directly from action header
fields. This would probably involve a day or so of work in each of PSF, OPF and QEF.
5. Optimization for “top n”
A query plan optimized to materialize all rows of a result set may not offer the most
efficient execution of a “top n” execution of the same query. DB2 has long supported a
“optimize for n rows” hint that causes the query plan to be built on the assumption that
the result set will only contain “n” rows.
“top n” optimization can be as simple as assuming that each node, table/index access or
join, will only produce n rows. However, in the current climate of data warehouse
applications and the prevalence of “top n” queries in various TPC benchmarks, more
sophisticated analysis of a “top n” query is required for building an optimal plan.
5.1 Carey, Kossman Approach
One of the earlier “top n” optimization papers was “On Saying ‘Enough Already’ in
SQL”, presented at SIGMOD 1997. It describes the problem nicely and proposes a
simple framework for handling some “top n” queries more efficiently. Their target is
select-project-join queries, typically with an order by clause to give some meaning to the
“top n” result. Without an “order by”, the results of a “top n” query are non-deterministic,
as noted in section 3, above.
Their fundamental idea is to introduce a “stop after n” operator into strategic locations in
the query plan to prevent the materialization of all rows in both the final result set and at
intermediate places in the query plan. The purpose of the “stop after” operators is to
reduce the number of rows passing through the query plan to a minimum. Costs are
associated with the new operator so that optimization of “top n” queries is sensitive to
potential savings of using the “stop after” operator.
Needless to say, the value of “n” in the “stop after” operator will depend on the query
itself. If a join can multiply the number of instances of a row in the result set, the “stop
after” value might be smaller than “n” in the “top n” syntax. On the other hand, if
predicates can remove rows from the result set (so-called “reductive” predicates), the
“stop after” value may have to be larger.
The paper talks about reductive and non-reductive predicates. It also discusses
conservative and aggressive placement of the operators. A conservative placement is
guaranteed to return the required rows, and possibly more. An aggressive placement may
not return enough rows and, therefore, may require the query to be restarted. In general,
all the approaches to “top n” optimization incorporate the potential need to restart a
Another interesting technique the paper discusses is a “sort stop” variant on the “stop
after” operator. Any sort operation must clearly consume all input rows, even if it is only
required to return the “top n” sorted rows. However, a very simple variation to the Ingres
heap sort algorithms would permit us to only keep the first “n” rows in the sort structures
at any point in time. At the start of the sort, the first “n” rows would be loaded into the
heap structure. But only those remaining rows whose sort key was in the first “n” would
be added to the heap (replacing one already in the heap). All other rows would simply be
discarded. For typical values of “n” this could always be done in the QEF memory sort.
Apparently Oracle incorporates this optimization in its “top n” processing.
Finally, another Carey/Kossman paper (“Reducing the Braking Distance of an SQL
Query Engine” from VLDB 1998) supplements these ideas with a technique using range
partitioning to split rows into partitions designed to capture the “top n” in one or more
materialized partitions, while discarding the remaining rows.
5.2 Donjerkovic, Ramakrishnan Approach
“Probablistic Optimization of Top N Queries” (presented at VLDB1999) builds on the
results of Carey & Kossman. Rather than introduce explicit “stop after” operators, their
paper recommends introducing a “cutoff predicate” to strategic locations in the query
plan. The cutoff predicate compares the order by column (which determines the “top n”)
to an appropriately estimated value of the column. The cutoff value is chosen to assure
that the “top n” rows are produced, though again, a restart operator is required in the
event that the estimated cutoff value is too restrictive.
Given that the “top n”-ness of a query is optimized by means of introduced cutoff
predicates in this approach, they also introduce the idea of a more holistic approach to
optimization that not only uses statistics and histograms to predict the numbers of
qualified rows at each step of execution, but also uses probability to assess the risk of
choosing incorrect cutoff values (resulting in restarts) and to incorporate restart risk into
the overall plan cost estimate.
Because of the acknowledged importance of statistical accuracy, the paper also
introduces the idea of evaluating the “quality” of histograms. Resulting maximum error
estimates in the histograms are then also incorporated in the probabilities used in plan
The paper also discusses a couple of specialized “top n” queries. Queries in which the
ranked attribute is a count (i.e., computed as “count(*)”) can use histograms on the
counted column, along with histogram error estimates as described earlier, to predict the
values that will produce the “top n” counts. This is risky business, though, and the values
chosen must be the right ones (i.e., the candidate set of values must be chosen
conservatively) since there is no verification process that the result is correct other than to
materialize the entire result set.
5.3 Ilyas, Shah, et al Approach
A more recent paper (“Rank-aware Query Optimization” presented at SIGMOD 2004)
addresses a subset of the ranking problem, with rank expressions (the thing in the “order
by” clause) consisting of values from more than one table. It proposes the use of
specialized “rank joins” that discard rows more quickly based on their rank scores. As
well as the modified join algorithms, the paper also describes techniques for
incorporating the new joins as alternate strategies that can be integrated into the overall
6. TPC Benchmarks and Classes of “top n” Queries
There are several categories of “top n” queries, some far easier than others to optimize.
Each of these categories is represented in the TPC benchmarks and so, deserves some
consideration. The “easy” “top n” queries are single table queries that order on a single
column. These could be handled efficiently with the “top n” sort described earlier.
Adding joins to a “top n” query, even if the “order by” is on a single column, can
complicate the optimization considerably. The multiplicative and/or reductive
characteristics of the join predicates must then be accounted for in order to estimate the
number of rows to input to the joins before the stop or cutoff should take place. Blocking
joins such as sort/merge and hash may be good for retrieving all rows of the result set, but
bad for the “top n”. Optimizing joins for “top n” queries is much easier when the joins
map onto known referential relationships. But Ingres referential relationships are not
known during optimization (I’ve made numerous recommendations on how to deal with
An “order by” on multiple columns from one or more tables and/or on multi-column
expressions further adds complexity. Multi-column ranking expressions are discussed in
the paper referenced in section 5.3, above.
“top n” queries on aggregate results are very difficult to handle. The paper in 5.2, above,
discusses how to handle the problem when the aggregate is count(). Even that case would
require significant changes to Ingres. However, sum() and avg() aggregates are even
more difficult. Personally, I can’t see any practical way of determining ahead of time the
groups in an aggregate query that will produce the largest sums. Even in a single table
query, the grouping and sum/avg columns will undoubtedly be different and any
optimization would require some idea of the degree of correlation between the columns.
Add a join into the mix, or compute the sum/avg on an expression and the problem is
unsolvable. The only optimization that could be brought to bear is the “top n” sort on the
results of the grouping and aggregation.
While one might hope that “top n” aggregate queries are rare, my intuition is that they are
the rule rather than the exception in warehousing applications. Each of TPC H, E and DS
contains “top n” queries on sum(). They all also contain “top n” queries ranked on
individual columns, though usually on joins. TPC H has a “top n” query on a count()
aggregate. TPC DS queries are typically very complex with nesting, derived tables,
unions and so forth. “top n” is then performed on the results of these extremely complex
queries. It seems hard to believe that such queries could be handled efficiently by any
means other than pre-computed materialized views.
7. Ingres Optimization of “top n” Queries
7.1 “top n” Sort
Adding a “top n” sort to Ingres should be a trivial task in QEF. The DMF sort will be
more difficult, both because of the potential for overflow to disk and in the face of
parallel sort execution where multiple threads may each be sorting a subset of the rows.
However, the likelihood of a value of “n” large enough to require a DMF sort with disk
overflow is very small.
A modified sort node will be required in the query plan to support “top n” sorting. This
would be generated by the OPF code generator and will also require some knowledge in
the query optimizer to use it effectively. A trivial alternative would be to detect a sort at
the top of a “top n” query plan and modify it with no optimization.
7.2 Knowledge of Joins across Referential Relationships
Optimizing “top n” joins is much easier if it is known that the joins map to referential
relationships. The Ingres catalog structure is not designed for easy determination of sets
of matching columns in a referential relationship. I have recommended before that new
catalogs be introduced that record these relationships in a manner that is useful to query
optimization. Even without considering “top n” queries, optimization of Ingres join
queries would benefit from this information.
A more general catalog that simply records proportions of rows from a cross product of 2
tables that participate in joins on specified columns is already informally in place in
Ingres. Populating it with statistics describing common joins is another technique that
would improve join estimates.
7.3 Restart Operator
The techniques for processing “top n” queries as described in the literature all incorporate
the notion of a restart operator in the event that not enough rows are processed to honour
the “top n” count. This could be done as simply as spooling the result rows until we know
we have the right number. If enough rows aren’t returned, the spool file is discarded and
the query is started over with modified cutoff values. This approach is obviously the least
efficient as it wastes all the effort required to materialize the first set of result rows.
Moreover, it delays the return of the first result row until all have been materialized and
spooled. “top n” queries typically also want the first rows to be returned as quickly as
The field of adaptive query processing has examined the problem of restarting query
plans while minimizing the amount of wasted work. The context of the restart is not the
same as for “top n”, though the techniques should be analogous. However, this is yet
another complication to solving the “top n” optimization problem.
7.4 Probabilistic Optimization
The techniques proposed by Donjerkovic and Ramakrishnan are applicable to the Ingres
optimizer, but only with a large amount of implementation effort. The mechanisms for
computing histogram quality (maximum error) proposed in the paper and in
Donjerkovic’s doctoral thesis wouldn’t be difficult to add to Ingres histogram
construction. However, the Ingres optimizer is not easily changed. Replacing the current
selectivity estimation in Ingres with the probabilistic approach discussed in the paper
would be very difficult.
OPF is an “old school” query optimizer in which cost estimation is interleaved with plan
enumeration. Consideration of different execution algorithms (join techniques, sorting,
etc.) is all “hard coded” in OPF. Optimizer extensions described in this and other papers
are much more easily applied to rule-based optimizers. But this would involve a complete
rewrite of the Ingres optimizer.
7.5 Optimization of sum/avg “top n” Queries
As suggested before, I don’t really see any way to effectively optimize a “top n” query
ranked on a sum or avg aggregate. While important subsets of “top n” can be effectively
optimized with strategies that could ultimately be introduced to Ingres, there will always
be other important subsets of “top n” queries with no solution or with entirely different
This document was intended to trigger discussion of various aspects of “top n” query
optimization and execution. This is a very wide field, much of it as yet unexplored. We
can certainly change Ingres to solve some of the problems that are faced in processing
such queries, but at least some of those changes will be very large in scope and will
require significant resources to implement.