Tractor Pulling on Data Warehouse

Tractor Pulling on
Datawarehouses

Martin Kersten, Volker Markl
Meikel Poess, Kai-Uwe Settler
Alfons Kemper, Ani Nica,

DBTest 2011

The good old days
• The early eighties when
– Oracle appeared on the scene
– Ingres was a respected innovator on
RDBMS
– System R fought the Codasyl battle
– IMS was still dominating the market

• There was a need for a metric to
evaluate the solutions

The good old days
• Turned into an organised battle
– TPC-C, TPC-H, TPC-D, TPC-W…
– hundreds of benchmarks to proof one’s
muscles

• We need tools to assess a solution
space

• We don’t need weapons to win a
‘war’

Dagstuhl 2010 Robust Query
Processing

• With each step in the pull the tension
of the Tractor increases
(exponentially)

• The Tractor driver is throttling and
changing gears to keep it going

Ingredients of the DBMS
Tractor Pull
• A tractor pull is a series of workload
steps for which we measure the
performance
• Each step is defined by
– Catalog changes
– Database load, delete+load+create
index
– Query processing, BI grouped statistics
– Concurrency
– Act of God operations

A database soil

Generate a small database < RAM
Use a single data type

A database soil

COPY the smaller relation into the larger one

Cop

Query template
SELECT R0.B0, ...,Ri.Bi, count(*), avg(R0.B0),
avg(R1.B0), avg(R1.B1),. . ., avg(Ri.B0), . . .
FROM R0, . . . , Ri
WHERE selectpattern(R0, . . . , Ri) AND
joinpattern(R0, . . . , Ri)
GROUP BY R0.B0, . . . , Ri.Bi
ORDER BY R0.B0, . . . , Ri.Bi

Linear, Cyclic, Star-based, Clique query patterns

The n-th query load includes the n-1 th query load

Scenarios
• Tractor pull workload

• W(N) = < S, L, Pre, Qry, Post, qry,
db>
– Schema adjustments
– Loading the database
– Pre-optimization
– Query execution
– Post optimization
– query characteristics
– db growth function

Hill scenario
• The Hills scenario models a data
warehouse that grows with a modest
growth rate of g ∈ (0, 1) (e.g., g =
0.2).

• It starts out from a main-memory
focus until it overflows into a few
disks.

• It will highlight a system’s robustness
to deal with the memory-disk

Hill scenario
A modest growing warehouse with a
single user.
The database fits in memory and spills
over to disk

D ∈ (0%, 100%), G∈ (0, 1)
Number of connections at track I : 1
db(0) = (D x RAM) x ( 1 / (2 x dom) )
db(i) = g x i x db(0)
qry(0) = 1, qry(i) = 4
|qry(i)| = 1 + 4 x i

Meadow scenario
A stable warehouse with a multiple users.
Query templates stress complexity

d∈(0%,100%), g=0, C>1
Number of connections at track i : C
db(0) = (d × RAM) × (1) 2×dom
db(i) = 0 (no growth)
qry(0) = 0, qry(i) = C
|Q(i)| = 1 + C × i

Rockies scenario
A growing warehouse with a multiple
users.
Query templates stress complexity

d∈(0%,100%), g∈ (0,10)
Number of connections at track i : i
db(0) = (d × RAM) × (1) 2×dom
db(i) = g × i × db(0)
qry(0) = 0, qry(i) = i × 4
|Q(i)| = 1 + 4 × i (i+1)/2

Robustness metrics
• It is a multi-dimensional metric
aimed at measuring the deviation
from the expected norm

• Robust(N)=<L, S, QO, QOk, QE, QEk,
H>
– Standard deviation of the loading time L
– ,, Storage requirements
– ,, Query optimization (per track
– ,, Query execution (per track)
– ,, Holistic

Take aways
• Robustness is all about comparisons.
We need methods to quickly
determine difference in behavior.

• If the system reaches the end of the
field we are happy. If it blows up or if
the queries are behaving worse
along the way it is not robust.

Conclusions
• Tractorpulling is an effective new
toolkit for robustness testing a DBMS
in various dimensions

• Refinements for ease of analysis is
needed (GUIs)

• http://sourceforge.net/projects/tracto
rpulling

Tractor Pulling on Data Warehouse

Tractor Pulling on Data Warehouse

More Related Content

What's hot

Viewers also liked

Similar to Tractor Pulling on Data Warehouse

More from PlanetData Network of Excellence

Recently uploaded

Tractor Pulling on Data Warehouse