Tractor Pulling on Data Warehouse


Published on

This topic was presented by Martin Kersten (CWI) at the 4th International Workshop on Testing Database Systems (DBTest 2011) on June 13th, 2011 in Athens, Greece.


Robustness of database systems under stress is hard to quantify, because there are many factors involved, most notably the user expectation to perform a job within certain bounds of the user requirements. Nevertheless, robustness of database system is very important to end users. In this paper we develop a database benchmark suite, inspired by tractor pulling, where robustness is measured as a system's ability to process data despite a continuous increase in system load, as defined in terms of data volume, query volume and complexity. A functional evaluation is performed against several systems to highlight the benchmark capabilities.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Tractor Pulling on Data Warehouse

  1. 1. Tractor Pulling on DatawarehousesMartin Kersten, Volker MarklMeikel Poess, Kai-Uwe Settler Alfons Kemper, Ani Nica, DBTest 2011
  2. 2. The good old days• The early eighties when – Oracle appeared on the scene – Ingres was a respected innovator on RDBMS – System R fought the Codasyl battle – IMS was still dominating the market• There was a need for a metric to evaluate the solutions
  3. 3. The good old days• Turned into an organised battle – TPC-C, TPC-H, TPC-D, TPC-W… – hundreds of benchmarks to proof one’s muscles
  4. 4. • We need tools to assess a solution space• We don’t need weapons to win a ‘war’
  5. 5. Dagstuhl 2010 Robust Query Processing
  6. 6. • With each step in the pull the tension of the Tractor increases (exponentially)• The Tractor driver is throttling and changing gears to keep it going
  7. 7. Ingredients of the DBMS Tractor Pull• A tractor pull is a series of workload steps for which we measure the performance• Each step is defined by – Catalog changes – Database load, delete+load+create index – Query processing, BI grouped statistics – Concurrency – Act of God operations
  8. 8. A database soilGenerate a small database < RAMUse a single data type
  9. 9. A database soilCOPY the smaller relation into the larger one Cop
  10. 10. A database soil
  11. 11. Query templateSELECT R0.B0, ...,Ri.Bi, count(*), avg(R0.B0),avg(R1.B0), avg(R1.B1),. . ., avg(Ri.B0), . . .FROM R0, . . . , RiWHERE selectpattern(R0, . . . , Ri) ANDjoinpattern(R0, . . . , Ri)GROUP BY R0.B0, . . . , Ri.BiORDER BY R0.B0, . . . , Ri.BiLinear, Cyclic, Star-based, Clique query patternsThe n-th query load includes the n-1 th query load
  12. 12. Scenarios• Tractor pull workload• W(N) = < S, L, Pre, Qry, Post, qry, db> – Schema adjustments – Loading the database – Pre-optimization – Query execution – Post optimization – query characteristics – db growth function
  13. 13. Hill scenario• The Hills scenario models a data warehouse that grows with a modest growth rate of g ∈ (0, 1) (e.g., g = 0.2).• It starts out from a main-memory focus until it overflows into a few disks.• It will highlight a system’s robustness to deal with the memory-disk
  14. 14. Hill scenarioA modest growing warehouse with a single user.The database fits in memory and spills over to diskD ∈ (0%, 100%), G∈ (0, 1)Number of connections at track I : 1db(0) = (D x RAM) x ( 1 / (2 x dom) )db(i) = g x i x db(0)qry(0) = 1, qry(i) = 4|qry(i)| = 1 + 4 x i
  15. 15. Meadow scenarioA stable warehouse with a multiple users.Query templates stress complexityd∈(0%,100%), g=0, C>1Number of connections at track i : Cdb(0) = (d × RAM) × (1) 2×domdb(i) = 0 (no growth)qry(0) = 0, qry(i) = C|Q(i)| = 1 + C × i
  16. 16. Rockies scenarioA growing warehouse with a multiple users.Query templates stress complexityd∈(0%,100%), g∈ (0,10)Number of connections at track i : idb(0) = (d × RAM) × (1) 2×domdb(i) = g × i × db(0)qry(0) = 0, qry(i) = i × 4|Q(i)| = 1 + 4 × i (i+1)/2
  17. 17. Robustness metrics• It is a multi-dimensional metric aimed at measuring the deviation from the expected norm• Robust(N)=<L, S, QO, QOk, QE, QEk, H> – Standard deviation of the loading time L – ,, Storage requirements – ,, Query optimization (per track – ,, Query execution (per track) – ,, Holistic
  18. 18. A hill scenario
  19. 19. A meadow Scenario
  20. 20. A Rockies scenario
  21. 21. Take aways• Robustness is all about comparisons. We need methods to quickly determine difference in behavior.• If the system reaches the end of the field we are happy. If it blows up or if the queries are behaving worse along the way it is not robust.
  22. 22. Conclusions• Tractorpulling is an effective new toolkit for robustness testing a DBMS in various dimensions• Refinements for ease of analysis is needed (GUIs)• rpulling