Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Finding Logic Bugs in Database Management Systems
1. Finding Logic Bugs in Database
Management Systems
Manuel Rigger
ETH Zurich, Switzerland
13/06/2020
@RiggerManuel @ast_eth https://people.inf.ethz.ch/suz/
4. 4
Logic Bugs
DatabaseDBMS
Logic Bugs are bugs that cause the DBMS
to return an incorrect result set
row1 φ
row2 φ
row3 ¬φ
row1 φ
SELECT *
FROM …
WHERE φ
6. 6
Is the Problem Solved?
SQLite (~150,000 LOC) has 662 times
as much test code as source code
SQLite is extensively fuzzed (e.g., by
Google’s OS-Fuzz Project)
SQLite’s test cases achieve
100% branch test coverage
Anomaly testing (out-of-memory,
I/O error, power failures)
https://www.sqlite.org/testing.html
8. 8
Bug-Hunting Challenge
22 bugs were classified as P1, and
6 as P2 → over 100 T-shirts!
https://pingcap.com/community-cn/tidb-bug-hunting/
9. 9
Automatic Testing Core Challenges
1. Effective test case
Generate a
Database
Generate a
Query
Validate the
Query’s Result
10. 10
Automatic Testing Core Challenges
1. Effective test case
Generate a
Database
Generate a
Query
Validate the
Query’s Result
2. Test oracle
11. 11
Automatic Testing Core Challenges
1. Effective test case
Generate a
Database
Generate a
Query
Validate the
Query’s Result
2. Test oracle
Query and database generation
is not the focus of this talk
12. 12
Heuristic Database Generation
Lower and upper limits
for all statements
CREATE TABLE t0(c0 INT UNIQUE);
CREATE TABLE t1(c0 INT, c1 TEXT);
Generate
Tables
Select Other
Actions
13. 13
Heuristic Database Generation
INSERT INTO t1(c0, c1) VALUES (0, '');
UPDATE t1 SET c0 = 3;
CREATE INDEX i0 ON t1(c0, c1) WHERE c0 > 0;
INSERT INTO t0(c0) VALUES (0), (0);
CREATE TABLE t0(c0 INT UNIQUE);
CREATE TABLE t1(c0 INT, c1 TEXT);
Generate
Tables
Select Other
Actions
14. 14
Heuristic Database Generation
INSERT INTO t1(c0, c1) VALUES (0, '');
UPDATE t1 SET c0 = 3;
CREATE INDEX i0 ON t1(c0, c1) WHERE c0 > 0;
INSERT INTO t0(c0) VALUES (0), (0);
CREATE TABLE t0(c0 INT UNIQUE);
CREATE TABLE t1(c0 INT, c1 TEXT);
Statements can fail or be redundant
Generate
Tables
Select Other
Actions
16. 16
Automatic Testing Core Challenges
1. Effective test case
Generate a
Database
Generate a
Query
Validate the
Query’s Result
2. Test oracle
The focus of this talk are test
oracles for finding logic bugs!
20. 20
Differential Testing: RAGS (Slutz 1998)
“[Differential testing] proved to be extremely useful,
but only for the small set of common SQL”
21. 21
Problem 1: Different SQL Dialects
DBMS-
specific SQL
Common
SQL Core
“We are unable to use Postgres as an
oracle because CockroachDB has slightly
different semantics and SQL support, and
generating queries that execute identically
on both is tricky […].” – Cockroach Labs
22. 22
Problem 2: No Ground Truth
https://github.com/pingcap/tidb/issues/15743
DBMS are affected
by common bugs
https://github.com/pingcap/tidb/issues/15743
23. 23
Other approaches
• Solver-based (Khalek et al. 2008, 2010)
• Performance (Jung et al. 2019)
• Estimated cost accuracy (Gu et al. 2012)
24. 24
Automatic Testing Core Challenges
1. Effective test case
Generate a
Database
Generate a
Query
Validate the
Query’s Result
2. Test oracle
The problem of testing DBMS to find logic
bugs has not yet been well addressed
38. 38
Translation Step
SELECT *
FROM …
WHERE φ
row1 φ
row2 φ
SELECT φ
FROM …
TRUE φ
TRUE φ
FALSE ¬φ
φ evaluates
to TRUE for
two rows
φ evaluates
to TRUE for
two rows
=
✓
Optimized
Query
Unoptimized
Query
DBMS
DBMS
39. 39
Translation Step
SELECT *
FROM …
WHERE φ
row1 φ
SELECT φ
FROM …
TRUE φ
TRUE φ
FALSE ¬φ
φ evaluates
to TRUE for
one row
φ evaluates
to TRUE for
two rows
≠
Optimized
Query
Unoptimized
Query
DBMS
DBMS
40. 40
Translation Step
SELECT *
FROM …
WHERE φ
SELECT φ
FROM …
Optimized
Query
Unoptimized
Query
The result set contains
the original rows
The result set contains
TRUE and FALSE values
41. 41
Translation Step
SELECT *
FROM …
WHERE φ
SELECT φ
FROM …
Optimized
Query
Unoptimized
Query
Most optimization aim to reduce
the amount of data that
is processed (e.g., using indexes)
42. 42
Translation Step
SELECT *
FROM …
WHERE φ
SELECT φ
FROM …
Optimized
Query
Unoptimized
Query
The predicate must be
evaluated on every row
43. 43
Translation Step
The translated query
cannot be efficiently
optimized by the DBMS
SELECT *
FROM …
WHERE φ
SELECT φ
FROM …
Optimized
Query
Unoptimized
Query
The predicate must be
evaluated on every row
64. 64
How to Realize This Idea?
Key challenge: find a valid partitioning
strategy that stresses the DBMS
65. 65
Ternary Logic
Consider a predicate φ and a given row r. Exactly one of the following
must hold:
• φ
• NOT φ
• φ IS NULL
ternary predicate variants
66. 66
Ternary Logic
Consider a predicate φ and a given row r. Exactly one of the following
must hold:
• φ
• NOT φ
• φ IS NULL
φ
NOT φ
φ IS NULL
68. 68
Example: MySQL
t0.c0 t0.c1
0 -0
SELECT * FROM t0, t1 WHERE t0.c0=t1.c0
UNION ALL
SELECT * FROM t0, t1 WHERE NOT (t0.c0=t1.c0)
UNION ALL
SELECT * FROM t0, t1 WHERE (t0.c0=t1.c0) IS NULL;
SELECT * FROM t0, t1;
t0.c0 t0.c1
≠
75. 75
Testing WHERE Clauses
Q Q’ptern ♢(Q’p, Q’¬p, Q’p IS NULL)
SELECT <columns>
FROM <tables>
[<joins>]
SELECT <columns>
FROM <tables>
[<joins>]
WHERE ptern
Q′p ⊎ Q′¬p ⊎ Q′p IS NULL
76. 76
Testing WHERE Clauses
Q Q’ptern ♢(Q’p, Q’¬p, Q’p IS NULL)
SELECT <columns>
FROM <tables>
[<joins>]
SELECT <columns>
FROM <tables>
[<joins>]
WHERE ptern
Q′p ⊎ Q′¬p ⊎ Q′p IS NULL
The multiset addition can be
implemented using UNION ALL
77. 77
Testing HAVING Clauses
Q Q’ptern ♢(Q’p, Q’¬p, Q’p IS NULL)
SELECT <columns>
FROM <tables>
[<joins>]
[WHERE …]
[GROUP BY …]
SELECT <columns>
FROM <tables>
[<joins>]
[WHERE …]
[GROUP BY …]
HAVING ptern
Q′p ⊎ Q′¬p ⊎ Q′p IS NULL
78. 78
Testing DISTINCT Clauses
Q Q’ptern ♢(Q’p, Q’¬p, Q’p IS NULL)
SELECT DISTINCT
<columns>
FROM <tables>
[<joins>]
SELECT [DISTINCT]
<columns>
FROM <tables>
[<joins>]
WHERE ptern
Q′p ∪ Q′¬p ∪ Q′p IS NULL
79. 79
Testing DISTINCT Clauses
Q Q’ptern ♢(Q’p, Q’¬p, Q’p IS NULL)
SELECT DISTINCT
<columns>
FROM <tables>
[<joins>]
SELECT [DISTINCT]
<columns>
FROM <tables>
[<joins>]
WHERE ptern;
Q′p ∪ Q′¬p ∪ Q′p IS NULL
The set union can be
implemented using UNION
80. 80
Testing GROUP BY Clauses
Q Q’ptern ♢(Q’p, Q’¬p, Q’p IS NULL)
SELECT <columns>
FROM <tables>
[<joins>]
GROUP BY <columns>
SELECT <columns>
FROM <tables>
[<joins>]
WHERE ptern
GROUP BY <columns>
Q′p ∪ Q′¬p ∪ Q′p IS NULL
81. 81
Testing Self-decomposable Aggregate Functions
Q Q’ptern ♢(Q’p, Q’¬p, Q’p IS NULL)
SELECT MAX(<e>)
FROM <tables>
[<joins>]
SELECT MAX(<e>)
FROM <tables>
[<joins>]
WHERE ptern;
MAX(Q′p ⊎ Q′¬p ⊎ Q′p IS NULL)
A partition is an intermediate
result, rather than
a subset of the result set
82. 82
Bug Example: CockroachDB
SET vectorize=experimental_on;
CREATE TABLE t0(c0 INT);
CREATE TABLE t1(c0 BOOL) INTERLEAVE IN PARENT t0(rowid);
INSERT INTO t0(c0) VALUES (0);
INSERT INTO t1(rowid, c0) VALUES(0, TRUE);
NULL 0
SELECT MAX(aggr) FROM (
SELECT MAX(t1.rowid) as aggr FROM t1 WHERE '+' >= t1.c0 UNION ALL
SELECT MAX(t1.rowid) as aggr FROM t1 WHERE NOT('+' >= t1.c0) UNION ALL
SELECT MAX(t1.rowid) as aggr FROM t1 WHERE ('+' >= t1.c0) IS NULL
);
SELECT MAX(t1.rowid)
FROM t1;
≠
83. 83
Testing Decomposable Aggregate Functions
Q Q’ptern ♢(Q’p, Q’¬p, Q’p IS NULL)
SELECT AVG(<e>)
FROM <tables>
[<joins>];
SELECT SUM(<e>) as s,
COUNT(<e>) as s
FROM <tables>
[<joins>];
SUM(s(Q′p ⊎ Q′¬p ⊎ Q′p IS NULL))
SUM(c(Q′p ⊎ Q′¬p ⊎ Q′p IS NULL))
We did not consider non-decomposable
aggregate functions (e.g., GROUP_CONCAT())
84. 84
Evaluation: Found Bugs
Closed
DBMS Fixed Verified Intended Duplicate
SQLite 4 0 0 0
MySQL 1 6 3 0
CockroachDB 22 9 0 0
TiDB 26 35 0 1
DuckDB 71 1 0 2
We found 175 bugs, 123 of
which have been fixed!
85. 85
Evaluation: Found Bugs
Closed
DBMS Fixed Verified Intended Duplicate
SQLite 4 0 0 0
MySQL 1 6 3 0
CockroachDB 22 9 0 0
TiDB 26 35 0 1
DuckDB 71 1 0 2
The total number of bugs
reflects our testing focus
86. 86
Evaluation: Found Bugs
Query Partitioning Oracle
DBMS WHERE Aggregate GROUP BY HAVING DISTINCT Error Crash
SQLite 0 3 0 0 1 0 0
CockroachDB 3 3 0 1 0 22 2
TiDB 29 0 1 0 0 27 4
MySQL 7 0 0 0 0 0 0
DuckDB 21 4 1 2 1 13 19
87. 87
Evaluation: Found Bugs
Query Partitioning Oracle
DBMS WHERE Aggregate GROUP BY HAVING DISTINCT Error Crash
SQLite 0 3 0 0 1 0 0
CockroachDB 3 3 0 1 0 22 2
TiDB 29 0 1 0 0 27 4
MySQL 7 0 0 0 0 0 0
DuckDB 21 4 1 2 1 13 19
The WHERE oracle is the simplest, but most effective oracle
88. 88
Evaluation: Found Bugs
Query Partitioning Oracle
DBMS WHERE Aggregate GROUP BY HAVING DISTINCT Error Crash
SQLite 0 3 0 0 1 0 0
CockroachDB 3 3 0 1 0 22 2
TiDB 29 0 1 0 0 27 4
MySQL 7 0 0 0 0 0 0
DuckDB 21 4 1 2 1 13 19
The other oracles found interesting, but fewer bugs
89. 89
Evaluation: Found Bugs
Query Partitioning Oracle
DBMS WHERE Aggregate GROUP BY HAVING DISTINCT Error Crash
SQLite 0 3 0 0 1 0 0
CockroachDB 3 3 0 1 0 22 2
TiDB 29 0 1 0 0 27 4
MySQL 7 0 0 0 0 0 0
DuckDB 21 4 1 2 1 13 19
We implemented these oracles only for DBMS
for which our bug-finding efforts saturated
90. 90
Comparison to NoREC
TLP
• Aggregate functions,
DISTINCT, GROUP BY,
HAVING, WHERE clauses
• UNION, UNION ALL
• Bugs in (unoptimized)
JOINs and operators
NoREC
• Bugs in the unoptimized
query
93. 93
Inherent Limitation of Metamorphic Testing
SQL Query Result Set
Execute
Derived SQL
Query
Derive
Metamorphic approaches cannot establish a ground truth
104. 104
Example: SQLite3 Bug
c0
0
1
2
3
NULL
t0
CREATE TABLE t0(c0);
CREATE INDEX i0 ON t0(1) WHERE c0 NOT NULL;
INSERT INTO t0 (c0) VALUES (0), (1), (2), (3), (NULL);
SELECT c0 FROM t0
WHERE t0.c0 IS NOT 1;
Expected to fetch the pivot row
105. 105
Example: SQLite3 Bug
c0
0
1
2
3
NULL
t0
CREATE TABLE t0(c0);
CREATE INDEX i0 ON t0(1) WHERE c0 NOT NULL;
INSERT INTO t0 (c0) VALUES (0), (1), (2), (3), (NULL);
SELECT c0 FROM t0
WHERE t0.c0 IS NOT 1;
0
2
3
The pivot row is not contained in the result set!
115. 115
Implementation Effort
• Literal evaluator
• Simpler than PL AST Interpreters → No mutable state
• Simpler than query engines → only a single row needs to be considered
• Operators are implemented naively
• The performance of the DBMS is the bottleneck
• Higher implementation effort for functions (e.g. printf) and
complex operators