Upcoming SlideShare
×

# Solving Common Sql Problems With The Seq Engine

1,276

Published on

Published in: Technology, Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
1,276
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
7
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Solving Common Sql Problems With The Seq Engine

1. 1. Copyright © 2009 Beat Vontobel This work is made available under the Creative Commons Attribution-Noncommercial-Share Alike license, see http://creativecommons.org/licenses/by-nc-sa/3.0/ Solving Common SQL Problems with the SeqEngine Beat Vontobel, CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
2. 2. Solving SQL Problems with the SeqEngine How to benefit from simple auxiliary tables holding • sequences Use of a pluggable storage engine to create such tables • On the side: • Some interesting benchmarks ‣ MySQL-Optimizer caveats ‣ Remember once more how to do things the „SQL-way“ ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
3. 3. Sequences: What are we talking about? CREATE TABLE integer_sequence ( i INT NOT NULL PRIMARY KEY ); INSERT INTO integer_sequence (i) VALUES (1), (2), (3), (4), (5), (6), (7), (8); SELECT * FROM integer_sequence; +---+ |i| +---+ |1| |2| |3| |4| … Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
4. 4. Names used by others… Pivot Table • Can be used to „pivot“ other tables („turn them around“) ‣ Integers Table • They often hold integers as data type ‣ Auxiliary/Utility Table • They help us solve problems, but contain no actual data ‣ Sequence Table • Just what it is: The name I‘ll use ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
5. 5. What we‘re not talking about (1) -- Oracle Style Sequences -- -- (mostly used to generate primary keys, much -- like what MySQL‘s auto_increment feature is -- used for) CREATE SEQUENCE customers_seq START WITH 1000 INCREMENT BY 1; INSERT INTO customers (customer_id, name) VALUES (customers_seq.NEXTVAL, 'John Doe'); Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
6. 6. What we‘re not talking about (2) Sequence in Mathematics: • „an ordered list of objects“ ‣ n-tuple ‣ Sequence Table in SQL: • a set, unordered by definition ‣ F = {n | 1 ≤ n ≤ 20; n is integer} relation (set of 1-tuples) ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
7. 7. „Using such a utility table is a favorite old trick of experienced SQL developers“ (Stéphane Faroult: The Art of SQL) Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
8. 8. Finding Holes… typically Swiss! Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
9. 9. Finding Holes… in a Table! +---------+---------------------+------+--- | stat_id | datetime | tt |… +---------+---------------------+------+------+ |… … | … … | | | ABO | 2004-11-03 22:40:00 | 8.3 | … | ABO | 2004-11-03 22:50:00 | 8.7 | | ABO | 2004-11-03 23:00:00 | 9.9 | | ABO | 2004-11-03 23:10:00 | 7.8 | | ABO | 2004-11-04 00:10:00 | 9.2 | | ABO | 2004-11-04 00:20:00 | 9.1 | | ABO | 2004-11-04 00:30:00 | 10.2 | | ABO | 2004-11-04 00:40:00 | 9.3 | | | … |… … … | | | | +---------+---------------------+------+---- Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
10. 10. Finding Holes… in a Table! +---------+---------------------+------+--- | stat_id | datetime | tt |… +---------+---------------------+------+------+ |… … | … … | | | ABO | 2004-11-03 22:40:00 | 8.3 | … | ABO | 2004-11-03 22:50:00 | 8.7 | | ABO | 2004-11-03 23:00:00 | 9.9 | | ABO | 2004-11-03 23:10:00 | 7.8 | | ABO | 2004-11-04 00:10:00 | 9.2 | | ABO | 2004-11-04 00:20:00 | 9.1 | | ABO | 2004-11-04 00:30:00 | 10.2 | | ABO | 2004-11-04 00:40:00 | 9.3 | | | … |… … … | | | | +---------+---------------------+------+---- Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
11. 11. The table‘s create statement used for demo CREATE TABLE temperatures ( stat_id CHAR(3) NOT NULL, datetime TIMESTAMP NOT NULL, tt decimal(3,1) DEFAULT NULL, PRIMARY KEY (stat_id, datetime), UNIQUE KEY reverse_primary (datetime, stat_id) ); Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
12. 12. How to „SELECT“ a row that doesn‘t exist? SELECT only returns rows that are there • WHERE only filters rows • We need something to generate rows! • Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
13. 13. Finding Holes… the naïve way for(„all timestamps to check“) { /* Single SELECTs for every timestamp */ db_query(„SELECT COUNT(*) FROM temperatures WHERE stat_id = ? AND datetime = ?“); if(„no row found“) { warn_about_missing_row(„timestamp“); } } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
14. 14. Finding Holes… the „standard“ way /* Working with an ordered set */ db_query(„SELECT datetime FROM temperatures WHERE stat_id = ? ORDER BY datetime ASC“); for(„all timestamps to check“) { db_fetch_row(); while(„timestamps don‘t match“) { warn_about_missing_row(); increment_timestamp(); } } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
15. 15. These were just ordinary JOINs! for-Loop just walks an „imaginary“ timestamps table with a • sequence of all the values to check for! Then we LEFT JOIN these timestamps against our • temperatures or do a NOT EXIST subquery ‣ So, if we had a sequence table… • Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
16. 16. Sequence of timestamps CREATE TABLE timestamps ( datetime TIMESTAMP NOT NULL PRIMARY KEY ); INSERT INTO timestamps (datetime) VALUES ('2004-01-01 00:00:00'), ('2004-01-01 00:00:10'), …; SELECT * FROM timestamps; +---------------------+ | datetime | +---------------------+ | 2004-01-01 00:00:00 | | 2004-01-01 00:00:10 | |… | Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
17. 17. Queries using the sequence SELECT * FROM timestamps -- our „for-Loop“ LEFT JOIN temperatures ON timestamps.datetime = temperatures.datetime WHERE temperatures.datetime IS NULL; SELECT * FROM timestamps -- our „for-Loop“ WHERE NOT EXISTS ( SELECT * FROM temperatures WHERE temperatures.datetime = timestamps.datetime ); Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
18. 18. Finding missing rows WHERE temperatures.stat_id IS NULL timestamps temperatures datetime stat_id datetime tt … … … … 2004-11-03 23:00:00 ABO 2004-11-03 23:00:00 9.9 2004-11-03 23:10:00 ABO 2004-11-03 23:10:00 7.8 2004-11-03 23:20:00 NULL NULL NULL 2004-11-03 23:30:00 NULL NULL NULL 2004-11-03 23:40:00 NULL NULL NULL 2004-11-03 23:50:00 NULL NULL NULL 2004-11-04 00:00:00 NULL NULL NULL 2004-11-04 00:10:00 ABO 2004-11-04 00:10:00 9.2 … … … … Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
19. 19. Filling sequence tables: „Manually“ INSERT from an external loop (or a stored procedure) • „explode“ a few rows using CROSS JOINs • INSERT INTO i VALUES (1), (2), …, (8), (9), (10); ‣ INSERT INTO j ‣ SELECT u.i * 10 + v.i FROM i AS u CROSS JOIN i AS v; „Pop quiz: generate 1 million records“ (Giuseppe Maxia) • http://datacharmer.blogspot.com/2007/12/pop-quiz-generate-1- million-records.html Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
20. 20. …or, just use the SeqEngine -- http://seqengine.org -- README for build instructions INSTALL PLUGIN SeqEngine SONAME 'ha_seqengine.so'; SHOW PLUGIN; SHOW ENGINES; CREATE TABLE million (i TIMESTAMP NOT NULL) ENGINE=SeqEngine CONNECTION=‘1;1000000;1‘; -- If you want to… now it‘s materialized (fast!) ALTER TABLE million ENGINE=MyISAM; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
21. 21. Syntax -- Variable parts are highlighted CREATE TABLE table_name ( column_name {INT|TIMESTAMP} NOT NULL [PRIMARY KEY] ) ENGINE=SeqEngine CONNECTION=‘start;end;increment‘; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
22. 22. „Manually“ created: Disadvantages Wastes storage • Wastes RAM (for caches or if ENGINE=MEMORY) • Wastes I/O • Wastes CPU (unnecessary overhead in code) • Cumbersome to fill (especially if large) • Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
23. 23. SeqEngine: Disadvantages None • …other than: • It‘s (yet) just a really quick hack for this presentation ‣ Contains ugly code and probably a lot of bugs ‣ Coded in C++ by somebody who‘s never done C++ ‣ before Is not part of the core server – go build it yourself! ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
24. 24. Limitations of the SeqEngine (v0.1) Not real limitations, but due to the concept: • Read-only ‣ One column maximum ‣ UNIQUE keys only ‣ Current limitations: • INT and TIMESTAMP only ‣ Only full key reads ‣ Error checking, clean-up, optimization, bugs… ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
25. 25. The tiny core of the SeqEngine: Init int ha_seqengine::rnd_init(bool scan) { DBUG_ENTER(quot;ha_seqengine::rnd_initquot;); rnd_cursor_pos = share->seq_def.seq_start; DBUG_RETURN(0); } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
26. 26. The tiny core of the SeqEngine: Next row int ha_seqengine::rnd_next(uchar *buf) { DBUG_ENTER(quot;ha_seqengine::rnd_nextquot;); if(rnd_cursor_pos <= share->seq_def.seq_end) { build_row(buf, rnd_cursor_pos); rnd_cursor_pos += share->seq_def.seq_inc; table->status= 0; DBUG_RETURN(0); } table->status= STATUS_NOT_FOUND; DBUG_RETURN(HA_ERR_END_OF_FILE); } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
27. 27. SeqEngine: The BOF Using the Storage Engine API for small projects • Additional questions/discussion • Wednesday, April 22 ‣ 20:30pm ‣ Ballroom E ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
28. 28. Back to the missing rows example… WHERE temperatures.stat_id IS NULL timestamps temperatures datetime stat_id datetime tt … … … … 2004-11-03 23:00:00 ABO 2004-11-03 23:00:00 9.9 2004-11-03 23:10:00 ABO 2004-11-03 23:10:00 7.8 2004-11-03 23:20:00 NULL NULL NULL 2004-11-03 23:30:00 NULL NULL NULL 2004-11-03 23:40:00 NULL NULL NULL 2004-11-03 23:50:00 NULL NULL NULL 2004-11-04 00:00:00 NULL NULL NULL 2004-11-04 00:10:00 ABO 2004-11-04 00:10:00 9.2 … … … … Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
29. 29. SeqEngine (LEFT JOIN) SELECT timestamps.datetime, stations.stat_id FROM timestamps CROSS JOIN stations LEFT JOIN temperatures AS temps ON (temps.datetime, temps.stat_id) = (timestamps.datetime, stations.stat_id) WHERE stations.stat_id = 'ABO' AND temperatures.stat_id IS NULL; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
30. 30. SeqEngine (NOT EXISTS) SELECT timestamps.datetime, stations.stat_id FROM timestamps CROSS JOIN stations WHERE stations.stat_id = 'ABO' AND NOT EXISTS ( SELECT * FROM temperatures AS temps WHERE (temps.datetime, temps.stat_id) = (timestamps.datetime, stations.stat_id) ); Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
31. 31. Finding Holes… the naïve way for(„all timestamps to check“) { /* Single SELECTs for every timestamp */ db_query(„SELECT COUNT(*) FROM temperatures WHERE stat_id = ? AND datetime = ?“); if(„no row found“) { warn_about_missing_row(„timestamp“); } } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
32. 32. As a Procedure (Single SELECTs) CREATE PROCEDURE find_holes_naive(stat CHAR(3)) BEGIN DECLARE dt DATETIME DEFAULT '2004-01-01 00:00:00'; DECLARE c INT; WHILE dt < '2005-01-01 00:00:00' DO SELECT COUNT(*) INTO c FROM temperatures WHERE (stat_id, datetime) = (stat, dt); IF c = 0 THEN -- missing row SELECT stat, dt; END IF; SET dt = dt + INTERVAL 10 MINUTE; END WHILE; END // Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
33. 33. Finding Holes… the „standard“ way /* Working with an ordered set */ db_query(„SELECT datetime FROM temperatures WHERE stat_id = ? ORDER BY datetime ASC“); for(„all timestamps to check“) { db_fetch_row(); while(„timestamps don‘t match“) { warn_about_missing_row(); increment_timestamp(); } } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
34. 34. As a Procedure (Ordered Set) CREATE PROCEDURE find_holes_ordered(stat CHAR(3)) BEGIN DECLARE no_more_rows BOOLEAN DEFAULT FALSE; DECLARE dt1 DATETIME DEFAULT '2004-01-01 00:00:00'; DECLARE dt2 DATETIME; DECLARE temperatures_cursor CURSOR FOR SELECT datetime FROM temperatures WHERE stat_id = stat ORDER BY datetime ASC; DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_rows = TRUE; OPEN temperatures_cursor; temperatures_loop: LOOP FETCH temperatures_cursor INTO dt2; WHILE dt1 != dt2 DO SELECT stat, dt1; SET dt1 = dt1 + INTERVAL 10 MINUTE; IF dt1 >= '2005-01-01 00:00:00' THEN LEAVE temperatures_loop; END IF; END WHILE; SET dt1 = dt1 + INTERVAL 10 MINUTE; IF dt1 >= '2005-01-01 00:00:00' THEN LEAVE temperatures_loop; END IF; END LOOP temperatures_loop; CLOSE temperatures_cursor; END// Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
35. 35. Self-Reference (LEFT self-JOIN) SELECT * FROM temperatures LEFT JOIN temperatures AS missing ON temperatures.stat_id = missing.stat_id AND temperatures.datetime + INTERVAL 10 MINUTE = missing.datetime WHERE temperatures.stat_id = 'ABO' AND missing.datetime IS NULL; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
36. 36. Self-Reference (NOT EXISTS) SELECT * FROM temperatures WHERE NOT EXISTS ( SELECT * FROM temperatures AS missing WHERE missing.datetime = temperatures.datetime + INTERVAL 10 MINUTE AND missing.stat_id = temperatures.stat_id ) AND stat_id = 'ABO'; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
37. 37. What‘s the performance? SeqEngine • LEFT JOIN ‣ NOT EXISTS ‣ Self-Reference • LEFT self-JOIN ‣ NOT EXISTS ‣ Stored Procedures • Naïve (Single SELECTs) ‣ Standard (Ordered SET) ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
38. 38. The benchmark All the usual disclaimers for benchmarks apply: Go ahead and measure it with your hardware, your version of MySQL, your storage engines, your data sets and your server conﬁguration settings. Query Remarks Time [s] 1 SeqEngine (NOT EXISTS) 0.28 2 SeqEngine (LEFT JOIN) 0.29 3 Procedure (Ordered SET) result set per missing row 0.59 4 Self (NOT EXISTS) only first missing row 0.93 5 Self (LEFT JOIN) only first missing row 1.10 6 Procedure (Single SELECTs) result set per missing row 2.80 Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
39. 39. The benchmark All the usual disclaimers for benchmarks apply: Go ahead and measure it with your hardware, your version of MySQL, your storage engines, your data sets and your server conﬁguration settings. 1. SeqEngine (NOT EXISTS) 0.28s 2. SeqEngine (LEFT JOIN) 0.29s 3. Procedure (Ordered SET) 0.59s 4. Self Reference (NOT EXISTS) 0.93s 5. Self Reference (LEFT JOIN) 1.10s 6. Procedure (Single SELECTs) 2.80s 0s 0.5s 1.0s 1.5s 2.0s 2.5s 3.0s Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
40. 40. Lessons to be learned… The Sequence trick (and SeqEngine) worked • It may sometimes pay off to go the extra mile and write a ‣ custom storage engine! Stored PROCEDUREs with CURSORs sometimes • can be damned fast! Subquery optimization really did progress in MySQL • (at least in some parts, more to come with 6.0) Consider NOT EXISTS over LEFT JOIN ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
41. 41. 2nd use case: Generate Test Data mysql> CREATE TABLE large (i INT NOT NULL) ENGINE=SeqEngine CONNECTION='1;10000000;1'; Query OK, 0 rows affected (0,12 sec) mysql> ALTER TABLE large ENGINE=MyISAM; Query OK, 10000000 rows affected (3,27 sec) Records: 10000000 Duplicates: 0 Warnings: 0 Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
42. 42. Generating other Sequences from Integers CREATE VIEW letters AS SELECT CHAR(i) FROM integer_sequence; CREATE VIEW timestamps AS SELECT FROM_UNIXTIME(i) FROM integer_sequence; CREATE VIEW squares AS SELECT i*i FROM integer_sequence; … Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
43. 43. Generate very large and complex data sets INSERT INTO customers SELECT i AS customer_id, MD5(i) AS customer_name, ROUND(RAND()*80+1920) AS customer_year FROM large; SELECT * FROM customers; +-------------+---------------------+---------------+ | customer_id | customer_name | customer_year | +-------------+---------------------+---------------+ | 1 | c4ca4238a0b9f75849… | 1935 | | 2 | c81e728d9d4c2f636f… | 1967 | | | | | | …|… | …| +-------------+---------------------+---------------+ 10000000 rows in set Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
44. 44. „Salvage“ a bad design One-to-Many gone wrong: Table `users` +----------+--------+---------+---------+ | username | sel1 | sel2 | sel3 | +----------+--------+---------+---------+ | john | apple | orange | pear | | bill | NULL | NULL | NULL | | emma | banana | pear | NULL | +----------+--------+---------+---------+ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
45. 45. „Salvage“ a bad design CREATE TABLE salvage ( col INT NOT NULL ) ENGINE=SeqEngine CONNECTION='1;3;1'; +-----+ | col | +-----+ | 1| | 2| | 3| +-----+ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
46. 46. „Multiply“ the rows with a cartesian JOIN mysql> SELECT * FROM users CROSS JOIN salvage; +----------+--------+--------+------+-----+ | username | sel1 | sel2 | sel3 | col | +----------+--------+--------+------+-----+ | bill | NULL | NULL | NULL | 1| | bill | NULL | NULL | NULL | 2| | bill | NULL | NULL | NULL | 3| | emma | banana | pear | NULL | 3| | emma | banana | pear | NULL | 1| | emma | banana | pear | NULL | 2| | john | apple | orange | pear | 1| | john | apple | orange | pear | 2| | john | apple | orange | pear | 3| +----------+--------+--------+------+-----+ 9 rows in set (0,00 sec) Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
47. 47. „Multiply“ the rows with a cartesian JOIN mysql> SELECT * FROM users CROSS JOIN salvage; +----------+--------+--------+------+-----+ | username | sel1 | sel2 | sel3 | col | +----------+--------+--------+------+-----+ | bill | NULL | NULL | NULL | 1| | bill | NULL | NULL | NULL | 2| | bill | NULL | NULL | NULL | 3| | emma | banana | pear | NULL | 3| | emma | banana | pear | NULL | 1| | emma | banana | pear | NULL | 2| | john | apple | orange | pear | 1| | john | apple | orange | pear | 2| | john | apple | orange | pear | 3| +----------+--------+--------+------+-----+ 9 rows in set (0,00 sec) Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
48. 48. Normalized on the fly SELECT username, CASE col WHEN 1 THEN sel1 WHEN 2 THEN sel2 WHEN 3 THEN sel3 END AS sel FROM users CROSS JOIN salvage HAVING sel IS NOT NULL; +----------+--------+ | username | sel | +----------+--------+ | john | apple | | emma | banana | | john | orange | | emma | pear | | john | pear | +----------+--------+ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
49. 49. Comma-Separated Attribute Lists mysql> DESCRIBE selections; +------------+--------------+------+-----+---------+ | Field | Type | Null | Key | Default | +------------+--------------+------+-----+---------+ | username | varchar(5) | NO | PRI | NULL | | selections | varchar(255) | NO | | NULL | +------------+--------------+------+-----+---------+ mysql> SELECT * FROM selections; +----------+-------------------+ | username | selections | +----------+-------------------+ | john | apple,orange,pear | | bill | | | emma | banana,pear | +----------+-------------------+ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
50. 50. Querying Comma-Separated Attribute Lists SELECT username, SUBSTRING_INDEX( SUBSTRING_INDEX( selections, ',', i ), ',', -1 ) AS selection FROM selections JOIN integers HAVING selection NOT LIKE ''; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
51. 51. Querying Comma-Separated Attribute Lists SELECT username, -- Take last element SUBSTRING_INDEX( -- Crop list after element i SUBSTRING_INDEX( -- Add empty sentinel element CONCAT(selections, ','), ',', i ), ',', -1 ) AS selection FROM selections JOIN integers HAVING selection NOT LIKE ''; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
52. 52. Querying Comma-Separated Attribute Lists SELECT username, -- Take last element SUBSTRING_INDEX( -- Crop list after element i SUBSTRING_INDEX( -- Add empty sentinel element CONCAT(selections, ','), ',', i ), ',', -1 ) AS selection FROM selections JOIN integers HAVING selection NOT LIKE ''; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
53. 53. Querying Comma-Separated Attribute Lists SELECT username, -- Take last element SUBSTRING_INDEX( -- Crop list after element i SUBSTRING_INDEX( -- Add empty sentinel element CONCAT(selections, ','), ',', i ), ',', -1 ) AS selection FROM selections JOIN integers HAVING selection NOT LIKE ''; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
54. 54. Counting members from attribute lists SELECT SUBSTRING_INDEX( SUBSTRING_INDEX( CONCAT(selections, ','), ',', i ), ',', -1 ) AS selection, COUNT(*) FROM selections JOIN integers GROUP BY selection HAVING selection NOT LIKE ''; +-----------+----------+ | selection | COUNT(*) | +-----------+----------+ | apple | 1| | banana | 1| | orange | 1| | pear | 2| +-----------+----------+ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
55. 55. Problem: Variable-sized IN-Predicates Statements can‘t be prepared for variable-sized lists in the • in clause: SELECT * FROM x WHERE a IN (?) ‣ One needs: • SELECT * FROM x WHERE a IN (?) ‣ SELECT * FROM x WHERE a IN (?, ?) ‣ SELECT * FROM x WHERE a IN (?, ?, ?, …) ‣ Example from Stéphane Faroult: „The Art of SQL“ • adapted for MySQL Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
56. 56. Split arguments as before! SELECT … FROM rental INNER JOIN customer ON rental.customer_id = … INNER JOIN address ON … … INNER JOIN ( SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(CONCAT(?, quot;,quot;), quot;,quot;, i), quot;,quot;, -1 ) AS customer_id FROM sequences.integers WHERE i <= ? ) AS s ON rental.customer_id = s.customer_id … WHERE …; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
57. 57. SQL-String-Parsing beats Query-Parsing! Execution Times in Seconds for a different number of runs (lower is better) 20 18.1 16 16.3 12.1 12 10.9 8 6.0 4 5.4 0.6 0 0.5 x 0x 0x 0x 00 00 00 00 10 10 20 30 Prepared/Sequence Client-side IN-List Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
58. 58. Sequences and SeqEngine: Conclusion Use Sequences (and SeqEngine) to e.g.: • Find missing rows ‣ Generate test data ‣ Pivot tables ‣ Do clever-things with „for-Loops“ (String-Parsing etc.) ‣ http://seqengine.org • Slides will be available shortly after the presentation ‣ (also on conference website) Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.