2. About Me
Database Programmer For 20 Years
1st Benchmark: TPC-C for AS/400
PostgreSQL, Oracle, DB2, SQLServer,
Sybase, Informix, SQLite, DB/400... and
mysql.
Dabbled in Improv and Roller Derby
3. About Moat
Advertising Analytics
Viewability, Reach, Fraud Detection
Tens of Billions of ad events per day
Summary data imports ~500M rows/day
Hiring for Offices NYC, London, SF, LA,
Singapore, Sydney, Austin, Cincinnati,
Miami
4. Database Workloads - OLTP
● On-Line Transaction Processing
● also known as Short Request
● Very High concurrency
● Most queries are very small in results fetched, data updated
● New data is essentially random
● Locking issues, latency paramount
5. Database Workloads - Data Warehouse
● Usually derivative data
○ Either from in-house OLTP databases
○ and other data sources
● Latency tolerance much higher
● Data imported in batches with something in common
○ data from specific time-frame
○ data from specific OLTP shards
○ externally sourced data
● Query throughput more important than any one query's individual
performance
● Data is stored in ways optimal for reading, not updating.
6. Extract, Transform, Load
● Extract
○ Get the data from the external sources
■ Other databases under our control
■ Data pulls from websites
■ Publicly available data
○ Maybe you control the format, maybe you don't
● Transform
○ Scan the data for correct formatting, valid values, etc
■ Discard? Fix?
○ Reformat the data to the shape of the table(s) you wish to load it.
● Load
○ Get it in the Data Warehouse as fast as possible
7. Toolsets - External Programs
● Homegrown data file reading, one INSERT at a time.
○ Read data from external data source
○ organize that data into rows
○ insert, one row at a time
○ Didn't I write this once in high school?
○ Client libraries offer conveniences like execute_many()
■ Client libraries lie
■ They lie, and they loop
8. Toolsets - Third Party Loader Programs
● Command line
○ pg_loader, pg_bulkload on PgSQL
○ SQL*Loader (Oracle)
○ bcp (Sybase/SQLServer)
● GUI tools
○ Kettle (Pentaho)
○ SSIS (Microsoft)
○ DataStage (IBM)
○ Informatica (Sauron)
9. Third Party Loader Programs - Pros
● Custom designed for many common solved problems
○ Never parse a CSV again
● Usually aware of bulk loading facilities in the target database
○ The fewer target databases, the more likely they're optimized (beware
vendor bias)
● Simple things are simple
10. Third Party Loader Programs - Cons
● If you want all your app logic logic in one place, you're probably out of luck
● Non-simple things can be impossible
○ Custom code often back to single row inserts
○ plugins usually involve writing in the language that the tool was written
in (Visual C++, Java) rather than your own core competencies
● Graphical interfaces conceal application logic
○ un grep-able
● "Code" often stored in XML/binary, resistant to source control
● Biz logic that requires existing database state requires local copy of that
state
○ Re-inventing the LEFT OUTER JOIN
○ Fun with race conditions
11. Extract, Load, Transform
● Extract - Same as ETL
● Load
○ Make local work tables shaped to suit the data as-is
● Transform
○ Filter - remove rows/columns you know you don't want
○ Validate - test values for correctness
○ Classify - data might be destined for multiple tables, have key
linkages, etc
○ Encode - Date strings become dates, etc.
○ Dereference - Enumerations and "dimensional" data become
associated with existing primary key values or new ones are inserted
○ Insert transformed data into user-facing tables
○ Isn't this just ETL with the 'T' inside the DB? Yes.
12. ELT - Pros
● Transformation logic is usually written in SQL statements, a skill you
already had internally.
● Easily worked into existing source control.
● Referencing existing data is trivial.
● Referencing existing data is transactional.
● You can reuse a lot of your existing data integrity logic.
13. ELT - Cons
● You're writing the data to the database twice.
○ Some of that data you don't even want
○ Additional disk space needed for transformation work...temporarily
○ Additional CPU burden during ETL
■ Not a big deal if you use read replicas and take the master out of
UI rotation.
● Some data validation may still require on external factors
● If your data is in very large data files, you may have to split them up
● If your data is in a large number of small files, you may have to use
multiple workers and manage the workload yourself.
○ That workload management is itself an OLTP-type task.
14. ELT Case 1: COPY a Big File
● The file:
○ Disk size: 40MB, compressed, 1.1GB uncompressed
○ Number of rows: 5160108, including header
○ Number of columns:
■ taxonomy columns (text): 8
■ metrics (bigint): ¯_(ツ)_/¯
● Format changes over time
● producers were not necessarily consistent
● updated data files would follow latest format, which might
conflict with original format
15. Case 1: Confession 1: Format Discovery
● Dates and taxonomy (text) column names are known and generally fixed,
but don't count on it.
● Any columns with unfamiliar names are assumed to be metrics (bigint)
● COPY just the header row into a one column table and split the row
afterward.
CREATE TEMP TABLE header(header_string text);
COPY header FROM PROGRAM 'zcat filename.csv.gz | head -n 1';
● regexp_split_to_table() on the one-row table
16. Case 1: Confession 2: Format Discovery
Same as before, but use plpythonu
CREATE OR REPLACE FUNCTION get_header(file_name text)
RETURNS text[]
LANGUAGE PLPYTHONU STRICT SET SEARCH_PATH FROM CURRENT
AS $PYTHON$
import gzip
with gzip.GzipFile(file_name,'rb') as g:
header = g.readline()
if not header:
return None
return header.rstrip().split(',')
$PYTHON$;
17. Case 1: Confession 2: Format Discovery
Another function to use that one and create a temp table
l_columns := get_csvgz_file_header(l_temp_file);
SELECT string_agg(CASE
WHEN colname = 'load_date' THEN 'load_date date'
WHEN colname in ('taxo1', ..., 'taxo8')
THEN format('%s text', colname)
ELSE format('%I bigint',colname)
END,
', ' ORDER BY col_order)
FROM unnest(l_columns) WITH ORDINALITY AS c(colname,col_order)
INTO l_col_defs;
EXECUTE format('create temporary table work_table (%s)', l_col_defs);
Now the temp table is created in a single transaction with a format to match its
own header and existing business rules.
18. ELT Case 2: COPY a Big File
● The test machine:
○ AWS EC2 m4.xlarge (4 cores, 16GB RAM)
○ PostgreSQL 10 pre-alpha!
● The production machine:
○ AWS EC2 i3.8xlarge (32 cores, 245GB RAM)
■ Approx 5000 jobs in an ETL cycle
■ Jobs need to play nice parallel-wise.
○ PostgreSQL 9.6
● AWS machines have a reputation for being I/O starved
19. Case 2: Load Times By Various Methods
● Simplify the test by knowing the format of the data file ahead of time.
○ Datafile: csv, 8 text columns, 123 bigints
○ 5.6M rows
20. Case 2: Load Method: pgloader
Create a control file to load the data into a regular table.
pgloader: an external program, controlled via command line switches or a
configuration file
load csv from ../d2.csv ( co1, col2, ..., colN )
into postgresql://test_user:test@localhost/pgloader_test?destination
with truncate, skip header = 1, fields optionally enclosed by '"';
21. Case 2: Load Method: pgloader
Timing
Loading uncompressed .csv:
real 3m33.102s
pigz -d --stdout compressed.csv.gz | pgloader test_stdin.load
real 4m37.150s
22. Case 2: Load Times By pg_bulkload
Direct load of uncompressed table
real 0m25.601s
pigz piped to pg_bulkload
real 0m36.924s
23. Case 2: COPY .CSV To Regular Table
COPY destination FROM '../d2.csv' WITH (FORMAT CSV, HEADER)
Time: 24124.291 ms (00:24.124)
pigz --stdout -d ../d2.csv.gz | time psql db_name -c "COPY destination FROM
STDIN WITH (FORMAT CSV, HEADER)"
Time: 24849.039 ms (00:24.849)
COPY destination FROM program 'pigz --stdout -d ../d2.csv.gz' WITH (FORMAT
CSV, HEADER)
Time: 25357.763 ms (00:25.358)
24. Case 2: COPY .CSV To Unlogged Table
COPY dest_ulog FROM '../d2.csv' WITH (FORMAT CSV, HEADER)
Time: 18714.313 ms (00:18.714)
pigz --stdout -d ../d2.csv.gz | time psql db_name -c "COPY dest_ulog FROM
STDIN WITH (FORMAT CSV, HEADER)"
Time: 19509.254 ms (00:19.509)
COPY dest_unlogged FROM program 'pigz --stdout -d ../d2.csv.gz' WITH
(FORMAT CSV, HEADER)
Time: 19264.937 ms (00:19.265)
26. Case 2: Conclusion
● COPY FROM PROGRAM performs same as external unix pipes
● pigz only slightly better than gzip -d
● Decompressing .csv.gz is only slight overhead over uncompressed.
● pg_bulkload ineffective in situations where a naive COPY will do.
● pg_loader consumes all available CPU, still slower than all other
methods.
27. Case 3: Avoid the Bounce Table
CREATE TEMP TABLE bounce_table (...);
COPY bounce_table FROM '...';
ANALYZE bounce_table;
INSERT INTO table_that_actually_matters (...);
SELECT SUM(something), ... FROM bounce_table;
● Reading the file
● Write to temp table
● Read it right back out
● discard the temp table
● Oid churn with the temp table
● Nice if we could just read it as a table
28. Case 3: Temporary Foreign Table
CREATE SERVER filesystem FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE pg_temp.straight_from_file (...)
SERVER filesystem;
INSERT INTO table_that_actually_matters(...);
SELECT SUM(something), ...
FROM pg_temp.straight_from_file;
● Should eliminate unnecessary writes
● Should halve the number of disk reads
● Allows for filtration using WHERE clause
● Still need to look out for invalid data.
● Still burns an oid
● No such thing as CREATE TEMPORARY FOREIGN TABLE, yet.
29. Case 3: Temporary Foreign Table
● SELECT SUM(c1), ..., SUM(c5) FROM pg_temp.straight_from_file;
Time: 9858.383 ms (00:09.858)
● SELECT SUM(c1), ..., SUM(c20) FROM pg_temp.straight_from_file;
Time: 11701.863 ms (00:11.702)
● COPY TO UNLOGGED TABLE
Time: 17477.338 ms (00:17.477)
● ANALYZE OF UNLOGGED TABLE
Time: 383.597 ms
● SUM 5 COLUMNS
Time: 1265.488 ms (00:01.265)
● SUM 20 COLUMNS
Time: 1992.116 ms (00:01.992)
31. Case 3: Conclusions
● file_fdw is fastest when most of the columns are ignored.
● timings closer to even as the number of columns referenced approaches all in the table
● explicit analyzing the foreign table has no discernable effect on SELECT performance
● row estimate done via file size.
● COPY + ANALYZE wins if you need to read from that table at least twice.
● reading only once you save 50% of the time, plus saved resources available for other programs.
● file_fdw can use PROGRAM option in v10
● be careful of bad row estimates in using PROGRAM
● COPY_SRF() (failed v10 patch)
○ COPY protocol in SRF form
○ does not burn an oid
○ materializes the entire file
○ Also subject to bad row estimates, but could encapsulate and alter the wrapper function
row estimate
○ Cannot copy from STDIN without a change to wireline protocol
32. Case 4: Rollup Tables
● One large table with very specific 8-column grain
● Most queries will aggregate to their own specific grain
● Pre-aggregate tables so that specific queries can choose smaller tables
that more directly suit their needs.
● Oracle does this transparently with materialized views.
The Rollups
(a,b,c,d,e,f,g,h) (a,b,c,d)
(a,b,c, e,f,g,h) (a,b,c )
(a,b, e,f,g,h) (a,b )
(a, e,f,g,h) (a )
( e,f,g,h) ( )
33. Case 4: Dumb Inserts
INSERT INTO d9 SELECT a,b,c,d,e,f,g,h, SUM(m1), ..., SUM(mN)
FROM copied_table GROUP BY a,b,c,d,e,f,g,h;
INSERT INTO d8 SELECT a,b,c, e,f,g,h, SUM(m1), ..., SUM(mN)
FROM copied_table GROUP BY a,b,c, e,f,g,h;
INSERT INTO d7 SELECT a,b, e,f,g,h, SUM(m1), ..., SUM(mN)
FROM copied_table GROUP BY a,b, e,f,g,h;
...
INSERT INTO d1 SELECT a SUM(m1), ..., SUM(mN)
FROM copied_table GROUP BY a;
INSERT INTO d0 SELECT SUM(m1), ..., SUM(mN)
FROM copied_table;
34. Case 4: Chained Inserts
INSERT INTO d9 SELECT a,b,c,d,e,f,g,h, SUM(m1), ..., SUM(mN)
FROM copied_table GROUP BY a,b,c,d,e,f,g,h;
INSERT INTO d8 SELECT a,b,c, e,f,g,h, SUM(m1), ..., SUM(mN)
FROM d9 GROUP BY a,b,c, e,f,g,h;
INSERT INTO d7 SELECT a,b, e,f,g,h, SUM(m1), ..., SUM(mN)
FROM d8 GROUP BY a,b, e,f,g,h;
...
INSERT INTO d1 SELECT a SUM(m1), ..., SUM(mN)
FROM d2 GROUP BY a;
INSERT INTO d0 SELECT SUM(m1), ..., SUM(mN)
FROM d1;
35. Case 4: Chained Inserts in a CTE
WITH temp_d9 as (
INSERT INTO d9 SELECT a,b,c,d,e,f,g,h, SUM(m1), ..., SUM(mN)
FROM copied_table GROUP BY a,b,c,d,e,f,g,h RETURNING *),
temp_d8 as (
INSERT INTO d8 SELECT a,b,c, e,f,g,h, SUM(m1), ..., SUM(mN)
FROM temp_d9 GROUP BY a,b,c, e,f,g,h RETURNING *),
temp_d7 as (
INSERT INTO d7 SELECT a,b, e,f,g,h, SUM(m1), ..., SUM(mN)
FROM temp_d8 GROUP BY a,b, e,f,g,h RETURNING *),
...
temp_d0 as (
INSERT INTO d1 SELECT a SUM(m1), ..., SUM(mN)
FROM temp_d1 GROUP BY a RETURNING *)
SELECT (SELECT COUNT(*) FROM temp_d9) as rows_d9,
(SELECT COUNT(*) FROM temp_d8) as rows_d8,
...
(SELECT COUNT(*) FROM temp_d0) as rows_d0;
36. Case 4: Dumb Rollup Table (1/2)
CREATE TEMP TABLE dumb_rollup (a,b,c,d,e,f,g,h,m1,...,mN,grouping_level)
SELECT a,b,c,d,e,f,g,h, SUM(m1), ..., SUM(mN),
CASE
WHEN grouping(d) = 0 and grouping(h) = 0 THEN 9
WHEN grouping(c) = 0 and grouping(h) = 0 THEN 8
...
WHEN grouping(c) = 0 THEN 3
WHEN grouping(b) = 0 THEN 2
WHEN grouping(a) = 0 THEN 1
ELSE 0
END
FROM copied_table
GROUP BY GROUPING SETS( (a, b, c, d, e, f, g, h),
(a, b, c, d, f, g, h),
...
(a, b),
(a),
() );
37. Case 4: Dumb Rollup Table (2/2)
INSERT INTO d9
SELECT a,b,c,d,e,f,g,h,m1,...,mN FROM dumb_rollup WHERE grouping_level = 9;
INSERT INTO d8
SELECT a,b,c, e,f,g,h,m1,...,mN FROM dumb_rollup WHERE grouping_level = 8;
...
INSERT INTO d1
SELECT a, m1,...,mN FROM dumb_rollup WHERE grouping_level = 1;
INSERT INTO d0
SELECT m1,...,mN FROM dumb_rollup WHERE grouping_level = 0;
38. Case 4: Chained CTE Rollup
WITH rollups (a,b,c,d,e,f,g,h,m1,...,mN,grouping_level)
SELECT a,b,c,d,e,f,g,h, SUM(m1), ..., SUM(mN),
CASE ... END
FROM copied_table
GROUP BY GROUPING SETS( ... ) ),
temp_d9 as (
INSERT INTO d9 SELECT ... FROM rollups WHERE grouping_level = 9 RETURNING NULL),
temp_d8 as (
INSERT INTO d8 SELECT ... FROM rollups WHERE grouping_level = 8 RETURNING NULL),
...
temp_d1 as (
INSERT INTO d9 SELECT ... FROM rollups WHERE grouping_level = 1 RETURNING NULL),
temp_d0 as (
INSERT INTO d8 SELECT ... FROM rollups WHERE grouping_level = 0 RETURNING NULL)
SELECT (SELECT COUNT(*) FROM temp_d9) as rows_d9,
(SELECT COUNT(*) FROM temp_d8) as rows_d8,
...
(SELECT COUNT(*) FROM temp_d0) as rows_d0;
39. Case 4: Rollup to v10 Native Partition ATR
Instead of progressively removing columns from tables, leave them all the same shape but enforce NULL constraints
CREATE TABLE dest_all ( a ..., m1 ..., grouping_level integer );
CREATE TABLE dest_9 PARTITION OF dest_all FOR VALUES IN (9);
CREATE TABLE dest_8 PARTITION OF dest_all FOR VALUES IN (8);
...
CREATE TABLE dest_0 PARTITION OF dest_all FOR VALUES IN (0);
INSERT INTO dest_all
SELECT a,b,c,d,e,f,g,h, SUM(m1), ..., SUM(mN),
CASE ... END AS grouping_level
FROM copied_table
GROUP BY GROUPING SETS( ... ) );