FOREIGN DATA
WRAPPER
ENHANCEMENTS	
June 17, 2015
PostgreSQL Developers Unconference
Clustering Track
Shigeru HANADA, Etsuro Fujita
Who are we	
•  Shigeru HANADA
•  From Tokyo, Japan
•  Working on FDW since 2010
•  Implemented initial FDW API and postgres_fdw
•  Etsuro Fujita
•  From Tokyo, Japan
•  Working on Postgres for 10 years
•  Interested in FDW enhancements
Agenda	
•  Past enhancements proposed for 9.5
•  Inheritance support (Committed)
•  Join push-down (Committed)
•  Join push-down for postgres_fdw (Returned with feedback)
•  Update push-down (Returned with feedback)
•  Possible remote query optimization in 9.5
•  Ideas for further enhancement
•  Sort push-down
•  Aggregate push-down
•  More aggressive join push-down
•  Discussions
PAST ENHANCEMENTS
PROPOSED FOR 9.5
Inheritance support	
•  Outline
•  Allow foreign table to participate in inheritance tree
•  A way to implement sharding
•  Example
postgres=# explain verbose select * from parent ;!
QUERY PLAN!
---------------------------------------------------------------------------!
Append (cost=0.00..270.00 rows=2001 width=4)!
-> Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=4)!
Output: parent.a!
-> Foreign Scan on public.ft1 (cost=100.00..135.00 rows=1000 width=4)!
Output: ft1.a!
Remote SQL: SELECT a FROM public.t1!
-> Foreign Scan on public.ft2 (cost=100.00..135.00 rows=1000 width=4)!
Output: ft2.a!
Remote SQL: SELECT a FROM public.t2!
(9 rows)
Update push-down	
•  Outline
•  Send whole UPDATE/DELETE statement when it has same
semantics on the remote side
•  Example
postgres=# explain verbose update foo set a = a + 1 where a > 10;!
QUERY PLAN!
--------------------------------------------------------------------------------!
Update on public.foo (cost=100.00..139.78 rows=990 width=10)!
Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1!
-> Foreign Scan on public.foo (cost=100.00..139.78 rows=990 width=10)!
Output: (a + 1), ctid!
Remote SQL: SELECT a, ctid FROM public.foo WHERE ((a > 10)) FOR UPDATE!
(5 rows)!
!
postgres=# explain verbose update foo set a = a + 1 where a > 10;!
QUERY PLAN!
-----------------------------------------------------------------------------!
Update on public.foo (cost=100.00..139.78 rows=990 width=10)!
-> Foreign Update on public.foo (cost=100.00..139.78 rows=990 width=10)!
Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a > 10))!
(3 rows)	
Current	
Patched
Update push-down, cont.	
•  Issues
•  FDW-APIs for update push-down
•  Called from nodeModifyTable.c or nodeForeignscan.c?
•  Update push-down for an update on a join
•  "UPDATE foo ... FROM bar ..." (both foo and bar are remote)
•  Further enhancements
•  INSERT/UPSERT push-down
Join push-down	
•  Outline
•  Join foreign tables on remote side, if it’s safe
•  Example
	
fdw=# EXPLAIN (VERBOSE) SELECT tbalance FROM pgbench_branches b JOIN
pgbench_tellers t USING(bid);!
QUERY PLAN!
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------!
Foreign Scan (cost=100.00..101.00 rows=50 width=4)!
Output: t.tbalance!
Relations: (public.pgbench_branches b) INNER JOIN
(public.pgbench_tellers t)!
Remote SQL: SELECT r.a1 FROM (SELECT l.a9 FROM (SELECT bid a9 FROM
public.pgbench_branches) l) l (a1) INNER JOIN (SELECT r.a11, r.a10 FROM
(SELECT bid a10, tbalance a11 FROM public.pgbench_tellers) r) r (a1, a2) ON
((l.a1 = r.a2))!
(4 rows)
Join push-down, cont.	
•  Issues
•  Implement postgres_fdw to handle join APIs
•  Centralize deparsing remote query
•  Should use parse tree rather than planner information to generate join
query?
•  Generic SQL deparser would help porting to FDWs for other RDBMS
Possible remote query optimization in 9.5	
•  When we run a following query:	
SELECT c.grade, max(s.score) max_score!
FROM scores s LEFT JOIN classes c!
ON c.class_id = s.class_id!
WHERE c.subject = ‘Math’!
GROUP BY c.grade!
HAVING max(s.score) > 50!
ORDER BY c.grade DESC;	
“scores” and
“classes” are
foreign tables
Possible remote query optimization in 9.5	
•  When we run a following query:	
SELECT c.grade, max(s.score) max_score!
FROM scores s LEFT JOIN classes c!
ON c.class_id = s.class_id!
WHERE c.subject = ‘Math’!
GROUP BY c.grade!
HAVING max(s.score) > 50!
ORDER BY c.grade DESC;	
SELECT c.grade, s.score!
FROM scores s LEFT JOIN classes c!
ON c.class_id = s.class_id!
WHERE c.subject= ‘Math’!
ORDER BY c.grade DESC;	
Genarate remote query	
We can push-down
red portions of the
query
Possible remote query optimization in 9.5	
postgres=# EXPLAIN SELECT c.grade, max(s.score) max_score!
postgres-# FROM scores s LEFT JOIN classes c!
postgres-# ON c.class_id = s.class_id!
postgres-# WHERE c.subject= 'Math'!
postgres-# GROUP BY c.grade!
postgres-# HAVING max(s.score) > 50!
postgres-# ORDER BY c.grade DESC;!
QUERY PLAN!
----------------------------------------------------------------------------------!
GroupAggregate (cost=27.92..27.94 rows=1 width=8)!
Group Key: c.grade!
Filter: (max(s.score) > 50)!
-> Sort (cost=27.92..27.92 rows=1 width=8)!
Sort Key: c.grade DESC!
-> Hash Join (cost=20.18..27.91 rows=1 width=8)!
Hash Cond: (s.class_id = c.class_id)!
-> Seq Scan on scores s (cost=0.00..6.98 rows=198 width=8)!
-> Hash (cost=20.12..20.12 rows=4 width=8)!
-> Seq Scan on classes c (cost=0.00..20.12 rows=4 width=8)!
Filter: (subject = 'Math'::text)!
(11 rows)
IDEAS FOR FURTHER
ENHANCEMENT
Ideas for further enhancement	
•  Sort push-down
•  Aggregate push-down
•  More aggressive join push-down
•  2PC support (out of scope of this session)
•  Will be discussed in Ashutosh’s session on 19th Jun.
Sort push-down	
•  Outline
•  Mark a ForiegnScan as sorted
•  Efficacy
•  Avoid unnecessary sort on local side
•  Use ForeignScan as a source of MergeJoin directly
•  How to implement
•  Add extra ForeignPath with pathkeys
•  Estimate costs of pre-sorted path
•  Sort result of a foreign scan
•  add ORDER BY, in RDBMS FDWs
•  choose pre-sorted file, in file-based FDWs
Sort push-down	
•  Issues
•  How can we limit candidates of sort keys?
•  No brute-force approach
•  Introduce FOREIGN INDEX to represent generic remote indexes?
•  Introduce FDW-specific catalogs?
•  Extract key information from ORDER BY, JOIN, GROUP BY?
•  How can we ensure that the semantics of ordering are identical?
•  Even between PostgreSQLs, we have collation issues.
•  Is it OK to leave it to DBAs?
•  Limiting to non-character data types seems a way to go for the first cut.
•  Can we use pre-sorted join results as sorted path?
•  MergeJoin as a root node of remote query means the result is sorted by
the join key, but it is not certain even we execute EXPLAIN before
query.
•  Any idea?
Aggregate push-down	
•  Outline
•  Replace a Aggregate/GroupAggregate/HashAggregate plan node
with a ForeignScan which produces aggregated results
•  Efficacy
•  Reduce amount of data transferred
•  Off-load overheads of aggregation
•  How to implement
•  New FDW API for aggregation hooking
•  Implement API in each FDW
Aggregate push-down	
•  Issues
•  GROUP BY requires identical semantics about grouping keys.
•  We have similar issue to sort push-down.
•  How can we map functions to remote ones?
•  ROUTINE MAPPING is defined in SQL standard, but it doesn’t seem
well-designed.
More aggressive join push-down	
•  Outline
•  Send local data to join it on remote side, with following way:
•  VALUES expression in FROM clause
•  per-table replication, with logical replication, Slony-I, etc.
•  Efficacy
•  Reduce amount of data transferred from remote to local
•  Limited to cases that joining small local table and huge remote table
which produce small results
More aggressive join push-down	
•  How to implement
•  Replace reference to a small local table with VALUES()
•  Use a remote replicated table as an alternative
•  Issues
•  How can we construct VALUES() expression?
•  How can we know a table is replicated on the remote side?	
SELECT *!
FROM huge_remote_table h!
JOIN!
(VALUES (1, ‘foo’), (2, ‘bar’)) AS s (id, name)!
ON s.id;	
Generated by scanning
local small table
DISCUSSIONS

Foreign Data Wrapper Enhancements

  • 1.
    FOREIGN DATA WRAPPER ENHANCEMENTS June 17,2015 PostgreSQL Developers Unconference Clustering Track Shigeru HANADA, Etsuro Fujita
  • 2.
    Who are we • Shigeru HANADA •  From Tokyo, Japan •  Working on FDW since 2010 •  Implemented initial FDW API and postgres_fdw •  Etsuro Fujita •  From Tokyo, Japan •  Working on Postgres for 10 years •  Interested in FDW enhancements
  • 3.
    Agenda •  Past enhancementsproposed for 9.5 •  Inheritance support (Committed) •  Join push-down (Committed) •  Join push-down for postgres_fdw (Returned with feedback) •  Update push-down (Returned with feedback) •  Possible remote query optimization in 9.5 •  Ideas for further enhancement •  Sort push-down •  Aggregate push-down •  More aggressive join push-down •  Discussions
  • 4.
  • 5.
    Inheritance support •  Outline • Allow foreign table to participate in inheritance tree •  A way to implement sharding •  Example postgres=# explain verbose select * from parent ;! QUERY PLAN! ---------------------------------------------------------------------------! Append (cost=0.00..270.00 rows=2001 width=4)! -> Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=4)! Output: parent.a! -> Foreign Scan on public.ft1 (cost=100.00..135.00 rows=1000 width=4)! Output: ft1.a! Remote SQL: SELECT a FROM public.t1! -> Foreign Scan on public.ft2 (cost=100.00..135.00 rows=1000 width=4)! Output: ft2.a! Remote SQL: SELECT a FROM public.t2! (9 rows)
  • 6.
    Update push-down •  Outline • Send whole UPDATE/DELETE statement when it has same semantics on the remote side •  Example postgres=# explain verbose update foo set a = a + 1 where a > 10;! QUERY PLAN! --------------------------------------------------------------------------------! Update on public.foo (cost=100.00..139.78 rows=990 width=10)! Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1! -> Foreign Scan on public.foo (cost=100.00..139.78 rows=990 width=10)! Output: (a + 1), ctid! Remote SQL: SELECT a, ctid FROM public.foo WHERE ((a > 10)) FOR UPDATE! (5 rows)! ! postgres=# explain verbose update foo set a = a + 1 where a > 10;! QUERY PLAN! -----------------------------------------------------------------------------! Update on public.foo (cost=100.00..139.78 rows=990 width=10)! -> Foreign Update on public.foo (cost=100.00..139.78 rows=990 width=10)! Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a > 10))! (3 rows) Current Patched
  • 7.
    Update push-down, cont. • Issues •  FDW-APIs for update push-down •  Called from nodeModifyTable.c or nodeForeignscan.c? •  Update push-down for an update on a join •  "UPDATE foo ... FROM bar ..." (both foo and bar are remote) •  Further enhancements •  INSERT/UPSERT push-down
  • 8.
    Join push-down •  Outline • Join foreign tables on remote side, if it’s safe •  Example fdw=# EXPLAIN (VERBOSE) SELECT tbalance FROM pgbench_branches b JOIN pgbench_tellers t USING(bid);! QUERY PLAN! --------------------------------------------------------------------------- --------------------------------------------------------------------------- --------------------------------------------------------------------------- ---------! Foreign Scan (cost=100.00..101.00 rows=50 width=4)! Output: t.tbalance! Relations: (public.pgbench_branches b) INNER JOIN (public.pgbench_tellers t)! Remote SQL: SELECT r.a1 FROM (SELECT l.a9 FROM (SELECT bid a9 FROM public.pgbench_branches) l) l (a1) INNER JOIN (SELECT r.a11, r.a10 FROM (SELECT bid a10, tbalance a11 FROM public.pgbench_tellers) r) r (a1, a2) ON ((l.a1 = r.a2))! (4 rows)
  • 9.
    Join push-down, cont. • Issues •  Implement postgres_fdw to handle join APIs •  Centralize deparsing remote query •  Should use parse tree rather than planner information to generate join query? •  Generic SQL deparser would help porting to FDWs for other RDBMS
  • 10.
    Possible remote queryoptimization in 9.5 •  When we run a following query: SELECT c.grade, max(s.score) max_score! FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id! WHERE c.subject = ‘Math’! GROUP BY c.grade! HAVING max(s.score) > 50! ORDER BY c.grade DESC; “scores” and “classes” are foreign tables
  • 11.
    Possible remote queryoptimization in 9.5 •  When we run a following query: SELECT c.grade, max(s.score) max_score! FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id! WHERE c.subject = ‘Math’! GROUP BY c.grade! HAVING max(s.score) > 50! ORDER BY c.grade DESC; SELECT c.grade, s.score! FROM scores s LEFT JOIN classes c! ON c.class_id = s.class_id! WHERE c.subject= ‘Math’! ORDER BY c.grade DESC; Genarate remote query We can push-down red portions of the query
  • 12.
    Possible remote queryoptimization in 9.5 postgres=# EXPLAIN SELECT c.grade, max(s.score) max_score! postgres-# FROM scores s LEFT JOIN classes c! postgres-# ON c.class_id = s.class_id! postgres-# WHERE c.subject= 'Math'! postgres-# GROUP BY c.grade! postgres-# HAVING max(s.score) > 50! postgres-# ORDER BY c.grade DESC;! QUERY PLAN! ----------------------------------------------------------------------------------! GroupAggregate (cost=27.92..27.94 rows=1 width=8)! Group Key: c.grade! Filter: (max(s.score) > 50)! -> Sort (cost=27.92..27.92 rows=1 width=8)! Sort Key: c.grade DESC! -> Hash Join (cost=20.18..27.91 rows=1 width=8)! Hash Cond: (s.class_id = c.class_id)! -> Seq Scan on scores s (cost=0.00..6.98 rows=198 width=8)! -> Hash (cost=20.12..20.12 rows=4 width=8)! -> Seq Scan on classes c (cost=0.00..20.12 rows=4 width=8)! Filter: (subject = 'Math'::text)! (11 rows)
  • 13.
  • 14.
    Ideas for furtherenhancement •  Sort push-down •  Aggregate push-down •  More aggressive join push-down •  2PC support (out of scope of this session) •  Will be discussed in Ashutosh’s session on 19th Jun.
  • 15.
    Sort push-down •  Outline • Mark a ForiegnScan as sorted •  Efficacy •  Avoid unnecessary sort on local side •  Use ForeignScan as a source of MergeJoin directly •  How to implement •  Add extra ForeignPath with pathkeys •  Estimate costs of pre-sorted path •  Sort result of a foreign scan •  add ORDER BY, in RDBMS FDWs •  choose pre-sorted file, in file-based FDWs
  • 16.
    Sort push-down •  Issues • How can we limit candidates of sort keys? •  No brute-force approach •  Introduce FOREIGN INDEX to represent generic remote indexes? •  Introduce FDW-specific catalogs? •  Extract key information from ORDER BY, JOIN, GROUP BY? •  How can we ensure that the semantics of ordering are identical? •  Even between PostgreSQLs, we have collation issues. •  Is it OK to leave it to DBAs? •  Limiting to non-character data types seems a way to go for the first cut. •  Can we use pre-sorted join results as sorted path? •  MergeJoin as a root node of remote query means the result is sorted by the join key, but it is not certain even we execute EXPLAIN before query. •  Any idea?
  • 17.
    Aggregate push-down •  Outline • Replace a Aggregate/GroupAggregate/HashAggregate plan node with a ForeignScan which produces aggregated results •  Efficacy •  Reduce amount of data transferred •  Off-load overheads of aggregation •  How to implement •  New FDW API for aggregation hooking •  Implement API in each FDW
  • 18.
    Aggregate push-down •  Issues • GROUP BY requires identical semantics about grouping keys. •  We have similar issue to sort push-down. •  How can we map functions to remote ones? •  ROUTINE MAPPING is defined in SQL standard, but it doesn’t seem well-designed.
  • 19.
    More aggressive joinpush-down •  Outline •  Send local data to join it on remote side, with following way: •  VALUES expression in FROM clause •  per-table replication, with logical replication, Slony-I, etc. •  Efficacy •  Reduce amount of data transferred from remote to local •  Limited to cases that joining small local table and huge remote table which produce small results
  • 20.
    More aggressive joinpush-down •  How to implement •  Replace reference to a small local table with VALUES() •  Use a remote replicated table as an alternative •  Issues •  How can we construct VALUES() expression? •  How can we know a table is replicated on the remote side? SELECT *! FROM huge_remote_table h! JOIN! (VALUES (1, ‘foo’), (2, ‘bar’)) AS s (id, name)! ON s.id; Generated by scanning local small table
  • 21.