Welcome Parallelism to
PostgreSQL
Thursday, 19 May 2016
• Current State of Parallelism in PostgreSQL
• What was needed to bring server side parallelism – Work done in
v9.4 and v9.5
• Parallel Query in v9.6
• Review some parallel plans
• Parallelism may not be used always
• Parallelism may not be useful always
• Parameters
• Benefits
• Questions
Agenda
2
• Client side parallelism – Application can open multiple sessions
• One can run a batch with multiple application threads
• Server side languages can potentially do parallel operations
• I/O activity is taken off from main query execution process by wal
writer and bgwriter
• effective_io_concurrency allows page prefetch requests to the
kernel, for bitmap joins
• But there is no server side parallelism for dividing the same task
among multiple-workers
Current State (v9.5) of Parallelism in PostgreSQL
3
v9.4
• Dynamic background workers
• Dynamic shared memory
• Implementation of shared
memory message queues
v9.5
• Message propagation i.e. error
messages from background
worker can be sent to master
and received by master
• Synchronization of state (GUC
values, XID, CID mapping,
current user and current db
etc)
• Parallel Contexts can be used
by backend code to launch
worker processes
A lot of work was needed and was done!
4
• Parallel Sequential Scan
• Parallel Joins
• Parallel Aggregates
• Though these are not in their best forms and have certain
exceptions/limitations but they still work and quite useful!
v9.6: We have something that users can use!
5
Basically how parallelism is supposed to work
6
Let’s look at some plans
Sequential Scan without Parallelism
8
Parallel Sequential Scans
9
You may not get as many workers as you desire
10
Parallel Aggregate
11
Parallel Joins
12
Wow! So using ‘Parallel
Workers’ should be
preferred!
No, not really!
Parallel Query May not be used all the time
• Cost of working and coordinating among multiple worker
processes defeats the advantage of parallelism
• Cost of setting up parallelism infrastructure is too high
• No worker process is available
14
Example
15
Parallel Query may not be good all the time
16
Parallel Query may not be good all the time
• It depends a lot on your hardware resources and process scheduling by
your OS
• I tried various degree of parallelism on a test machine
• 3 CPU, 3GB RAM
• VM Running CentOS
• Single I/O disk
• A simple ‘count’ on a table with 100million rows and 8 byte width
• explain analyze select count(*) from pgbench_accounts ;
• It performs faster with parallel degree set to 0, as index scan is
performed
• Make sure you have tuned your parameters well to help optimizer
decide
17
Parameters Involved
Parameters which govern parallel query execution
• parallel_setup_cost
• parallel_tuple_cost
• max_worker_processes
• max_parallel_degree
• force_parallel_mode
• ALTER TABLE … SET (parallel_degree=n)
• ALTER FUNCTION … PARALLEL SAFE
• ALTER FUNCTION … COST
19
Benefits to the users
• Sequential scan on large tables would be faster
• Analytics workload involve aggregates would be faster
• Faster JOINs between large tables
• PostgreSQL v9.6 can be a good candidate for the backend
database of data warehouse
• More parallel operations to come in future releases
20
What can you do?
• PostgreSQL Beta 1 is out
• Try it out…
• Test it…
• Break it…
• Report it
• Help PostgreSQL community make it better
21
Further Reading
• PGCon 2014: Implementing Parallelism in PostgreSQL, Robert
Haas
• PGConf.US, 2016: PostgreSQL 9.6, Magnus Hagander
• PGCon, Ottawa 2015: Parallel Sequential Scan, Robert Haas and
Amit Kapila at
• EnterpriseDB Blog: Parallelism Progress, Robert Haas
• Parallel Sequential Scan is Committed, Robert Haas
• EnterpriseDB Blog: Parallelism Becomes a Reality in Postgres, Amit
Kapila
22
Send us your suggestions and questions
success@ashnik.com
Stay Tuned!
Website: www.ashnik.com

2016 may-countdown-to-postgres-v96-parallel-query

  • 1.
  • 2.
    • Current Stateof Parallelism in PostgreSQL • What was needed to bring server side parallelism – Work done in v9.4 and v9.5 • Parallel Query in v9.6 • Review some parallel plans • Parallelism may not be used always • Parallelism may not be useful always • Parameters • Benefits • Questions Agenda 2
  • 3.
    • Client sideparallelism – Application can open multiple sessions • One can run a batch with multiple application threads • Server side languages can potentially do parallel operations • I/O activity is taken off from main query execution process by wal writer and bgwriter • effective_io_concurrency allows page prefetch requests to the kernel, for bitmap joins • But there is no server side parallelism for dividing the same task among multiple-workers Current State (v9.5) of Parallelism in PostgreSQL 3
  • 4.
    v9.4 • Dynamic backgroundworkers • Dynamic shared memory • Implementation of shared memory message queues v9.5 • Message propagation i.e. error messages from background worker can be sent to master and received by master • Synchronization of state (GUC values, XID, CID mapping, current user and current db etc) • Parallel Contexts can be used by backend code to launch worker processes A lot of work was needed and was done! 4
  • 5.
    • Parallel SequentialScan • Parallel Joins • Parallel Aggregates • Though these are not in their best forms and have certain exceptions/limitations but they still work and quite useful! v9.6: We have something that users can use! 5
  • 6.
    Basically how parallelismis supposed to work 6
  • 7.
    Let’s look atsome plans
  • 8.
  • 9.
  • 10.
    You may notget as many workers as you desire 10
  • 11.
  • 12.
  • 13.
    Wow! So using‘Parallel Workers’ should be preferred! No, not really!
  • 14.
    Parallel Query Maynot be used all the time • Cost of working and coordinating among multiple worker processes defeats the advantage of parallelism • Cost of setting up parallelism infrastructure is too high • No worker process is available 14
  • 15.
  • 16.
    Parallel Query maynot be good all the time 16
  • 17.
    Parallel Query maynot be good all the time • It depends a lot on your hardware resources and process scheduling by your OS • I tried various degree of parallelism on a test machine • 3 CPU, 3GB RAM • VM Running CentOS • Single I/O disk • A simple ‘count’ on a table with 100million rows and 8 byte width • explain analyze select count(*) from pgbench_accounts ; • It performs faster with parallel degree set to 0, as index scan is performed • Make sure you have tuned your parameters well to help optimizer decide 17
  • 18.
  • 19.
    Parameters which governparallel query execution • parallel_setup_cost • parallel_tuple_cost • max_worker_processes • max_parallel_degree • force_parallel_mode • ALTER TABLE … SET (parallel_degree=n) • ALTER FUNCTION … PARALLEL SAFE • ALTER FUNCTION … COST 19
  • 20.
    Benefits to theusers • Sequential scan on large tables would be faster • Analytics workload involve aggregates would be faster • Faster JOINs between large tables • PostgreSQL v9.6 can be a good candidate for the backend database of data warehouse • More parallel operations to come in future releases 20
  • 21.
    What can youdo? • PostgreSQL Beta 1 is out • Try it out… • Test it… • Break it… • Report it • Help PostgreSQL community make it better 21
  • 22.
    Further Reading • PGCon2014: Implementing Parallelism in PostgreSQL, Robert Haas • PGConf.US, 2016: PostgreSQL 9.6, Magnus Hagander • PGCon, Ottawa 2015: Parallel Sequential Scan, Robert Haas and Amit Kapila at • EnterpriseDB Blog: Parallelism Progress, Robert Haas • Parallel Sequential Scan is Committed, Robert Haas • EnterpriseDB Blog: Parallelism Becomes a Reality in Postgres, Amit Kapila 22
  • 23.
    Send us yoursuggestions and questions success@ashnik.com Stay Tuned! Website: www.ashnik.com