It has just been a few months since the PostgreSQL9.5 is released. We have got some of our customers excited about great new features and performance enhancements in v9.5. But here we are already taking a peak into the next version, and we find it awesome! One of the most awaited features – parallelism makes it to Postgres. The infrastructure for parallelism has been added over last few releases but the first parallel operation in query execution will be seen only in v9.6.
2. • Current State of Parallelism in PostgreSQL
• What was needed to bring server side parallelism – Work done in
v9.4 and v9.5
• Parallel Query in v9.6
• Review some parallel plans
• Parallelism may not be used always
• Parallelism may not be useful always
• Parameters
• Benefits
• Questions
Agenda
2
3. • Client side parallelism – Application can open multiple sessions
• One can run a batch with multiple application threads
• Server side languages can potentially do parallel operations
• I/O activity is taken off from main query execution process by wal
writer and bgwriter
• effective_io_concurrency allows page prefetch requests to the
kernel, for bitmap joins
• But there is no server side parallelism for dividing the same task
among multiple-workers
Current State (v9.5) of Parallelism in PostgreSQL
3
4. v9.4
• Dynamic background workers
• Dynamic shared memory
• Implementation of shared
memory message queues
v9.5
• Message propagation i.e. error
messages from background
worker can be sent to master
and received by master
• Synchronization of state (GUC
values, XID, CID mapping,
current user and current db
etc)
• Parallel Contexts can be used
by backend code to launch
worker processes
A lot of work was needed and was done!
4
5. • Parallel Sequential Scan
• Parallel Joins
• Parallel Aggregates
• Though these are not in their best forms and have certain
exceptions/limitations but they still work and quite useful!
v9.6: We have something that users can use!
5
13. Wow! So using ‘Parallel
Workers’ should be
preferred!
No, not really!
14. Parallel Query May not be used all the time
• Cost of working and coordinating among multiple worker
processes defeats the advantage of parallelism
• Cost of setting up parallelism infrastructure is too high
• No worker process is available
14
17. Parallel Query may not be good all the time
• It depends a lot on your hardware resources and process scheduling by
your OS
• I tried various degree of parallelism on a test machine
• 3 CPU, 3GB RAM
• VM Running CentOS
• Single I/O disk
• A simple ‘count’ on a table with 100million rows and 8 byte width
• explain analyze select count(*) from pgbench_accounts ;
• It performs faster with parallel degree set to 0, as index scan is
performed
• Make sure you have tuned your parameters well to help optimizer
decide
17
19. Parameters which govern parallel query execution
• parallel_setup_cost
• parallel_tuple_cost
• max_worker_processes
• max_parallel_degree
• force_parallel_mode
• ALTER TABLE … SET (parallel_degree=n)
• ALTER FUNCTION … PARALLEL SAFE
• ALTER FUNCTION … COST
19
20. Benefits to the users
• Sequential scan on large tables would be faster
• Analytics workload involve aggregates would be faster
• Faster JOINs between large tables
• PostgreSQL v9.6 can be a good candidate for the backend
database of data warehouse
• More parallel operations to come in future releases
20
21. What can you do?
• PostgreSQL Beta 1 is out
• Try it out…
• Test it…
• Break it…
• Report it
• Help PostgreSQL community make it better
21
22. Further Reading
• PGCon 2014: Implementing Parallelism in PostgreSQL, Robert
Haas
• PGConf.US, 2016: PostgreSQL 9.6, Magnus Hagander
• PGCon, Ottawa 2015: Parallel Sequential Scan, Robert Haas and
Amit Kapila at
• EnterpriseDB Blog: Parallelism Progress, Robert Haas
• Parallel Sequential Scan is Committed, Robert Haas
• EnterpriseDB Blog: Parallelism Becomes a Reality in Postgres, Amit
Kapila
22
23. Send us your suggestions and questions
success@ashnik.com
Stay Tuned!
Website: www.ashnik.com