PG-Strom
Query Acceleration Engine of PostgreSQL
Powered by GPGPU
NEC OSS Promotion Center
The PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
Self Introduction
▌Name: KaiGai Kohei
▌Company: NEC
▌Mission: Software architect & Intrepreneur
▌Background:
 Linux kernel development (2003~?)
 PostgreSQL development (2006~)
 SAP alliance (2011~2013)
 PG-Strom development & productization (2012~)
▌PG-Strom Project:
 In-company startup of NEC
 Also, an open source software project
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.2
What is PG-Strom
▌An Extension of PostgreSQL
▌Off-loads CPU intensive SQL workloads to GPU processors
▌Major Features
① Automatic and just-in-time GPU code generation from SQL
② Asynchronous and concurrent query executor
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.3
database
Query
Executor
Query
Planner
Custom
Executor
Custom
Planner
GPU code
on the flySQL
command
Async-
Execution
PG-Strom
Query
Frontend
Concept
▌No Pain
 Looks like a traditional PostgreSQL database from standpoint of
applications, thus, we can utilize existing tools, drivers, applications.
▌No Tuning
 Massive computing capability by GPGPU kills necessity of database
tuning by human. It allows engineering folks to focus on the task only
human can do.
▌No Complexity
 No need to export large data to external tools from RDBMS, because its
computing performance is sufficient to run the workloads nearby data.
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.4
RDBMS and bottleneck (1/2)
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 5
Storage
Processor
Data
RAM
Data Size > RAM Data Size < RAM
Storage
Processor
Data
RAM
In the future?
Processor
Wide
Band
RAM
Non-
volatile
RAM
Data
World of current cpu/memory bottleneck
Join, Aggregation, Sort, Projection, ...
[strategy]
• burstable access pattern
• parallel algorithm
World of traditional disk-i/o bottleneck
SeqScan, IndexScan, ...
[strategy]
• reduction of i/o (size, count)
• distribution of disk (RAID)
RDBMS and bottleneck (2/2)
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 6
Processor
RAM
Storage
bandwidth:
multiple
hundreds GB/s
bandwidth:
multiple GB/s
Background (1/4) – Semiconductor Trend
▌Movement to CPU/GPU integrated architecture rather than multicore CPU
▌Free lunch for SW by HW evolution will finish soon
 Unless software is not designed to utilize GPU capability,
unable to pull-out the full hardware capability.
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 7
SOURCE: THE HEART OF AMD INNOVATION, Lisa Su, at AMD Developer Summit 2013
Background (2/4) – Features of GPU
▌Characteristics
 Larger percentage of ALUs on chip
 Relatively smaller percentage of cache
and control logic
Advantages to simple calculation in
parallel, but not complicated logic
 Much higher number of cores per price
• GTX750Ti (640core) with $150
GPU CPU
Model Nvidia Tesla K20X
Intel Xeon
E5-2670 v3
Architecture Kepler Haswell
Launch Nov-2012 Sep-2014
# of transistors 7.1billion 3.84billion
# of cores 2688 (simple) 12 (functional)
Core clock 732MHz
2.6GHz,
up to 3.5GHz
Peak Flops
(single precision)
3.95TFLOPS
998.4GFLOPS
(with AVX2)
DRAM size 6GB, GDDR5
768GB/socket,
DDR4
Memory band 250GB/s 68GB/s
Power
consumption
235W 135W
Price $3,000 $2,094
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 8
SOURCE: CUDA C Programming Guide (v6.5)
Background (3/4) – How GPU works
●item[0]
step.1 step.2 step.4step.3
Computing
the sum of array:
𝑖𝑡𝑒𝑚[𝑖]
𝑖=0…𝑁−1
with N-cores of GPU
◆
●
▲ ■ ★
● ◆
●
● ◆ ▲
●
● ◆
●
● ◆ ▲ ■
●
● ◆
●
● ◆ ▲
●
● ◆
●
item[1]
item[2]
item[3]
item[4]
item[5]
item[6]
item[7]
item[8]
item[9]
item[10]
item[11]
item[12]
item[13]
item[14]
item[15]
Total sum of items[]
with log2N steps
Inter core synchronization by HW support
Background (4/4) – Custom-Plan Interface
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.10
Aggregate
SELECT cat, avg(x) FROM t1, t2
WHERE t1.id = t2.id AND y > 100
GROUP BY cat;
Scan on t1 Scan on t2
Join
t1 t2
key: cat
• Hash Join
• Merge Join
• Nested Loop
• Custom Join
• Seq Scan
• Index Scan
• Index-Only Scan
• Tid Scan
• Custom Scan
IndexScan on t1
y > 100
“BulkLoad” on t1
“GpuHashJoin”
t1.id = t2.id
PG-Strom Features
▌Logics
 GpuScan ... Parallel evaluation of scan qualifiers
 GpuHashJoin ... Parallel multi-relational join
 GpuPreAgg ... Two phase aggregation
 GpuSort ... GPU + CPU Hybrid Sorting
 GpuNestedLoop (in develop)
▌Data Types
 Integer, Float, Date/Time, Numeric, Text
▌Function and Operators
 Equality and comparison operators
 Arithmetic operators and mathematical functions
 Aggregates: count, min/max, sum, avg, std, var, corr, regr
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.11
Automatic GPU code generation
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.12
postgres=# SET pg_strom.show_device_kernel = on;
postgres=# EXPLAIN VERBOSE SELECT * FROM t0 WHERE sqrt(x+y) < 10;
QUERY PLAN
--------------------------------------------------------------------------------
Custom Scan (GpuScan) on public.t0 (cost=500.00..357569.35 rows=6666683 width=77)
Output: id, cat, aid, bid, cid, did, eid, x, y, z
Device Filter: (sqrt((t0.x + t0.y)) < 10::double precision)
Features: likely-tuple-slot
Kernel Source: #include "opencl_common.h“
:
static pg_bool_t
gpuscan_qual_eval(__private cl_int *errcode,
__global kern_parambuf *kparams,
__global kern_data_store *kds,
__global kern_data_store *ktoast,
size_t kds_index)
{
pg_float8_t KPARAM_0 = pg_float8_param(kparams,errcode,0);
pg_float8_t KVAR_8 = pg_float8_vref(kds,ktoast,errcode,7,kds_index);
pg_float8_t KVAR_9 = pg_float8_vref(kds,ktoast,errcode,8,kds_index);
return pgfn_float8lt(errcode,
pgfn_dsqrt(errcode, pgfn_float8pl(errcode, KVAR_8, KVAR_9)), KPARAM_0);
}
Implementation (1/3) – GpuScan
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.13
Table
DMA
Send
DMA
Recv
DMA
Send
DMA
Recv
DMA
Send
DMA
Recv
Execution of
auto-generated
GPU code
Result
Output
Stream
Input
Stream
Chunk
(16~64MB)
PostgreSQL
PG-Strom
Software Architecture (1/2) – current version
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 14
GPU Code Generator
Storage
Storage Manager
Shared
Buffer
Query
Parser
Query
Optimizer
Query
Executor
SQL Query
Breaks down
the query to
parse tree
Makes query
execution plan
Run the query
Custom-PlanAPIs
GpuScan
GpuHashJoin
GpuPreAgg
GPU Program
Manager
PG-Strom
OpenCL
Server
Message Queue
GpuSort
PostgreSQL
PG-Strom
Software Architecture (2/2) – upcoming version
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 15
GPU Code Generator
Storage
Storage Manager
Shared
Buffer
Query
Parser
Query
Optimizer
Query
Executor
SQL Query
Breaks down
the query to
parse tree
Makes query
execution plan
Run the query
Custom-PlanAPIs
GpuScan
GpuHashJoin
GpuPreAgg
CUDA controller
GpuSort
DMA
Buffer
Just-in-time
compilewith
NVRTC
Copy
DMA
kernel launch
Implementation (2/3) – GpuHashJoin
PG-Strom Preview Feb-2015Page. 16
Inner
relation
Outer
relation
Inner
relation
Outer
relation
Hash Table Hash Table
Next stage Next stage
CPU just
references
materialized
results
Hash-Table
Search by CPU
Sequential
Materialization
by CPU
Parallel
Materialization
Parallel
Hash-Table
Search
vanilla Hash-Join GpuHashJoin
Benchmark result (1/2) – simple tables join
▌Benchmark Query:
SELECT * FROM t0 NATURAL JOIN t1 [NATURAL JOIN ....];
▌Environment:
 t0 has 100million rows (13GB), t1-t9 has 40,000 rows for each, all-data pre-loaded
 CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.17
18.19 19.45 21.04 23.66 26.69
37.64 43.22 49.57 56.38
64.27
87.73
109.73
132.21
155.10
179.62
207.85
233.31
263.51
0.00
50.00
100.00
150.00
200.00
250.00
300.00
2 3 4 5 6 7 8 9 10
QueryResponseTime[sec]
number of tables joined
Simple Tables Join Benchmark
PG-Strom PostgreSQL
Implementation (3/3) – GpuPreAgg
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.18
Table
1st Stage
Reduction
2nd Stage
Reduction
Chunk
(16~64MB)
Benchmark result (2/2) – Star Schema Model
▌40 typical reporting queries
▌100GB of retail / start-schema data, all pre-loaded
▌Environment
 CPU: Xeon E5-2670v3(12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.19
0.00
200.00
400.00
600.00
800.00
1000.00
1200.00
1400.00
1600.00
1800.00
2000.00
Q.01
Q.02
Q.03
Q.04
Q.05
Q.06
Q.07
Q.08
Q.09
Q.10
Q.11
Q.12
Q.13
Q.14
Q.15
Q.16
Q.17
Q.18
Q.19
Q.20
Q.21
Q.22
Q.23
Q.24
Q.25
Q.26
Q.27
Q.28
Q.29
Q.30
Q.31
Q.32
Q.33
Q.34
Q.35
Q.36
Q.37
Q.38
Q.39
Q.40
QueryResponseTime[sec]
Typical Reporting Queries on Retail / Star-Schema Data
PG-Strom PostgreSQL
Expected Scenario – Reduction of ETL
▌ETL – Its design is human centric task
▌Replication – much automatous task
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.20
ERPCRMSCM BI
OLTP
database
OLAP
database
ETL
OLAP CubesMaster / Fact Tables
BI
Replication
Replica of
Master / Fact Tables
Optimized to
transaction
workloads
Optimized to
analytic
workloads
Sufficient to
analytic
workloads also
PG-Strom
Direction of PG-Strom
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.21
Development Plan
 Migration of OpenCL to CUDA
 Add support of GpuNestedLoop
 Add support multi-functional kernel
 Standardization of custom-join interface
 ...and more...?
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.22
Short term target: PostgreSQL v9.5 timeline (2015)
Middle term target: PostgreSQL v9.6 timeline (2016)
Current version: PG-Strom β + PostgreSQL v9.5devel
 Integration with funnel executor
 Investigation to SSD/NvRAM utilization
 Custom-sort/aggregate interface
 Add support for spatial data types (?)
Enhancement Idea (1/3) – GpuNestedLoop
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.23
Inner-Relation
(Ny: relatively small)
Outer-Relation
(Nx:relativelylarge)
●
●
●
●
●
●
●
●
●
●
●●●●●●●
blockDim.x
blockDim.y
Ny items
Nx
items
Thread
(X=2, Y=3)
Each GPU threads evaluates
non-equalar join condition
in parallel
2-dimensional
GPU kernel launch
(Nx×Ny threads at once)
Enhancement Idea (2/3) – Multi-functional GPU kernel
PG-Strom Preview Feb-201524
magnetic
storage
GpuScan
GpuHashJoin
GpuPreAgg
Final result
current query execution
Data
Chunk
on-memory
cache
Load
GpuScan
Kernel
GpuHashJoin
Kernel
GpuPreAgg
Kernel
execution
on GPU device
Interaction
between
CPU and GPU
multiple times
magnetic
storage
Final result
execution in newer version
on-memory
cache
Interaction
between
CPU and GPU
Only once
GpuMultiOps
GpuScan
GpuHashJoin
GpuPreAgg
GpuMultiOps
Kernel
•PreAgg
•HashJoin
•Scan
Data
Chunk
Enhancement Idea (3/3) – Funnel executor integration
PG-Strom Preview Feb-201525
magnetic
storage
on-memory
cache
Scan
Join
Aggregate
Final result
current query execution
Query
Execution
Plan
PG-Strom may
replace GPU
version, but host
system run with
single thread.
execution in PostgreSQL v9.6
Funnel Executor
Partial
Aggregate
Partial
Scan
Partial
Aggregate
Partial
Scan
Partial
Aggregate
Join
Partial
Scan
Join Join
magnetic
storage
SSD
device
on-memory cache
Final result
Combined Aggregate
Funnel executor
assigns a part of
query execution
task on worker
processes
Combines
multiple partial
aggregates to
generate the
final result
Let’s try – Deployment on AWS
Page. 26
Search by “strom” !
AWS GPU Instance (g2.2xlarge)
CPU Xeon E5-2670 (8 xCPU)
RAM 15GB
GPU NVIDIA GRID K2 (1536core)
Storage 60GB of SSD
Price $0.898/hour, $646.56/mon
(*) Price for on-demand instance
on Tokyo region at Nov-2014
The PostgreSQL Conference 2014, Tokyo - GPGPU Accelerates PostgreSQL
Welcome to your involvement
▌How to be involved?
 as a user
 as a developer
 as a business partner
▌Source code
 https://github.com/pg-strom/devel
▌Contact US
 e-mail: kaigai@ak.jp.nec.com
 twitter: @kkaigai
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.27
check it out!
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom

20150318-SFPUG-Meetup-PGStrom

  • 1.
    PG-Strom Query Acceleration Engineof PostgreSQL Powered by GPGPU NEC OSS Promotion Center The PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
  • 2.
    Self Introduction ▌Name: KaiGaiKohei ▌Company: NEC ▌Mission: Software architect & Intrepreneur ▌Background:  Linux kernel development (2003~?)  PostgreSQL development (2006~)  SAP alliance (2011~2013)  PG-Strom development & productization (2012~) ▌PG-Strom Project:  In-company startup of NEC  Also, an open source software project PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.2
  • 3.
    What is PG-Strom ▌AnExtension of PostgreSQL ▌Off-loads CPU intensive SQL workloads to GPU processors ▌Major Features ① Automatic and just-in-time GPU code generation from SQL ② Asynchronous and concurrent query executor PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.3 database Query Executor Query Planner Custom Executor Custom Planner GPU code on the flySQL command Async- Execution PG-Strom Query Frontend
  • 4.
    Concept ▌No Pain  Lookslike a traditional PostgreSQL database from standpoint of applications, thus, we can utilize existing tools, drivers, applications. ▌No Tuning  Massive computing capability by GPGPU kills necessity of database tuning by human. It allows engineering folks to focus on the task only human can do. ▌No Complexity  No need to export large data to external tools from RDBMS, because its computing performance is sufficient to run the workloads nearby data. PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.4
  • 5.
    RDBMS and bottleneck(1/2) DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 5 Storage Processor Data RAM Data Size > RAM Data Size < RAM Storage Processor Data RAM In the future? Processor Wide Band RAM Non- volatile RAM Data
  • 6.
    World of currentcpu/memory bottleneck Join, Aggregation, Sort, Projection, ... [strategy] • burstable access pattern • parallel algorithm World of traditional disk-i/o bottleneck SeqScan, IndexScan, ... [strategy] • reduction of i/o (size, count) • distribution of disk (RAID) RDBMS and bottleneck (2/2) DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 6 Processor RAM Storage bandwidth: multiple hundreds GB/s bandwidth: multiple GB/s
  • 7.
    Background (1/4) –Semiconductor Trend ▌Movement to CPU/GPU integrated architecture rather than multicore CPU ▌Free lunch for SW by HW evolution will finish soon  Unless software is not designed to utilize GPU capability, unable to pull-out the full hardware capability. DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 7 SOURCE: THE HEART OF AMD INNOVATION, Lisa Su, at AMD Developer Summit 2013
  • 8.
    Background (2/4) –Features of GPU ▌Characteristics  Larger percentage of ALUs on chip  Relatively smaller percentage of cache and control logic Advantages to simple calculation in parallel, but not complicated logic  Much higher number of cores per price • GTX750Ti (640core) with $150 GPU CPU Model Nvidia Tesla K20X Intel Xeon E5-2670 v3 Architecture Kepler Haswell Launch Nov-2012 Sep-2014 # of transistors 7.1billion 3.84billion # of cores 2688 (simple) 12 (functional) Core clock 732MHz 2.6GHz, up to 3.5GHz Peak Flops (single precision) 3.95TFLOPS 998.4GFLOPS (with AVX2) DRAM size 6GB, GDDR5 768GB/socket, DDR4 Memory band 250GB/s 68GB/s Power consumption 235W 135W Price $3,000 $2,094 DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 8 SOURCE: CUDA C Programming Guide (v6.5)
  • 9.
    Background (3/4) –How GPU works ●item[0] step.1 step.2 step.4step.3 Computing the sum of array: 𝑖𝑡𝑒𝑚[𝑖] 𝑖=0…𝑁−1 with N-cores of GPU ◆ ● ▲ ■ ★ ● ◆ ● ● ◆ ▲ ● ● ◆ ● ● ◆ ▲ ■ ● ● ◆ ● ● ◆ ▲ ● ● ◆ ● item[1] item[2] item[3] item[4] item[5] item[6] item[7] item[8] item[9] item[10] item[11] item[12] item[13] item[14] item[15] Total sum of items[] with log2N steps Inter core synchronization by HW support
  • 10.
    Background (4/4) –Custom-Plan Interface PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.10 Aggregate SELECT cat, avg(x) FROM t1, t2 WHERE t1.id = t2.id AND y > 100 GROUP BY cat; Scan on t1 Scan on t2 Join t1 t2 key: cat • Hash Join • Merge Join • Nested Loop • Custom Join • Seq Scan • Index Scan • Index-Only Scan • Tid Scan • Custom Scan IndexScan on t1 y > 100 “BulkLoad” on t1 “GpuHashJoin” t1.id = t2.id
  • 11.
    PG-Strom Features ▌Logics  GpuScan... Parallel evaluation of scan qualifiers  GpuHashJoin ... Parallel multi-relational join  GpuPreAgg ... Two phase aggregation  GpuSort ... GPU + CPU Hybrid Sorting  GpuNestedLoop (in develop) ▌Data Types  Integer, Float, Date/Time, Numeric, Text ▌Function and Operators  Equality and comparison operators  Arithmetic operators and mathematical functions  Aggregates: count, min/max, sum, avg, std, var, corr, regr PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.11
  • 12.
    Automatic GPU codegeneration PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.12 postgres=# SET pg_strom.show_device_kernel = on; postgres=# EXPLAIN VERBOSE SELECT * FROM t0 WHERE sqrt(x+y) < 10; QUERY PLAN -------------------------------------------------------------------------------- Custom Scan (GpuScan) on public.t0 (cost=500.00..357569.35 rows=6666683 width=77) Output: id, cat, aid, bid, cid, did, eid, x, y, z Device Filter: (sqrt((t0.x + t0.y)) < 10::double precision) Features: likely-tuple-slot Kernel Source: #include "opencl_common.h“ : static pg_bool_t gpuscan_qual_eval(__private cl_int *errcode, __global kern_parambuf *kparams, __global kern_data_store *kds, __global kern_data_store *ktoast, size_t kds_index) { pg_float8_t KPARAM_0 = pg_float8_param(kparams,errcode,0); pg_float8_t KVAR_8 = pg_float8_vref(kds,ktoast,errcode,7,kds_index); pg_float8_t KVAR_9 = pg_float8_vref(kds,ktoast,errcode,8,kds_index); return pgfn_float8lt(errcode, pgfn_dsqrt(errcode, pgfn_float8pl(errcode, KVAR_8, KVAR_9)), KPARAM_0); }
  • 13.
    Implementation (1/3) –GpuScan PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.13 Table DMA Send DMA Recv DMA Send DMA Recv DMA Send DMA Recv Execution of auto-generated GPU code Result Output Stream Input Stream Chunk (16~64MB)
  • 14.
    PostgreSQL PG-Strom Software Architecture (1/2)– current version DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 14 GPU Code Generator Storage Storage Manager Shared Buffer Query Parser Query Optimizer Query Executor SQL Query Breaks down the query to parse tree Makes query execution plan Run the query Custom-PlanAPIs GpuScan GpuHashJoin GpuPreAgg GPU Program Manager PG-Strom OpenCL Server Message Queue GpuSort
  • 15.
    PostgreSQL PG-Strom Software Architecture (2/2)– upcoming version DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQLPage. 15 GPU Code Generator Storage Storage Manager Shared Buffer Query Parser Query Optimizer Query Executor SQL Query Breaks down the query to parse tree Makes query execution plan Run the query Custom-PlanAPIs GpuScan GpuHashJoin GpuPreAgg CUDA controller GpuSort DMA Buffer Just-in-time compilewith NVRTC Copy DMA kernel launch
  • 16.
    Implementation (2/3) –GpuHashJoin PG-Strom Preview Feb-2015Page. 16 Inner relation Outer relation Inner relation Outer relation Hash Table Hash Table Next stage Next stage CPU just references materialized results Hash-Table Search by CPU Sequential Materialization by CPU Parallel Materialization Parallel Hash-Table Search vanilla Hash-Join GpuHashJoin
  • 17.
    Benchmark result (1/2)– simple tables join ▌Benchmark Query: SELECT * FROM t0 NATURAL JOIN t1 [NATURAL JOIN ....]; ▌Environment:  t0 has 100million rows (13GB), t1-t9 has 40,000 rows for each, all-data pre-loaded  CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.17 18.19 19.45 21.04 23.66 26.69 37.64 43.22 49.57 56.38 64.27 87.73 109.73 132.21 155.10 179.62 207.85 233.31 263.51 0.00 50.00 100.00 150.00 200.00 250.00 300.00 2 3 4 5 6 7 8 9 10 QueryResponseTime[sec] number of tables joined Simple Tables Join Benchmark PG-Strom PostgreSQL
  • 18.
    Implementation (3/3) –GpuPreAgg PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.18 Table 1st Stage Reduction 2nd Stage Reduction Chunk (16~64MB)
  • 19.
    Benchmark result (2/2)– Star Schema Model ▌40 typical reporting queries ▌100GB of retail / start-schema data, all pre-loaded ▌Environment  CPU: Xeon E5-2670v3(12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1 PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.19 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00 1800.00 2000.00 Q.01 Q.02 Q.03 Q.04 Q.05 Q.06 Q.07 Q.08 Q.09 Q.10 Q.11 Q.12 Q.13 Q.14 Q.15 Q.16 Q.17 Q.18 Q.19 Q.20 Q.21 Q.22 Q.23 Q.24 Q.25 Q.26 Q.27 Q.28 Q.29 Q.30 Q.31 Q.32 Q.33 Q.34 Q.35 Q.36 Q.37 Q.38 Q.39 Q.40 QueryResponseTime[sec] Typical Reporting Queries on Retail / Star-Schema Data PG-Strom PostgreSQL
  • 20.
    Expected Scenario –Reduction of ETL ▌ETL – Its design is human centric task ▌Replication – much automatous task PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.20 ERPCRMSCM BI OLTP database OLAP database ETL OLAP CubesMaster / Fact Tables BI Replication Replica of Master / Fact Tables Optimized to transaction workloads Optimized to analytic workloads Sufficient to analytic workloads also PG-Strom
  • 21.
    Direction of PG-Strom PG-Strom- Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.21
  • 22.
    Development Plan  Migrationof OpenCL to CUDA  Add support of GpuNestedLoop  Add support multi-functional kernel  Standardization of custom-join interface  ...and more...? PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.22 Short term target: PostgreSQL v9.5 timeline (2015) Middle term target: PostgreSQL v9.6 timeline (2016) Current version: PG-Strom β + PostgreSQL v9.5devel  Integration with funnel executor  Investigation to SSD/NvRAM utilization  Custom-sort/aggregate interface  Add support for spatial data types (?)
  • 23.
    Enhancement Idea (1/3)– GpuNestedLoop PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.23 Inner-Relation (Ny: relatively small) Outer-Relation (Nx:relativelylarge) ● ● ● ● ● ● ● ● ● ● ●●●●●●● blockDim.x blockDim.y Ny items Nx items Thread (X=2, Y=3) Each GPU threads evaluates non-equalar join condition in parallel 2-dimensional GPU kernel launch (Nx×Ny threads at once)
  • 24.
    Enhancement Idea (2/3)– Multi-functional GPU kernel PG-Strom Preview Feb-201524 magnetic storage GpuScan GpuHashJoin GpuPreAgg Final result current query execution Data Chunk on-memory cache Load GpuScan Kernel GpuHashJoin Kernel GpuPreAgg Kernel execution on GPU device Interaction between CPU and GPU multiple times magnetic storage Final result execution in newer version on-memory cache Interaction between CPU and GPU Only once GpuMultiOps GpuScan GpuHashJoin GpuPreAgg GpuMultiOps Kernel •PreAgg •HashJoin •Scan Data Chunk
  • 25.
    Enhancement Idea (3/3)– Funnel executor integration PG-Strom Preview Feb-201525 magnetic storage on-memory cache Scan Join Aggregate Final result current query execution Query Execution Plan PG-Strom may replace GPU version, but host system run with single thread. execution in PostgreSQL v9.6 Funnel Executor Partial Aggregate Partial Scan Partial Aggregate Partial Scan Partial Aggregate Join Partial Scan Join Join magnetic storage SSD device on-memory cache Final result Combined Aggregate Funnel executor assigns a part of query execution task on worker processes Combines multiple partial aggregates to generate the final result
  • 26.
    Let’s try –Deployment on AWS Page. 26 Search by “strom” ! AWS GPU Instance (g2.2xlarge) CPU Xeon E5-2670 (8 xCPU) RAM 15GB GPU NVIDIA GRID K2 (1536core) Storage 60GB of SSD Price $0.898/hour, $646.56/mon (*) Price for on-demand instance on Tokyo region at Nov-2014 The PostgreSQL Conference 2014, Tokyo - GPGPU Accelerates PostgreSQL
  • 27.
    Welcome to yourinvolvement ▌How to be involved?  as a user  as a developer  as a business partner ▌Source code  https://github.com/pg-strom/devel ▌Contact US  e-mail: kaigai@ak.jp.nec.com  twitter: @kkaigai PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU -P.27 check it out!