SlideShare a Scribd company logo
1 of 24
Download to read offline
PG-Strom v2.0 Release
Technical Brief
(17-Apr-2018)
PG-Strom Development Team
<pgstrom@heterodb.com>
What is PG-Strom?
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)2
GPU
[For I/O intensive workloads]
SSD-to-GPU Direct SQL Execution
In-memory Columnar Cache
SCAN+JOIN+GROUP BY
combined GPU kernel
[For Advanced Analytics workloads]
CPU+GPU hybrid parallel
PL/CUDA user defined function
GPU memory store (Gstore_fdw)
off-loading
✓ Designed as PostgreSQL extension
✓ Transparent SQL acceleration
✓ Cooperation with fully transactional
database management system
✓ Various comprehensive tools and
applications for PostgreSQL
PG-Strom: an extension module to accelerate analytic SQL workloads using GPU.
[GPU’s characteristics]
✓ Several thousands of processor cores per device
✓ Nearly terabytes per second memory bandwidth
✓ Much higher cost performance ratio
PG-Strom is an open source extension module for PostgreSQL (v9.6 or later) to accelerate analytic queries, has been developed and
incrementally improved since 2012. It provides PostgreSQL alternative query execution plan for SCAN, JOIN and GROUP BY workloads.
Once optimizer chose the custom plans which use GPU for SQL execution, PG-Strom constructs relative GPU code on the fly. It means
PostgreSQL uses GPU aware execution engine only when it makes sense, so here is no downside for transactional workloads.
PL/CUDA is a derivational feature that allows manual optimization with user defined function (UDF) by CUDA C; which shall be executed on
GPU device. The PG-Strom v2.0 added many features to enhance these basis for more performance and wider use scenarios.
PG-Strom v2.0 features highlight
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)3
▌Storage Enhancement
 SSD-to-GPU Direct SQL Execution
 In-memory Columnar Cache
 GPU memory store (gstore_fdw)
▌Advanced SQL Infrastructure
 PostgreSQL v9.6/v10 support – CPU+GPU Hybrid Parallel
 SCAN+JOIN+GROUP BY combined GPU kernel
 Utilization of demand paging of GPU device memory
▌Miscellaneous
 PL/CUDA related enhancement
 New data type support
 Documentation and Packaging
Storage Enhancement
Storage enhancement for each layer
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)5
Storage Layer
GPU device memory
Band: ~900GB/s
Capacity: ~32GB
Host memory
Band: ~128GB/s
Capacity: ~1.5TB
NVMe-SSD
Band: ~10GB/s
Capacity: 10TB and more
Hot Storage
Cold Storage
Related PG-Strom Feature
GPU memory store
(gstore_fdw)
In-memory columnar cache
SSD-to-GPU Direct SQL
Execution
In general, key of performance is not only number of cores and its clock, but data throughput to be loaded for the processors also.
Individual storage layer has its own characteristics, thus PG-Strom provides several options to optimize the supply of data.
SSD-to-GPU Direct SQL Execution is unique and characteristic feature of PG-Strom. It directly loads the data blocks of PostgreSQL to GPU,
and runs SQL workloads to reduce amount of data to be processed by CPU prior to the arrival. Its data transfer bypasses operating system
software stacks, thus allows to pull out nearly wired performance of the hardware. In-memory columnar cache allows to keep data blocks
in the optimal format for GPU to compute and transfer over the PCIe bus. GPU memory store (gstore_fdw) allows preload on the GPU
device memory using standard SQL statement. It is an ideal data location for PL/CUDA function because it does not need to carry the data
set for each invocation, and no size limitation of 1GB which is maximum length of the varlena.
SSD-to-GPU Direct SQL Execution (1/3)
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)6
Pre-processing of SQL workloads to drop unnecessary rows prior to the data loading to CPU
GPU code generation
PG-Strom automatically
generate CUDA code. It
looks like a transparent
acceleration from the
standpoint of users.
SQL execution on GPU
It loads data blocks of
PostgreSQL to GPU using
peer-to-peer DMA, then
drops unnecessary data
with parallel SQL execution
by GPU.
SQL execution on CPU
CPU runs pre-processed
data set; that is much
smaller than the original.
Eventually, it looks like
GPU accelerated I/O also.
PCIe Bus
NVMe SSD GPU
SSD-to-GPU P2P DMA
(NVMe-Strom driver)
WHERE-clause
JOIN
GROUP BY
Large PostgreSQL
Tables
PostgreSQL
Data Blocks
SSD-to-GPU Direct SQL
Once data blocks are loaded to GPU,
we can process SQL workloads using
thousands cores of GPU. It reduces
the amount of data to be loaded and
executed by CPU; looks like I/O
performance acceleration.
Traditional data flow
Even if unnecessary records, only CPU
can determine whether these are
necessary or not. So, we have to move
any records including junks.
SELECT cat, count(*), avg(X)
FROM t0 JOIN t1 ON t0.id = t1.id
WHERE YMD >= 20120701
GROUP BY cat;
SQL optimization stage
SQL execution stage
SQL-to-GPU
Program
Generator
GPU binary
Just-in-time
Compile
SSD-to-GPU Direct SQL Execution (2/3)
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)7
0
1000
2000
3000
4000
5000
6000
7000
8000
Q1-1 Q1-2 Q1-3 Q2-1 Q2-2 Q2-3 Q3-1 Q3-2 Q3-3 Q3-4 Q4-1 Q4-2 Q4-3
QueryExecutionThroughput[MB/s]
Star Schema Benchmark results on NVMe-SSDx3 with md-raid0
PostgreSQL v10.3 PG-Strom v2.0
This result shows query execution throughput on 13 different queries of the Star Schema Benchmark. Host system mounts just 192GB
memory but database size to be scanned is 353GB. So, it is quite i/o intensive workload.
PG-Strom with SSD-to-GPU Direct SQL Execution shows 7.2-7.7GB/s throughput which is more than x3.5 faster than the vanilla PostgreSQL
for large scale batch processing. The throughput is calculated by (database size) / (query response time). So, average query response time of
PG-Strom is later half of the 40s, and PostgreSQL is between 200s to 300s.
Benchmark environment:
Server: Supermicro 1019GP-TT, CPU: Intel Xeon Gold 6126T (2.6GHz, 12C), RAM: 192GB, GPU: NVIDIA Tesla P40 (3840C, 24GB),
SSD: Intel DC P4600 (HHHL, 2.0TB) x3, OS: CentOS 7.4, SW: CUDA 9.1, PostgreSQL v10.3, PG-Strom v2.0
Tesla GPU
NVIDIA
CUDA
Toolkit
SSD-to-GPU Direct SQL Execution (3/3)
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)8
Filesystem
(ext4, xfs)
nvme driver (inbox)
nvme_strom
kernel
module
NVMe SSD drives
commodity x86_64 hardware
NVIDIA GPUDirect RDMA
NVIDIA
kernel
driver
PostgreSQL
pg_strom
extension
read(2) ioctl(2)
Hardware
Layer
Operating
System
Software
Layer
Database
Software
Layer
Application Software
SQL Interface
I/O path based on
normal filesystem
I/O path based on
SSD-to-GPU Direct SQL Execution
SSD-to-GPU Direct SQL Execution is
a technology built on top of NVIDIA
GPUDirect RDMA which allows P2P
DMA between GPU and 3rd party PCIe
devices.
The nvme_strom kernel module
intermediates P2P DMA from NVMe-
SSD to Tesla GPU [*1].
Once GPU accelerated query execution
plan gets chosen by the query optimizer,
then PG-Strom calls ioctl(2) to deliver
the command of SSD-to-GPU Direct SQL
Execution to nvme_strom in the kernel
space.
This driver has small interaction with
filesystem to convert file descriptor +
file offset to block numbers on the
device. So, only limited filesystems
(Ext4, XFS) are now supported.
You can also use striping of NVMe-SSD
using md-raid0 for more throughput,
however, it is a feature of commercial
subscription. Please contact HeteroDB,
if you need multi-SSDs grade
performance.
[*1] NVIDIA GPUDirect RDMA is available on
only Tesla or Quadro, not GeForce.
In-memory Columnar Cache
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)9
PostgreSQL
heap buffer
(row-format)
In-memory
columnar
cache
(column-format)
GPU kernel
for column
format
GPU kernel
for row
format
GPU kernel
for column
format
GPU kernel
for column
format
GPU kernel
for row
format
Query Execution on GPU device
sequential scan
asynchronous columnar cache builders (background workers)
transactional workloads
UPDATE
cache
invalidation
on write
(Background) Why columnar-format is preferable for GPU
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)10
▌Row-format – random memory access
▌Column-format – coalesced memory access
32bit
Memory transaction width: 256bit
32bit 32bit32bit 32bit 32bit
32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit
Memory transaction width: 256bit
32bits x 8 = 256bits are valid
in the 256bits memory transaction
(usage ratio: 100.0%)
Only 32bits x 1 = 32bits are valid
in the 256bits memory transaction
(usage ratio: 12.5%)
GPU cores
GPU cores
Memory access pattern affects to GPU’s performance so much because of its memory sub-system architecture. GPU has relatively larger
memory transaction width. If multiple co-operating cores simultaneously read continuous items in array, one memory transaction can load
eight 32bit values (if width is 256bit) at once. It fully depends on the data format, and one of the most significant optimization factor.
Row-format tends to put values of a particular column to be scanned on random place, not continuous, thus it is hard to pull out maximum
performance of GPU. Column-format represents a table like as a set of simple array. So, when column-X is referenced in scan, all the values
are located very closely thus tend to fit the coalesced memory access pattern.
GPU computing world
GPU memory store (Gstore_fdw)
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)11
Storage
SQL world
GPU device memory
Foreign Table
(gstore_fdw)
INSERT
UPDATE
DELETE
SELECT
Reference
by Zero-copy
✓ Data format conversion
✓ Data compression
✓ Transaction management
PL/CUDA
user defined
functions
Gstore_fdw provides a set of interface to read/write GPU device memory using standard SQL. You can load bulk data into GPU’s device
memory using INSERT command, and so on. Because all the operations are handled inside PostgreSQL database system, data keep its binary
form (so, no needs to dump as CSV file once and parse by Python script again). SQL is one of the most flexible tool for data management, so
you can load arbitrary data-set from the master table, and apply pre-processing required by machine-learning algorithm on the fly.
One other significant feature is data-collaboration with external programs like Python scripts. GPU device memory can be shared with
external program once identifier of the acquired memory region (just 64bytes token) is exported. It enables to use PostgreSQL as a powerful
data management infrastructure for machine-learning usage. Even though Gstore_fdw now supports only ‘pgstrom’ internal format, we will
support other internal format in the next or future version.
IPC Handle
User written
Scripts
Machine Learning
Framework
IPC Handle
Advanced SQL Infrastructure
PostgreSQL v9.6/v10 support – CPU+GPU Hybrid Parallel
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)13
SeqScan
Part-Agg
HashJoin
SeqScan
Part-Agg
HashJoin
SeqScan
Part-Agg
HashJoin
Gather
Final-Agg
GpuScan
GpuPreAgg
GpuJoin
Gather
Final-Agg
GpuScan
GpuPreAgg
GpuJoin
GpuScan
GpuPreAgg
GpuJoin
result result
CPU Parallel Execution CPU + GPU Hybrid Parallel Execution
CPU parallel execution per process granularity CPU parallel execution per process granularity
PostgreSQL’s worker
process individually
uses GPU for finer
grained granularity.
 Multi-process pulls up GPU usage more efficiently
One epoch-making feature at PostgreSQL v9.6 was parallel query execution based on concurrent multi-processes. It also extended the
custom-scan interface for extensions to support parallel-query. PG-Strom v2.0 was re-designed according to the new interface set, then
it enables GPU aware custom-plan to run on the background worker process.
Heuristically, capacity of single CPU thread is not sufficient to supply enough amount of data stream to GPU. It is usually much narrower
than GPU’s computing capability. So, we can expect CPU parallel execution assists to pull up GPU usage with much higher data supply ratio.
SCAN + JOIN + GROUP BY combined GPU kernel
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)14
CustomScan interface allows extension to inject alternative implementation of query execution plan. If estimated cost is more reasonable
than built-in implementation, query optimizer of PostgreSQL chooses the alternative plan. PG-Strom provides its custom logic for SCAN,
JOIN and GROUP BY workloads, with GPU acceleration.
Usually, JOIN consumes the result of SCAN then generates records as its output. The output of JOIN often performs as input of GROUP BY,
then it generates aggregation.
In case when GpuPreAgg, GpuJoin and GpuScan are continuously executed, we have an opportunity of further optimization by reduction of
the data transfer over PCIe bus. The result of GpuScan can perform as GpuJoin’s input, and the result of GpuJoin can also perform as
GpuPreAgg’s input, if all of them shall be continuously executed. PG-Strom tries to re-use the result buffer of the previous step as input
buffer of the next step. It allows to eliminate the data ping-pong over the PCIe-bus. Once SCAN + JOIN + GROUP BY combined GPU kernel
gets ready, it can run with the most efficient GPU kernel because it does not need data exchange over the query execution plan.
GpuScan
kernel
GpuJoin
kernel
GpuPreAgg
kernel
Agg
(PostgreSQL)
GPU
CPU
Storage
Results
SCAN + JOIN + GROUP BY Combined Kernel
Data
Blocks
Host
Buffer
Host
Buffer
elimination of
data ping-pong
Utilization of demand paging of GPU device memory
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)15
Buffer size estimation is not an easy job for SQL workloads. In general, we cannot know exact number of result rows unless query is not
actually executed, even though table statistics informs us rough estimation through the query planning.
GPU kernel needs result buffer for each operation, and it has to be acquired prior to its execution. It has been a problematic trade-off
because a tight configuration with small margin often leads lack of the result buffer, on the other hands, buffer allocation with large margin
makes unignorable dead space on the GPU device memory.
The recent GPU (Pascal / Volta) supports demand paging of GPU device memory. It assigns physical page frame on demand, thus, unused
region consumes no physical device memory. It means that large margin configuration consumes no dead physical device memory.
It also allows to simplify the code to estimate the size of result buffer and to retry GPU kernel invocation with larger result buffer. These
logics are very complicated and had many potential bugs around the error pass.
PG-Strom v2.0 fully utilized the demand paging of GPU device, thus we could eliminate the problematic code. It also contributed the
stability of the software.
Hash Table t2
Hash Table t1
Data chunk of
table t0
t0
GpuJoin
t1
t2
result buffer
Kepler / Maxwel
 Dead space (!)
Pascal / Volta
 No physical page frames
are not assigned, so harmless
Miscellaneous Improvement
PL/CUDA related enhancement
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)17
All In-database Analytics Scan
Pre-Process
Analytics
Post-ProcessCREATE FUNCTION
my_logic( reggstore, text )
RETURNS matrix
AS $$
$$ LANGUAGE ‘plcuda’;
Custom CUDA C code block
(runs on GPU device) ✓ manually optimized analytics &
machine-learning algorithms
✓ utilization of a few thousands
processor cores
✓ ultra high memory bandwidth
If “reggstore” type is supplied as argument of PL/CUDA function, user defined part of this PL/CUDA function receives this argument as
pointer to the preserved GPU device memory for gstore_fdw. It allows to reference the data preliminary loaded onto the Gstore_fdw.
The #plcuda_include directive is newly supported to include the code which is returned from the specifies function. You can switch the
code to be compiled according to the arguments, not to create multiple but similar variations.
CREATE FUNCTION my_distance(reggstore, text)
RETURNS text
AS $$
if ($2 = “manhattan”)
return “#define dist(X,Y) abs(X-y)”;
if ($2 = “euclid”)
return “#define dist(X,Y) ((X-y)^2)”
$$ ...
#plcuda_include my_distance
② Inclusion of other function’s result
as a part of CUDA C code.
GPU memory
store
(gstore_fdw)
① reggstore argument as reference to gstore_fdw structure
New data type support
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)18
▌Numeric
 float2
▌Network address
 macaddr, inet, cidr
▌Range data types
 int4range, int8range, tsrange, tstzrange, daterange
▌Miscellaneous
 uuid, reggstore
The “float2” data type is implemented by PG-Strom, not a standard built-in data type. It represents half-precision floating point values.
People in machine-learning area often use FP16 for more short representation of matrix; less device memory consumption and higher
computing throughput.
Documentation and Packaging
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)19
Documentation was totally rewritten with markdown that is much easier for timely update than raw HTML based one.
It is now published at http://heterodb.github.io/pg-strom/
RPM packages are also available for RHEL7.x / CentOS 7.x. PG-Strom and related software are available on the HeteroDB Software
Distribution Center (SWDC) at https://heterodb.github.io/swdc/
You can use the SWDC as yum repository source.
PG-Strom official documentation HeteroDB Software Distribution Center
Post-v2.0 Development Roadmap
PostgreSQL v11 support (1/2) – Parallel Append & Multi-GPUs
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)21
 PG10 Restriction: GPUs not close to SSD must be idle during the scan across partition table.
 PG11 allows to scan partitioned children in parallel. It also makes multiple-GPUs active simultaneously.
Parallel
Append
t1 t2 t4t3
GPU1 GPU2
BgWorker BgWorker BgWorker BgWorker
Background worker process that
contains GPU-aware custom-plans
Choose the GPU1 which shares
the same PCIe root complex of
the NVMe-SSD to be scanned.
auto-tuning
based on PCIe
topology
NVIDIA GPUDirect RDMA, is a basis technology of SSD-to-GPU Direct SQL Execution, requires GPU and SSD should share the PCIe root
complex, thus P2P DMA route should not traverse QPI link. It leads a restriction on multi-GPUs configuration with partitioned table.
When background worker scans partitioned child tables across multiple SSDs, GPU-aware custom plan needs to choose its neighbor GPU to
avoid QPI traversal. In other words, the target table for scan determines the GPU to be attached on the background workers.
In PG10, partitioned child tables are sequentially picked up to scan, so we cannot activate more than one GPUs simultaneously because the
secondary GPU will cause QPI traverse on usual hardware configuration (1:CPU – 1:GPU + n:SSDs).
PG11 supports parallel scan across partitioned child tables. It allows individual background worker activate its neighbor GPU for the tables
they are scanning. It enables to utilize multiple GPUs under SSD-to-GPU Direct SQL Execution for larger data processing.
PostgreSQL v11 support (2/2) – In-box distributed query execution
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)22
PCIe I/O Expansion Box
Host System
(x86_64 server)
NVMe SSD
PostgreSQL Tables
PostgreSQL
Data Blocks
Internal
PCIe Switch
SSD-to-GPU P2P DMA
(Large data size)
GPU
WHERE-clause
JOIN
GROUP BY
PCIe over
Ethernet
Pre-processed
small data
Several GB/s
SQL execution
per Box
Several GB/s
SQL execution
per Box
Several GB/s
SQL execution
per Box
Add performance and capacity
NIC / HBA
Performs as just a simple single node
configuration from the standpoint of
applications and administrators
Multi-GPUs capability will expand the opportunity of big data processing for more bigger data set, probably, close to 100TB.
Multiple vendors provide PCIe I/O expansion box solution that allows to install PCIe devices on the physically separated box which is
connected to the host system using fast network. In addition, some of the solution have internal PCIe switch that can route P2P DMA packet
inside of the I/O box. It means PG-Strom handles SSD-to-GPU Direct SQL Execution on the I/O box with little interaction to the host system,
and runs the partial workload on the partitioned table per box in parallel once multi-GPUs capability get supported at PG11.
From the standpoint of applications and administrators, it is just a simple single node configuration even though many GPUs and SSDs are
installed, thus, no need to pay attention for distributed transaction. It makes application design and daily maintenance so simplified.
Other significant features
PG-Strom v2.0 Release Technical Brief (17-Apr-2018)23
▌cuPy data format support of Gstore_fdw
▌BRIN index support
▌Basic PostGIS support
▌NVMe over Fabric support
▌GPU device function that can return varlena datum
▌Semi- / Anti- Join support
▌MVCC visibility checks on the device side
▌Compression support of in-memory columnar cache
See 003: Development Roadmap for more details.
Run! Beyond
the Limitations

More Related Content

What's hot

Expansion cards and slots
Expansion cards and slotsExpansion cards and slots
Expansion cards and slotsJibin Varghese
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)MuntasirMuhit
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPUChetan Gole
 
Differnce of two processors
Differnce of two processorsDiffernce of two processors
Differnce of two processorshashim102
 
Expansion slot report
Expansion slot reportExpansion slot report
Expansion slot reportiyinyan
 
10. GPU - Video Card (Display, Graphics, VGA)
10. GPU - Video Card (Display, Graphics, VGA)10. GPU - Video Card (Display, Graphics, VGA)
10. GPU - Video Card (Display, Graphics, VGA)Akhila Dakshina
 
Trials of antiplatelet drugs.
Trials of antiplatelet drugs.Trials of antiplatelet drugs.
Trials of antiplatelet drugs.Sumedh Ramteke
 
[IGC2018] AMD Don Woligroski - WHY Ryzen
[IGC2018] AMD Don Woligroski - WHY Ryzen[IGC2018] AMD Don Woligroski - WHY Ryzen
[IGC2018] AMD Don Woligroski - WHY Ryzen강 민우
 
Storage (Hard disk drive)
Storage (Hard disk drive)Storage (Hard disk drive)
Storage (Hard disk drive)0949778108
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentationJosiah Lund
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit pptNitesh Dubey
 
Embedded Development on ESP32 - FEKT VUT - UREL
Embedded Development on ESP32 - FEKT VUT - URELEmbedded Development on ESP32 - FEKT VUT - UREL
Embedded Development on ESP32 - FEKT VUT - URELJuraj Michálek
 

What's hot (20)

GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
Expansion cards and slots
Expansion cards and slotsExpansion cards and slots
Expansion cards and slots
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)
 
Gpu
GpuGpu
Gpu
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
 
Differnce of two processors
Differnce of two processorsDiffernce of two processors
Differnce of two processors
 
Expansion slot report
Expansion slot reportExpansion slot report
Expansion slot report
 
CPU vs GPU Comparison
CPU  vs GPU ComparisonCPU  vs GPU Comparison
CPU vs GPU Comparison
 
Graphic card
Graphic cardGraphic card
Graphic card
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Lecture14
Lecture14Lecture14
Lecture14
 
Hdmi and vga cables
Hdmi and vga cablesHdmi and vga cables
Hdmi and vga cables
 
10. GPU - Video Card (Display, Graphics, VGA)
10. GPU - Video Card (Display, Graphics, VGA)10. GPU - Video Card (Display, Graphics, VGA)
10. GPU - Video Card (Display, Graphics, VGA)
 
Trials of antiplatelet drugs.
Trials of antiplatelet drugs.Trials of antiplatelet drugs.
Trials of antiplatelet drugs.
 
[IGC2018] AMD Don Woligroski - WHY Ryzen
[IGC2018] AMD Don Woligroski - WHY Ryzen[IGC2018] AMD Don Woligroski - WHY Ryzen
[IGC2018] AMD Don Woligroski - WHY Ryzen
 
Storage (Hard disk drive)
Storage (Hard disk drive)Storage (Hard disk drive)
Storage (Hard disk drive)
 
Gpu presentation
Gpu presentationGpu presentation
Gpu presentation
 
Cpu
CpuCpu
Cpu
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
Embedded Development on ESP32 - FEKT VUT - UREL
Embedded Development on ESP32 - FEKT VUT - URELEmbedded Development on ESP32 - FEKT VUT - UREL
Embedded Development on ESP32 - FEKT VUT - UREL
 

Similar to PG-Strom v2.0 Technical Brief (17-Apr-2018)

20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA UnconferenceKohei KaiGai
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_ProcessingKohei KaiGai
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGaiKohei KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...Equnix Business Solutions
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_ENKohei KaiGai
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...Equnix Business Solutions
 
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드문기 박
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
High Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionHigh Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionMattBethel1
 
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...journalBEEI
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~Kohei KaiGai
 

Similar to PG-Strom v2.0 Technical Brief (17-Apr-2018) (20)

20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
 
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
High Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionHigh Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data Production
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 

More from Kohei KaiGai

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_HistoryKohei KaiGai
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_APIKohei KaiGai
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrowKohei KaiGai
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_countKohei KaiGai
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0Kohei KaiGai
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCacheKohei KaiGai
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_IndexKohei KaiGai
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGISKohei KaiGai
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGISKohei KaiGai
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_OnlineKohei KaiGai
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdwKohei KaiGai
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_FdwKohei KaiGai
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_TokyoKohei KaiGai
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.JapanKohei KaiGai
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_BetaKohei KaiGai
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStromKohei KaiGai
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStromKohei KaiGai
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdwKohei KaiGai
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_FdwKohei KaiGai
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - EnglishKohei KaiGai
 

More from Kohei KaiGai (20)

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_Online
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStrom
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStrom
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 

Recently uploaded

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 

PG-Strom v2.0 Technical Brief (17-Apr-2018)

  • 1. PG-Strom v2.0 Release Technical Brief (17-Apr-2018) PG-Strom Development Team <pgstrom@heterodb.com>
  • 2. What is PG-Strom? PG-Strom v2.0 Release Technical Brief (17-Apr-2018)2 GPU [For I/O intensive workloads] SSD-to-GPU Direct SQL Execution In-memory Columnar Cache SCAN+JOIN+GROUP BY combined GPU kernel [For Advanced Analytics workloads] CPU+GPU hybrid parallel PL/CUDA user defined function GPU memory store (Gstore_fdw) off-loading ✓ Designed as PostgreSQL extension ✓ Transparent SQL acceleration ✓ Cooperation with fully transactional database management system ✓ Various comprehensive tools and applications for PostgreSQL PG-Strom: an extension module to accelerate analytic SQL workloads using GPU. [GPU’s characteristics] ✓ Several thousands of processor cores per device ✓ Nearly terabytes per second memory bandwidth ✓ Much higher cost performance ratio PG-Strom is an open source extension module for PostgreSQL (v9.6 or later) to accelerate analytic queries, has been developed and incrementally improved since 2012. It provides PostgreSQL alternative query execution plan for SCAN, JOIN and GROUP BY workloads. Once optimizer chose the custom plans which use GPU for SQL execution, PG-Strom constructs relative GPU code on the fly. It means PostgreSQL uses GPU aware execution engine only when it makes sense, so here is no downside for transactional workloads. PL/CUDA is a derivational feature that allows manual optimization with user defined function (UDF) by CUDA C; which shall be executed on GPU device. The PG-Strom v2.0 added many features to enhance these basis for more performance and wider use scenarios.
  • 3. PG-Strom v2.0 features highlight PG-Strom v2.0 Release Technical Brief (17-Apr-2018)3 ▌Storage Enhancement  SSD-to-GPU Direct SQL Execution  In-memory Columnar Cache  GPU memory store (gstore_fdw) ▌Advanced SQL Infrastructure  PostgreSQL v9.6/v10 support – CPU+GPU Hybrid Parallel  SCAN+JOIN+GROUP BY combined GPU kernel  Utilization of demand paging of GPU device memory ▌Miscellaneous  PL/CUDA related enhancement  New data type support  Documentation and Packaging
  • 5. Storage enhancement for each layer PG-Strom v2.0 Release Technical Brief (17-Apr-2018)5 Storage Layer GPU device memory Band: ~900GB/s Capacity: ~32GB Host memory Band: ~128GB/s Capacity: ~1.5TB NVMe-SSD Band: ~10GB/s Capacity: 10TB and more Hot Storage Cold Storage Related PG-Strom Feature GPU memory store (gstore_fdw) In-memory columnar cache SSD-to-GPU Direct SQL Execution In general, key of performance is not only number of cores and its clock, but data throughput to be loaded for the processors also. Individual storage layer has its own characteristics, thus PG-Strom provides several options to optimize the supply of data. SSD-to-GPU Direct SQL Execution is unique and characteristic feature of PG-Strom. It directly loads the data blocks of PostgreSQL to GPU, and runs SQL workloads to reduce amount of data to be processed by CPU prior to the arrival. Its data transfer bypasses operating system software stacks, thus allows to pull out nearly wired performance of the hardware. In-memory columnar cache allows to keep data blocks in the optimal format for GPU to compute and transfer over the PCIe bus. GPU memory store (gstore_fdw) allows preload on the GPU device memory using standard SQL statement. It is an ideal data location for PL/CUDA function because it does not need to carry the data set for each invocation, and no size limitation of 1GB which is maximum length of the varlena.
  • 6. SSD-to-GPU Direct SQL Execution (1/3) PG-Strom v2.0 Release Technical Brief (17-Apr-2018)6 Pre-processing of SQL workloads to drop unnecessary rows prior to the data loading to CPU GPU code generation PG-Strom automatically generate CUDA code. It looks like a transparent acceleration from the standpoint of users. SQL execution on GPU It loads data blocks of PostgreSQL to GPU using peer-to-peer DMA, then drops unnecessary data with parallel SQL execution by GPU. SQL execution on CPU CPU runs pre-processed data set; that is much smaller than the original. Eventually, it looks like GPU accelerated I/O also. PCIe Bus NVMe SSD GPU SSD-to-GPU P2P DMA (NVMe-Strom driver) WHERE-clause JOIN GROUP BY Large PostgreSQL Tables PostgreSQL Data Blocks SSD-to-GPU Direct SQL Once data blocks are loaded to GPU, we can process SQL workloads using thousands cores of GPU. It reduces the amount of data to be loaded and executed by CPU; looks like I/O performance acceleration. Traditional data flow Even if unnecessary records, only CPU can determine whether these are necessary or not. So, we have to move any records including junks. SELECT cat, count(*), avg(X) FROM t0 JOIN t1 ON t0.id = t1.id WHERE YMD >= 20120701 GROUP BY cat; SQL optimization stage SQL execution stage SQL-to-GPU Program Generator GPU binary Just-in-time Compile
  • 7. SSD-to-GPU Direct SQL Execution (2/3) PG-Strom v2.0 Release Technical Brief (17-Apr-2018)7 0 1000 2000 3000 4000 5000 6000 7000 8000 Q1-1 Q1-2 Q1-3 Q2-1 Q2-2 Q2-3 Q3-1 Q3-2 Q3-3 Q3-4 Q4-1 Q4-2 Q4-3 QueryExecutionThroughput[MB/s] Star Schema Benchmark results on NVMe-SSDx3 with md-raid0 PostgreSQL v10.3 PG-Strom v2.0 This result shows query execution throughput on 13 different queries of the Star Schema Benchmark. Host system mounts just 192GB memory but database size to be scanned is 353GB. So, it is quite i/o intensive workload. PG-Strom with SSD-to-GPU Direct SQL Execution shows 7.2-7.7GB/s throughput which is more than x3.5 faster than the vanilla PostgreSQL for large scale batch processing. The throughput is calculated by (database size) / (query response time). So, average query response time of PG-Strom is later half of the 40s, and PostgreSQL is between 200s to 300s. Benchmark environment: Server: Supermicro 1019GP-TT, CPU: Intel Xeon Gold 6126T (2.6GHz, 12C), RAM: 192GB, GPU: NVIDIA Tesla P40 (3840C, 24GB), SSD: Intel DC P4600 (HHHL, 2.0TB) x3, OS: CentOS 7.4, SW: CUDA 9.1, PostgreSQL v10.3, PG-Strom v2.0
  • 8. Tesla GPU NVIDIA CUDA Toolkit SSD-to-GPU Direct SQL Execution (3/3) PG-Strom v2.0 Release Technical Brief (17-Apr-2018)8 Filesystem (ext4, xfs) nvme driver (inbox) nvme_strom kernel module NVMe SSD drives commodity x86_64 hardware NVIDIA GPUDirect RDMA NVIDIA kernel driver PostgreSQL pg_strom extension read(2) ioctl(2) Hardware Layer Operating System Software Layer Database Software Layer Application Software SQL Interface I/O path based on normal filesystem I/O path based on SSD-to-GPU Direct SQL Execution SSD-to-GPU Direct SQL Execution is a technology built on top of NVIDIA GPUDirect RDMA which allows P2P DMA between GPU and 3rd party PCIe devices. The nvme_strom kernel module intermediates P2P DMA from NVMe- SSD to Tesla GPU [*1]. Once GPU accelerated query execution plan gets chosen by the query optimizer, then PG-Strom calls ioctl(2) to deliver the command of SSD-to-GPU Direct SQL Execution to nvme_strom in the kernel space. This driver has small interaction with filesystem to convert file descriptor + file offset to block numbers on the device. So, only limited filesystems (Ext4, XFS) are now supported. You can also use striping of NVMe-SSD using md-raid0 for more throughput, however, it is a feature of commercial subscription. Please contact HeteroDB, if you need multi-SSDs grade performance. [*1] NVIDIA GPUDirect RDMA is available on only Tesla or Quadro, not GeForce.
  • 9. In-memory Columnar Cache PG-Strom v2.0 Release Technical Brief (17-Apr-2018)9 PostgreSQL heap buffer (row-format) In-memory columnar cache (column-format) GPU kernel for column format GPU kernel for row format GPU kernel for column format GPU kernel for column format GPU kernel for row format Query Execution on GPU device sequential scan asynchronous columnar cache builders (background workers) transactional workloads UPDATE cache invalidation on write
  • 10. (Background) Why columnar-format is preferable for GPU PG-Strom v2.0 Release Technical Brief (17-Apr-2018)10 ▌Row-format – random memory access ▌Column-format – coalesced memory access 32bit Memory transaction width: 256bit 32bit 32bit32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit Memory transaction width: 256bit 32bits x 8 = 256bits are valid in the 256bits memory transaction (usage ratio: 100.0%) Only 32bits x 1 = 32bits are valid in the 256bits memory transaction (usage ratio: 12.5%) GPU cores GPU cores Memory access pattern affects to GPU’s performance so much because of its memory sub-system architecture. GPU has relatively larger memory transaction width. If multiple co-operating cores simultaneously read continuous items in array, one memory transaction can load eight 32bit values (if width is 256bit) at once. It fully depends on the data format, and one of the most significant optimization factor. Row-format tends to put values of a particular column to be scanned on random place, not continuous, thus it is hard to pull out maximum performance of GPU. Column-format represents a table like as a set of simple array. So, when column-X is referenced in scan, all the values are located very closely thus tend to fit the coalesced memory access pattern.
  • 11. GPU computing world GPU memory store (Gstore_fdw) PG-Strom v2.0 Release Technical Brief (17-Apr-2018)11 Storage SQL world GPU device memory Foreign Table (gstore_fdw) INSERT UPDATE DELETE SELECT Reference by Zero-copy ✓ Data format conversion ✓ Data compression ✓ Transaction management PL/CUDA user defined functions Gstore_fdw provides a set of interface to read/write GPU device memory using standard SQL. You can load bulk data into GPU’s device memory using INSERT command, and so on. Because all the operations are handled inside PostgreSQL database system, data keep its binary form (so, no needs to dump as CSV file once and parse by Python script again). SQL is one of the most flexible tool for data management, so you can load arbitrary data-set from the master table, and apply pre-processing required by machine-learning algorithm on the fly. One other significant feature is data-collaboration with external programs like Python scripts. GPU device memory can be shared with external program once identifier of the acquired memory region (just 64bytes token) is exported. It enables to use PostgreSQL as a powerful data management infrastructure for machine-learning usage. Even though Gstore_fdw now supports only ‘pgstrom’ internal format, we will support other internal format in the next or future version. IPC Handle User written Scripts Machine Learning Framework IPC Handle
  • 13. PostgreSQL v9.6/v10 support – CPU+GPU Hybrid Parallel PG-Strom v2.0 Release Technical Brief (17-Apr-2018)13 SeqScan Part-Agg HashJoin SeqScan Part-Agg HashJoin SeqScan Part-Agg HashJoin Gather Final-Agg GpuScan GpuPreAgg GpuJoin Gather Final-Agg GpuScan GpuPreAgg GpuJoin GpuScan GpuPreAgg GpuJoin result result CPU Parallel Execution CPU + GPU Hybrid Parallel Execution CPU parallel execution per process granularity CPU parallel execution per process granularity PostgreSQL’s worker process individually uses GPU for finer grained granularity.  Multi-process pulls up GPU usage more efficiently One epoch-making feature at PostgreSQL v9.6 was parallel query execution based on concurrent multi-processes. It also extended the custom-scan interface for extensions to support parallel-query. PG-Strom v2.0 was re-designed according to the new interface set, then it enables GPU aware custom-plan to run on the background worker process. Heuristically, capacity of single CPU thread is not sufficient to supply enough amount of data stream to GPU. It is usually much narrower than GPU’s computing capability. So, we can expect CPU parallel execution assists to pull up GPU usage with much higher data supply ratio.
  • 14. SCAN + JOIN + GROUP BY combined GPU kernel PG-Strom v2.0 Release Technical Brief (17-Apr-2018)14 CustomScan interface allows extension to inject alternative implementation of query execution plan. If estimated cost is more reasonable than built-in implementation, query optimizer of PostgreSQL chooses the alternative plan. PG-Strom provides its custom logic for SCAN, JOIN and GROUP BY workloads, with GPU acceleration. Usually, JOIN consumes the result of SCAN then generates records as its output. The output of JOIN often performs as input of GROUP BY, then it generates aggregation. In case when GpuPreAgg, GpuJoin and GpuScan are continuously executed, we have an opportunity of further optimization by reduction of the data transfer over PCIe bus. The result of GpuScan can perform as GpuJoin’s input, and the result of GpuJoin can also perform as GpuPreAgg’s input, if all of them shall be continuously executed. PG-Strom tries to re-use the result buffer of the previous step as input buffer of the next step. It allows to eliminate the data ping-pong over the PCIe-bus. Once SCAN + JOIN + GROUP BY combined GPU kernel gets ready, it can run with the most efficient GPU kernel because it does not need data exchange over the query execution plan. GpuScan kernel GpuJoin kernel GpuPreAgg kernel Agg (PostgreSQL) GPU CPU Storage Results SCAN + JOIN + GROUP BY Combined Kernel Data Blocks Host Buffer Host Buffer elimination of data ping-pong
  • 15. Utilization of demand paging of GPU device memory PG-Strom v2.0 Release Technical Brief (17-Apr-2018)15 Buffer size estimation is not an easy job for SQL workloads. In general, we cannot know exact number of result rows unless query is not actually executed, even though table statistics informs us rough estimation through the query planning. GPU kernel needs result buffer for each operation, and it has to be acquired prior to its execution. It has been a problematic trade-off because a tight configuration with small margin often leads lack of the result buffer, on the other hands, buffer allocation with large margin makes unignorable dead space on the GPU device memory. The recent GPU (Pascal / Volta) supports demand paging of GPU device memory. It assigns physical page frame on demand, thus, unused region consumes no physical device memory. It means that large margin configuration consumes no dead physical device memory. It also allows to simplify the code to estimate the size of result buffer and to retry GPU kernel invocation with larger result buffer. These logics are very complicated and had many potential bugs around the error pass. PG-Strom v2.0 fully utilized the demand paging of GPU device, thus we could eliminate the problematic code. It also contributed the stability of the software. Hash Table t2 Hash Table t1 Data chunk of table t0 t0 GpuJoin t1 t2 result buffer Kepler / Maxwel  Dead space (!) Pascal / Volta  No physical page frames are not assigned, so harmless
  • 17. PL/CUDA related enhancement PG-Strom v2.0 Release Technical Brief (17-Apr-2018)17 All In-database Analytics Scan Pre-Process Analytics Post-ProcessCREATE FUNCTION my_logic( reggstore, text ) RETURNS matrix AS $$ $$ LANGUAGE ‘plcuda’; Custom CUDA C code block (runs on GPU device) ✓ manually optimized analytics & machine-learning algorithms ✓ utilization of a few thousands processor cores ✓ ultra high memory bandwidth If “reggstore” type is supplied as argument of PL/CUDA function, user defined part of this PL/CUDA function receives this argument as pointer to the preserved GPU device memory for gstore_fdw. It allows to reference the data preliminary loaded onto the Gstore_fdw. The #plcuda_include directive is newly supported to include the code which is returned from the specifies function. You can switch the code to be compiled according to the arguments, not to create multiple but similar variations. CREATE FUNCTION my_distance(reggstore, text) RETURNS text AS $$ if ($2 = “manhattan”) return “#define dist(X,Y) abs(X-y)”; if ($2 = “euclid”) return “#define dist(X,Y) ((X-y)^2)” $$ ... #plcuda_include my_distance ② Inclusion of other function’s result as a part of CUDA C code. GPU memory store (gstore_fdw) ① reggstore argument as reference to gstore_fdw structure
  • 18. New data type support PG-Strom v2.0 Release Technical Brief (17-Apr-2018)18 ▌Numeric  float2 ▌Network address  macaddr, inet, cidr ▌Range data types  int4range, int8range, tsrange, tstzrange, daterange ▌Miscellaneous  uuid, reggstore The “float2” data type is implemented by PG-Strom, not a standard built-in data type. It represents half-precision floating point values. People in machine-learning area often use FP16 for more short representation of matrix; less device memory consumption and higher computing throughput.
  • 19. Documentation and Packaging PG-Strom v2.0 Release Technical Brief (17-Apr-2018)19 Documentation was totally rewritten with markdown that is much easier for timely update than raw HTML based one. It is now published at http://heterodb.github.io/pg-strom/ RPM packages are also available for RHEL7.x / CentOS 7.x. PG-Strom and related software are available on the HeteroDB Software Distribution Center (SWDC) at https://heterodb.github.io/swdc/ You can use the SWDC as yum repository source. PG-Strom official documentation HeteroDB Software Distribution Center
  • 21. PostgreSQL v11 support (1/2) – Parallel Append & Multi-GPUs PG-Strom v2.0 Release Technical Brief (17-Apr-2018)21  PG10 Restriction: GPUs not close to SSD must be idle during the scan across partition table.  PG11 allows to scan partitioned children in parallel. It also makes multiple-GPUs active simultaneously. Parallel Append t1 t2 t4t3 GPU1 GPU2 BgWorker BgWorker BgWorker BgWorker Background worker process that contains GPU-aware custom-plans Choose the GPU1 which shares the same PCIe root complex of the NVMe-SSD to be scanned. auto-tuning based on PCIe topology NVIDIA GPUDirect RDMA, is a basis technology of SSD-to-GPU Direct SQL Execution, requires GPU and SSD should share the PCIe root complex, thus P2P DMA route should not traverse QPI link. It leads a restriction on multi-GPUs configuration with partitioned table. When background worker scans partitioned child tables across multiple SSDs, GPU-aware custom plan needs to choose its neighbor GPU to avoid QPI traversal. In other words, the target table for scan determines the GPU to be attached on the background workers. In PG10, partitioned child tables are sequentially picked up to scan, so we cannot activate more than one GPUs simultaneously because the secondary GPU will cause QPI traverse on usual hardware configuration (1:CPU – 1:GPU + n:SSDs). PG11 supports parallel scan across partitioned child tables. It allows individual background worker activate its neighbor GPU for the tables they are scanning. It enables to utilize multiple GPUs under SSD-to-GPU Direct SQL Execution for larger data processing.
  • 22. PostgreSQL v11 support (2/2) – In-box distributed query execution PG-Strom v2.0 Release Technical Brief (17-Apr-2018)22 PCIe I/O Expansion Box Host System (x86_64 server) NVMe SSD PostgreSQL Tables PostgreSQL Data Blocks Internal PCIe Switch SSD-to-GPU P2P DMA (Large data size) GPU WHERE-clause JOIN GROUP BY PCIe over Ethernet Pre-processed small data Several GB/s SQL execution per Box Several GB/s SQL execution per Box Several GB/s SQL execution per Box Add performance and capacity NIC / HBA Performs as just a simple single node configuration from the standpoint of applications and administrators Multi-GPUs capability will expand the opportunity of big data processing for more bigger data set, probably, close to 100TB. Multiple vendors provide PCIe I/O expansion box solution that allows to install PCIe devices on the physically separated box which is connected to the host system using fast network. In addition, some of the solution have internal PCIe switch that can route P2P DMA packet inside of the I/O box. It means PG-Strom handles SSD-to-GPU Direct SQL Execution on the I/O box with little interaction to the host system, and runs the partial workload on the partitioned table per box in parallel once multi-GPUs capability get supported at PG11. From the standpoint of applications and administrators, it is just a simple single node configuration even though many GPUs and SSDs are installed, thus, no need to pay attention for distributed transaction. It makes application design and daily maintenance so simplified.
  • 23. Other significant features PG-Strom v2.0 Release Technical Brief (17-Apr-2018)23 ▌cuPy data format support of Gstore_fdw ▌BRIN index support ▌Basic PostGIS support ▌NVMe over Fabric support ▌GPU device function that can return varlena datum ▌Semi- / Anti- Join support ▌MVCC visibility checks on the device side ▌Compression support of in-memory columnar cache See 003: Development Roadmap for more details.