20181025_pgconfeu_lt_gstorefdw

PostgreSQL as a Machine-Learning Platform
〜Gstore_fdw and data collaboration〜
HeteroDB,Inc
Chief Architect & CEO
KaiGai Kohei <kaigai@heterodb.com>

about HeteroDB
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-2
Corporate overview
 Name HeteroDB,Inc
 Established 4th-Jul-2017
 Headcount 2 (KaiGai and Kashiwagi)
 Location Shinagawa, Tokyo, Japan
 Businesses Sales of accelerated database product
Technical consulting on GPU&DB region
By the heterogeneous-computing technology on the database area,
we provides a useful, fast and cost-effective data analytics platform
for all the people who need the power of analytics.
CEO Profile
 KaiGai Kohei – He has contributed for PostgreSQL and Linux kernel
development in the OSS community more than ten years, especially,
for security and database federation features of PostgreSQL.
 Award of “Genius Programmer” by IPA MITOH program (2007)
 The top-5 posters finalist at GPU Technology Conference 2017.

Friday 11:50 - 12:40
NVME and GPU accelerates
PostgreSQL
Features of RDBMS
✓ High-availability / Clustering
✓ DB administration and backup
✓ Transaction control
✓ BI and visualization
➔ We can use the products that
support PostgreSQL as-is.
Core technology – PG-Strom
PG-Strom: An extension module for PostgreSQL, to accelerate SQL
workloads by the thousands cores and wide-band memory of GPU.
GPU
Big-data Analytics
PG-Strom
Machine-learning & Statistics

GPU’s characteristics - mostly as a computing accelerator
Over 10years history in HPC, then massive popularization in Machine-Learning
NVIDIA Tesla V100
Super Computer
(TITEC; TSUBAME3.0) Computer Graphics Machine-Learning
How PG-Strom utilizes the power of GPU for in-database analytics?
Simulation

2 Years Before...

PGconf.SV 2016 at SunFrancisco
Acceleration of drug-discovery workloads
with in-database analytics approach
using PL/CUDA user defined function

PL/CUDA Used Defined Function
Result
Scan
Pre-Process
Analytics
Post-ProcessCREATE FUNCTION
my_logic(int, real[], real[])
RETURNS real[]
AS $$
$$ LANGUAGE ‘plcuda’;
Custom CUDA C code block
(runs on GPU device)
▌Don’t export the dataset for analytics using external software.
▌All you pull out from the database is “result” of analytics.

PL/CUDA works on Drug-Discovery workloads (1/2)
Database chemical
compounds set
(D; 10M records scale)
Query chemical
compounds set
(Q; ~1000 records scale)
Calculation of
their similarity
Target Protein “similar compounds” will
have higher probability of active
10 billions
combination
Similarity-search on chemical compounds is researcher’s daily job.

30.25
145.29
295.95
1503.31
3034.94
13.00 13.23 13.59 16.01 19.13
0
500
1000
1500
2000
2500
3000
3500
10 50 100 500 1000
QueryResponseTime[sec]
Number of Query Compounds [Q]
Similarity search of chemical compounds by k-NN method (k=3, D=10M)
CPU(E5-2670v3) GTX1080
Yes, GPU accelerates the workloads more than x150 time faster!
x150 times
shorter
response!

But...

30.25
145.29
295.95
1503.31
3034.94
13.00 13.23 13.59 16.01 19.13
0
500
1000
1500
2000
2500
3000
3500
10 50 100 500 1000
CPU(E5-2670v3) GTX1080
CPU version consumes time according to the scale of calculation
x100 times larger execution time
for x100 times larger calculation amount

30.25
145.29
295.95
1503.31
3034.94
13.00 13.23 13.59 16.01 19.13
0
500
1000
1500
2000
2500
3000
3500
10 50 100 500 1000
CPU(E5-2670v3) GTX1080
Why GPU version takes relatively longer time for the small workloads
Less than x2 times larger
execution time
for x100 times larger
calculation amount

Invocation of PL/CUDA functions
PREPARE knn_sim_rand_10m_gpu_v2(int) -- arg1:@k-value
AS
SELECT row_number() OVER (),
fp.name,
similarity
FROM (SELECT float4_as_int4(key_id) key_id, similarity
FROM matrix_unnest(
(SELECT rbind( knn_gpu_similarity($1,Q.matrix,
D.matrix))
FROM (SELECT cbind(array_matrix(id),
array_matrix(bitmap)) matrix
FROM finger_print_query) Q,
(SELECT matrix
FROM finger_print_10m_matrix) D
)
) AS sim(key_id real, similarity real)
ORDER BY similarity DESC) sim,
finger_print_10m fp
WHERE fp.id = sim.key_id
LIMIT 1000;
Time consumption by argument setup
(10～11sec per invocation)

idea:
Keep the dataset on
GPU device memory

Gstore_fdw
GPU memory store

Gstore_fdw - FDW on behalf of GPU device memory region
GPU world
Storage
SQL world
GPU device memory
Foreign Table
(gstore_fdw)
INSERT
UPDATE
DELETE
SELECT
Reference
by Zero-copy
✓ Data Format Conversion
✓ Data Compression
✓ Transaction Controls
PL/CUDA
User Defined
Function

Gstore_fdw manages persistent device memory (1/4)
CREATE FOREIGN TABLE ft (
id int,
x0 real,
x1 real,
x2 real,
x3 real,
x4 real,
x5 real,
x6 real,
x7 real,
x8 real,
x9 real
) SERVER gstore_fdw
OPTIONS (pinning '0', format 'pgstrom');

postgres=# INSERT INTO ft
(SELECT x, 100*random(), 100*random(), 100*random(),
100*random(), 100*random(), 100*random(),
100*random(), 100*random(), 100*random(),
100*random()
FROM generate_series(1,10000000) x);
LOG: alloc: preserved memory 440000320 bytes
INSERT 0 10000000
Acquired 440MB of GPU device memory, then
load the written data chunk to GPU device

Before INSERT
$ nvidia-smi
Sun Nov 12 00:03:30 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:02:00.0 Off | 0 |
| N/A 36C P0 52W / 250W | 171MiB / 22912MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12438 C ...bgworker: PG-Strom GPU memory keeper 161MiB |
+-----------------------------------------------------------------------------+

After INSERT
$ nvidia-smi
Sun Nov 12 00:06:01 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:02:00.0 Off | 0 |
| N/A 36C P0 51W / 250W | 591MiB / 22912MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12438 C ...bgworker: PG-Strom GPU memory keeper 581MiB |
+-----------------------------------------------------------------------------+
Preserved GPU device memory
even if PostgreSQL session closed

By the way...

GPU device memory
can perform
like a shared memory

CUDA Driver API - Interprocess device memory handling
CUresult cuIpcGetMemHandle(CUipcMemHandle *pHandle,
CUdeviceptr dptr);
Gets an interprocess memory handle for an existing device memory allocation.
CUresult cuIpcOpenMemHandle(CUdeviceptr* pdptr,
CUipcMemHandle handle,
unsigned int flags )
Opens an interprocess memory handle exported from another process and returns a
device pointer usable in the local process.
Gets a unique identifier of GPU device memory at the owner process
Opens the GPU device memory using the unique identifier at the other process

PostgreSQL as a Machine-Learning Platform
GPU world
Inter-Process
Data Collaboration
Storage
SQL world
GPU device
memory
Foreign Table
(gstore_fdw)
INSERT
UPDATE
DELETE
SELECT
User’s Custrom
Python Scripts
IPC Handle
IPC Handle
Internal data structure that
is compatible to Python’s
analytics libraries
ndarray data
ndarray data
Gets a unique identifier of GPU device memory at the owner process

Step-1. Export IPC handle of GPU device memory
postgres=# select gstore_export_ipchandle('ft’);
gstore_export_ipchandle
-------------------------------------------------------------
¥x006b73020000000060110000000000000075020000000000000020000000
0000000000000000000000020000000000005b000000000000002000d0c1ff
00005c
(1 row)
CUDA runtime returns a unique identifier with 64bytes length

Step-2. Open IPC handle on your Python script
#!/usr/bin/python
import psycopg2
import pystrom
# connect to PostgreSQL server
conn = psycopg2.connect("host=localhost dbname=postgres")
# Get IPC handle of the foreign-table ‘ft’
curr = conn.cursor()
curr.execute("select gstore_export_ipchandle('ft')::bytea")
row = curr.fetchone()
conn.close()
# Get cupy.ndarray object; 2D-matrix with float4
# which is consists of column ‘x’, ’y’ and ‘z’
X = pystrom.ipc_import(row[0], ['x','y','z'])

Step-3. Data is now already loaded on GPU. Do analytics as you like.
At Python script:
>>> X
array([[0.05267062, 0.15842682, 0.95535886],
[0.8110889 , 0.75173104, 0.09625155],
[0.0950045 , 0.71161145, 0.6916123 ],
...,
[0.32576588, 0.8340051 , 0.82255083],
[0.12769088, 0.23999453, 0.28765103],
[0.07242639, 0.14565416, 0.7454422 ]], dtype=float32)
At PostgreSQL:
postgres=# SELECT * FROM ft LIMIT 5;
id | x | y | z
----+-----------+----------+-----------
1 | 0.0526706 | 0.158427 | 0.955359
2 | 0.811089 | 0.751731 | 0.0962516
3 | 0.0950045 | 0.711611 | 0.691612
4 | 0.051835 | 0.405314 | 0.0207166
5 | 0.598073 | 0.4739 | 0.492226
(5 rows)

Step-4. All the stuff in Python, do your analytics workloads on GPUs
◆ Dot Product
>>> cupy.dot(X[:,0],X[:,1])
array(24974.453, dtype=float32)
◆ Transpose Matrix
>>> cupy.transpose(X)
array([[0.8655484, 0.9804696, 0.43135548, ..., 0.58545816,
0.9951294, 0.14361869],
[0.12646914, 0.92461866, 0.14051293, ..., 0.5793936,
0.7182556 , 0.15441231],
[0.10312917, 0.2307432 , 0.6121663 , ..., 0.78983736,
0.19550513, 0.38183048]], dtype=float32)

Our vision for in-database analytics & machine-learning
gstore_fdw
Data manipulation
on the local sideData
Collaboration
Good bye CSV, Data Management is a suitable job for DBMS
Python runs
statistical analysis &
machine-learning
Data Scientist
are responsible for both of data
management and data analytics;
including machine-learning.
Data Lake
Data Warehouse
postgres_fdw / xxx_fdw
connects remote database for data import.
Available to run filtering, pre-processing
and others on the remote side.

Resources
▌PG-Strom
 GitHub:
https://github.com/heterodb/pg-strom
 Documentation:
http://heterodb.github.io/pg-strom/
▌System requirement
 Plan to distribute VM image for Microsoft Azure GPU instance
....likely, by end of the November (coming soon!)
 Or, your on-premise environment, of course.
https://github.com/heterodb/pg-strom/wiki/002:-HW-Validation-List
▌Contact
 ML: pgstrom@heterodb.com
 e-mail: kaigai@heterodb.com
 Twitter: @kkaigai

20181025_pgconfeu_lt_gstorefdw

20181025_pgconfeu_lt_gstorefdw

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 20181025_pgconfeu_lt_gstorefdw

Similar to 20181025_pgconfeu_lt_gstorefdw (20)

More from Kohei KaiGai

More from Kohei KaiGai (20)

Recently uploaded

Recently uploaded (20)

20181025_pgconfeu_lt_gstorefdw