SlideShare a Scribd company logo
In-database Analytics using GPU
~ Tried to implement Logistic Regression Analytics~
HeteroDB,Inc
Chief Architect & CEO
KaiGai Kohei <kaigai@heterodb.com>
Hello guys,
Are you using PL/CUDA?
Hello guys. Are you using PL/CUDA?
This caption is not automatic by machine-learning. I preliminary write up by manual.
PGconf.ASIA 2018 LT - In-database Analytics using GPU2
Result
PL/CUDA User Defined Function
PGconf.ASIA 2018 LT - In-database Analytics using GPU3
▌What is PL/CUDA?
 PL/CUDA allows UDF written in CUDA C which is executable on GPU.
▌Characteristics
 Extreme optimization of GPU code by manual; not auto-generated.
 Fully integration of SQL for pre-/post-processes; with flexible operations
All In-database Analytics
Scan
Pre-Process
Analytics
Post-ProcessCREATE FUNCTION
my_logic( reggstore, text )
RETURNS matrix
AS $$
$$ LANGUAGE ‘plcuda’;
Custom CUDA C code block
(runs on GPU device)  Manual optimization for statistics
and machine-learning
 Utilization of thousands cores and
wide-band device memory.
ready
PL/CUDA allows UDF written in CUDA C program that is executable on GPU. Valuable due to integration of
manual (extreme) optimization for GPU and flexible data operation by SQL.
PL/CUDA Use Case – Similarity Search on Drug-Discovery
PGconf.ASIA 2018 LT - In-database Analytics using GPU4
ID NAME Fingerprint (1024bit)
1 CHEMBL153534 00000000000100000010000000000010001000000...
2 CHEMBL405398 00000000000000010010000000000000000100000...
3 CHEMBL503634 00000100000000000000010000000000000000000...
: : :
Data structure of chemical compounds
Database compounds
(10M items)
Query compounds
(~1,000 items)
To be checked = 10billion combinations
DB Server
Similarity
Search Logic
Query
List of similar
chemical
compounds
For similarity search on drug-discovery, GPU calculated 10billion of distance between chemical compounds
x150 times faster than C-binary on CPU. It is very computing intensive workloads.
x150 times
faster!!
response time of the similarity search by k-NN method (k=3, D=10M)
number of query compounds [Q]
Is there any sample program?
Oh.... this case was proprietary algorithm. Now we have no sample code in public.
Is there any sample programs?
PGconf.ASIA 2018 LT - In-database Analytics using GPU5
I tried to make it.
Theme: Logistic Regression Analytics
I tried to make it.
Theme: Logistic Regression Analytics
PGconf.ASIA 2018 LT - In-database Analytics using GPU6
What is Logistics Regression Analytics (1/2)
A method for binary classification
Logistic Regression Analytics is a machine-learning method for binary classification.
True
False
PGconf.ASIA 2018 LT - In-database Analytics using GPU7
What is Logistics Regression Analytics (2/2)
Probability of right classification follows the logistic function.
Probability of “right” classification follows the logistic function
𝜎 𝛼 =
1
1 − 𝑒−𝛼
PGconf.ASIA 2018 LT - In-database Analytics using GPU8
Estimation of the parameters (1/3)
In general ....
Parameter: 𝑤 = 𝑤0, 𝑤1, ⋯ , 𝑤 𝑚
Explanatory variables: 𝜑𝑖 = 1, 𝑥1, ⋯ , 𝑥 𝑚 𝑖
Teacher data: 𝑡𝑖 = 0 𝑜𝑟 1
Determination of the division
surface is equivalent to seek
the weight of explanatory
variables and intercept.
0 = 𝑤0 + 𝑤1 𝑥 + 𝑤2 𝑦
Determination of division surface is equivalent to seek the weight of the explanatory variables and
intercept. But teacher data tell us boolean state for the combination of explanatory variables.
PGconf.ASIA 2018 LT - In-database Analytics using GPU9
Estimation of the parameters (2/3)
Target: Maximize the probability of the training set.
When 𝑧𝑖 = 𝜎 𝑊 𝑇
𝜑𝑖 , 𝑷 = 𝑃𝑖 = 𝑍𝑖
𝑡 𝑖
1 − 𝑍𝑖
1−𝑡 𝑖𝑁
𝑖=1
𝑁
𝑖=1
Distance from the division surface
introduces certainness of the
classification.
We assume the training set is a result
by the feasible probability.
Explanatory variables far from the division surface has higher probability of true/false. We assume the
training-set is result of the highest likelihood, maximized by the W parameter.
PGconf.ASIA 2018 LT - In-database Analytics using GPU10
Estimation of the parameters (3/3)
Parameter estimation by iteration of:
𝑤 𝑛𝑒𝑤 = 𝑤 𝑜𝑙𝑑 − Φ 𝑇
𝑅Φ −1
Φ 𝑇
𝑧 − 𝑡
i.e,
Φ =
1 𝑥11 ⋯ 𝑥1𝑚
⋮ ⋱ ⋮
1 𝑥 𝑛1 ⋯ 𝑥 𝑛𝑚
𝑡 = 𝑡1, … , 𝑡 𝑛
𝑧 = 𝑧1, … , 𝑧 𝑛
𝑅 = 𝑑𝑖𝑎𝑔 𝑧1 1 − 𝑧1 , … , 𝑧 𝑛 1 − 𝑧 𝑛
For more details, check out the book. Anyway, W is updated for each iteration, then Wnew shall seek to the
reasonable parameter then Wold. Eventually, difference of Wnew and Wold becomes very small.
For more details, check the book
“The first step of machine-learning theory”
PGconf.ASIA 2018 LT - In-database Analytics using GPU11
Amount of the calculation
▌# of explanatory variables (small): several to several hundreds ... m items
▌# of training data (large): several hundreds to several millions ... n items
𝑤 𝑛𝑒𝑤 = 𝑤 𝑜𝑙𝑑 − 𝑤Δ = 𝑤 𝑜𝑙𝑑 − Φ 𝑇
𝑅Φ −1
Φ 𝑇
𝑧 − 𝑡
Estimation for amount of the calculation. # of explanatory variables are to up hundreds, but # of training
data set is more than million items. It is suitable for parallel calculation by GPU.
ΦR
n
-1
Φ 𝑇
𝑧 − 𝑡
Φ 𝑇
n
m
n
1
-1
Φ 𝑇
𝑅Φ −1
Φ 𝑇
𝑧 − 𝑡
𝑤Δ
𝑚 × 𝑚 𝑚 × 1
𝑚 × 1
PGconf.ASIA 2018 LT - In-database Analytics using GPU12
Example of GPU code for matrix-products Φ 𝑇 𝑅Φ
KERNEL_FUNCTION_MAXTHREADS(void) logregr_update_P(cl_double **Preg, /* out */
cl_float **Xp,
cl_int width,
VectorTypeFloat *Z) {
cl_double *P = Preg[0];
__shared__ cl_float v[MAXTHREADS_PER_BLOCK]; // shared variables
nitems_bs = TYPEALIGN(get_local_size(), nitems);
nloops = width * width * nitems_bs;
for (loop = get_global_id(); // unique identifier of GPU threads
loop < nloops;
loop += get_global_size()) { // add total number of GPU threads
k = loop % nitems_bs; // index of 𝑅 column/row
i = (loop / nitems_bs) % width; // index of Φ 𝑇 column
j = loop / (nitems_bs * width); // index of Φ column
if (k < nitems) {
cl_float z = Z->values[k];
cl_float x1 = (i == 0 ? 1.0 : Xp[i-1][k]);
cl_float x2 = (j == 0 ? 1.0 : Xp[j-1][k]);
v[get_local_id()] = x1 * z * (1.0 - z) * x2;
}
else
v[get_local_id()] = 0.0;
sum = pgstromTotalSum(v,MAXTHREADS_PER_BLOCK); // total sum of the element
if (get_local_id() == 0) // calculated by the sibling threads
atomicAdd(&P[i + j * width], sum);
__syncthreads();
}
}
PGconf.ASIA 2018 LT - In-database Analytics using GPU13
Calculation by GPU – A case for reduction algorithm
●item[0]
step.1 step.2 step.4step.3
Sum count by GPU
Σi=0...N-1item[i]
◆
●
▲ ■ ★
● ◆
●
● ◆ ▲
●
● ◆
●
● ◆ ▲ ■
●
● ◆
●
● ◆ ▲
●
● ◆
●
item[1]
item[2]
item[3]
item[4]
item[5]
item[6]
item[7]
item[8]
item[9]
item[10]
item[11]
item[12]
item[13]
item[14]
item[15]
Sum of the items[]
in log2N steps
Inter-core synchronization by HW support
SELECT count(X),
sum(Y),
avg(Z)
FROM my_table;
Also used by aggregation
PGconf.ASIA 2018 LT - In-database Analytics using GPU14
Values on shared memory can be accessed by multiple CPU cores simultaneously. Hardware supports inter-
cores synchronization, and it enables to calculate total sum with log2N steps.
Sample program of the Logistic Regression Analytics
$ git clone https://github.com/heterodb/toybox.git
$ cd toybox/logistic_regression/
$ make && make install
$ psql postgres
postgres=# create extension logregr;
CREATE EXTENSION
To get the sample code, open “heterodb/toybox” on GitHub, then move to “logistic_regression”.
You can install it using CREATE EXTENSION, if PG-Strom is correctly setup.
https://github.com/heterodb/toybox/  logistic_regression
PGconf.ASIA 2018 LT - In-database Analytics using GPU15
Let’s play (1/4) - Creation of artificial test data
postgres=# CREATE TABLE logreg (
t bool,
x1 float,
x2 float,
x3 float,
x4 float );
CREATE TABLE
-- The training data classified all the 1 + 2𝑥1 − 3𝑥2 + 𝑥3 + 0.5𝑥4 > 0 as true; 40M rows
postgres=# INSERT INTO logreg
(SELECT (1.0+2.0*x1-3.0*x2+x3+0.5*x4) > 0 t, x1, x2, x3, x4
FROM (SELECT random() x1,
random() x2,
random() x3,
random() x4
FROM generate_series(1,40000000)) x);
INSERT 0 40000000
OK, let’s work the PL/CUDA function. First of all, make a normal table with 40M rows of random data.
All the rows that satisfy 1 + 2𝑥1 − 3𝑥2 + 𝑥3 + 0.5𝑥4 > 0 are marked as ‘true’.
PGconf.ASIA 2018 LT - In-database Analytics using GPU16
Let’s play (2/4) - Data loading to GPU device memory (part-1)
postgres=# CREATE FOREIGN TABLE ft (
t bool,
x1 real,
x2 real,
x3 real,
x4 real
) SERVER gstore_fdw
OPTIONS (pinning '0');
CREATE FOREIGN TABLE
postgres=# INSERT INTO ft
(SELECT * FROM logreg);
INSERT 0 40000000
Gstore_Fdw is a FDW extension on behalf of the GPU device memory, specified by the ‘pinning’ option.
INSERT INTO the Gstore_Fdw table loads 40M rows in the ‘logreg’ table.
GPU device memory
Foreign Table
(gstore_fdw)
 Data format conversion
 Data compression (if any)
 Transaction control
PGconf.ASIA 2018 LT - In-database Analytics using GPU17
Let’s play (3/4) - Data loading to GPU device memory (part-2)
[kaigai@saba src]$ nvidia-smi
Thu Dec 6 12:10:56 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:02:00.0 Off | N/A |
| N/A 42C P0 52W / 250W | 817MiB / 22919MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 27650 C ...bgworker: PG-Strom GPU memory keeper 807MiB |
+-----------------------------------------------------------------------------+
807MB of GPU device memory is preserved. The dataset consumes 680MB, in addition to the 120MB
for device management.
For device management: about 120MB +
(sizeof(bool) + 4*sizeof(float)) * 40M = 680MB
PGconf.ASIA 2018 LT - In-database Analytics using GPU18
Let’s play (4/4)
postgres=# SELECT logregr_train('ft',
attnum_of('ft','t'),
attnums_of('ft','{x1,x2,x3,x4}'));
logregr_train
------------------------------------------
{3376.4,6752.71,-10129.1,3376.3,1688.27}
(1 row)
Time: 3647.059 ms (00:03.647)
Weight of the explanatory variables are estimated. 5 elements are returned because here is four
explanatory variables and intercept. It takes 3.6sec.
PGconf.ASIA 2018 LT - In-database Analytics using GPU19
Comparison to CPU implementation (1/3)
logregr_train() function at MADLib
postgres=# SELECT madlib.logregr_train(‘logreg’, ‘hoge’,
‘t’,’ARRAY[1,x1,x2,x3,x4]’,
NULL, 20);
logregr_train
---------------
(1 row)
Time: 1301307.361 ms (21:41.307)
postgres=# SELECT coef FROM hoge;
coef
------------------------------------------------------
{3041.82722783601,6083.57794939209,-9125.44857123801,3041.73992459095,1520.98287953044}
(1 row)
For the same jobs, MADLib’s logregr_train() tooks 21min41sec. PL/CUDA implementation was 356 times
faster than the CPU-based implementation.
1301307.36 / 3647.06
= x356.8 times faster
PGconf.ASIA 2018 LT - In-database Analytics using GPU20
Comparison to CPU implementation (2/3) - recalculation
It is weight of
the explanatory variables.
The parameter estimated by
logregr_train() is weight of
the division surface.
w0 w1 w2 w3 w4
PL/CUDA 3376.4 6752.71 -10129.1 3376.3 1688.27
MADLib 3041.83 6083.58 -9125.45 3041.74 1520.98
The result of logregr_train() is different from the weight when we made the dataset artificially, because it
returns the gradient and intercept of the normal vector towards the division surface.
PGconf.ASIA 2018 LT - In-database Analytics using GPU21
Comparison to CPU implementation (3/3) - recalculation
Notice: !!we usually should not apply estimated parameter on the training set!!
postgres=# SELECT COUNT(*)
FROM (SELECT t, logregr_predict(ARRAY[ 3376.4, 6752.71,
-10129.1, 3376.3,
1688.27]::float[],
ARRAY[x1,x2,x3,x4]) p
FROM logreg) data
WHERE t != p;
count
-------
90
(1 row)
postgres=# SELECT COUNT(*)
FROM (SELECT t, logregr_predict(hoge.coef,
ARRAY[x1,x2,x3,x4]) p
FROM logreg, hoge) data
WHERE t != p;
count
-------
70
(1 row)
Prediction by our PL/CUDA function told 90 of 40M rows wrongly, and MADLib also told 70 of 40M.
Note that we usually don’t apply prediction on the training set when we have “actual” data analytics.
count number of the incorrect estimations
PGconf.ASIA 2018 LT - In-database Analytics using GPU22
Conclusion
▌PL/CUDA sample programs
https://github.com/heterodb/toybox
▌PL/CUDA is fun(ction).
▌Suitable workloads for PL/CUDA
 Machine-Learning
 Similarity-Search
 Anomaly Detection
 Image Generation
 .... and others
Conclusion: We could make a sample program of PL/CUDA, and be published. PL/CUDA is fun.
PL/CUDA will be valuable for machine-learning, similarity-search, anomaly-detection, image generation, ...
PGconf.ASIA 2018 LT - In-database Analytics using GPU23
20181212 - PGconfASIA - LT - English

More Related Content

What's hot

PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
Kohei KaiGai
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
John Zedlewski
 
Japan Lustre User Group 2014
Japan Lustre User Group 2014Japan Lustre User Group 2014
Japan Lustre User Group 2014
Hitoshi Sato
 
Parallel K means clustering using CUDA
Parallel K means clustering using CUDAParallel K means clustering using CUDA
Parallel K means clustering using CUDA
prithan
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
prithan
 
Apache Nemo
Apache NemoApache Nemo
Apache Nemo
NAVER Engineering
 
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
Hitoshi Sato
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Ural-PDC
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
Kazuaki Ishizaki
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter
 
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructuretempledf
 
Supermicro cloudera hadoop
Supermicro cloudera hadoopSupermicro cloudera hadoop
Supermicro cloudera hadoopSupermicro_SMCI
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
Hitoshi Sato
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
Kyong-Ha Lee
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
PeterAndreasEntschev
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
Altinity Ltd
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Prashant de-ny-project-s1
Prashant de-ny-project-s1Prashant de-ny-project-s1
Prashant de-ny-project-s1
Prashant Ratnaparkhi
 

What's hot (20)

PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Japan Lustre User Group 2014
Japan Lustre User Group 2014Japan Lustre User Group 2014
Japan Lustre User Group 2014
 
Parallel K means clustering using CUDA
Parallel K means clustering using CUDAParallel K means clustering using CUDA
Parallel K means clustering using CUDA
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
Apache Nemo
Apache NemoApache Nemo
Apache Nemo
 
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructure
 
Supermicro cloudera hadoop
Supermicro cloudera hadoopSupermicro cloudera hadoop
Supermicro cloudera hadoop
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
Prashant de-ny-project-s1
Prashant de-ny-project-s1Prashant de-ny-project-s1
Prashant de-ny-project-s1
 

Similar to 20181212 - PGconfASIA - LT - English

Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
PyData
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Kohei KaiGai
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
alpinedatalabs
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
DB Tsai
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Command Prompt., Inc
 
spaGO: A self-contained ML & NLP library in GO
spaGO: A self-contained ML & NLP library in GOspaGO: A self-contained ML & NLP library in GO
spaGO: A self-contained ML & NLP library in GO
Matteo Grella
 
Xgboost
XgboostXgboost
Xgboost
XgboostXgboost
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
Ferdinand Jamitzky
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
Kohei KaiGai
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Kohei KaiGai
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1
NVIDIA
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
PVS-Studio
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
Andrey Karpov
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
Intel® Software
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
Mail.ru Group
 

Similar to 20181212 - PGconfASIA - LT - English (20)

Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
spaGO: A self-contained ML & NLP library in GO
spaGO: A self-contained ML & NLP library in GOspaGO: A self-contained ML & NLP library in GO
spaGO: A self-contained ML & NLP library in GO
 
Xgboost
XgboostXgboost
Xgboost
 
Xgboost
XgboostXgboost
Xgboost
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Python Coding Examples for Drive Time Analysis
Python Coding Examples for Drive Time AnalysisPython Coding Examples for Drive Time Analysis
Python Coding Examples for Drive Time Analysis
 

More from Kohei KaiGai

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History
Kohei KaiGai
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API
Kohei KaiGai
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow
Kohei KaiGai
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0
Kohei KaiGai
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache
Kohei KaiGai
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
Kohei KaiGai
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS
Kohei KaiGai
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_Online
Kohei KaiGai
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw
Kohei KaiGai
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw
Kohei KaiGai
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo
Kohei KaiGai
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
Kohei KaiGai
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta
Kohei KaiGai
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStrom
Kohei KaiGai
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStrom
Kohei KaiGai
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw
Kohei KaiGai
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw
Kohei KaiGai
 
20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT
Kohei KaiGai
 
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
Kohei KaiGai
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference
Kohei KaiGai
 

More from Kohei KaiGai (20)

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_Online
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStrom
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStrom
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw
 
20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT
 
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference
 

Recently uploaded

Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 

Recently uploaded (20)

Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 

20181212 - PGconfASIA - LT - English

  • 1. In-database Analytics using GPU ~ Tried to implement Logistic Regression Analytics~ HeteroDB,Inc Chief Architect & CEO KaiGai Kohei <kaigai@heterodb.com>
  • 2. Hello guys, Are you using PL/CUDA? Hello guys. Are you using PL/CUDA? This caption is not automatic by machine-learning. I preliminary write up by manual. PGconf.ASIA 2018 LT - In-database Analytics using GPU2
  • 3. Result PL/CUDA User Defined Function PGconf.ASIA 2018 LT - In-database Analytics using GPU3 ▌What is PL/CUDA?  PL/CUDA allows UDF written in CUDA C which is executable on GPU. ▌Characteristics  Extreme optimization of GPU code by manual; not auto-generated.  Fully integration of SQL for pre-/post-processes; with flexible operations All In-database Analytics Scan Pre-Process Analytics Post-ProcessCREATE FUNCTION my_logic( reggstore, text ) RETURNS matrix AS $$ $$ LANGUAGE ‘plcuda’; Custom CUDA C code block (runs on GPU device)  Manual optimization for statistics and machine-learning  Utilization of thousands cores and wide-band device memory. ready PL/CUDA allows UDF written in CUDA C program that is executable on GPU. Valuable due to integration of manual (extreme) optimization for GPU and flexible data operation by SQL.
  • 4. PL/CUDA Use Case – Similarity Search on Drug-Discovery PGconf.ASIA 2018 LT - In-database Analytics using GPU4 ID NAME Fingerprint (1024bit) 1 CHEMBL153534 00000000000100000010000000000010001000000... 2 CHEMBL405398 00000000000000010010000000000000000100000... 3 CHEMBL503634 00000100000000000000010000000000000000000... : : : Data structure of chemical compounds Database compounds (10M items) Query compounds (~1,000 items) To be checked = 10billion combinations DB Server Similarity Search Logic Query List of similar chemical compounds For similarity search on drug-discovery, GPU calculated 10billion of distance between chemical compounds x150 times faster than C-binary on CPU. It is very computing intensive workloads. x150 times faster!! response time of the similarity search by k-NN method (k=3, D=10M) number of query compounds [Q]
  • 5. Is there any sample program? Oh.... this case was proprietary algorithm. Now we have no sample code in public. Is there any sample programs? PGconf.ASIA 2018 LT - In-database Analytics using GPU5
  • 6. I tried to make it. Theme: Logistic Regression Analytics I tried to make it. Theme: Logistic Regression Analytics PGconf.ASIA 2018 LT - In-database Analytics using GPU6
  • 7. What is Logistics Regression Analytics (1/2) A method for binary classification Logistic Regression Analytics is a machine-learning method for binary classification. True False PGconf.ASIA 2018 LT - In-database Analytics using GPU7
  • 8. What is Logistics Regression Analytics (2/2) Probability of right classification follows the logistic function. Probability of “right” classification follows the logistic function 𝜎 𝛼 = 1 1 − 𝑒−𝛼 PGconf.ASIA 2018 LT - In-database Analytics using GPU8
  • 9. Estimation of the parameters (1/3) In general .... Parameter: 𝑤 = 𝑤0, 𝑤1, ⋯ , 𝑤 𝑚 Explanatory variables: 𝜑𝑖 = 1, 𝑥1, ⋯ , 𝑥 𝑚 𝑖 Teacher data: 𝑡𝑖 = 0 𝑜𝑟 1 Determination of the division surface is equivalent to seek the weight of explanatory variables and intercept. 0 = 𝑤0 + 𝑤1 𝑥 + 𝑤2 𝑦 Determination of division surface is equivalent to seek the weight of the explanatory variables and intercept. But teacher data tell us boolean state for the combination of explanatory variables. PGconf.ASIA 2018 LT - In-database Analytics using GPU9
  • 10. Estimation of the parameters (2/3) Target: Maximize the probability of the training set. When 𝑧𝑖 = 𝜎 𝑊 𝑇 𝜑𝑖 , 𝑷 = 𝑃𝑖 = 𝑍𝑖 𝑡 𝑖 1 − 𝑍𝑖 1−𝑡 𝑖𝑁 𝑖=1 𝑁 𝑖=1 Distance from the division surface introduces certainness of the classification. We assume the training set is a result by the feasible probability. Explanatory variables far from the division surface has higher probability of true/false. We assume the training-set is result of the highest likelihood, maximized by the W parameter. PGconf.ASIA 2018 LT - In-database Analytics using GPU10
  • 11. Estimation of the parameters (3/3) Parameter estimation by iteration of: 𝑤 𝑛𝑒𝑤 = 𝑤 𝑜𝑙𝑑 − Φ 𝑇 𝑅Φ −1 Φ 𝑇 𝑧 − 𝑡 i.e, Φ = 1 𝑥11 ⋯ 𝑥1𝑚 ⋮ ⋱ ⋮ 1 𝑥 𝑛1 ⋯ 𝑥 𝑛𝑚 𝑡 = 𝑡1, … , 𝑡 𝑛 𝑧 = 𝑧1, … , 𝑧 𝑛 𝑅 = 𝑑𝑖𝑎𝑔 𝑧1 1 − 𝑧1 , … , 𝑧 𝑛 1 − 𝑧 𝑛 For more details, check out the book. Anyway, W is updated for each iteration, then Wnew shall seek to the reasonable parameter then Wold. Eventually, difference of Wnew and Wold becomes very small. For more details, check the book “The first step of machine-learning theory” PGconf.ASIA 2018 LT - In-database Analytics using GPU11
  • 12. Amount of the calculation ▌# of explanatory variables (small): several to several hundreds ... m items ▌# of training data (large): several hundreds to several millions ... n items 𝑤 𝑛𝑒𝑤 = 𝑤 𝑜𝑙𝑑 − 𝑤Δ = 𝑤 𝑜𝑙𝑑 − Φ 𝑇 𝑅Φ −1 Φ 𝑇 𝑧 − 𝑡 Estimation for amount of the calculation. # of explanatory variables are to up hundreds, but # of training data set is more than million items. It is suitable for parallel calculation by GPU. ΦR n -1 Φ 𝑇 𝑧 − 𝑡 Φ 𝑇 n m n 1 -1 Φ 𝑇 𝑅Φ −1 Φ 𝑇 𝑧 − 𝑡 𝑤Δ 𝑚 × 𝑚 𝑚 × 1 𝑚 × 1 PGconf.ASIA 2018 LT - In-database Analytics using GPU12
  • 13. Example of GPU code for matrix-products Φ 𝑇 𝑅Φ KERNEL_FUNCTION_MAXTHREADS(void) logregr_update_P(cl_double **Preg, /* out */ cl_float **Xp, cl_int width, VectorTypeFloat *Z) { cl_double *P = Preg[0]; __shared__ cl_float v[MAXTHREADS_PER_BLOCK]; // shared variables nitems_bs = TYPEALIGN(get_local_size(), nitems); nloops = width * width * nitems_bs; for (loop = get_global_id(); // unique identifier of GPU threads loop < nloops; loop += get_global_size()) { // add total number of GPU threads k = loop % nitems_bs; // index of 𝑅 column/row i = (loop / nitems_bs) % width; // index of Φ 𝑇 column j = loop / (nitems_bs * width); // index of Φ column if (k < nitems) { cl_float z = Z->values[k]; cl_float x1 = (i == 0 ? 1.0 : Xp[i-1][k]); cl_float x2 = (j == 0 ? 1.0 : Xp[j-1][k]); v[get_local_id()] = x1 * z * (1.0 - z) * x2; } else v[get_local_id()] = 0.0; sum = pgstromTotalSum(v,MAXTHREADS_PER_BLOCK); // total sum of the element if (get_local_id() == 0) // calculated by the sibling threads atomicAdd(&P[i + j * width], sum); __syncthreads(); } } PGconf.ASIA 2018 LT - In-database Analytics using GPU13
  • 14. Calculation by GPU – A case for reduction algorithm ●item[0] step.1 step.2 step.4step.3 Sum count by GPU Σi=0...N-1item[i] ◆ ● ▲ ■ ★ ● ◆ ● ● ◆ ▲ ● ● ◆ ● ● ◆ ▲ ■ ● ● ◆ ● ● ◆ ▲ ● ● ◆ ● item[1] item[2] item[3] item[4] item[5] item[6] item[7] item[8] item[9] item[10] item[11] item[12] item[13] item[14] item[15] Sum of the items[] in log2N steps Inter-core synchronization by HW support SELECT count(X), sum(Y), avg(Z) FROM my_table; Also used by aggregation PGconf.ASIA 2018 LT - In-database Analytics using GPU14 Values on shared memory can be accessed by multiple CPU cores simultaneously. Hardware supports inter- cores synchronization, and it enables to calculate total sum with log2N steps.
  • 15. Sample program of the Logistic Regression Analytics $ git clone https://github.com/heterodb/toybox.git $ cd toybox/logistic_regression/ $ make && make install $ psql postgres postgres=# create extension logregr; CREATE EXTENSION To get the sample code, open “heterodb/toybox” on GitHub, then move to “logistic_regression”. You can install it using CREATE EXTENSION, if PG-Strom is correctly setup. https://github.com/heterodb/toybox/  logistic_regression PGconf.ASIA 2018 LT - In-database Analytics using GPU15
  • 16. Let’s play (1/4) - Creation of artificial test data postgres=# CREATE TABLE logreg ( t bool, x1 float, x2 float, x3 float, x4 float ); CREATE TABLE -- The training data classified all the 1 + 2𝑥1 − 3𝑥2 + 𝑥3 + 0.5𝑥4 > 0 as true; 40M rows postgres=# INSERT INTO logreg (SELECT (1.0+2.0*x1-3.0*x2+x3+0.5*x4) > 0 t, x1, x2, x3, x4 FROM (SELECT random() x1, random() x2, random() x3, random() x4 FROM generate_series(1,40000000)) x); INSERT 0 40000000 OK, let’s work the PL/CUDA function. First of all, make a normal table with 40M rows of random data. All the rows that satisfy 1 + 2𝑥1 − 3𝑥2 + 𝑥3 + 0.5𝑥4 > 0 are marked as ‘true’. PGconf.ASIA 2018 LT - In-database Analytics using GPU16
  • 17. Let’s play (2/4) - Data loading to GPU device memory (part-1) postgres=# CREATE FOREIGN TABLE ft ( t bool, x1 real, x2 real, x3 real, x4 real ) SERVER gstore_fdw OPTIONS (pinning '0'); CREATE FOREIGN TABLE postgres=# INSERT INTO ft (SELECT * FROM logreg); INSERT 0 40000000 Gstore_Fdw is a FDW extension on behalf of the GPU device memory, specified by the ‘pinning’ option. INSERT INTO the Gstore_Fdw table loads 40M rows in the ‘logreg’ table. GPU device memory Foreign Table (gstore_fdw)  Data format conversion  Data compression (if any)  Transaction control PGconf.ASIA 2018 LT - In-database Analytics using GPU17
  • 18. Let’s play (3/4) - Data loading to GPU device memory (part-2) [kaigai@saba src]$ nvidia-smi Thu Dec 6 12:10:56 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P40 Off | 00000000:02:00.0 Off | N/A | | N/A 42C P0 52W / 250W | 817MiB / 22919MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 27650 C ...bgworker: PG-Strom GPU memory keeper 807MiB | +-----------------------------------------------------------------------------+ 807MB of GPU device memory is preserved. The dataset consumes 680MB, in addition to the 120MB for device management. For device management: about 120MB + (sizeof(bool) + 4*sizeof(float)) * 40M = 680MB PGconf.ASIA 2018 LT - In-database Analytics using GPU18
  • 19. Let’s play (4/4) postgres=# SELECT logregr_train('ft', attnum_of('ft','t'), attnums_of('ft','{x1,x2,x3,x4}')); logregr_train ------------------------------------------ {3376.4,6752.71,-10129.1,3376.3,1688.27} (1 row) Time: 3647.059 ms (00:03.647) Weight of the explanatory variables are estimated. 5 elements are returned because here is four explanatory variables and intercept. It takes 3.6sec. PGconf.ASIA 2018 LT - In-database Analytics using GPU19
  • 20. Comparison to CPU implementation (1/3) logregr_train() function at MADLib postgres=# SELECT madlib.logregr_train(‘logreg’, ‘hoge’, ‘t’,’ARRAY[1,x1,x2,x3,x4]’, NULL, 20); logregr_train --------------- (1 row) Time: 1301307.361 ms (21:41.307) postgres=# SELECT coef FROM hoge; coef ------------------------------------------------------ {3041.82722783601,6083.57794939209,-9125.44857123801,3041.73992459095,1520.98287953044} (1 row) For the same jobs, MADLib’s logregr_train() tooks 21min41sec. PL/CUDA implementation was 356 times faster than the CPU-based implementation. 1301307.36 / 3647.06 = x356.8 times faster PGconf.ASIA 2018 LT - In-database Analytics using GPU20
  • 21. Comparison to CPU implementation (2/3) - recalculation It is weight of the explanatory variables. The parameter estimated by logregr_train() is weight of the division surface. w0 w1 w2 w3 w4 PL/CUDA 3376.4 6752.71 -10129.1 3376.3 1688.27 MADLib 3041.83 6083.58 -9125.45 3041.74 1520.98 The result of logregr_train() is different from the weight when we made the dataset artificially, because it returns the gradient and intercept of the normal vector towards the division surface. PGconf.ASIA 2018 LT - In-database Analytics using GPU21
  • 22. Comparison to CPU implementation (3/3) - recalculation Notice: !!we usually should not apply estimated parameter on the training set!! postgres=# SELECT COUNT(*) FROM (SELECT t, logregr_predict(ARRAY[ 3376.4, 6752.71, -10129.1, 3376.3, 1688.27]::float[], ARRAY[x1,x2,x3,x4]) p FROM logreg) data WHERE t != p; count ------- 90 (1 row) postgres=# SELECT COUNT(*) FROM (SELECT t, logregr_predict(hoge.coef, ARRAY[x1,x2,x3,x4]) p FROM logreg, hoge) data WHERE t != p; count ------- 70 (1 row) Prediction by our PL/CUDA function told 90 of 40M rows wrongly, and MADLib also told 70 of 40M. Note that we usually don’t apply prediction on the training set when we have “actual” data analytics. count number of the incorrect estimations PGconf.ASIA 2018 LT - In-database Analytics using GPU22
  • 23. Conclusion ▌PL/CUDA sample programs https://github.com/heterodb/toybox ▌PL/CUDA is fun(ction). ▌Suitable workloads for PL/CUDA  Machine-Learning  Similarity-Search  Anomaly Detection  Image Generation  .... and others Conclusion: We could make a sample program of PL/CUDA, and be published. PL/CUDA is fun. PL/CUDA will be valuable for machine-learning, similarity-search, anomaly-detection, image generation, ... PGconf.ASIA 2018 LT - In-database Analytics using GPU23