Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PG-Strom - A FDW module utilizing GPU device


Published on

slides on LT session of PGcon2012

  • Be the first to comment

PG-Strom - A FDW module utilizing GPU device

  1. 1. PG-Strom~A FDW module utilizing GPU device~ NEC Europe SAP Global Competence Center KaiGai Kohei <>
  2. 2. FDW is fun Exec Regular Exec Table Executor Foreign MySQL Table FDWSELECT * FROM … Foreign Oracle Exec Table FDW Run on single PG-Strom Foreign thread FDW Table Utilizing External Regular Computing Table Resource! Page 2 PostgreSQL Conference 2012
  3. 3. Idea of Asynchronous Execution using CPU and GPU vanilla PostgreSQL PostgreSQL with PG-Strom CPU CPU GPU Asynchronous memory transfer and execution Iteration of scan tuples and evaluation of qualifiers Synchronization Larger “chunk” to scan Earlier than the database at once “Only CPU” scan : Red means, scan tuples from the database : Green means, execution of the qualifiers Page 3 PostgreSQL Conference 2012
  4. 4. Architecture of PG-Strom World of CPU World of GPU regular shadow Data Exchange tables tables via shared chunk Massive shared buffer shared chunks Parallel Execution Exec Preload PG-Strom PG-Strom GPU GPU Kernel Calculation Function Executor Server Exec Backend Backend Backend Process Process Process GPU Device DMA Memory Postmaster Transfer PCI-E x16 Gen2 (16GB/sec) Page 4 PostgreSQL Conference 2012
  5. 5. Data Density and Column-base structure Foreign Table FT1 (a, b, c, d) Table: my_schema.ft1.c.cs column store of D column store of C column store of A column store of B Shadow 10100 {‘2010-10-21’, …} rowid map Tables 10200 {‘2011-01-23’, …} 10300 {‘2011-08-17’, …} Table: my_schema.ft1.b.cs 10100 {2.4, 5.6, 4.95, … } row store of FT1 10300 {10.23, 7.54, 5.43, … } ② Calculation rowmap Chunk Buffer of FT1 ① Transfer value a[] <not used> value b[] ③ Write-Back value c[] value d[] <not used> Page 5 PostgreSQL Conference 2012
  6. 6. Benchmark Result postgres=# SELECT COUNT(*) FROM rtbl WHERE sqrt((x-256)^2 + (y-128)^2) < 40; count ------- 25069 (1 row) GPU Accelerated! Time: 3739.492 ms postgres=# SELECT COUNT(*) FROM ftbl WHERE sqrt((x-256)^2 + (y-128)^2) < 40; count ------- 25069 X10 times (1 row) Faster Time: 227.023 ms▌CPU: Intel Xeon E5504 (2.0GHz/4core), GPU: Nvidia GeForce GTS450 (128 cuda core)▌rtbl and ftbl contains 5 million tuples, with same values.▌All the tuples are already in the shared buffers, so seldom disk i/o happen. Page 6 PostgreSQL Conference 2012
  7. 7. Future Development▌Git URL▌v9.3 development  Writable Foreign Table  Sort / Aggregate acceleration using GPU  Inheritance between regular and foreign tables▌Need your help  Folks who can review the patches  Folks who can provide real-life big data  Folks who can know typical workload of analytic queries Page 7 PostgreSQL Conference 2012
  8. 8. Page 8 PostgreSQL Conference 2012