Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20170602_OSSummit_an_intelligent_storage

3,102 views

Published on

Slides at Open Source Summit Japan 2017 - Brief introduction of SSD-to-GPU Direct SQL Execution and its future.

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

20170602_OSSummit_an_intelligent_storage

  1. 1. An Intelligent Storage ~SQL Execution on GPU Closely Connected with SSD~ PG-Strom Development Team KaiGai Kohei <kaigai@kaigai.gr.jp>
  2. 2. The PG-Strom Project Who am I? ▌KaiGai Kohei  PG-Strom Development Team ▌Background  Linux kernel (SELinux, …)  PostgreSQL database  Enhancement of security features  Writable FDW, Remote Join  Custom Scan Interface  GPU and CUDA ▌PG-Strom  SQL Acceleration Technology using GPU  Solution for data analytics in-database 2 Open Source Summit Japan 2017 - An Intelligent Storage
  3. 3. The PG-Strom Project PG-Strom PG-Strom (1/3) – GPU as accelerator of SQL 3 GPU CPU Model NVIDIA Tesla P40 Intel Xeon E5-2699v4 Code name Pascal Broadwell Release Q3-2016 Q1-2016 # of transistors 12billion 7.2billion # of cores 3840 (simple) 22 (functional) Core clocks 1.306GHz ~1.53GHz 2.20GHz ~3.60GHz Performance (FP32) 12 TFLOPS 1.2 TFLOPS (with AVX2) RAM Size 24GB (GDDR5) max1.5TB (DDR4) Memory Band 347GB/s 76.8GB/s Power consumption 250W 145W Manufacturing process 16nm 14nm Application GPU Off-load WHERE, JOIN, GROUP BY Auto generation of GPU binary from SQL PG-Strom: Extension module of PostgreSQL, to accelerate SQL workloads using multi-thousands cores and high bandwidth memory. Same SQL Statement Same Data Structure Open Source Summit Japan 2017 - An Intelligent Storage
  4. 4. The PG-Strom Project PG-Strom (2/3) – Transparent GPU code generation on demand Open Source Summit Japan 2017 - An Intelligent Storage4 QUERY: SELECT cat, count(*), avg(x) FROM t0 WHERE x between y and y + 20.0 GROUP BY cat; : STATIC_FUNCTION(bool) gpupreagg_qual_eval(kern_context *kcxt, kern_data_store *kds, size_t kds_index) { pg_float8_t KPARAM_1 = pg_float8_param(kcxt,1); pg_float8_t KVAR_3 = pg_float8_vref(kds,kcxt,2,kds_index); pg_float8_t KVAR_4 = pg_float8_vref(kds,kcxt,3,kds_index); return EVAL((pgfn_float8ge(kcxt, KVAR_3, KVAR_4) && pgfn_float8le(kcxt, KVAR_3, pgfn_float8pl(kcxt, KVAR_4, KPARAM_1)))); } : E.g) Generate CUDA C code from calculation formula in WHERE-clause Reference to input data SQL expression in CUDA source code Run-time Compiler (nvrtc) Just-in-time Compile Parallel Execution
  5. 5. The PG-Strom Project PG-Strom (3/3) – An example of SQL acceleration 5 ▌Test Query: SELECT cat, count(*), avg(x) FROM t0 NATURAL JOIN t1 [NATURAL JOIN t2 ...] GROUP BY cat;  t0 contains 100M rows, t1...t8 contains 100K rows (like a start schema) Open Source Summit Japan 2017 - An Intelligent Storage 8.48 13.23 18.28 23.42 28.88 34.50 40.77 47.16 5.00 5.46 5.91 6.45 7.17 8.07 9.22 10.21 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 2 3 4 5 6 7 8 9 QueryResponseTime[sec] Number of tables joined PG-Strom microbenchmark with JOIN/GROUP BY PostgreSQL v9.6 PG-Strom 2.0devel CPU: Xeon E5-2640v4 GPU: Tesla P40 RAM: 128GB OS: CentOS 7.3 DB: PostgreSQL 9.6.2 + PG-Strom 2.0devel
  6. 6. The PG-Strom Project In-database Analytics ▌Computing close to data  Cost to move data  Cost to transform / validate data  Waste of time for DBAs ▌Utilization of SQL’s flexibility  Pre-/Post-process of analytic algorithms  Conjunction with other data (JOIN)  Summarization of results (GROUP BY/Window Functions) Open Source Summit Japan 2017 - An Intelligent Storage6
  7. 7. The PG-Strom Project What we want to develop (1/2) Open Source Summit Japan 2017 - An Intelligent Storage7 Data Collection Summarize Pre-process Data Generation Analytics & ML Visualization Data Size: Large  Small Amount of computing: Small  Large SSD-to-GPU Direct SQL Exec Transparent GPU code generation & Asynchronous Exec Statistical Analytics & Machine Learning Libraries Advanced algorithms as a part of SQL statement RowColum Hardware Transform BI Tools Execution of entire lifecycle for data analytics in-database DNN k-means SVM Random Forest PL/CUDA NVMe-SSD GPU
  8. 8. The PG-Strom Project What we want to develop (2/2) – elemental technology Open Source Summit Japan 2017 - An Intelligent Storage8 Data Collection Summarize Pre-process Data Generation Analytics & ML Visualization Data Size: Large  Small Amount of computing: Small  Large SSD-to-GPU Direct SQL Exec Transparent GPU code generation & Asynchronous Exec Statistical Analytics & Machine Learning Libraries Advanced algorithms as a part of SQL statement RowColum Hardware Transform BI Tools Execution of entire lifecycle for data analytics in-database DNN k-means SVM Random Forest PL/CUDA NVMe-SSD GPU SSD-to-GPU Direct SQL Execution (for I/O) PG-Strom (for SQL) PL/CUDA and Library (for computing)
  9. 9. The PG-Strom Project I/O Acceleration with GPU? Open Source Summit Japan 2017 - An Intelligent Storage9
  10. 10. The PG-Strom Project Re-definition of GPU’s role I/O workloads Computing workloads Open Source Summit Japan 2017 - An Intelligent Storage10
  11. 11. The PG-Strom Project Brief Architecture of GPU GPU Device Memory (GDDR5 or HBM2) Host RAM Data Transfer with DMA over the PCIe Bus CPU PCIe Bus Also data transfer via DMA from: particular blocks to: destination physical address Storage GPU Device Open Source Summit Japan 2017 - An Intelligent Storage11
  12. 12. The PG-Strom Project Technology Basis (1/2) – GPUDirect RDMA ▌Feature for peer-to-peer DMA between GPU and PCIe devices ▌Originally, designed for MPI over Infiniband ▌But available for any kind of PCIe devices, if custom kernel module Copyright (c) NVIDIA corporation, 2015 Open Source Summit Japan 2017 - An Intelligent Storage12
  13. 13. The PG-Strom Project Technology Basis (2/2) – GPUDirect RDMA Physical Address Space PCIe BAR1 AreaGPU RAM Host RAM NVMe-SSD Infiniband HBA PCIe devices GPUDirect RDMA A feature to map GPU’s device memory on physical address space of the host system Once GPU device memory gets physical address of the host system, driver can use the address as source/destination of DMA operations. 13 Open Source Summit Japan 2017 - An Intelligent Storage
  14. 14. The PG-Strom Project Idea of SSD-to-GPU Direct SQL Execution Open Source Summit Japan 2017 - An Intelligent Storage14 Large PostgreSQL Tables PCIe Bus NVMe SSD GPUSSD-to-GPU P2P DMA (NVMe-Strom driver) WHERE JOIN GROUP BYPostgreSQL Data Block Reduction of unnecessary rows according to the SQL  Smaller amount of I/O Existing Data Flow (Large I/O Size) SSD-to-GPU Direct SQL Execution – Load PostgreSQL’s data block to GPU directly, then reduce unnecessary rows with multi-thousands cores of GPUs.
  15. 15. The PG-Strom Project As destination of SSD to GPU DMA commands!! Linux kernel APIs of GPUDirect RDMA int nvidia_p2p_get_pages(uint64_t p2p_token, /* deprecated */ uint32_t va_space, /* deprecated */ uint64_t virtual_address, uint64_t length, struct nvidia_p2p_page_table **page_table, void (*free_callback)(void *private_data), void *private_data); struct nvidia_p2p_page_table { uint32_t version; uint32_t page_size; struct nvidia_p2p_page **pages; uint32_t entries; uint8_t *gpu_uuid; } nvidia_p2p_page_table_t; Array of I/O mapped physical memory address to the GPU device memory CUresult cuMemAlloc(CUdeviceptr *daddr, size_t bytesize); Open Source Summit Japan 2017 - An Intelligent Storage15
  16. 16. The PG-Strom Project NVMe-Strom Software Stack Open Source Summit Japan 2017 - An Intelligent Storage16 pg-strom NVMe-Strom VFS Page Cache NVMe SSD Driver nvidia driver GPU device memory GPU device memory PostgreSQL file offset DMA request block number SSD-to-GPU Peer-to-Peer DMA cuMemAlloc() /proc/nvme-strom ioctl(2) read(2) User Space Kernel Space
  17. 17. The PG-Strom Project Benchmark (1/2) – Star Schema Benchmark Open Source Summit Japan 2017 - An Intelligent Storage17 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Q1-1 Q1-2 Q1-3 Q2-1 Q2-2 Q2-3 Q3-1 Q3-2 Q3-3 Q3-4 Q4-1 Q4-2 Q4-3 QueryExecutionThroughput[MB/s] PostgreSQL SSDx1 PostgreSQL SSDx2 PG-Strom SSDx1 PG-Strom SSDx2  SQL with WHERE-clause, JOIN, Aggregation/GROUP BY towards 353GB database.  Usually, entire workloads are dominated by I/O.  PostgreSQL achieves up to 1.6GB/s with regular filesystem based access.  PG-Strom with SSD-to-GPU direct SQL execution achieved up to 3.8GB/s at the peak performance. ※ (Query Execution Throughput) = (Database Size; 353GB) / (Entire Query Response Time [sec]) Theoretical limitation by 2x SSD [4.4GB/s] Under Investigation Theoretical limitation by 1x SSD [4.4GB/s]
  18. 18. The PG-Strom Project Benchmark (2/2) – Star Schema Benchmark Open Source Summit Japan 2017 - An Intelligent Storage18  Example of Query SELECT sum(lo_revenue), d_year, p_brand1 FROM lineorder, date1, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_category = 'MFGR#12‘ AND s_region = 'AMERICA‘ GROUP BY d_year, p_brand1 ORDER BY d_year, p_brand1; customer 12M Rows (1.6GB) date1 2500 Rows (400KB) part 1.8M Rows (206MB) supplier 4M Rows (528MB) lineorder 2.4B Rows (351GB) Typical queries for summary; which consists of WHERE-clause, JOIN and GROUP BY. Size of lineorder table is larger than physical RAM size, majority of the workload is dominated by I/O.
  19. 19. The PG-Strom Project Hardware configuration on the benchmark 19 ▌Hardware/Software configuration  Model: DELL PowerEdge R730  CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2  RAM: 128GB  HDD: SAS 300GB x8 (RAID5)  OS: CentOS7.2 (3.10.0-327.el7)  SW: CUDA 7.5.17 NVIDIA driver 352.99 PostgreSQL 9.6.1 PG-Strom v2.0devel ▌GPU Specifications  NVIDIA Tesla K80  2x 2496 CUDA cores (560MHz)  2x 12GB GDDR5 RAM (240GB/s) ▌SSD Specifications  Intel SSD 750 (400GB) x2  Interface: PCIe 3.0 x4 (NVMe 1.2)  Max SeqRead: 2.2GB/s Open Source Summit Japan 2017 - An Intelligent Storage CPU2CPU1 SSD x2 Tesla K80 GPU Tesla K20 GPU (unused)
  20. 20. The PG-Strom Project (OT) Consistency with disk cache 20  Problem ① Storage block may not be up-to-date, if target blocks are kept on page caches. ② Short size DMA is not efficient.  Solution – Write back cached PostgreSQL blocks (= 8KB/block) to userspace buffer once, then kicks bulk transfer from RAM to GPU by CUDA APIs.  Almost equivalent cost with read(2) + cuMemcpyHtoDAsync()  Although PostgreSQL blocks are reordered on GPU memory, it generates correct results. Open Source Summit Japan 2017 - An Intelligent Storage BLK-100: uncached BLK-101: cached BLK-102: uncached BLK-103: uncached BLK-104: cached BLK-105: cached BLK-106: uncached BLK-107: uncached BLK-108: cached BLK-109: uncached BLCKSZ (=8KB) Transfer Size Per Request BLCKSZ * NChunks BLK-108: cached BLK-105: cached BLK-104: cached BLK-101: cached BLK-100: uncached BLK-102: uncached BLK-103: uncached BLK-106: uncached BLK-107: uncached BLK-109: uncached BLK-108: cached BLK-105: cached BLK-104: cached BLK-101: cached unused SSD-to-GPU P2P DMA File Userspace DMA Buffer (RAM) Device Memory (GPU) CUDA API (userspace) cuMemcpyHtoDAsync
  21. 21. The PG-Strom Project Not only Scan, but GpuJoin also... Open Source Summit Japan 2017 - An Intelligent Storage21 Inner relation Outer relation Inner relation Outer relation Hash table Hash table Next Step Next Step Just reference to the results on CPU side Hash Table Search by CPU Generation of results by CPU Generation of results by GPU Parallel Hash Table Search by GPU HashJoin by CPU GpuHashJoin Here we have SSD-to-GPU P2P DMA
  22. 22. The PG-Strom Project GpuPreAgg Not only Scan, but GpuPreAgg also... Open Source Summit Japan 2017 - An Intelligent Storage22 Aggregation by CPU Aggregation with GPU Relation Relation Aggregation Aggregation 1st Step: Parallel Hash Reduction 2nd Step: Atomic Merge Read of 1M rows Process 1M rows by CPU Smaller # of rows GpuPreAgg: Reduction of rows to be processed by CPU Interim results Here we have SSD-to-GPU P2P DMA.
  23. 23. The PG-Strom Project The Future Open Source Summit Japan 2017 - An Intelligent Storage23
  24. 24. The PG-Strom Project Single Node Solution 2x Enterprise Grade SSD Based on PCIe x8 NVIDIA Tesla P40 CPU RAM DMI SSD-1 SSD-2 GPU max 10GB/s capability of query processing / node PCI-E Hardware Block Diagram Open Source Summit Japan 2017 - An Intelligent Storage24
  25. 25. The PG-Strom Project Multi Node Solution (?) NVIDIA Tesla P40 Joint solution with PostgreSQL’s scale-out technologies 2x Enterprise Grade SSD Based on PCIe x8 Open Source Summit Japan 2017 - An Intelligent Storage25
  26. 26. The PG-Strom Project Scale Up Solution (?) 5x NVIDIA Tesla P40 DMI GPU-1 PCI-SW PCI-SW QPIx2 GPU-2 GPU-4 GPU-5 SSD-1 SSD-2 SSD-3 SSD-4 SSD-5 GPU-3 RAM RAM CPU 1 CPU 2 PCIe x16 x2 slot PCIe 96lane PCIe 96lane Deployment on GPU monster 5x Enterprise Grade SSD Based on PCIe x16 Open Source Summit Japan 2017 - An Intelligent Storage26
  27. 27. The PG-Strom Project Towards columnar structure (1/3) – On-memory Columnar Cache 27 ▌On-memory Columnar Cache  Caches PostgreSQL’s data-blocks (row-data) on memory as column-data  Reduction of data transfer between RAM<->GPU  Optimization of memory bandwidth of GPU  Switch GPU kernel to be invoked according to existence of column-cache Open Source Summit Japan 2017 - An Intelligent Storage BlockNum = 101 all_visible=false BlockNum = 102 all_visible=true BlockNum = 103 all_visible=true BlockNum = 104 all_visible=true Columnar Cache (BlockNum=102) Columnar Cache (BlockNum=102) background worker Construction UPDATE invalidation PG-Strom Row-data GPU Kernel for Row Column-data GPU Kernel for Column Row-data GPU Kernel for Row Column-data GPU Kernel for Column
  28. 28. The PG-Strom Project Towards columnar structure (2/3) – Background 28 ▌Row data format – Random memory access  Less usage ratio of memory bus, larger number of memory transactions Open Source Summit Japan 2017 - An Intelligent Storage ▌Column data format – Coalesced memory access  Optimal memory bus usage, least number of memory transactions 32bit Memory transaction width: 256bit 32bit 32bit32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit 32bit Memory transaction width: 256bit 32bit x8 = 256bit is valid in 256bit memory transaction. (usage ratio: 100.0%) Only 32bit x1 = 32bit is valid in 256bit memory transaction . (usage ratio: 12.5%) GPU Cores GPU Cores Why columnar structure is preferable for GPU architecture
  29. 29. The PG-Strom Project World of OLTP World of OLAP Towards columnar structure (3/3) – FPGA engine on SSD Open Source Summit Japan 2017 - An Intelligent Storage29 FPGA Logic on SSD enables write by row, then read as column RowColumn Transformer column-format Read of analytic data (column-format) Write of transactional data (row-format) Data-format transformation by FPGA on SSD SQL Execution – based on columnar format; best for GPU Pre-processed data Only required columns are fetched
  30. 30. The PG-Strom Project Towards the “true” intelligent storage system Open Source Summit Japan 2017 - An Intelligent Storage30 Large PostgreSQL Tables PCIe Bus NVMe SSD GPUSSD-to-GPU P2P DMA (NVMe-Strom driver) WHERE JOIN GROUP BYPostgreSQL Data Block Reduction of unnecessary rows according to the SQL  Smaller amount of I/O Step-1: A pair of NVMe-SSD and GPU looks like an intelligent storage which understand SQL workloads for filtering-out unnecessary rows and pre-process of JOIN/GROUP BY prior to data loading onto the main memory. Step-2: A special NVMe-SSD transforms data format for more suitable one to process, based on knowledge of the requester. (PostgreSQL’s block format in this case)
  31. 31. Any Questions?

×