This document discusses using PostgreSQL and GPU acceleration to build a machine learning platform. It describes HeteroDB, which provides database and analytics acceleration using GPUs. It outlines how PostgreSQL's foreign data wrapper Gstore_fdw manages persistent GPU device memory, allowing data to remain on the GPU between queries for faster analytics. Gstore_fdw also enables inter-process data collaboration by allowing processes to share access to GPU memory using IPC handles. This facilitates integrating PostgreSQL with external analytics code in languages like Python.
The document describes benchmark results achieved by using NVMe SSDs and GPU acceleration to improve the performance of PostgreSQL beyond typical limitations. A benchmark was run using 13 queries on a 1055GB dataset with PostgreSQL v11beta3 + PG-Strom v2.1. This achieved a maximum query execution throughput of 13.5GB/s. PG-Strom is an extension module that uses thousands of GPU cores and wide-band memory to accelerate SQL workloads. It generates GPU code from SQL and executes queries directly on the GPU, bypassing data transfers between CPU and GPU to improve performance.
PL/CUDA allows writing user-defined functions in CUDA C that can run on a GPU. This provides benefits for analytics workloads that can utilize thousands of GPU cores and wide memory bandwidth. A sample logistic regression implementation in PL/CUDA showed a 350x speedup compared to a CPU-based implementation in MADLib. Logistic regression performs binary classification by estimating weights for explanatory variables and intercept through iterative updates. This is well-suited to parallelization on a GPU.
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
The document describes a technology called PG-Strom that uses GPU acceleration to optimize I/O performance for PostgreSQL. PG-Strom allows data to be transferred directly from NVMe SSDs to the GPU over the PCIe bus, bypassing the CPU and RAM. This reduces data movement and allows PostgreSQL queries to be partially executed directly on the GPU. Benchmark results show the approach can achieve throughput close to the theoretical hardware limits for a single server configuration processing large datasets.
The document discusses PG-Strom, an open source project that uses GPU acceleration for PostgreSQL. PG-Strom allows for automatic generation of GPU code from SQL queries, enabling transparent acceleration of operations like WHERE clauses, JOINs, and GROUP BY through thousands of GPU cores. It introduces PL/CUDA, which allows users to write custom CUDA kernels and integrate them with PostgreSQL for manual optimization of complex algorithms. A case study on k-nearest neighbor similarity search for drug discovery is presented to demonstrate PG-Strom's ability to accelerate computational workloads through GPU processing.
1) The PG-Strom project aims to accelerate PostgreSQL queries using GPUs. It generates CUDA code from SQL queries and runs them on Nvidia GPUs for parallel processing.
2) Initial results show PG-Strom can be up to 10 times faster than PostgreSQL for queries involving large table joins and aggregations.
3) Future work includes better supporting columnar formats and integrating with PostgreSQL's native column storage to improve performance further.
This document discusses using GPUs and SSDs to accelerate PostgreSQL queries. It introduces PG-Strom, a project that generates CUDA code from SQL to execute queries massively in parallel on GPUs. The document proposes enhancing PG-Strom to directly transfer data from SSDs to GPUs without going through CPU/RAM, in order to filter and join tuples during loading for further acceleration. Challenges include improving the NVIDIA driver for NVMe devices and tracking shared buffer usage to avoid unnecessary transfers. The goal is to maximize query performance by leveraging the high bandwidth and parallelism of GPUs and SSDs.
The document describes benchmark results achieved by using NVMe SSDs and GPU acceleration to improve the performance of PostgreSQL beyond typical limitations. A benchmark was run using 13 queries on a 1055GB dataset with PostgreSQL v11beta3 + PG-Strom v2.1. This achieved a maximum query execution throughput of 13.5GB/s. PG-Strom is an extension module that uses thousands of GPU cores and wide-band memory to accelerate SQL workloads. It generates GPU code from SQL and executes queries directly on the GPU, bypassing data transfers between CPU and GPU to improve performance.
PL/CUDA allows writing user-defined functions in CUDA C that can run on a GPU. This provides benefits for analytics workloads that can utilize thousands of GPU cores and wide memory bandwidth. A sample logistic regression implementation in PL/CUDA showed a 350x speedup compared to a CPU-based implementation in MADLib. Logistic regression performs binary classification by estimating weights for explanatory variables and intercept through iterative updates. This is well-suited to parallelization on a GPU.
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
The document describes a technology called PG-Strom that uses GPU acceleration to optimize I/O performance for PostgreSQL. PG-Strom allows data to be transferred directly from NVMe SSDs to the GPU over the PCIe bus, bypassing the CPU and RAM. This reduces data movement and allows PostgreSQL queries to be partially executed directly on the GPU. Benchmark results show the approach can achieve throughput close to the theoretical hardware limits for a single server configuration processing large datasets.
The document discusses PG-Strom, an open source project that uses GPU acceleration for PostgreSQL. PG-Strom allows for automatic generation of GPU code from SQL queries, enabling transparent acceleration of operations like WHERE clauses, JOINs, and GROUP BY through thousands of GPU cores. It introduces PL/CUDA, which allows users to write custom CUDA kernels and integrate them with PostgreSQL for manual optimization of complex algorithms. A case study on k-nearest neighbor similarity search for drug discovery is presented to demonstrate PG-Strom's ability to accelerate computational workloads through GPU processing.
1) The PG-Strom project aims to accelerate PostgreSQL queries using GPUs. It generates CUDA code from SQL queries and runs them on Nvidia GPUs for parallel processing.
2) Initial results show PG-Strom can be up to 10 times faster than PostgreSQL for queries involving large table joins and aggregations.
3) Future work includes better supporting columnar formats and integrating with PostgreSQL's native column storage to improve performance further.
This document discusses using GPUs and SSDs to accelerate PostgreSQL queries. It introduces PG-Strom, a project that generates CUDA code from SQL to execute queries massively in parallel on GPUs. The document proposes enhancing PG-Strom to directly transfer data from SSDs to GPUs without going through CPU/RAM, in order to filter and join tuples during loading for further acceleration. Challenges include improving the NVIDIA driver for NVMe devices and tracking shared buffer usage to avoid unnecessary transfers. The goal is to maximize query performance by leveraging the high bandwidth and parallelism of GPUs and SSDs.
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~Kohei KaiGai
GPU processing provides significant performance gains for PostgreSQL according to benchmarks. PG-Strom is an open source project that allows PostgreSQL to leverage GPUs for processing queries. It generates CUDA code from SQL queries to accelerate operations like scans, joins, and aggregations by massive parallel processing on GPU cores. Performance tests show orders of magnitude faster response times for queries involving multiple joins and aggregations when using PG-Strom compared to the regular PostgreSQL query executor. Further development aims to support more data types and functions for GPU processing.
This document describes using in-place computing on PostgreSQL to perform statistical analysis directly on data stored in a PostgreSQL database. Key points include:
- An F-test is used to compare the variances of accelerometer data from different phone models (Nexus 4 and S3 Mini) and activities (walking and biking).
- Performing the F-test directly in PostgreSQL via SQL queries is faster than exporting the data to an R script, as it avoids the overhead of data transfer.
- PG-Strom, an extension for PostgreSQL, is used to generate CUDA code on-the-fly to parallelize the variance calculations on a GPU, further speeding up the F-test.
PL/CUDA allows running CUDA C code directly in PostgreSQL user-defined functions. This allows advanced analytics and machine learning algorithms to be run directly in the database.
The gstore_fdw foreign data wrapper allows data to be stored directly in GPU memory, accessed via SQL, eliminating the overhead of copying data between CPU and GPU memory for each query.
Integrating PostgreSQL with GPU computing and machine learning frameworks allows for fast data exploration and model training by combining flexible SQL queries with high-performance analytics directly on the data.
The column-oriented data structure of PG-Strom stores data in separate column storage (CS) tables based on the column type, with indexes to enable efficient lookups. This reduces data transfer compared to row-oriented storage and improves GPU parallelism by processing columns together.
This document discusses GPU accelerated computing and programming with GPUs. It provides characteristics of GPUs from Nvidia, AMD, and Intel including number of cores, memory size and bandwidth, and power consumption. It also outlines the 7 steps for programming with GPUs which include building and loading a GPU kernel, allocating device memory, transferring data between host and device memory, setting kernel arguments, enqueueing kernel execution, transferring results back, and synchronizing the command queue. The goal is to achieve super parallel execution with GPUs.
This document provides an introduction to HeteroDB, Inc. and its chief architect, KaiGai Kohei. It discusses PG-Strom, an open source PostgreSQL extension developed by HeteroDB for high performance data processing using heterogeneous architectures like GPUs. PG-Strom uses techniques like SSD-to-GPU direct data transfer and a columnar data store to accelerate analytics and reporting workloads on terabyte-scale log data using GPUs and NVMe SSDs. Benchmark results show PG-Strom can process terabyte workloads at throughput nearing the hardware limit of the storage and network infrastructure.
This document discusses using HyperLogLog (HLL) to estimate cardinality for count(distinct) queries in PostgreSQL.
HLL is an algorithm that uses constant memory to estimate the number of unique elements in a large set. It works by mapping elements to registers in a bitmap and tracking the number of leading zeros in each hash value. The harmonic mean of these counts is used to estimate cardinality.
PG-Strom implements HLL in PostgreSQL to enable fast count(distinct) queries on GPUs. On a table with 60 million rows and 87GB in size, HLL estimated the distinct count within 0.3% accuracy in just 9 seconds, over 40x faster than the regular count(distinct).
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be time consuming, and in an attempt to minimize this time, our project is a parallel implementation of K-Means clustering algorithm on CUDA using C. We present the performance analysis and implementation of our approach to parallelizing K-Means clustering.
PgOpenCL is a new PostgreSQL procedural language that allows developers to write OpenCL kernels to harness the parallel processing power of GPUs. It introduces a new execution model where tables can be copied to arrays, passed to an OpenCL kernel for parallel operations on the GPU, and results copied back to tables. This unlock the potential for dramatically improved performance on compute-intensive database operations like joins, aggregations, and sorting.
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
PG-Strom - A FDW module utilizing GPU deviceKohei KaiGai
PG-Strom is a module that utilizes GPUs to accelerate query processing in PostgreSQL. It uses a foreign data wrapper to push query execution to the GPU. Benchmark results show a query running 10 times faster on a table using the PG-Strom FDW compared to a regular PostgreSQL table. Future plans include supporting writable foreign tables, accelerating sort and aggregate operations using the GPU, and inheritance between regular and foreign tables. Help from the community is needed to review code, provide large real-world datasets, and understand common analytic queries.
아파치 네모로 빠르고 효율적으로 빅데이터 처리하기
- 송원욱, 양영석(서울대학교 컴퓨터공학부 소프트웨어 플랫폼 연구실)
개요 #
아파치 네모(Apache Nemo)는 빅데이터 애플리케이션의 분산 수행 방식을 다양한 자원 환경 및 데이터 특성에 맞춰 최적화하는 시스템입니다. Geo-distributed resources, transient resources, large data shuffle, skewed data 처리 상황에서 아파치 네모는 아파치 스파크(Apache Spark) 보다 월등하게 높은 성능을 보입니다.
목차 #
아파치 네모의 최적화 케이스 스터디
아파치 네모의 분산 실행 과정
앞으로의 연구 방향
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
PG-Strom is an extension of PostgreSQL that utilizes GPUs and NVMe SSDs to enable terabyte-scale data processing and in-database analytics. It features SSD-to-GPU Direct SQL, which loads data directly from NVMe SSDs to GPUs using RDMA, bypassing CPU and RAM. This improves query performance by reducing I/O traffic over the PCIe bus. PG-Strom also uses Apache Arrow columnar storage format to further boost performance by transferring only referenced columns and enabling vector processing on GPUs. Benchmark results show PG-Strom can process over a billion rows per second on a simple 1U server configuration with an NVIDIA GPU and multiple NVMe SSDs.
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~Kohei KaiGai
GPU processing provides significant performance gains for PostgreSQL according to benchmarks. PG-Strom is an open source project that allows PostgreSQL to leverage GPUs for processing queries. It generates CUDA code from SQL queries to accelerate operations like scans, joins, and aggregations by massive parallel processing on GPU cores. Performance tests show orders of magnitude faster response times for queries involving multiple joins and aggregations when using PG-Strom compared to the regular PostgreSQL query executor. Further development aims to support more data types and functions for GPU processing.
This document describes using in-place computing on PostgreSQL to perform statistical analysis directly on data stored in a PostgreSQL database. Key points include:
- An F-test is used to compare the variances of accelerometer data from different phone models (Nexus 4 and S3 Mini) and activities (walking and biking).
- Performing the F-test directly in PostgreSQL via SQL queries is faster than exporting the data to an R script, as it avoids the overhead of data transfer.
- PG-Strom, an extension for PostgreSQL, is used to generate CUDA code on-the-fly to parallelize the variance calculations on a GPU, further speeding up the F-test.
PL/CUDA allows running CUDA C code directly in PostgreSQL user-defined functions. This allows advanced analytics and machine learning algorithms to be run directly in the database.
The gstore_fdw foreign data wrapper allows data to be stored directly in GPU memory, accessed via SQL, eliminating the overhead of copying data between CPU and GPU memory for each query.
Integrating PostgreSQL with GPU computing and machine learning frameworks allows for fast data exploration and model training by combining flexible SQL queries with high-performance analytics directly on the data.
The column-oriented data structure of PG-Strom stores data in separate column storage (CS) tables based on the column type, with indexes to enable efficient lookups. This reduces data transfer compared to row-oriented storage and improves GPU parallelism by processing columns together.
This document discusses GPU accelerated computing and programming with GPUs. It provides characteristics of GPUs from Nvidia, AMD, and Intel including number of cores, memory size and bandwidth, and power consumption. It also outlines the 7 steps for programming with GPUs which include building and loading a GPU kernel, allocating device memory, transferring data between host and device memory, setting kernel arguments, enqueueing kernel execution, transferring results back, and synchronizing the command queue. The goal is to achieve super parallel execution with GPUs.
This document provides an introduction to HeteroDB, Inc. and its chief architect, KaiGai Kohei. It discusses PG-Strom, an open source PostgreSQL extension developed by HeteroDB for high performance data processing using heterogeneous architectures like GPUs. PG-Strom uses techniques like SSD-to-GPU direct data transfer and a columnar data store to accelerate analytics and reporting workloads on terabyte-scale log data using GPUs and NVMe SSDs. Benchmark results show PG-Strom can process terabyte workloads at throughput nearing the hardware limit of the storage and network infrastructure.
This document discusses using HyperLogLog (HLL) to estimate cardinality for count(distinct) queries in PostgreSQL.
HLL is an algorithm that uses constant memory to estimate the number of unique elements in a large set. It works by mapping elements to registers in a bitmap and tracking the number of leading zeros in each hash value. The harmonic mean of these counts is used to estimate cardinality.
PG-Strom implements HLL in PostgreSQL to enable fast count(distinct) queries on GPUs. On a table with 60 million rows and 87GB in size, HLL estimated the distinct count within 0.3% accuracy in just 9 seconds, over 40x faster than the regular count(distinct).
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be time consuming, and in an attempt to minimize this time, our project is a parallel implementation of K-Means clustering algorithm on CUDA using C. We present the performance analysis and implementation of our approach to parallelizing K-Means clustering.
PgOpenCL is a new PostgreSQL procedural language that allows developers to write OpenCL kernels to harness the parallel processing power of GPUs. It introduces a new execution model where tables can be copied to arrays, passed to an OpenCL kernel for parallel operations on the GPU, and results copied back to tables. This unlock the potential for dramatically improved performance on compute-intensive database operations like joins, aggregations, and sorting.
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
PG-Strom - A FDW module utilizing GPU deviceKohei KaiGai
PG-Strom is a module that utilizes GPUs to accelerate query processing in PostgreSQL. It uses a foreign data wrapper to push query execution to the GPU. Benchmark results show a query running 10 times faster on a table using the PG-Strom FDW compared to a regular PostgreSQL table. Future plans include supporting writable foreign tables, accelerating sort and aggregate operations using the GPU, and inheritance between regular and foreign tables. Help from the community is needed to review code, provide large real-world datasets, and understand common analytic queries.
아파치 네모로 빠르고 효율적으로 빅데이터 처리하기
- 송원욱, 양영석(서울대학교 컴퓨터공학부 소프트웨어 플랫폼 연구실)
개요 #
아파치 네모(Apache Nemo)는 빅데이터 애플리케이션의 분산 수행 방식을 다양한 자원 환경 및 데이터 특성에 맞춰 최적화하는 시스템입니다. Geo-distributed resources, transient resources, large data shuffle, skewed data 처리 상황에서 아파치 네모는 아파치 스파크(Apache Spark) 보다 월등하게 높은 성능을 보입니다.
목차 #
아파치 네모의 최적화 케이스 스터디
아파치 네모의 분산 실행 과정
앞으로의 연구 방향
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
PG-Strom is an extension of PostgreSQL that utilizes GPUs and NVMe SSDs to enable terabyte-scale data processing and in-database analytics. It features SSD-to-GPU Direct SQL, which loads data directly from NVMe SSDs to GPUs using RDMA, bypassing CPU and RAM. This improves query performance by reducing I/O traffic over the PCIe bus. PG-Strom also uses Apache Arrow columnar storage format to further boost performance by transferring only referenced columns and enabling vector processing on GPUs. Benchmark results show PG-Strom can process over a billion rows per second on a simple 1U server configuration with an NVIDIA GPU and multiple NVMe SSDs.
1) NVIDIA-Iguazio Accelerated Solutions for Deep Learning and Machine Learning (30 mins):
About the speaker:
Dr. Gabriel Noaje, Senior Solutions Architect, NVIDIA
http://bit.ly/GabrielNoaje
2) GPUs in Data Science Pipelines ( 30 mins)
- GPU as a Service for enterprise AI
- A short demo on the usage of GPUs for model training and model inferencing within a data science workflow
About the speaker:
Anant Gandhi, Solutions Engineer, Iguazio Singapore. https://www.linkedin.com/in/anant-gandhi-b5447614/
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfDLow6
RAPIDS accelerates data science and machine learning workflows in Python by leveraging GPUs. It includes cuDF for GPU-accelerated pandas functionality, cuML for scikit-learn compatible machine learning algorithms, cuGraph for graph analytics, and integrations with Dask and Spark. RAPIDS has a large community of contributors and is used by many Fortune 100 companies to speed up workflows, reduce costs, and scale to large datasets.
This is from a talk we gave at Graph the Planet 2019 where we demonstrated how BlazingSQL quickly queried Apache Parquet files and seamlessly sent the results to Graphistry to beautifully visualize Netflow Data in a Graph.
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco SlotCitus Data
One of the unique things about postgres is that extensions can add new functionality that falls outside the scope of a SQL database. Several postgres extensions add the ability to perform commands or query data on other servers. These extensions can be combined in interesting ways to form advanced distributed systems on top of postgres.
In this talk we will explore how extensions such as dblink, postgres_fdw, pglogical, pg_cron, and citus together with PL/pgSQL can be used as building blocks for distributed systems. We will give several demonstrations of using PostgreSQL as a distributed computing platform, including a MapReduce implementation that can transform very large tables, and a Kafka-like distributed queue.
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Abstract: We will introduce RAPIDS, a suite of open source libraries for GPU-accelerated data science, and illustrate how it operates seamlessly with MLflow to enable reproducible training, model storage, and deployment. We will walk through a baseline example that incorporates MLflow locally, with a simple SQLite backend, and briefly introduce how the same workflow can be deployed in the context of GPU enabled Kubernetes clusters.
PG-Strom is an open source PostgreSQL extension that accelerates analytic queries using GPUs. Key features of version 2.0 include direct loading of data from SSDs to GPU memory for processing, an in-memory columnar data cache for efficient GPU querying, and a foreign data wrapper that allows data to be stored directly in GPU memory and queried using SQL. These features improve performance by reducing data movement and leveraging the GPU's parallel architecture. Benchmark results show the new version providing over 3.5x faster query throughput for large datasets compared to PostgreSQL alone.
In this deck from FOSDEM'19, Thomas Schwinge presents: Speeding up Programs with OpenACC in GCC.
"Proven in production use for decades, GCC (the GNU Compiler Collection) offers C, C++, Fortran, and other compilers for a multitude of target systems. Over the last few years, we -- formerly known as "CodeSourcery", now a group in "Mentor, a Siemens Business" -- added support for the directive-based OpenACC programming model. Requiring only few changes to your existing source code, OpenACC allows for easy parallelization and code offloading to accelerators such as GPUs. We will present a short introduction of GCC and OpenACC, implementation status, examples, and performance results.
OpenACC is a user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model."
Watch the video: https://wp.me/p3RLHQ-jOR
Learn more: https://fosdem.org/2019/
and
https://www.openacc.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages toolslaparuma
The document discusses GPU computing software and development tools from NVIDIA. It provides an overview of language and API options for GPU computing like CUDA, OpenCL, and DirectCompute. It also describes development tools like the CUDA toolkit, Parallel Nsight debugger, and Visual Profiler for optimizing GPU applications. Foundation libraries and common design patterns are also covered to help developers leverage GPUs effectively.
This document discusses measuring and optimizing data processing performance on CPUs and GPUs. It provides examples using Cupy, NumPy, Apache Ignite, and ROCm to perform operations and measure throughput on different hardware. Specific examples include sorting random data on CPU and GPU to compare performance, running Apache Ignite benchmarks on different CPUs, and using ROCm and PyOpenCL to leverage an AMD GPU for general-purpose computing. The document emphasizes that tightly coupling fast memory and computation is important for workloads with large data processing requirements.
The document describes Griffon, a GPU programming API for scientific and general purpose applications. Griffon aims to provide a simple programming model like OpenMP while achieving high performance computation on GPUs like CUDA. It uses a source-to-source compiler with directives to generate CUDA code and handles memory management automatically. The document outlines Griffon's objectives, software architecture, directives for parallel regions, kernel flow control, synchronization, and overlapping GPU/CPU computation.
The document is a presentation about new features in PostgreSQL 9.6. It discusses several major new features including parallel queries, avoiding VACUUM on all-frozen pages using freeze maps, monitoring the progress of VACUUM, phrase full text search, multiple synchronous replication, remote_apply synchronous commit, and improved capabilities of the postgres_fdw extension including pushing down sorts, joins, updates and deletes to remote servers.
The document provides an overview of PostgreSQL best practices, including installation, configuration, performance optimization, and security. It discusses setting up a PostgreSQL cluster, optimizing the operating system, installing PostgreSQL, securing the database with configuration files, tuning main PostgreSQL parameters, and performing backups and recovery. It also outlines an OLTP performance benchmark comparing PostgreSQL to Oracle configurations and results.
This document provides an update on PGI compilers and tools for heterogeneous supercomputing. It discusses PGI's support for OpenACC directives to accelerate applications on multicore CPUs and NVIDIA GPUs from a single source. It highlights new compiler features including support for Intel Skylake, AMD EPYC and IBM POWER9 CPUs as well as NVIDIA Volta GPUs. Benchmark results show strong performance of OpenACC applications on these platforms. The document also discusses the growing adoption of OpenACC in HPC applications and resources available to support OpenACC development.
The document provides an overview of PostgreSQL best practices from initial setup to an OLTP performance benchmark against Oracle. It discusses PostgreSQL architecture, installation options, securing the PostgreSQL cluster, main configuration parameters, backup and recovery strategies. It then details the results of an OLTP performance benchmark test between PostgreSQL and Oracle using the same hardware, workload, and configuration. The test found Oracle had slightly better performance with a shorter completion time and higher maximum transactions per minute compared to PostgreSQL.
BlazingSQL gave a talk at the GPU Technology conference in San Jose 2019 to talk about our GPU-accelerated SQL engine on the open source RAPIDS AI stack. Directly SQL any data source and raw files into GPU memory with BlazingSQL and 5 lines of Python code!
1) The document describes a presentation on using SSD-to-GPU direct SQL to accelerate PostgreSQL by bypassing CPU/RAM and loading data directly from NVMe SSD to GPU.
2) Benchmark results show the technique achieved query execution performance up to 13.5GB/s, close to the raw I/O limitation of the hardware configuration.
3) One challenge discussed is performing partition-wise joins efficiently across multiple SSDs and GPUs to avoid gathering large results sets.
Transforming Product Development using OnePlan To Boost Efficiency and Innova...OnePlan Solutions
Ready to overcome challenges and drive innovation in your organization? Join us in our upcoming webinar where we discuss how to combat resource limitations, scope creep, and the difficulties of aligning your projects with strategic goals. Discover how OnePlan can revolutionize your product development processes, helping your team to innovate faster, manage resources more effectively, and deliver exceptional results.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
What is Continuous Testing in DevOps - A Definitive Guide.pdfkalichargn70th171
Once an overlooked aspect, continuous testing has become indispensable for enterprises striving to accelerate application delivery and reduce business impacts. According to a Statista report, 31.3% of global enterprises have embraced continuous integration and deployment within their DevOps, signaling a pervasive trend toward hastening release cycles.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
The Rising Future of CPaaS in the Middle East 2024Yara Milbes
Explore "The Rising Future of CPaaS in the Middle East in 2024" with this comprehensive PPT presentation. Discover how Communication Platforms as a Service (CPaaS) is transforming communication across various sectors in the Middle East.
Boost Your Savings with These Money Management AppsJhone kinadey
A money management app can transform your financial life by tracking expenses, creating budgets, and setting financial goals. These apps offer features like real-time expense tracking, bill reminders, and personalized insights to help you save and manage money effectively. With a user-friendly interface, they simplify financial planning, making it easier to stay on top of your finances and achieve long-term financial stability.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
DevOps Consulting Company | Hire DevOps Servicesseospiralmantra
Spiral Mantra excels in providing comprehensive DevOps services, including Azure and AWS DevOps solutions. As a top DevOps consulting company, we offer controlled services, cloud DevOps, and expert consulting nationwide, including Houston and New York. Our skilled DevOps engineers ensure seamless integration and optimized operations for your business. Choose Spiral Mantra for superior DevOps services.
https://www.spiralmantra.com/devops/
Liberarsi dai framework con i Web Component.pptxMassimo Artizzu
In Italian
Presentazione sulle feature e l'utilizzo dei Web Component nell sviluppo di pagine e applicazioni web. Racconto delle ragioni storiche dell'avvento dei Web Component. Evidenziazione dei vantaggi e delle sfide poste, indicazione delle best practices, con particolare accento sulla possibilità di usare web component per facilitare la migrazione delle proprie applicazioni verso nuovi stack tecnologici.
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...kalichargn70th171
In today's fiercely competitive mobile app market, the role of the QA team is pivotal for continuous improvement and sustained success. Effective testing strategies are essential to navigate the challenges confidently and precisely. Ensuring the perfection of mobile apps before they reach end-users requires thoughtful decisions in the testing plan.
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
20181025_pgconfeu_lt_gstorefdw
1. PostgreSQL as a Machine-Learning Platform
〜Gstore_fdw and data collaboration〜
HeteroDB,Inc
Chief Architect & CEO
KaiGai Kohei <kaigai@heterodb.com>
2. about HeteroDB
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-2
Corporate overview
Name HeteroDB,Inc
Established 4th-Jul-2017
Headcount 2 (KaiGai and Kashiwagi)
Location Shinagawa, Tokyo, Japan
Businesses Sales of accelerated database product
Technical consulting on GPU&DB region
By the heterogeneous-computing technology on the database area,
we provides a useful, fast and cost-effective data analytics platform
for all the people who need the power of analytics.
CEO Profile
KaiGai Kohei – He has contributed for PostgreSQL and Linux kernel
development in the OSS community more than ten years, especially,
for security and database federation features of PostgreSQL.
Award of “Genius Programmer” by IPA MITOH program (2007)
The top-5 posters finalist at GPU Technology Conference 2017.
3. Friday 11:50 - 12:40
NVME and GPU accelerates
PostgreSQL
Features of RDBMS
✓ High-availability / Clustering
✓ DB administration and backup
✓ Transaction control
✓ BI and visualization
➔ We can use the products that
support PostgreSQL as-is.
Core technology – PG-Strom
PG-Strom: An extension module for PostgreSQL, to accelerate SQL
workloads by the thousands cores and wide-band memory of GPU.
GPU
Big-data Analytics
PG-Strom
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-3
Machine-learning & Statistics
4. GPU’s characteristics - mostly as a computing accelerator
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-4
Over 10years history in HPC, then massive popularization in Machine-Learning
NVIDIA Tesla V100
Super Computer
(TITEC; TSUBAME3.0) Computer Graphics Machine-Learning
How PG-Strom utilizes the power of GPU for in-database analytics?
Simulation
6. PGconf.SV 2016 at SunFrancisco
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-6
Acceleration of drug-discovery workloads
with in-database analytics approach
using PL/CUDA user defined function
7. PL/CUDA Used Defined Function
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-7
Result
Scan
Pre-Process
Analytics
Post-ProcessCREATE FUNCTION
my_logic(int, real[], real[])
RETURNS real[]
AS $$
$$ LANGUAGE ‘plcuda’;
Custom CUDA C code block
(runs on GPU device)
▌Don’t export the dataset for analytics using external software.
▌All you pull out from the database is “result” of analytics.
8. PL/CUDA works on Drug-Discovery workloads (1/2)
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-8
Database chemical
compounds set
(D; 10M records scale)
Query chemical
compounds set
(Q; ~1000 records scale)
Calculation of
their similarity
Target Protein “similar compounds” will
have higher probability of active
10 billions
combination
Similarity-search on chemical compounds is researcher’s daily job.
9. PL/CUDA works on Drug-Discovery workloads (2/2)
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-9
30.25
145.29
295.95
1503.31
3034.94
13.00 13.23 13.59 16.01 19.13
0
500
1000
1500
2000
2500
3000
3500
10 50 100 500 1000
QueryResponseTime[sec]
Number of Query Compounds [Q]
Similarity search of chemical compounds by k-NN method (k=3, D=10M)
CPU(E5-2670v3) GTX1080
Yes, GPU accelerates the workloads more than x150 time faster!
x150 times
shorter
response!
11. PL/CUDA works on Drug-Discovery workloads (2/2)
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-11
30.25
145.29
295.95
1503.31
3034.94
13.00 13.23 13.59 16.01 19.13
0
500
1000
1500
2000
2500
3000
3500
10 50 100 500 1000
QueryResponseTime[sec]
Number of Query Compounds [Q]
Similarity search of chemical compounds by k-NN method (k=3, D=10M)
CPU(E5-2670v3) GTX1080
CPU version consumes time according to the scale of calculation
x100 times larger execution time
for x100 times larger calculation amount
12. PL/CUDA works on Drug-Discovery workloads (2/2)
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-12
30.25
145.29
295.95
1503.31
3034.94
13.00 13.23 13.59 16.01 19.13
0
500
1000
1500
2000
2500
3000
3500
10 50 100 500 1000
QueryResponseTime[sec]
Number of Query Compounds [Q]
Similarity search of chemical compounds by k-NN method (k=3, D=10M)
CPU(E5-2670v3) GTX1080
Why GPU version takes relatively longer time for the small workloads
Less than x2 times larger
execution time
for x100 times larger
calculation amount
13. Invocation of PL/CUDA functions
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-13
PREPARE knn_sim_rand_10m_gpu_v2(int) -- arg1:@k-value
AS
SELECT row_number() OVER (),
fp.name,
similarity
FROM (SELECT float4_as_int4(key_id) key_id, similarity
FROM matrix_unnest(
(SELECT rbind( knn_gpu_similarity($1,Q.matrix,
D.matrix))
FROM (SELECT cbind(array_matrix(id),
array_matrix(bitmap)) matrix
FROM finger_print_query) Q,
(SELECT matrix
FROM finger_print_10m_matrix) D
)
) AS sim(key_id real, similarity real)
ORDER BY similarity DESC) sim,
finger_print_10m fp
WHERE fp.id = sim.key_id
LIMIT 1000;
Time consumption by argument setup
(10~11sec per invocation)
16. Gstore_fdw - FDW on behalf of GPU device memory region
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-16
GPU world
Storage
SQL world
GPU device memory
Foreign Table
(gstore_fdw)
INSERT
UPDATE
DELETE
SELECT
Reference
by Zero-copy
✓ Data Format Conversion
✓ Data Compression
✓ Transaction Controls
PL/CUDA
User Defined
Function
17. Gstore_fdw manages persistent device memory (1/4)
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-17
CREATE FOREIGN TABLE ft (
id int,
x0 real,
x1 real,
x2 real,
x3 real,
x4 real,
x5 real,
x6 real,
x7 real,
x8 real,
x9 real
) SERVER gstore_fdw
OPTIONS (pinning '0', format 'pgstrom');
18. Gstore_fdw manages persistent device memory (2/4)
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-18
postgres=# INSERT INTO ft
(SELECT x, 100*random(), 100*random(), 100*random(),
100*random(), 100*random(), 100*random(),
100*random(), 100*random(), 100*random(),
100*random()
FROM generate_series(1,10000000) x);
LOG: alloc: preserved memory 440000320 bytes
INSERT 0 10000000
Acquired 440MB of GPU device memory, then
load the written data chunk to GPU device
19. Gstore_fdw manages persistent device memory (3/4)
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-19
Before INSERT
$ nvidia-smi
Sun Nov 12 00:03:30 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:02:00.0 Off | 0 |
| N/A 36C P0 52W / 250W | 171MiB / 22912MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12438 C ...bgworker: PG-Strom GPU memory keeper 161MiB |
+-----------------------------------------------------------------------------+
20. Gstore_fdw manages persistent device memory (3/4)
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-20
After INSERT
$ nvidia-smi
Sun Nov 12 00:06:01 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:02:00.0 Off | 0 |
| N/A 36C P0 51W / 250W | 591MiB / 22912MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12438 C ...bgworker: PG-Strom GPU memory keeper 581MiB |
+-----------------------------------------------------------------------------+
Preserved GPU device memory
even if PostgreSQL session closed
23. CUDA Driver API - Interprocess device memory handling
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-23
CUresult cuIpcGetMemHandle(CUipcMemHandle *pHandle,
CUdeviceptr dptr);
Gets an interprocess memory handle for an existing device memory allocation.
CUresult cuIpcOpenMemHandle(CUdeviceptr* pdptr,
CUipcMemHandle handle,
unsigned int flags )
Opens an interprocess memory handle exported from another process and returns a
device pointer usable in the local process.
Gets a unique identifier of GPU device memory at the owner process
Opens the GPU device memory using the unique identifier at the other process
24. PostgreSQL as a Machine-Learning Platform
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-24
GPU world
Inter-Process
Data Collaboration
Storage
SQL world
GPU device
memory
Foreign Table
(gstore_fdw)
INSERT
UPDATE
DELETE
SELECT
User’s Custrom
Python Scripts
IPC Handle
IPC Handle
Internal data structure that
is compatible to Python’s
analytics libraries
ndarray data
ndarray data
Gets a unique identifier of GPU device memory at the owner process
25. Step-1. Export IPC handle of GPU device memory
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-25
postgres=# select gstore_export_ipchandle('ft’);
gstore_export_ipchandle
-------------------------------------------------------------
¥x006b73020000000060110000000000000075020000000000000020000000
0000000000000000000000020000000000005b000000000000002000d0c1ff
00005c
(1 row)
CUDA runtime returns a unique identifier with 64bytes length
26. Step-2. Open IPC handle on your Python script
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-26
#!/usr/bin/python
import psycopg2
import pystrom
# connect to PostgreSQL server
conn = psycopg2.connect("host=localhost dbname=postgres")
# Get IPC handle of the foreign-table ‘ft’
curr = conn.cursor()
curr.execute("select gstore_export_ipchandle('ft')::bytea")
row = curr.fetchone()
conn.close()
# Get cupy.ndarray object; 2D-matrix with float4
# which is consists of column ‘x’, ’y’ and ‘z’
X = pystrom.ipc_import(row[0], ['x','y','z'])
27. Step-3. Data is now already loaded on GPU. Do analytics as you like.
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-27
At Python script:
>>> X
array([[0.05267062, 0.15842682, 0.95535886],
[0.8110889 , 0.75173104, 0.09625155],
[0.0950045 , 0.71161145, 0.6916123 ],
...,
[0.32576588, 0.8340051 , 0.82255083],
[0.12769088, 0.23999453, 0.28765103],
[0.07242639, 0.14565416, 0.7454422 ]], dtype=float32)
At PostgreSQL:
postgres=# SELECT * FROM ft LIMIT 5;
id | x | y | z
----+-----------+----------+-----------
1 | 0.0526706 | 0.158427 | 0.955359
2 | 0.811089 | 0.751731 | 0.0962516
3 | 0.0950045 | 0.711611 | 0.691612
4 | 0.051835 | 0.405314 | 0.0207166
5 | 0.598073 | 0.4739 | 0.492226
(5 rows)
28. Step-4. All the stuff in Python, do your analytics workloads on GPUs
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-28
◆ Dot Product
>>> cupy.dot(X[:,0],X[:,1])
array(24974.453, dtype=float32)
◆ Transpose Matrix
>>> cupy.transpose(X)
array([[0.8655484, 0.9804696, 0.43135548, ..., 0.58545816,
0.9951294, 0.14361869],
[0.12646914, 0.92461866, 0.14051293, ..., 0.5793936,
0.7182556 , 0.15441231],
[0.10312917, 0.2307432 , 0.6121663 , ..., 0.78983736,
0.19550513, 0.38183048]], dtype=float32)
29. Our vision for in-database analytics & machine-learning
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-29
gstore_fdw
Data manipulation
on the local sideData
Collaboration
Good bye CSV, Data Management is a suitable job for DBMS
Python runs
statistical analysis &
machine-learning
Data Scientist
are responsible for both of data
management and data analytics;
including machine-learning.
Data Lake
Data Warehouse
postgres_fdw / xxx_fdw
connects remote database for data import.
Available to run filtering, pre-processing
and others on the remote side.
30. Resources
PostgreSQL as Machine-Learning Platform -PGconf.EU 2018-30
▌PG-Strom
GitHub:
https://github.com/heterodb/pg-strom
Documentation:
http://heterodb.github.io/pg-strom/
▌System requirement
Plan to distribute VM image for Microsoft Azure GPU instance
....likely, by end of the November (coming soon!)
Or, your on-premise environment, of course.
https://github.com/heterodb/pg-strom/wiki/002:-HW-Validation-List
▌Contact
ML: pgstrom@heterodb.com
e-mail: kaigai@heterodb.com
Twitter: @kkaigai