PL/CUDA allows writing user-defined functions in CUDA C that can run on a GPU. This provides benefits for analytics workloads that can utilize thousands of GPU cores and wide memory bandwidth. A sample logistic regression implementation in PL/CUDA showed a 350x speedup compared to a CPU-based implementation in MADLib. Logistic regression performs binary classification by estimating weights for explanatory variables and intercept through iterative updates. This is well-suited to parallelization on a GPU.
KaiGai's talk at PGconf.EU 2018, Lisbon.
It shows how SSD2GPU Direct SQL of PG-Strom accelerates I/O intensive big-data queries using GPU in contradiction to the common sense.
KaiGai's talk at PGconf.EU 2018, Lisbon.
It shows how SSD2GPU Direct SQL of PG-Strom accelerates I/O intensive big-data queries using GPU in contradiction to the common sense.
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be time consuming, and in an attempt to minimize this time, our project is a parallel implementation of K-Means clustering algorithm on CUDA using C. We present the performance analysis and implementation of our approach to parallelizing K-Means clustering.
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
아파치 네모로 빠르고 효율적으로 빅데이터 처리하기
- 송원욱, 양영석(서울대학교 컴퓨터공학부 소프트웨어 플랫폼 연구실)
개요 #
아파치 네모(Apache Nemo)는 빅데이터 애플리케이션의 분산 수행 방식을 다양한 자원 환경 및 데이터 특성에 맞춰 최적화하는 시스템입니다. Geo-distributed resources, transient resources, large data shuffle, skewed data 처리 상황에서 아파치 네모는 아파치 스파크(Apache Spark) 보다 월등하게 높은 성능을 보입니다.
목차 #
아파치 네모의 최적화 케이스 스터디
아파치 네모의 분산 실행 과정
앞으로의 연구 방향
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter
Big data analysis using Tajo on AWS (Hands-on session)
- presented by Young-kyong Ko, data analyst at Gruter
- at Gruter TECHDAY 2014 (Oct. 29 Seoul, Korea)
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataHitoshi Sato
Presentation Slides for ExaComm2018, Fourth International Workshop on Communication Architectures for HPC, Big Data, Deep Learning and Clouds at Extreme Scale, in conjunction with International Supercomputing Conference (ISC 2018)
http://nowlab.cse.ohio-state.edu/exacomm/
The search for faster computing remains of great importance to the software community. Relatively inexpensive modern hardware, such as GPUs, allows users to run highly parallel code on thousands, or even millions of cores on distributed systems.
Building efficient GPU software is not a trivial task, often requiring a significant amount of engineering hours to attain the best performance. Similarly, distributed computing systems are inherently complex. In recent years, several libraries were developed to solve such problems. However, they often target a single aspect of computing, such as GPU computing with libraries like CuPy, or distributed computing with Dask.
Libraries like Dask and CuPy tend to provide great performance while abstracting away the complexity from non-experts, being great candidates for developers writing software for various different applications. Unfortunately, they are often difficult to be combined, at least efficiently.
With the recent introduction of NumPy community standards and protocols, it has become much easier to integrate any libraries that share the already well-known NumPy API. Such changes allow libraries like Dask, known for its easy-to-use parallelization and distributed computing capabilities, to defer some of that work to other libraries such as CuPy, providing users the benefits from both distributed and GPU computing with little to no change in their existing software built using the NumPy API.
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be time consuming, and in an attempt to minimize this time, our project is a parallel implementation of K-Means clustering algorithm on CUDA using C. We present the performance analysis and implementation of our approach to parallelizing K-Means clustering.
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
아파치 네모로 빠르고 효율적으로 빅데이터 처리하기
- 송원욱, 양영석(서울대학교 컴퓨터공학부 소프트웨어 플랫폼 연구실)
개요 #
아파치 네모(Apache Nemo)는 빅데이터 애플리케이션의 분산 수행 방식을 다양한 자원 환경 및 데이터 특성에 맞춰 최적화하는 시스템입니다. Geo-distributed resources, transient resources, large data shuffle, skewed data 처리 상황에서 아파치 네모는 아파치 스파크(Apache Spark) 보다 월등하게 높은 성능을 보입니다.
목차 #
아파치 네모의 최적화 케이스 스터디
아파치 네모의 분산 실행 과정
앞으로의 연구 방향
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter
Big data analysis using Tajo on AWS (Hands-on session)
- presented by Young-kyong Ko, data analyst at Gruter
- at Gruter TECHDAY 2014 (Oct. 29 Seoul, Korea)
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataHitoshi Sato
Presentation Slides for ExaComm2018, Fourth International Workshop on Communication Architectures for HPC, Big Data, Deep Learning and Clouds at Extreme Scale, in conjunction with International Supercomputing Conference (ISC 2018)
http://nowlab.cse.ohio-state.edu/exacomm/
The search for faster computing remains of great importance to the software community. Relatively inexpensive modern hardware, such as GPUs, allows users to run highly parallel code on thousands, or even millions of cores on distributed systems.
Building efficient GPU software is not a trivial task, often requiring a significant amount of engineering hours to attain the best performance. Similarly, distributed computing systems are inherently complex. In recent years, several libraries were developed to solve such problems. However, they often target a single aspect of computing, such as GPU computing with libraries like CuPy, or distributed computing with Dask.
Libraries like Dask and CuPy tend to provide great performance while abstracting away the complexity from non-experts, being great candidates for developers writing software for various different applications. Unfortunately, they are often difficult to be combined, at least efficiently.
With the recent introduction of NumPy community standards and protocols, it has become much easier to integrate any libraries that share the already well-known NumPy API. Such changes allow libraries like Dask, known for its easy-to-use parallelization and distributed computing capabilities, to defer some of that work to other libraries such as CuPy, providing users the benefits from both distributed and GPU computing with little to no change in their existing software built using the NumPy API.
Alpine Data Labs presents a deep dive into our implementation of Multinomial Logistic Regression with Apache Spark. Machine Learning Engineer DB Tsai takes us through the technical implementation details step by step. First, he explains how the state of the art Machine Learning on Hadoop is not doing fulfilling the promise of Big Data. Next, he explains how Spark is a perfect match for machine learning through their in-memory cache-ing capability demonstrating 100x performance improvement. Third, he takes us through each aspect of a multinomial logistic regression and how this is developed with Spark APIs. Fourth, he demonstrates an extension of MLOR and training parameters. Finally, he benchmarks MLOR with 11M rows, 123 features, 11% non-zero elements with a 5 node Hadoop cluster. Finally, he shows Alpine's unique visual environment with Spark and verifies the performance with the job tracker. In conclusion, Alpine supports the state of the art Cloudera and Pivotal Hadoop clusters and performances at a level that far exceeds its next nearest competitor.
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries.
Mark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries.
spaGO: A self-contained ML & NLP library in GOMatteo Grella
Introduction to spaGO, a beautiful and maintainable machine learning library written in Go designed to support relevant neural network architectures in natural language processing tasks.
Github: https://github.com/nlpodyssey/spago
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
We all make mistakes while programming and spend a lot of time fixing them.
One of the methods which allows for quick detection of defects is source code static analysis.
We all make mistakes while programming and spend a lot of time fixing them.
One of the methods which allows for quick detection of defects is source code static analysis.
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
This talk covers the current parallel capabilities in MATLAB*. Learn about its parallel language and distributed and tall arrays. Interact with GPUs both on the desktop and in the cluster. Combine this information into an interesting algorithmic framework for data analysis and simulation.
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
Все мы знаем, что наш любимый Pandas исключительно однопоточный, а модели из scikit-learn часто учатся не очень быстро даже в несколько процессов. Поэтому в докладе я расскажу о проекте RAPIDS - наборе библиотек для анализа данных и построения предиктивных моделей с использованием NVIDIA GPU. В докладе я предложу подискутировать о том, что закон Мура больше не выполняется, рассмотрю принципы работы архитектуры CUDA. Разберу библиотеки cuDF и cuML, а также постараюсь предельно честно рассказать о том, ждать ли чуда от перехода на GPU и в каких случаях чудо неизбежно.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
20181212 - PGconfASIA - LT - English
1. In-database Analytics using GPU
~ Tried to implement Logistic Regression Analytics~
HeteroDB,Inc
Chief Architect & CEO
KaiGai Kohei <kaigai@heterodb.com>
2. Hello guys,
Are you using PL/CUDA?
Hello guys. Are you using PL/CUDA?
This caption is not automatic by machine-learning. I preliminary write up by manual.
PGconf.ASIA 2018 LT - In-database Analytics using GPU2
3. Result
PL/CUDA User Defined Function
PGconf.ASIA 2018 LT - In-database Analytics using GPU3
▌What is PL/CUDA?
PL/CUDA allows UDF written in CUDA C which is executable on GPU.
▌Characteristics
Extreme optimization of GPU code by manual; not auto-generated.
Fully integration of SQL for pre-/post-processes; with flexible operations
All In-database Analytics
Scan
Pre-Process
Analytics
Post-ProcessCREATE FUNCTION
my_logic( reggstore, text )
RETURNS matrix
AS $$
$$ LANGUAGE ‘plcuda’;
Custom CUDA C code block
(runs on GPU device) Manual optimization for statistics
and machine-learning
Utilization of thousands cores and
wide-band device memory.
ready
PL/CUDA allows UDF written in CUDA C program that is executable on GPU. Valuable due to integration of
manual (extreme) optimization for GPU and flexible data operation by SQL.
4. PL/CUDA Use Case – Similarity Search on Drug-Discovery
PGconf.ASIA 2018 LT - In-database Analytics using GPU4
ID NAME Fingerprint (1024bit)
1 CHEMBL153534 00000000000100000010000000000010001000000...
2 CHEMBL405398 00000000000000010010000000000000000100000...
3 CHEMBL503634 00000100000000000000010000000000000000000...
: : :
Data structure of chemical compounds
Database compounds
(10M items)
Query compounds
(~1,000 items)
To be checked = 10billion combinations
DB Server
Similarity
Search Logic
Query
List of similar
chemical
compounds
For similarity search on drug-discovery, GPU calculated 10billion of distance between chemical compounds
x150 times faster than C-binary on CPU. It is very computing intensive workloads.
x150 times
faster!!
response time of the similarity search by k-NN method (k=3, D=10M)
number of query compounds [Q]
5. Is there any sample program?
Oh.... this case was proprietary algorithm. Now we have no sample code in public.
Is there any sample programs?
PGconf.ASIA 2018 LT - In-database Analytics using GPU5
6. I tried to make it.
Theme: Logistic Regression Analytics
I tried to make it.
Theme: Logistic Regression Analytics
PGconf.ASIA 2018 LT - In-database Analytics using GPU6
7. What is Logistics Regression Analytics (1/2)
A method for binary classification
Logistic Regression Analytics is a machine-learning method for binary classification.
True
False
PGconf.ASIA 2018 LT - In-database Analytics using GPU7
8. What is Logistics Regression Analytics (2/2)
Probability of right classification follows the logistic function.
Probability of “right” classification follows the logistic function
𝜎 𝛼 =
1
1 − 𝑒−𝛼
PGconf.ASIA 2018 LT - In-database Analytics using GPU8
9. Estimation of the parameters (1/3)
In general ....
Parameter: 𝑤 = 𝑤0, 𝑤1, ⋯ , 𝑤 𝑚
Explanatory variables: 𝜑𝑖 = 1, 𝑥1, ⋯ , 𝑥 𝑚 𝑖
Teacher data: 𝑡𝑖 = 0 𝑜𝑟 1
Determination of the division
surface is equivalent to seek
the weight of explanatory
variables and intercept.
0 = 𝑤0 + 𝑤1 𝑥 + 𝑤2 𝑦
Determination of division surface is equivalent to seek the weight of the explanatory variables and
intercept. But teacher data tell us boolean state for the combination of explanatory variables.
PGconf.ASIA 2018 LT - In-database Analytics using GPU9
10. Estimation of the parameters (2/3)
Target: Maximize the probability of the training set.
When 𝑧𝑖 = 𝜎 𝑊 𝑇
𝜑𝑖 , 𝑷 = 𝑃𝑖 = 𝑍𝑖
𝑡 𝑖
1 − 𝑍𝑖
1−𝑡 𝑖𝑁
𝑖=1
𝑁
𝑖=1
Distance from the division surface
introduces certainness of the
classification.
We assume the training set is a result
by the feasible probability.
Explanatory variables far from the division surface has higher probability of true/false. We assume the
training-set is result of the highest likelihood, maximized by the W parameter.
PGconf.ASIA 2018 LT - In-database Analytics using GPU10
11. Estimation of the parameters (3/3)
Parameter estimation by iteration of:
𝑤 𝑛𝑒𝑤 = 𝑤 𝑜𝑙𝑑 − Φ 𝑇
𝑅Φ −1
Φ 𝑇
𝑧 − 𝑡
i.e,
Φ =
1 𝑥11 ⋯ 𝑥1𝑚
⋮ ⋱ ⋮
1 𝑥 𝑛1 ⋯ 𝑥 𝑛𝑚
𝑡 = 𝑡1, … , 𝑡 𝑛
𝑧 = 𝑧1, … , 𝑧 𝑛
𝑅 = 𝑑𝑖𝑎𝑔 𝑧1 1 − 𝑧1 , … , 𝑧 𝑛 1 − 𝑧 𝑛
For more details, check out the book. Anyway, W is updated for each iteration, then Wnew shall seek to the
reasonable parameter then Wold. Eventually, difference of Wnew and Wold becomes very small.
For more details, check the book
“The first step of machine-learning theory”
PGconf.ASIA 2018 LT - In-database Analytics using GPU11
12. Amount of the calculation
▌# of explanatory variables (small): several to several hundreds ... m items
▌# of training data (large): several hundreds to several millions ... n items
𝑤 𝑛𝑒𝑤 = 𝑤 𝑜𝑙𝑑 − 𝑤Δ = 𝑤 𝑜𝑙𝑑 − Φ 𝑇
𝑅Φ −1
Φ 𝑇
𝑧 − 𝑡
Estimation for amount of the calculation. # of explanatory variables are to up hundreds, but # of training
data set is more than million items. It is suitable for parallel calculation by GPU.
ΦR
n
-1
Φ 𝑇
𝑧 − 𝑡
Φ 𝑇
n
m
n
1
-1
Φ 𝑇
𝑅Φ −1
Φ 𝑇
𝑧 − 𝑡
𝑤Δ
𝑚 × 𝑚 𝑚 × 1
𝑚 × 1
PGconf.ASIA 2018 LT - In-database Analytics using GPU12
13. Example of GPU code for matrix-products Φ 𝑇 𝑅Φ
KERNEL_FUNCTION_MAXTHREADS(void) logregr_update_P(cl_double **Preg, /* out */
cl_float **Xp,
cl_int width,
VectorTypeFloat *Z) {
cl_double *P = Preg[0];
__shared__ cl_float v[MAXTHREADS_PER_BLOCK]; // shared variables
nitems_bs = TYPEALIGN(get_local_size(), nitems);
nloops = width * width * nitems_bs;
for (loop = get_global_id(); // unique identifier of GPU threads
loop < nloops;
loop += get_global_size()) { // add total number of GPU threads
k = loop % nitems_bs; // index of 𝑅 column/row
i = (loop / nitems_bs) % width; // index of Φ 𝑇 column
j = loop / (nitems_bs * width); // index of Φ column
if (k < nitems) {
cl_float z = Z->values[k];
cl_float x1 = (i == 0 ? 1.0 : Xp[i-1][k]);
cl_float x2 = (j == 0 ? 1.0 : Xp[j-1][k]);
v[get_local_id()] = x1 * z * (1.0 - z) * x2;
}
else
v[get_local_id()] = 0.0;
sum = pgstromTotalSum(v,MAXTHREADS_PER_BLOCK); // total sum of the element
if (get_local_id() == 0) // calculated by the sibling threads
atomicAdd(&P[i + j * width], sum);
__syncthreads();
}
}
PGconf.ASIA 2018 LT - In-database Analytics using GPU13
14. Calculation by GPU – A case for reduction algorithm
●item[0]
step.1 step.2 step.4step.3
Sum count by GPU
Σi=0...N-1item[i]
◆
●
▲ ■ ★
● ◆
●
● ◆ ▲
●
● ◆
●
● ◆ ▲ ■
●
● ◆
●
● ◆ ▲
●
● ◆
●
item[1]
item[2]
item[3]
item[4]
item[5]
item[6]
item[7]
item[8]
item[9]
item[10]
item[11]
item[12]
item[13]
item[14]
item[15]
Sum of the items[]
in log2N steps
Inter-core synchronization by HW support
SELECT count(X),
sum(Y),
avg(Z)
FROM my_table;
Also used by aggregation
PGconf.ASIA 2018 LT - In-database Analytics using GPU14
Values on shared memory can be accessed by multiple CPU cores simultaneously. Hardware supports inter-
cores synchronization, and it enables to calculate total sum with log2N steps.
15. Sample program of the Logistic Regression Analytics
$ git clone https://github.com/heterodb/toybox.git
$ cd toybox/logistic_regression/
$ make && make install
$ psql postgres
postgres=# create extension logregr;
CREATE EXTENSION
To get the sample code, open “heterodb/toybox” on GitHub, then move to “logistic_regression”.
You can install it using CREATE EXTENSION, if PG-Strom is correctly setup.
https://github.com/heterodb/toybox/ logistic_regression
PGconf.ASIA 2018 LT - In-database Analytics using GPU15
16. Let’s play (1/4) - Creation of artificial test data
postgres=# CREATE TABLE logreg (
t bool,
x1 float,
x2 float,
x3 float,
x4 float );
CREATE TABLE
-- The training data classified all the 1 + 2𝑥1 − 3𝑥2 + 𝑥3 + 0.5𝑥4 > 0 as true; 40M rows
postgres=# INSERT INTO logreg
(SELECT (1.0+2.0*x1-3.0*x2+x3+0.5*x4) > 0 t, x1, x2, x3, x4
FROM (SELECT random() x1,
random() x2,
random() x3,
random() x4
FROM generate_series(1,40000000)) x);
INSERT 0 40000000
OK, let’s work the PL/CUDA function. First of all, make a normal table with 40M rows of random data.
All the rows that satisfy 1 + 2𝑥1 − 3𝑥2 + 𝑥3 + 0.5𝑥4 > 0 are marked as ‘true’.
PGconf.ASIA 2018 LT - In-database Analytics using GPU16
17. Let’s play (2/4) - Data loading to GPU device memory (part-1)
postgres=# CREATE FOREIGN TABLE ft (
t bool,
x1 real,
x2 real,
x3 real,
x4 real
) SERVER gstore_fdw
OPTIONS (pinning '0');
CREATE FOREIGN TABLE
postgres=# INSERT INTO ft
(SELECT * FROM logreg);
INSERT 0 40000000
Gstore_Fdw is a FDW extension on behalf of the GPU device memory, specified by the ‘pinning’ option.
INSERT INTO the Gstore_Fdw table loads 40M rows in the ‘logreg’ table.
GPU device memory
Foreign Table
(gstore_fdw)
Data format conversion
Data compression (if any)
Transaction control
PGconf.ASIA 2018 LT - In-database Analytics using GPU17
18. Let’s play (3/4) - Data loading to GPU device memory (part-2)
[kaigai@saba src]$ nvidia-smi
Thu Dec 6 12:10:56 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:02:00.0 Off | N/A |
| N/A 42C P0 52W / 250W | 817MiB / 22919MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 27650 C ...bgworker: PG-Strom GPU memory keeper 807MiB |
+-----------------------------------------------------------------------------+
807MB of GPU device memory is preserved. The dataset consumes 680MB, in addition to the 120MB
for device management.
For device management: about 120MB +
(sizeof(bool) + 4*sizeof(float)) * 40M = 680MB
PGconf.ASIA 2018 LT - In-database Analytics using GPU18
19. Let’s play (4/4)
postgres=# SELECT logregr_train('ft',
attnum_of('ft','t'),
attnums_of('ft','{x1,x2,x3,x4}'));
logregr_train
------------------------------------------
{3376.4,6752.71,-10129.1,3376.3,1688.27}
(1 row)
Time: 3647.059 ms (00:03.647)
Weight of the explanatory variables are estimated. 5 elements are returned because here is four
explanatory variables and intercept. It takes 3.6sec.
PGconf.ASIA 2018 LT - In-database Analytics using GPU19
20. Comparison to CPU implementation (1/3)
logregr_train() function at MADLib
postgres=# SELECT madlib.logregr_train(‘logreg’, ‘hoge’,
‘t’,’ARRAY[1,x1,x2,x3,x4]’,
NULL, 20);
logregr_train
---------------
(1 row)
Time: 1301307.361 ms (21:41.307)
postgres=# SELECT coef FROM hoge;
coef
------------------------------------------------------
{3041.82722783601,6083.57794939209,-9125.44857123801,3041.73992459095,1520.98287953044}
(1 row)
For the same jobs, MADLib’s logregr_train() tooks 21min41sec. PL/CUDA implementation was 356 times
faster than the CPU-based implementation.
1301307.36 / 3647.06
= x356.8 times faster
PGconf.ASIA 2018 LT - In-database Analytics using GPU20
21. Comparison to CPU implementation (2/3) - recalculation
It is weight of
the explanatory variables.
The parameter estimated by
logregr_train() is weight of
the division surface.
w0 w1 w2 w3 w4
PL/CUDA 3376.4 6752.71 -10129.1 3376.3 1688.27
MADLib 3041.83 6083.58 -9125.45 3041.74 1520.98
The result of logregr_train() is different from the weight when we made the dataset artificially, because it
returns the gradient and intercept of the normal vector towards the division surface.
PGconf.ASIA 2018 LT - In-database Analytics using GPU21
22. Comparison to CPU implementation (3/3) - recalculation
Notice: !!we usually should not apply estimated parameter on the training set!!
postgres=# SELECT COUNT(*)
FROM (SELECT t, logregr_predict(ARRAY[ 3376.4, 6752.71,
-10129.1, 3376.3,
1688.27]::float[],
ARRAY[x1,x2,x3,x4]) p
FROM logreg) data
WHERE t != p;
count
-------
90
(1 row)
postgres=# SELECT COUNT(*)
FROM (SELECT t, logregr_predict(hoge.coef,
ARRAY[x1,x2,x3,x4]) p
FROM logreg, hoge) data
WHERE t != p;
count
-------
70
(1 row)
Prediction by our PL/CUDA function told 90 of 40M rows wrongly, and MADLib also told 70 of 40M.
Note that we usually don’t apply prediction on the training set when we have “actual” data analytics.
count number of the incorrect estimations
PGconf.ASIA 2018 LT - In-database Analytics using GPU22
23. Conclusion
▌PL/CUDA sample programs
https://github.com/heterodb/toybox
▌PL/CUDA is fun(ction).
▌Suitable workloads for PL/CUDA
Machine-Learning
Similarity-Search
Anomaly Detection
Image Generation
.... and others
Conclusion: We could make a sample program of PL/CUDA, and be published. PL/CUDA is fun.
PL/CUDA will be valuable for machine-learning, similarity-search, anomaly-detection, image generation, ...
PGconf.ASIA 2018 LT - In-database Analytics using GPU23