RAPIDS, GPUs & Python - AWS Community Day Melbourne

•

0 likes•42 views

This document discusses GPU accelerated data analysis and machine learning using RAPIDS. It summarizes that Moore's Law is slowing down CPU performance gains, while GPUs excel at parallel workloads suited for machine learning. It introduces RAPIDS as an open source project providing libraries like cuDF, cuIO and cuML to enable the entire data science process on GPUs from data ingestion to modeling. RAPIDS aims to provide a Python ecosystem for data science on GPUs in a scalable way across multiple GPUs, CPUs and nodes to accelerate workflows.

Technology

GPU Accelerated Data Analysis and
Machine Learning
Daniel Bradby – CTO @ Eliiza | August 30 2019

Eliiza – Data Science
Sports Prediction Computer Vision Energy Modelling Data Visualisation

General Purpose computing on GPU
Moore’s Law is slowing down
GPUs are strong at ‘embarrassingly parallel’ workloads
Machine Learning & Data Science is more than just training a model

Evolution of Data Processing for ML/DS
Hadoop – HDFS read/write
Spark – In-memory clusters
GPUs for Training
* presentation graphics sourced with kind permission from RAPIDS

Python Data Ecosystem
* presentation graphics sourced with kind permission from RAPIDS

Challenges of using GPUs for Machine Learning
Moving data to/from the CPU/GPU
* presentation graphics sourced with kind permission from RAPIDS

Challenges of using GPUs for Machine Learning
Converting between formats

Apache Arrow
Standardize columnar memory format
Cross-language
Flat or hierarchical data
Efficient Analytic operations
Designed for modern hardware
Zero-copy reads

End-to-end Data Science on GPU
* presentation graphics sourced with kind permission from RAPIDS

End-to-End GPU Accelerated Data Science
* presentation graphics sourced with kind permission from RAPIDS

cuDF
• Python library for manipulating GPU
Data Frames
• API Compatible with Pandas
• Uses Apache Arrow data structures
• Supports User Defined Functions
(UDFs) using Numba

cuIO
• cuIO to support data ingestion
• GPU accelerated parsing
• GPU accelerated decompression

BlazingSQL
• SQL Engine on GPU Data
• Built on Rapids
• “Performs ETL 20x faster than Spark at price parity”

cuML
• GPU accelerated scikit-Learn
• Classification / Regression
• Statistical Inference
• Clustering
• Decomposition &
Dimensionality Reduction
• Time Series Forecasting
• Recommendations
* presentation graphics sourced with kind permission from

GPU Challenges
16/32GB of memory
Data moved between CPU/GPU
Multi-GPU, Multi-CPU, Multi-Node
Source: https://www.nvidia.com/en-au/data-center/nvlink/

Dask
Architecture Time
Single CPU Core 2hr 39min
Forty CPU Cores 11min 30s
One GPU 1min 37s
Eight GPUs 19s
• Dask natively scales Python with
advanced parallelism
• Dynamic Task Scheduling
• “Big Data” Collections
• Familiar API
• Scales up and out
• Across both CPUs and GPUs

GPUs on AWS
• G3 – up to 4 NVIDIA Tesla M60s
• P2 – up to 16 NVIDIA K80s
• P3 – 1 to 8 NVIDIA Tesla V100s
• Distributed ML Instance (P3dn.24xlarge) – 8 x 32GB Tesla V100s
• Multi-GPU machines use NVLink for GPU Peer-to-Peer comms (300
GB/s)
• G4 – NVIDIA T4 GPUs are “in the works – Jeff Barr, March 2019”

Getting Started in 10 minutes
• Deep Learning AMI pre-installed
with:
• CUDA for GPUs
• Docker (nvidia-docker)
• PyData and ML/Deep Learning
• Docker images from RAPIDS
• Jupyter notebooks

Road to 1.0
• Released in October 2018
• Apache licensed open source
• Currently at 0.8 (with 0.9 in the works)
• Releasing a new version every 6 weeks
• More Multi-GPU support for cuML and cuGraph
• Streaming analytics

The impact on the Data Science Workflow
* presentation graphics sourced with kind permission from RAPIDS

GPU databases are the hottest new thing, with about 7 different companies producing their own variant. In this session, we will discuss why they were created, how they are already disrupting the database world, and what the future of computing holds for them. This presentation demonstrates how the power of NVIDIA GPUs can be leveraged to both accelerate speed to insight and to scale the amount of hot and warm data analyzed to meet the increasing demands of data scientists and business intelligence professionals alike, as well as to find tactical and strategic insights with greater speed on exponentially growing datasets. Organizations commonly believe that they are advancing in analytical capabilities due to the rise in the data science profession and the myriad of technologies available for analytics, business intelligence, artificial intelligence and machine learning. However, if you do the math, they are actually falling behind as the increases in the rates of data collection volume far outpace the rate of increases in hot and warm data used for analytics. This is causing organizations to rely on an ever-decreasing percentage of their information assets for decision making. We talk about why GPU databases were created and share what sets SQream apart from other GPU databases, MPP solutions, in memory and Hadoop based analytic alternatives. We will also outline how an organization can use GPU databases to thrive in the information revolution by using a significantly greater percentage of its data for analytical purposes, obtaining insights that are desired today, and will remain cost-effective into the next few years when data lakes are expected to balloon from petabytes to exabytes.

Introduction to SQream and the IoT environment

Arnon Shimoni

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム

Masayuki Matsushita

Accelerating analytics in a new era of data

Arnon Shimoni

Organizations today produce exponentially more data than they did just a few years ago, but their databases weren’t built to handle these new volumes. As a result, reporting takes way too long, and some complex analytics simply cannot be done. The Era of Massive Data is upon us, and a new approach is required to overcome the limitations of traditional CPU-based data stores. KEY TAKEAWAYS - Flexible data exploration with minimal preparation - Unrestricted access to your organization’s full scope of data - Access to previously unobtainable insights, for smarter business decisions

Hadoop online training by certified trainersriram0233

SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes

Arnon Shimoni

This talk will present SQream’s journey to building an analytics data warehouse powered by GPUs. SQream DB is an SQL data warehouse designed for larger than main-memory datasets (up to petabytes). It’s an on-disk database that combines novel ideas and algorithms to rapidly analyze trillions of rows with the help of high-throughput GPUs. We will explore some of SQream’s ideas and approaches to developing its analytics database – from simple prototype and tech demos, to a fully functional data warehouse product containing the most important features for enterprise deployment. We will also describe the challenges of working with exotic hardware like GPUs, and what choices had to be made in order to combine the CPU and GPU capabilities to achieve industry-leading performance – complete with real world use case comparisons. As part of this discussion, we will also share some of the real issues that were discovered, and the engineering decisions that led to the creation of SQream DB’s high-speed columnar storage engine, designed specifically to take advantage of streaming architectures like GPUs.

Hadoop online training

Smartittrainings

BlazingSQL + RAPIDS AI at GTC San Jose 2019

Rodrigo Aramburu

Alluxio Webinar April 6, 2021 For more Alluxio events: https://www.alluxio.io/events/ Speakers: Alex Ma, Alluxio Peter Behrakis, Alluxio Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows. In this talk, we will walk through what Alluxio’s Data Orchestration for the hybrid cloud era is and how it solves the performance and data management challenges we see. In this tech talk, we'll go over: - What is Alluxio Data Orchestration? - How does it work? - Alluxio customer results

Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...

Sri Ambati

This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/lk2NXurrwAA This talk highlights the integration of Driverless AI with IBM Spectrum Conductor. The integration demonstrates how you can deploy, manage, and scale out to have multiple Driverless AI instances running within your cluster per user to help maximize the efficiency and security of the cluster. The integration includes failover for Driverless AI instances, so that users can continue to work without needing to find another host to start Driverless AI on. In addition, the integration of H2O Sparkling Water with IBM Spectrum Conductor as a notebook is highlighted; as well as the benefits of running H20 Sparkling water within the cluster to maximize your cluster utilization across different workloads.For both Driverless AI and H2O Sparkling Water, a demo will be provided and a future plan for the integrations is highlighted. Bio: Kevin Doyle is the lead architect of IBM Spectrum Conductor at IBM, where he works with customers to deploy and manage all workloads; especially Spark and deep learning workloads to on-premise clusters. Kevin has been working on distributed computing, grid, cloud, and big data for the past five years with a focus on the management and lifecycle of workloads.

"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies

Dataconomy Media

"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies Watch more from Data Natives Tel Aviv 2016 here: http://bit.ly/2hw1MY0 Visit the conference website to learn more: http://telaviv.datanatives.io/ Follow Data Natives: https://www.facebook.com/DataNatives https://twitter.com/DataNativesConf Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS About the Author: Ami Gal is the Co-Founder and CEO at SqreamTechnologies where he is producing a very fast SQL Big Database at SQream Technologies, crunching all the way from a few Terabytes to Petabytes with high performance. He is a hands-on entepreneur, a mentor at Seedcamp and SmartCamp Mentor at IBM.

Powering Real-Time Big Data Analytics with a Next-Gen GPU Database

Kinetica

Freed from the constraints of storage, network and memory, many big data analytics systems now are routinely revealing themselves to be compute bound. To compensate, big data analytic systems often result in wide horizontal sprawl (300-node Spark or NoSQL clusters are not unusual!)— to bring in enough compute for the task at hand. High system complexity and crushing operational costs often result. As the world shifts from physical to virtual assets and methods of engagement, there is an increasing need for systems of intelligence to live alongside the more traditional systems of record and systems of analysis. New approaches to data processing are required to support the real-time processing of data required to drive these systems of intelligence. Join 451 Research and Kinetica to learn: •An overview of the business and technical trends driving widespread interest in real-time analytics •Why systems of analysis need to be transformed and augmented with systems of intelligence bringing new approaches to data processing •How a new class of solution—a GPU-accelerated, scale out, in-memory database–can bring you orders of magnitude more compute power, significantly smaller hardware footprint, and unrivaled analytic capabilities. •Hear how other companies in a variety of industries, such as financial services, entertainment, pharmaceutical, and oil and gas, benefit from augmenting their legacy systems with a modern analytics database.

R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...

Spark Summit

The central premise of DataXu is to apply data science to better marketing. At its core, is the Real Time Bidding Platform that processes 2 Petabytes of data per day and responds to ad auctions at a rate of 2.1 million requests per second across 5 different continents. Serving on top of this platform is Dataxu’s analytics engine that gives their clients insightful analytics reports addressed towards client marketing business questions. Some common requirements for both these platforms are the ability to do real-time processing, scalable machine learning, and ad-hoc analytics. This talk will showcase DataXu’s successful use-cases of using the Apache Spark framework and Databricks to address all of the above challenges while maintaining its agility and rapid prototyping strengths to take a product from initial R&D phase to full production. The team will share their best practices and highlight the steps of large scale Spark ETL processing, model testing, all the way through to interactive analytics.

High performance Spark distribution on PKS by SnappyData

VMware Tanzu

Distributed Heterogeneous Mixture Learning On Spark

Spark Summit

Modernizing Global Shared Data Analytics Platform and our Alluxio Journey

Alluxio, Inc.

Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...

Spark Summit

Redis accelerates Apache Spark execution by 45 times, when used as a shared distributed in-memory datastore for Spark in analyses like time series data range queries. With the redis module for machine learning, redis-ml, implementation of spark-ml models gains a new real time serving layer that offloads processing of models directly in Redis, allows multiple applications to reuse the same models and speeds up classification and execution of these models by 13x. Join this session to learn more about the Redis Labs’ connector for Apache Spark that enhances production implementations of real-time big data processing.

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

amrutupre

The Pandemic Changes Everything, the Need for Speed and Resiliency

Alluxio, Inc.

Deep Learning in the Cloud at Scale: A Data Orchestration Story

Alluxio, Inc.

How Apache Arrow and Parquet boost cross-language interoperability

Uwe Korn

SQL Saturday BH - Ingerindo, Processando e Analisando Dados com Azure Data La...

Luan Moreno Medeiros Maciel

Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.

Data Orchestration for AI, Big Data, and Cloud

Alluxio, Inc.

Cloud computing and Hadoop introduction

christian.perez

Cosmos DB Real-time Advanced Analytics Workshop

Databricks

The workshop implements an innovative fraud detection solution as a PoC for a bank who provides payment processing services for commerce to their merchant customers all across the globe, helping them save costs by applying machine learning and advanced analytics to detect fraudulent transactions. Since their customers are around the world, the right solutions should minimize any latencies experienced using their service by distributing as much of the solution as possible, as closely as possible, to the regions in which their customers use the service. The workshop designs a data pipeline solution that leverages Cosmos DB for both the scalable ingest of streaming data, and the globally distributed serving of both pre-scored data and machine learning models. Cosmos DB’s major advantage when operating at a global scale is its high concurrency with low latency and predictable results. This combination is unique to Cosmos DB and ideal for the bank needs. The solution leverages the Cosmos DB change data feed in concert with the Azure Databricks Delta and Spark capabilities to enable a modern data warehouse solution that can be used to create risk reduction solutions for scoring transactions for fraud in an offline, batch approach and in a near real-time, request/response approach. https://github.com/Microsoft/MCW-Cosmos-DB-Real-Time-Advanced-Analytics Takeaway: How to leverage Azure Cosmos DB + Azure Databricks along with Spark ML for building innovative advanced analytics pipelines.

The Future of Computing is Distributed

Alluxio, Inc.

Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio

Alluxio, Inc.

RAPIDS – Open GPU-accelerated Data Science

Data Works MD

RAPIDS – Open GPU-accelerated Data Science RAPIDS is an initiative driven by NVIDIA to accelerate the complete end-to-end data science ecosystem with GPUs. It consists of several open source projects that expose familiar interfaces making it easy to accelerate the entire data science pipeline- from the ETL and data wrangling to feature engineering, statistical modeling, machine learning, and graph analysis. Corey J. Nolet Corey has a passion for understanding the world through the analysis of data. He is a developer on the RAPIDS open source project focused on accelerating machine learning algorithms with GPUs. Adam Thompson Adam Thompson is a Senior Solutions Architect at NVIDIA. With a background in signal processing, he has spent his career participating in and leading programs focused on deep learning for RF classification, data compression, high-performance computing, and managing and designing applications targeting large collection frameworks. His research interests include deep learning, high-performance computing, systems engineering, cloud architecture/integration, and statistical signal processing. He holds a Masters degree in Electrical & Computer Engineering from Georgia Tech and a Bachelors from Clemson University.

Rapids: Data Science on GPUs

inside-BigData.com

In this deck from FOSDEM'19, Christoph Angerer from NVIDIA presents: Rapids - Data Science on GPUs. "The next big step in data science will combine the ease of use of common Python APIs, but with the power and scalability of GPU compute. The RAPIDS project is the first step in giving data scientists the ability to use familiar APIs and abstractions while taking advantage of the same technology that enables dramatic increases in speed in deep learning. This session highlights the progress that has been made on RAPIDS, discusses how you can get up and running doing data science on the GPU, and provides some use cases involving graph analytics as motivation. GPUs and GPU platforms have been responsible for the dramatic advancement of deep learning and other neural net methods in the past several years. At the same time, traditional machine learning workloads, which comprise the majority of business use cases, continue to be written in Python with heavy reliance on a combination of single-threaded tools (e.g., Pandas and Scikit-Learn) or large, multi-CPU distributed solutions (e.g., Spark and PySpark). RAPIDS, developed by a consortium of companies and available as open source code, allows for moving the vast majority of machine learning workloads from a CPU environment to GPUs. This allows for a substantial speed up, particularly on large data sets, and affords rapid, interactive work that previously was cumbersome to code or very slow to execute. Many data science problems can be approached using a graph/network view, and much like traditional machine learning workloads, this has been either local (e.g., Gephi, Cytoscape, NetworkX) or distributed on CPU platforms (e.g., GraphX). We will present GPU-accelerated graph capabilities that, with minimal conceptual code changes, allows both graph representations and graph-based analytics to achieve similar speed ups on a GPU platform. By keeping all of these tasks on the GPU and minimizing redundant I/O, data scientists are enabled to model their data quickly and frequently, affording a higher degree of experimentation and more effective model generation. Further, keeping all of this in compatible formats allows quick movement from feature extraction, graph representation, graph analytic, enrichment back to the original data, and visualization of results. RAPIDS has a mission to build a platform that allows data scientist to explore data, train machine learning algorithms, and build applications while primarily staying on the GPU and GPU platforms." Learn more: https://rapids.ai/ and https://fosdem.org/2019/ Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

What's hot

Accelerate Analytics and ML in the Hybrid Cloud Era

Alluxio, Inc.

Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...

Sri Ambati

"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies

Dataconomy Media

Powering Real-Time Big Data Analytics with a Next-Gen GPU Database

Kinetica

R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...

Spark Summit

High performance Spark distribution on PKS by SnappyData

VMware Tanzu

Distributed Heterogeneous Mixture Learning On Spark

Spark Summit

Modernizing Global Shared Data Analytics Platform and our Alluxio Journey

Alluxio, Inc.

Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...

Spark Summit

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

amrutupre

The Pandemic Changes Everything, the Need for Speed and Resiliency

Alluxio, Inc.

Deep Learning in the Cloud at Scale: A Data Orchestration Story

Alluxio, Inc.

How Apache Arrow and Parquet boost cross-language interoperability

Uwe Korn

SQL Saturday BH - Ingerindo, Processando e Analisando Dados com Azure Data La...

Luan Moreno Medeiros Maciel

Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.

Data Orchestration for AI, Big Data, and Cloud

Alluxio, Inc.

Cloud computing and Hadoop introduction

christian.perez

Cosmos DB Real-time Advanced Analytics Workshop

Databricks

The Future of Computing is Distributed

Alluxio, Inc.

What's hot (19)

Accelerate Analytics and ML in the Hybrid Cloud Era

Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...

"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies

Powering Real-Time Big Data Analytics with a Next-Gen GPU Database

R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...

High performance Spark distribution on PKS by SnappyData

Distributed Heterogeneous Mixture Learning On Spark

Modernizing Global Shared Data Analytics Platform and our Alluxio Journey

Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

The Pandemic Changes Everything, the Need for Speed and Resiliency

Deep Learning in the Cloud at Scale: A Data Orchestration Story

How Apache Arrow and Parquet boost cross-language interoperability

SQL Saturday BH - Ingerindo, Processando e Analisando Dados com Azure Data La...

Hw09 Rethinking The Data Warehouse With Hadoop And Hive

Data Orchestration for AI, Big Data, and Cloud

Cloud computing and Hadoop introduction

Cosmos DB Real-time Advanced Analytics Workshop

The Future of Computing is Distributed

Similar to RAPIDS, GPUs & Python - AWS Community Day Melbourne

Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio

Alluxio, Inc.

RAPIDS – Open GPU-accelerated Data Science

Data Works MD

Rapids: Data Science on GPUs

inside-BigData.com

NVIDIA Rapids presentation

testSri1

RAPIDS Overview

NVIDIA Japan

RAPIDS: GPU-Accelerated ETL and Feature Engineering

Keith Kraus

Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...

Databricks

GPU acceleration has been at the heart of scientific computing and artificial intelligence for many years now. GPUs provide the computational power needed for the most demanding applications such as Deep Neural Networks, nuclear or weather simulation. Since the launch of RAPIDS in mid-2018, this vast computational resource has become available for Data Science workloads too. The RAPIDS toolkit, which is now available on the Databricks Unified Analytics Platform, is a GPU-accelerated drop-in replacement for utilities such as Pandas/NumPy/ScikitLearn/XGboost. Through its use of Dask wrappers the platform allows for true, large scale computation with minimal, if any, code changes. The goal of this talk is to discuss RAPIDS, its functionality, architecture as well as the way it integrates with Spark providing on many occasions several orders of magnitude acceleration versus its CPU-only counterparts.

GPU-Accelerating UDFs in PySpark with Numba and PyGDF

Keith Kraus

With advances in computer hardware such as 10 gigabit network cards, infiniband, and solid state drives all becoming commodity offerings, the new bottleneck in big data technologies is very commonly the processing power of the CPU. In order to meet the computational demand desired by users, enterprises have had to resort to extreme scale out approaches just to get the processing power they need. One of the most well known technologies in this space, Apache Spark, has numerous enterprises publicly talking about the challenges in running multiple 1000+ node clusters to give their users the processing power they need. This talk is based on work completed by NVIDIA’s Applied Solutions Engineering team. Attendees will learn how they were able to GPU-accelerate UDFs in PySpark using open source technologies such as Numba and PyGDF, the lessons they learned in the process, and how they were able to accelerate workloads in a fraction of the hardware footprint.

BlazingSQL & Graphistry - Netflow Demo

Rodrigo Aramburu

S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf

DLow6

Backend.AI Technical Introduction (19.09 / 2019 Autumn)

Lablup Inc.

Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise

Databricks

The ever-growing continuous influx of data causes every component in a system to burst at its seams. GPUs and ASICs are helping on the compute side, whereas in-memory and flash storage devices are utilized to keep up with those local IOPS. All of those can perform extremely well in smaller setups and under contained workloads. However, today's workloads require more and more power that directly translates into higher scale. Training major AI models can no longer fit into humble setups. Streaming ingestion systems are barely keeping up with the load. These are just a few examples of why enterprises require a massive versatile infrastructure, that continuously grows and scales. The problems start when workloads are then scaled out to reveal the hardships of traditional network infrastructures in coping with those bandwidth hungry and latency sensitive applications. In this talk, we are going to dive into how intelligent hardware offloads can mitigate network bottlenecks in Big Data and AI platforms, and compare the offering and performance of what's available in major public clouds, as well as a la carte on-premise solutions.

Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak

Databricks

There has been growing interest in harnessing the parallelism of Graphics Processing Units (GPUs) to accelerate analytics workloads. GPUs have become the standard platform for many machine learning algorithms, particularly in the field of deep neural networks (DNNs), while making increasing inroads into more traditional domains such as analytics databases and visual analytics. However there is a strong need to couple these new platforms with Apache Spark, which has emerged as the de facto analytics platform for data scientists. In this talk we discuss how we built a connector from Spark to the open source GPU-powered MapD Analytics Platform, and the use cases such a connector enables around being able to pull high value data from Spark and cache it on the GPU for subsequent interactive visual analysis and machine learning. We will conclude with a brief demo of an end-to-end Spark-to-MapD pipeline.

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...

Matej Misik

Graphics cards (GPU) open up new ways of processing and analytics over big data, showing millisecond selections over billions of lines, as well as telling stories about data. #QikkDB How to present data to be understood by everyone? Data analysis is for scientists, but data storytelling is for everyone. For managers, product owners, sales teams, the general public. #TellStory Learn about high performance computing with GPU and how to present data with a rich Covid-19 data story example on the upcoming webinar.

[2C5]Map-D: A GPU Database for Interactive Big Data Analytics

NAVER D2

Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...

E-Commerce Brasil

GPU Accelerated Data Science with RAPIDS - ODSC West 2020

John Zedlewski

Ac922 cdac webinar

Ganesan Narayanasamy

RAPIDS cuGraph – Accelerating all your Graph needs

Connected Data World

The relationships between data sets matter. Discovering, analyzing, and learning those relationships is a central part to expanding our understand, and is a critical step to being able to predict and act upon the data. Unfortunately, these are not always simple or quick tasks. To help the analyst we introduce RAPIDS, a collection of open-source libraries, incubated by NVIDIA and focused on accelerating the complete end-to-end data science ecosystem. Graph analytics is a critical piece of the data science ecosystem for processing linked data, and RAPIDS is pleased to offer cuGraph as our accelerated graph library. Simply accelerating algorithms only addressed a portion of the problem. To address the full problem space, RAPIDS cuGraph strives to be feature-rich, easy to use, and intuitive. Rather than limiting the solution to a single graph technology, cuGraph supports Property Graphs, Knowledge Graphs, Hyper-Graphs, Bipartite graphs, and the basic directed and undirected graph. A Python API allows the data to be manipulated as a DataFrame, similar and compatible with Pandas, with inputs and outputs being shared across the full RAPIDS suite, for example with the RAPIDS machine learning package, cuML. This talk will present an overview of RAPIDS and cuGraph. Discuss and show examples of how to manipulate and analyze bipartite and property graph, plus show how data can be shared with machine learning algorithms. The talk will include some performance and scalability metrics. Then conclude with a preview of upcoming features, like graph query language support, and the general RAPIDS roadmap.

GOAI: GPU-Accelerated Data Science DataSciCon 2017

Joshua Patterson

Similar to RAPIDS, GPUs & Python - AWS Community Day Melbourne (20)

Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio

RAPIDS – Open GPU-accelerated Data Science

Rapids: Data Science on GPUs

NVIDIA Rapids presentation

RAPIDS Overview

RAPIDS: GPU-Accelerated ETL and Feature Engineering

Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...

GPU-Accelerating UDFs in PySpark with Numba and PyGDF

BlazingSQL & Graphistry - Netflow Demo

S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf

Backend.AI Technical Introduction (19.09 / 2019 Autumn)

Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise

Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak

Fast data in times of crisis with GPU accelerated database QikkDB | Business ...

[2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...

GPU Accelerated Data Science with RAPIDS - ODSC West 2020

Ac922 cdac webinar

RAPIDS cuGraph – Accelerating all your Graph needs

GOAI: GPU-Accelerated Data Science DataSciCon 2017

Recently uploaded

A tale of scale & speed: How the US Navy is enabling software delivery from l...

sonjaschweigert1

Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved: - Reduction in onboarding time from 5 weeks to 1 day - Improved developer experience and productivity through actionable findings and reduction of false positives - Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO) Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production. We will cover: - How to remove silos in DevSecOps - How to build efficient development pipeline roles and component templates - How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence) - How to streamline operations with automated policy checks on container images

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

SOFTTECHHUB

The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing. One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.

20240605 QFM017 Machine Intelligence Reading List May 2024

Matthew Sinclair

Microsoft - Power Platform_G.Aspiotis.pdf

Uni Systems S.M.S.A.

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

Pushing the limits of ePRTC: 100ns holdover for 100 days

Adtran

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Neo4j

Leonard Jayamohan, Partner & Generative AI Lead, Deloitte This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

Communications Mining Series - Zero to Hero - Session 1

DianaGray10

This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered: • Communication Mining Overview • Why is it important? • How can it help today’s business and the benefits • Phases in Communication Mining • Demo on Platform overview • Q/A

Free Complete Python - A step towards Data Science

RinaMondal9

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

UiPath Test Automation using UiPath Test Suite series, part 5

DianaGray10

By Design, not by Accident - Agile Venture Bolzano 2024

Pierluigi Pugliese

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Recently uploaded (20)

A tale of scale & speed: How the US Navy is enabling software delivery from l...

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Video Streaming: Then, Now, and in the Future

Essentials of Automations: The Art of Triggers and Actions in FME

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

20240605 QFM017 Machine Intelligence Reading List May 2024

Microsoft - Power Platform_G.Aspiotis.pdf

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Pushing the limits of ePRTC: 100ns holdover for 100 days

FIDO Alliance Osaka Seminar: Overview.pdf

Uni Systems Copilot event_05062024_C.Vlachos.pdf

National Security Agency - NSA mobile device best practices

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

Communications Mining Series - Zero to Hero - Session 1

Free Complete Python - A step towards Data Science

The Art of the Pitch: WordPress Relationships and Sales

UiPath Test Automation using UiPath Test Suite series, part 5

By Design, not by Accident - Agile Venture Bolzano 2024

UiPath Test Automation using UiPath Test Suite series, part 4

RAPIDS, GPUs & Python - AWS Community Day Melbourne

1. GPU Accelerated Data Analysis and Machine Learning Daniel Bradby – CTO @ Eliiza | August 30 2019

2. Eliiza – Data Science Sports Prediction Computer Vision Energy Modelling Data Visualisation

3. General Purpose computing on GPU Moore’s Law is slowing down GPUs are strong at ‘embarrassingly parallel’ workloads Machine Learning & Data Science is more than just training a model

4. Data Science Process

5. Data Science Process

6. Evolution of Data Processing for ML/DS Hadoop – HDFS read/write Spark – In-memory clusters GPUs for Training * presentation graphics sourced with kind permission from RAPIDS

7. Python Data Ecosystem * presentation graphics sourced with kind permission from RAPIDS

8. Challenges of using GPUs for Machine Learning Moving data to/from the CPU/GPU * presentation graphics sourced with kind permission from RAPIDS

9. Challenges of using GPUs for Machine Learning Converting between formats

10. Apache Arrow Standardize columnar memory format Cross-language Flat or hierarchical data Efficient Analytic operations Designed for modern hardware Zero-copy reads

11. End-to-end Data Science on GPU * presentation graphics sourced with kind permission from RAPIDS

12. Introducing RAPIDS

13. End-to-End GPU Accelerated Data Science * presentation graphics sourced with kind permission from RAPIDS

14. cuDF • Python library for manipulating GPU Data Frames • API Compatible with Pandas • Uses Apache Arrow data structures • Supports User Defined Functions (UDFs) using Numba

15. cuIO • cuIO to support data ingestion • GPU accelerated parsing • GPU accelerated decompression

16. BlazingSQL • SQL Engine on GPU Data • Built on Rapids • “Performs ETL 20x faster than Spark at price parity”

17. cuML • GPU accelerated scikit-Learn • Classification / Regression • Statistical Inference • Clustering • Decomposition & Dimensionality Reduction • Time Series Forecasting • Recommendations * presentation graphics sourced with kind permission from

18. Deep Learning

19. Scaling GPUs

20. GPU Challenges 16/32GB of memory Data moved between CPU/GPU Multi-GPU, Multi-CPU, Multi-Node Source: https://www.nvidia.com/en-au/data-center/nvlink/

21. Dask Architecture Time Single CPU Core 2hr 39min Forty CPU Cores 11min 30s One GPU 1min 37s Eight GPUs 19s • Dask natively scales Python with advanced parallelism • Dynamic Task Scheduling • “Big Data” Collections • Familiar API • Scales up and out • Across both CPUs and GPUs

22. RAPIDS on AWS

23. GPUs on AWS • G3 – up to 4 NVIDIA Tesla M60s • P2 – up to 16 NVIDIA K80s • P3 – 1 to 8 NVIDIA Tesla V100s • Distributed ML Instance (P3dn.24xlarge) – 8 x 32GB Tesla V100s • Multi-GPU machines use NVLink for GPU Peer-to-Peer comms (300 GB/s) • G4 – NVIDIA T4 GPUs are “in the works – Jeff Barr, March 2019”

24. Getting Started in 10 minutes • Deep Learning AMI pre-installed with: • CUDA for GPUs • Docker (nvidia-docker) • PyData and ML/Deep Learning • Docker images from RAPIDS • Jupyter notebooks

25. Rapids Journey

26. Road to 1.0 • Released in October 2018 • Apache licensed open source • Currently at 0.8 (with 0.9 in the works) • Releasing a new version every 6 weeks • More Multi-GPU support for cuML and cuGraph • Streaming analytics

27. The impact on the Data Science Workflow * presentation graphics sourced with kind permission from RAPIDS

28. Thanks!

RAPIDS, GPUs & Python - AWS Community Day Melbourne

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to RAPIDS, GPUs & Python - AWS Community Day Melbourne

Similar to RAPIDS, GPUs & Python - AWS Community Day Melbourne (20)

Recently uploaded

Recently uploaded (20)

RAPIDS, GPUs & Python - AWS Community Day Melbourne