This slide introduces technical specs and details about Backend.AI 19.09.
* On-premise clustering / container orchestration / scaling on cloud
* Container-level fractional GPU technology to use one GPU as many GPUs on many containers at the same time.
* NVidia GPU Cloud integrations
* Enterprise features
JMI Techtalk: 한재근 - How to use GPU for developing AILablup Inc.
이 Techtalk에서는 AI 개발을 위해 GPU를 사용할 때 Nvidia가 제공하는 성능 향상을 위한 다양한 방법들을 기술자료들과 함께 소개합니다. 특히 Volta 아키텍처를 기반으로 Mixed precision을 도입하여 성능을 향상하는 과정에 관한 내용을 자세히 다룹니다.
This Techtalk introduces a variety of ways to improve the performance that Nvidia provides when using the GPU for AI development, along with technical resources. In particular, this talk discusses the process of improving performance by introducing mixed precision based on the Volta architecture.
Brief intro into the problem and perspectives of OpenCL and distributed heterogeneous calculations with Hadoop. For Big Data Dive 2013 (Belarus Java User Group).
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analyticsinside-BigData.com
Today NVIDIA announced a GPU-acceleration platform for data science and machine learning, with broad adoption from industry leaders, that enables even the largest companies to analyze massive amounts of data and make accurate business predictions at unprecedented speed.
Data analytics and machine learning are the largest segments of the high performance computing market that have not been accelerated — until now,” said Jensen Huang, founder and CEO of NVIDIA, who revealed RAPIDS in his keynote address at the GPU Technology Conference. “The world’s largest industries run algorithms written by machine learning on a sea of servers to sense complex patterns in their market and environment, and make fast, accurate predictions that directly impact their bottom line.
"RAPIDS open-source software gives data scientists a giant performance boost as they address highly complex business challenges, such as predicting credit card fraud, forecasting retail inventory and understanding customer buying behavior. Reflecting the growing consensus about the GPU’s importance in data analytics, an array of companies is supporting RAPIDS — from pioneers in the open-source community, such as Databricks and Anaconda, to tech leaders like Hewlett Packard Enterprise, IBM and Oracle."
Learn more: https://insidehpc.com/2018/10/open-source-rapids-gpu-platform-accelerate-predictive-data-analytics/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
JMI Techtalk: 한재근 - How to use GPU for developing AILablup Inc.
이 Techtalk에서는 AI 개발을 위해 GPU를 사용할 때 Nvidia가 제공하는 성능 향상을 위한 다양한 방법들을 기술자료들과 함께 소개합니다. 특히 Volta 아키텍처를 기반으로 Mixed precision을 도입하여 성능을 향상하는 과정에 관한 내용을 자세히 다룹니다.
This Techtalk introduces a variety of ways to improve the performance that Nvidia provides when using the GPU for AI development, along with technical resources. In particular, this talk discusses the process of improving performance by introducing mixed precision based on the Volta architecture.
Brief intro into the problem and perspectives of OpenCL and distributed heterogeneous calculations with Hadoop. For Big Data Dive 2013 (Belarus Java User Group).
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analyticsinside-BigData.com
Today NVIDIA announced a GPU-acceleration platform for data science and machine learning, with broad adoption from industry leaders, that enables even the largest companies to analyze massive amounts of data and make accurate business predictions at unprecedented speed.
Data analytics and machine learning are the largest segments of the high performance computing market that have not been accelerated — until now,” said Jensen Huang, founder and CEO of NVIDIA, who revealed RAPIDS in his keynote address at the GPU Technology Conference. “The world’s largest industries run algorithms written by machine learning on a sea of servers to sense complex patterns in their market and environment, and make fast, accurate predictions that directly impact their bottom line.
"RAPIDS open-source software gives data scientists a giant performance boost as they address highly complex business challenges, such as predicting credit card fraud, forecasting retail inventory and understanding customer buying behavior. Reflecting the growing consensus about the GPU’s importance in data analytics, an array of companies is supporting RAPIDS — from pioneers in the open-source community, such as Databricks and Anaconda, to tech leaders like Hewlett Packard Enterprise, IBM and Oracle."
Learn more: https://insidehpc.com/2018/10/open-source-rapids-gpu-platform-accelerate-predictive-data-analytics/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This is a presentation about how to use Kubeflow for "AI pipeline optimization" - we show the "traditional" pipeline and why it should be optimized to have it available to a wider audience. Services are getting more and more important nowadays - thats why we call it "Data Science as a service".
Introduction to Software Defined Visualization (SDVis)Intel® Software
Software defined visualization (SDVis) is an open-source initiative from Intel and industry collaborators. Improve the visual fidelity, performance, and efficiency of prominent visualization solutions, while supporting the rapidly growing big data use on workstations through high-performance computing (HPC) on supercomputing clusters without memory limitations and cost of GPU-based solutions.
An overview of changes to OSPRay, focusing on:
Critical API features for practical OSPRay use
Internal changes and the motivation behind them
How to extend OSPRay for advanced use cases
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
In this deck from FOSDEM'19, Christoph Angerer from NVIDIA presents: Rapids - Data Science on GPUs.
"The next big step in data science will combine the ease of use of common Python APIs, but with the power and scalability of GPU compute. The RAPIDS project is the first step in giving data scientists the ability to use familiar APIs and abstractions while taking advantage of the same technology that enables dramatic increases in speed in deep learning. This session highlights the progress that has been made on RAPIDS, discusses how you can get up and running doing data science on the GPU, and provides some use cases involving graph analytics as motivation.
GPUs and GPU platforms have been responsible for the dramatic advancement of deep learning and other neural net methods in the past several years. At the same time, traditional machine learning workloads, which comprise the majority of business use cases, continue to be written in Python with heavy reliance on a combination of single-threaded tools (e.g., Pandas and Scikit-Learn) or large, multi-CPU distributed solutions (e.g., Spark and PySpark). RAPIDS, developed by a consortium of companies and available as open source code, allows for moving the vast majority of machine learning workloads from a CPU environment to GPUs. This allows for a substantial speed up, particularly on large data sets, and affords rapid, interactive work that previously was cumbersome to code or very slow to execute. Many data science problems can be approached using a graph/network view, and much like traditional machine learning workloads, this has been either local (e.g., Gephi, Cytoscape, NetworkX) or distributed on CPU platforms (e.g., GraphX). We will present GPU-accelerated graph capabilities that, with minimal conceptual code changes, allows both graph representations and graph-based analytics to achieve similar speed ups on a GPU platform. By keeping all of these tasks on the GPU and minimizing redundant I/O, data scientists are enabled to model their data quickly and frequently, affording a higher degree of experimentation and more effective model generation. Further, keeping all of this in compatible formats allows quick movement from feature extraction, graph representation, graph analytic, enrichment back to the original data, and visualization of results. RAPIDS has a mission to build a platform that allows data scientist to explore data, train machine learning algorithms, and build applications while primarily staying on the GPU and GPU platforms."
Learn more: https://rapids.ai/
and
https://fosdem.org/2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/xilinx/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nick Ni, Director of Product Marketing at Xilinx, presents the "Xilinx AI Engine: High Performance with Future-proof Architecture Adaptability" tutorial at the May 2019 Embedded Vision Summit.
AI inference demands orders- of-magnitude more compute capacity than what today’s SoCs offer. At the same time, neural network topologies are changing too quickly to be addressed by ASICs that take years to go from architecture to production. In this talk, Ni introduces the Xilinx AI Engine, which complements the dynamically- programmable FPGA fabric to enable ASIC-like performance via custom data flows and a flexible memory hierarchy. This combination provides an orders-of-magnitude boost in AI performance along with the hardware architecture flexibility needed to quickly adapt to rapidly evolving neural network topologies.
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA
Broadening support for GPU-accelerated supercomputing to a fast-growing new platform, NVIDIA founder and CEO Jensen Huang introduced a reference design for building GPU-accelerated Arm servers, with wide industry backing.
Developing, experimenting, and deploying ML models at scale requires substantial tooling, scripting, tracking, versioning, and monitoring.
Watch full video here: https://cnvrg.io/webinars-and-workshops/scaling-mlops-on-nvidia-dgx-systems/
Data scientists want to do data science – and are slowed down by MLOps and DevOps tasks.
They lack user friendly tools needed to track experiments, attach resources, manage datasets and launch multiple ML pipelines.
In this presentation cnvrg.io CEO, Yochay Ettun will host a special guest from NVIDIA, Sr. Product Manager for NVIDIA DGX systems, Michael Balint, and discuss how to optimize the use of any NVIDIA DGX and NVIDIA GPU asset both on-prem or in the cloud with the cnvrg.io machine learning platform.
We will show best practices to reach high utilization of NVIDIA DGX systems, while conducting meta-scheduling across multiple heterogeneous Kubernetes/OpenShift/Linux server clusters.
In addition, we will introduce the concept of production flows, which automate hundreds of models from the data hub to deployment. We will wrap up with a real-life demo of flows, exercising many experiments across DGX platforms.
What you will learn:
- Creating a data science flow: from data to deployment, while attaching different NVIDIA DGX Kubernetes clusters to each step of the flow
- The concept of meta-scheduler: scheduling experiments disperse resources or other schedulers, accomplishing high utilization at scale
- How the NVIDIA DGX ecosystem with cnvrg.io makes GPU assets consumed easily, with one-click, bypassing complexity of MLOps
- How to leverage NGC containers in ML pipelines
You can watch the full presentation along with audio and video in the link here: https://cnvrg.io/webinars-and-workshops/scaling-mlops-on-nvidia-dgx-systems/
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
RAPIDS – Open GPU-accelerated Data Science
RAPIDS is an initiative driven by NVIDIA to accelerate the complete end-to-end data science ecosystem with GPUs. It consists of several open source projects that expose familiar interfaces making it easy to accelerate the entire data science pipeline- from the ETL and data wrangling to feature engineering, statistical modeling, machine learning, and graph analysis.
Corey J. Nolet
Corey has a passion for understanding the world through the analysis of data. He is a developer on the RAPIDS open source project focused on accelerating machine learning algorithms with GPUs.
Adam Thompson
Adam Thompson is a Senior Solutions Architect at NVIDIA. With a background in signal processing, he has spent his career participating in and leading programs focused on deep learning for RF classification, data compression, high-performance computing, and managing and designing applications targeting large collection frameworks. His research interests include deep learning, high-performance computing, systems engineering, cloud architecture/integration, and statistical signal processing. He holds a Masters degree in Electrical & Computer Engineering from Georgia Tech and a Bachelors from Clemson University.
A Primer on FPGAs - Field Programmable Gate ArraysTaylor Riggan
A focus on the use of FPGAs by cloud service providers. Includes Microsoft Azure Catapult, Google Tensor Processors, and Amazon EC2 F1 instances. Also includes background info on how to get started with FPGAs
IBM Bayesian Optimization Accelerator (BOA) is a do-it-yourself toolkit to apply state-of-the-art Bayesian inferencing techniques and obtain optimal solutions for complex, real-world design simulations without requiring deep machine learning skills. This talk will describe IBM BOA, its differentiation and ease of use, and how researchers can take advantage of it for optimizing any arbitrary HPC simulation.
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
Artificial Intelligence is impacting all areas of society, from healthcare and transportation to smart cities and energy. How NVIDIA invests both in internal pure research and accelerated computation to enable its diverse customer base, across gaming & extended reality, graphics, AI, robotics, simulation, high performance scientific computing, healthcare & more. You will be introduced to the GPU computing platform & shown real world successfully deployed applications as well as a glimpse into the current state of the art across academia, enterprise and startups.
Axel Koehler from Nvidia presented this deck at the 2016 HPC Advisory Council Switzerland Conference.
“Accelerated computing is transforming the data center that delivers unprecedented through- put, enabling new discoveries and services for end users. This talk will give an overview about the NVIDIA Tesla accelerated computing platform including the latest developments in hardware and software. In addition it will be shown how deep learning on GPUs is changing how we use computers to understand data.”
In related news, the GPU Technology Conference takes place April 4-7 in Silicon Valley.
Watch the video presentation: http://insidehpc.com/2016/03/tesla-accelerated-computing/
See more talks in the Swiss Conference Video Gallery:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter:
http://insidehpc.com/newsletter
This is a presentation about how to use Kubeflow for "AI pipeline optimization" - we show the "traditional" pipeline and why it should be optimized to have it available to a wider audience. Services are getting more and more important nowadays - thats why we call it "Data Science as a service".
Introduction to Software Defined Visualization (SDVis)Intel® Software
Software defined visualization (SDVis) is an open-source initiative from Intel and industry collaborators. Improve the visual fidelity, performance, and efficiency of prominent visualization solutions, while supporting the rapidly growing big data use on workstations through high-performance computing (HPC) on supercomputing clusters without memory limitations and cost of GPU-based solutions.
An overview of changes to OSPRay, focusing on:
Critical API features for practical OSPRay use
Internal changes and the motivation behind them
How to extend OSPRay for advanced use cases
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
In this deck from FOSDEM'19, Christoph Angerer from NVIDIA presents: Rapids - Data Science on GPUs.
"The next big step in data science will combine the ease of use of common Python APIs, but with the power and scalability of GPU compute. The RAPIDS project is the first step in giving data scientists the ability to use familiar APIs and abstractions while taking advantage of the same technology that enables dramatic increases in speed in deep learning. This session highlights the progress that has been made on RAPIDS, discusses how you can get up and running doing data science on the GPU, and provides some use cases involving graph analytics as motivation.
GPUs and GPU platforms have been responsible for the dramatic advancement of deep learning and other neural net methods in the past several years. At the same time, traditional machine learning workloads, which comprise the majority of business use cases, continue to be written in Python with heavy reliance on a combination of single-threaded tools (e.g., Pandas and Scikit-Learn) or large, multi-CPU distributed solutions (e.g., Spark and PySpark). RAPIDS, developed by a consortium of companies and available as open source code, allows for moving the vast majority of machine learning workloads from a CPU environment to GPUs. This allows for a substantial speed up, particularly on large data sets, and affords rapid, interactive work that previously was cumbersome to code or very slow to execute. Many data science problems can be approached using a graph/network view, and much like traditional machine learning workloads, this has been either local (e.g., Gephi, Cytoscape, NetworkX) or distributed on CPU platforms (e.g., GraphX). We will present GPU-accelerated graph capabilities that, with minimal conceptual code changes, allows both graph representations and graph-based analytics to achieve similar speed ups on a GPU platform. By keeping all of these tasks on the GPU and minimizing redundant I/O, data scientists are enabled to model their data quickly and frequently, affording a higher degree of experimentation and more effective model generation. Further, keeping all of this in compatible formats allows quick movement from feature extraction, graph representation, graph analytic, enrichment back to the original data, and visualization of results. RAPIDS has a mission to build a platform that allows data scientist to explore data, train machine learning algorithms, and build applications while primarily staying on the GPU and GPU platforms."
Learn more: https://rapids.ai/
and
https://fosdem.org/2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/xilinx/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nick Ni, Director of Product Marketing at Xilinx, presents the "Xilinx AI Engine: High Performance with Future-proof Architecture Adaptability" tutorial at the May 2019 Embedded Vision Summit.
AI inference demands orders- of-magnitude more compute capacity than what today’s SoCs offer. At the same time, neural network topologies are changing too quickly to be addressed by ASICs that take years to go from architecture to production. In this talk, Ni introduces the Xilinx AI Engine, which complements the dynamically- programmable FPGA fabric to enable ASIC-like performance via custom data flows and a flexible memory hierarchy. This combination provides an orders-of-magnitude boost in AI performance along with the hardware architecture flexibility needed to quickly adapt to rapidly evolving neural network topologies.
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA
Broadening support for GPU-accelerated supercomputing to a fast-growing new platform, NVIDIA founder and CEO Jensen Huang introduced a reference design for building GPU-accelerated Arm servers, with wide industry backing.
Developing, experimenting, and deploying ML models at scale requires substantial tooling, scripting, tracking, versioning, and monitoring.
Watch full video here: https://cnvrg.io/webinars-and-workshops/scaling-mlops-on-nvidia-dgx-systems/
Data scientists want to do data science – and are slowed down by MLOps and DevOps tasks.
They lack user friendly tools needed to track experiments, attach resources, manage datasets and launch multiple ML pipelines.
In this presentation cnvrg.io CEO, Yochay Ettun will host a special guest from NVIDIA, Sr. Product Manager for NVIDIA DGX systems, Michael Balint, and discuss how to optimize the use of any NVIDIA DGX and NVIDIA GPU asset both on-prem or in the cloud with the cnvrg.io machine learning platform.
We will show best practices to reach high utilization of NVIDIA DGX systems, while conducting meta-scheduling across multiple heterogeneous Kubernetes/OpenShift/Linux server clusters.
In addition, we will introduce the concept of production flows, which automate hundreds of models from the data hub to deployment. We will wrap up with a real-life demo of flows, exercising many experiments across DGX platforms.
What you will learn:
- Creating a data science flow: from data to deployment, while attaching different NVIDIA DGX Kubernetes clusters to each step of the flow
- The concept of meta-scheduler: scheduling experiments disperse resources or other schedulers, accomplishing high utilization at scale
- How the NVIDIA DGX ecosystem with cnvrg.io makes GPU assets consumed easily, with one-click, bypassing complexity of MLOps
- How to leverage NGC containers in ML pipelines
You can watch the full presentation along with audio and video in the link here: https://cnvrg.io/webinars-and-workshops/scaling-mlops-on-nvidia-dgx-systems/
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
RAPIDS – Open GPU-accelerated Data Science
RAPIDS is an initiative driven by NVIDIA to accelerate the complete end-to-end data science ecosystem with GPUs. It consists of several open source projects that expose familiar interfaces making it easy to accelerate the entire data science pipeline- from the ETL and data wrangling to feature engineering, statistical modeling, machine learning, and graph analysis.
Corey J. Nolet
Corey has a passion for understanding the world through the analysis of data. He is a developer on the RAPIDS open source project focused on accelerating machine learning algorithms with GPUs.
Adam Thompson
Adam Thompson is a Senior Solutions Architect at NVIDIA. With a background in signal processing, he has spent his career participating in and leading programs focused on deep learning for RF classification, data compression, high-performance computing, and managing and designing applications targeting large collection frameworks. His research interests include deep learning, high-performance computing, systems engineering, cloud architecture/integration, and statistical signal processing. He holds a Masters degree in Electrical & Computer Engineering from Georgia Tech and a Bachelors from Clemson University.
A Primer on FPGAs - Field Programmable Gate ArraysTaylor Riggan
A focus on the use of FPGAs by cloud service providers. Includes Microsoft Azure Catapult, Google Tensor Processors, and Amazon EC2 F1 instances. Also includes background info on how to get started with FPGAs
IBM Bayesian Optimization Accelerator (BOA) is a do-it-yourself toolkit to apply state-of-the-art Bayesian inferencing techniques and obtain optimal solutions for complex, real-world design simulations without requiring deep machine learning skills. This talk will describe IBM BOA, its differentiation and ease of use, and how researchers can take advantage of it for optimizing any arbitrary HPC simulation.
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
Artificial Intelligence is impacting all areas of society, from healthcare and transportation to smart cities and energy. How NVIDIA invests both in internal pure research and accelerated computation to enable its diverse customer base, across gaming & extended reality, graphics, AI, robotics, simulation, high performance scientific computing, healthcare & more. You will be introduced to the GPU computing platform & shown real world successfully deployed applications as well as a glimpse into the current state of the art across academia, enterprise and startups.
Axel Koehler from Nvidia presented this deck at the 2016 HPC Advisory Council Switzerland Conference.
“Accelerated computing is transforming the data center that delivers unprecedented through- put, enabling new discoveries and services for end users. This talk will give an overview about the NVIDIA Tesla accelerated computing platform including the latest developments in hardware and software. In addition it will be shown how deep learning on GPUs is changing how we use computers to understand data.”
In related news, the GPU Technology Conference takes place April 4-7 in Silicon Valley.
Watch the video presentation: http://insidehpc.com/2016/03/tesla-accelerated-computing/
See more talks in the Swiss Conference Video Gallery:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter:
http://insidehpc.com/newsletter
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
Tecnologias NVIDIA aplicadas ao e-commerce. Muito além do hardware.
Jomar Silva
Gerente de relacionamento com desenvolvedores para a América Latina - NVIDIA
https://eventos.ecommercebrasil.com.br/forum/
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Abstract: We will introduce RAPIDS, a suite of open source libraries for GPU-accelerated data science, and illustrate how it operates seamlessly with MLflow to enable reproducible training, model storage, and deployment. We will walk through a baseline example that incorporates MLflow locally, with a simple SQLite backend, and briefly introduce how the same workflow can be deployed in the context of GPU enabled Kubernetes clusters.
Building and operating HPC-based AI computing environment inside Gwangju Institute of Science and Technology
For using the part of the slide, you need to cite "Narantuya Jargalsaikhan, GIST AI-X Computing Cluster, 2021".
Thank you!
Enabling Artificial Intelligence - Alison B. LowndesWithTheBest
An overview and update of our hardware and software offering and support provided to the Machine & Deep Learning Community around the world.
Alison B. Lowndes, AI DevRel, EMEA
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks
What we call the public cloud was developed primarily to manage and deploy web servers. The target audience for these products is Dev Ops. While this is a massive and exciting market, the world of Data Science and Deep Learning is very different — and possibly even bigger. Unfortunately, the tools available today are not designed for this new audience and the cloud needs to evolve. This talk would cover what the next 10 years of cloud computing will look like.
Semiconductors are the driving force behind the AI evolution and enable its adoption across various application areas ranging from connected and automated driving to smart healthcare and wearables. Given that, electronics research, design and manufacturing communities around the world are increasingly investing in specialized AI chips providing less latency, greater processing power, higher bandwidth and faster performance. AI also attracts new technology players to invest in making their own specialized AI chips, changing the electronics manufacturing landscape and moving the AI technology towards machine learning, deep learning and neural networks.
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
Graphics cards (GPU) open up new ways of processing and analytics over big data, showing millisecond selections over billions of lines, as well as telling stories about data. #QikkDB
How to present data to be understood by everyone? Data analysis is for scientists, but data storytelling is for everyone. For managers, product owners, sales teams, the general public. #TellStory
Learn about high performance computing with GPU and how to present data with a rich Covid-19 data story example on the upcoming webinar.
IBM Consultants & System Integrators Interchange - 2015
http://www-07.ibm.com/events/in/csiinterchange/index.html
Demystify OpenPOWER
Speaker: Anand Haridass, Chief Engineer – Power System, IBM India
OpenPOWER is an open development community, using the POWER Architecture to serve the evolving needs of customers. Hear about the success of the OpenPOWER strategy and Foundation that is building momentum, and fueling an explosion of new development, innovation and collaboration, and improved performance on the POWER Architecture. What does this means for your clients? Find out how OpenPOWER is expanding the Power ecosystem and capabilities with new solutions coming from IBM and our partners.
Presentation given by Jens Hagemeyer (Bielefeld University) at the ‘Low-Energy Heterogeneous Computing Workshop’ on 16 October 2020 within HiPEAC CSW Autumn 2020
Lablupconf session1-2 "거대한 백엔드에 벽돌 끼워넣기"Lablup Inc.
Lablup Conf 1st(Session1/Community)
"거대한 백엔드에 벽돌 끼워넣기" - 여종현
- 발표내용 :
* 이슈 관리 기법 - Backend.AI의 agent, manager, storage-proxy 등의 issue를 한 repo에 모아서 관리하기
* Github action - towncrier, travis CI, branch 관리
* Backend.AI의 문서를 바탕으로 파악한 소스 코드 구조
- 영상보러가기 : https://youtu.be/ip_leryNV-I
Lablupconf session8 "Paving the road to AI-powered world"Lablup Inc.
Lablup Conf 1st (Session4/Core)
"Paving the road to AI-powered world" - 김준기
- 발표내용
* Recap of Backend.AI history
* Future roadmap of Backend.AI for next 2 years
- 영상보러가기 : https://youtu.be/kAGSl99U0Bo
Lablupconf session7 People don't know what they want until LABLUP show it to ...Lablup Inc.
Lablup Conf 1st (Session7/Cases)
"People don't know what they want until LABLUP show it to them. : Practical guide to building GPU clusters for AI" - 김정묵
- 발표내용 :
* 교육부터 하이퍼스케일 AI 모델 개발까지, GPU Cluster 구축과 운영을 준비하실 때 미리 고려하실 사항들을 사례와 함께 공유드립니다.
- 영상보러가기 : https://youtu.be/GMYWKF993J8
Lablupconf session4 "스토리지 솔루션 입출력 파이프라인 가속화와 개발 범위 간의 균형 잡기"Lablup Inc.
Lablup Conf 1st (Session4/Core)
"How to strike a balance between Accelerating pipeline I/O of each storage solution and development scope" - 강지현
- 발표내용 :
* Backend.AI Storage Proxy: Accelerating data / model I/O pipeline
* Integrating storage solution: PureStorage / NetApp
* Case: Building NetApp integration
- 영상 보러 가기 : https://youtu.be/itCEkuO2DtE
Lablup Conf 1st (Keynote/Core)
"The good, the bad, the weird: Future of Backend.AI" - 신정규
발표내용
- Road to Backend.AI. Current and the future.
영상보러가기
- https://youtu.be/5askMmSumP4
초심자를 위한 무작정 시작하는 Backend.AI_04
○ Backend.AI 버전 확인하기
○ 사용자 정보 변경하기
○ 사용자 설정 건드려보기
- 일반
* 데스크탑 알림
* 간결한 사이드바 기본 사용 옵션
* 언어 설정
* SSH 키페어 관리 (하단 링크 참고)
* 자동 업데이트 체크 옵션
* 자동 로그아웃 활성화/비활성화
* 쉘 스크립트 환경 설정하기
- 로그
* 로그 살펴보기
* 로그 새로고침
* 로그 삭제하기
○ FAQ & Troubleshooting
- 세션에서 Jupyter notebook 실행시 kernel error로 종료됩니다.
- 비밀번호를 잊어버렸어요.
- 비밀번호를 올바르게 입력했는데도 로그인이 안됩니다.
- 기타 참고할 내용
1. Import & Run (가져오기 & 실행)
1) 노트북 가져오기
- (https://github.com/lablup/backend.ai-example-notebooks)
2) 노트북 런치 버튼 만들기
3) GitHub 저장소 내용 가져오기
- Data & Storage 메뉴에서 가져온 저장소 폴더 확인하기
2. Data & Storage (데이터 & 폴더)
1) 폴더 제어기능 살펴보기
- 폴더 생성하기
- 폴더 정보 보기
- 폴더 활용하기
∘ 폴더에 파일 업로드하기
∘ 폴더 마운트하여 세션 생성하기
∙ 웹 터미널에서 자동 마운트 폴더 확인하기
∘ 폴더에서 업로드한 파일 다운로드하기
∘ 폴더 내 파일 이름 변경하기
∘ 폴더 내 파일 삭제하기
- 폴더 공유하기
∘ 폴더 초대하기
∘ 폴더 초대 수락/거절하기
∘ 폴더 공유 권한 갱신하기
- 폴더 이름 변경하기
- 폴더 삭제하기
2) 자동 마운트 폴더 탭
- 자동 마운트 폴더 생성하기
- 폴더 마운트 없이 세션 생성하기
∘ 웹 터미널에서 자동 마운트 폴더 확인하기
3. Statistics (통계)
1) 일일 사용량 통계
2) 일주일 간 사용량 통계
초심자를 위한 무작정 시작하는 Backend.AI-02
○ Backend.AI 클라우드 둘러보기
- Summary(요약) 페이지
* 시작 패널
* 자원 사용량 살펴보기
* 시스템 자원
* 공지
* 초대 폴더
○ Session(세션) 페이지
- 세션 상태 안내
- 실행중인 세션
* 앱 런쳐
* 웹 터미널
- 종료된 세션
*세션 내 사용량
*사용시간
JMI Techtalk: 강재욱 - Toward tf.keras from tf.estimator - From TensorFlow 2.0 p...Lablup Inc.
이 Techtalk에서는 TensorFlow 2.0으로 이전시 tf.estimator 에서 tf.keras로 이전해야 하는 이유에 대하여 설명합니다.
This Techtalk explains why you need to migrate from tf.estimator to tf.keras when moving to TensorFlow 2.0.
Just Model It 이벤트에서 사용할 Backend.AI 에 관한 소개입니다. Backend.AI의 개괄, 주요 기능 및 사용예들을 다룹니다. 또한 Backend.AI 를 이용한 End-to-end ML model 개발 시나리오도 소개합니다.
An Introduction to Backend.AI to use in Just Model It event. It covers the overview of Backend.AI, its main features and examples. It also introduces the scenario of developing end-to-end ML model using Backend.AI.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
2. GPU Computing: Maximizing GPU utilization via Backend.AI
Backend.AI: The most efficient way to build and train your machine learning models
2 / 38
3. Synergy of Deep Learning and GPU
Deep Learning = Repetition of numeric ops on millions/billions of parameter matrices
2015 Microsoft ResNet
2015
13,000
2016 Baidu Deep Speech 2
2016
20,000
Google NMT
50억 달러
2017
60 million parameters
70 quadrillion calc.
300 million parameters
200 quadrillion calc.
Calc.: GOPS * bandwidth
Reference: NVIDIA 2017 “A NET COMPUTING ERA”
8.7 billion parameters
1.05 quintillion calc.1 2 3
4 5 6
7 8
9 10
11 12
58
1 + 7 + 2 + 9 + 3 + 11 = 58
3 / 38
4. Synergy of Deep Learning and GPU
DRAM
Cache
Control
ALU
ALU ALU
ALU
CPU GPU
DRAMDRAM
GPU = More computing units per chip area (ALU)
§ C/C++ codes that runs on the GPU in parallel made easy with NVIDIA's CUDA (2007) and OpenCL (2009)
§ Used in machine learning, numerical analysis, and scientific computing
CPU single thread performance
1000X
(Till 2025)
YoY 1.5X
YoY 1.1X
GPU computing power
YoY 1.5X
102
103
104
105
106
107
1980 1990 2000 2010 2020
4 / 38
5. Why GPU Computing?
HPC & AI = High utilization of large-scale resources
GPU = High-density computing chips
Note(s): CPU Baselined to 5000 Servers for each workload | Capex Costs: CPU node with 2x Skylake CPU’s ~$9K; GPU node with 4x V100 GPU’s ~$45K | Opex Costs: Power & cooling is $180/kW/month | Power: CPU serv
er + n/w = 0.6 KW; GPU server + n/w = 1.6 KW; DGX-1V/HGX-1 Server = 3.2KW | HPC: GPU node with 4xV100 compared to 2xCPU Server | DL Training: DGX-1V compared to a 2xCPU server | DL Inference: HGX-1 based s
erver (8xV100) Compared to 2x CPU Server |numbers rounded to nearest $0.5M
Workload
Baseline
(CPU-Only)
HPC
(Amber,LAMPS)
AI Training
(TensorFlow)
AI Inference
(Image, Speech)
Speed Up 1x 20x >100x 60x
Servers 5,000 250 <50 84
Capex $45M $11M $7.5M $7M
3 Year Opex
(Power+Cooling)
$19.5M $2.5M $1M $1.5M
TCO Saving N/A 79% 86% 86%
GPU is necessary!
5 / 38
10. Backend.AI Platform
IaaS / OS
Hardware Infra.
Managed GPU Apps
Data
Scientists
Data
Analysts
Instructors &
Learners
Developers
Container-level
GPU virtualization
Click-to-ready GPU
Environments
Web GUI for
monitoring & control
Backend.AI Manager
IDE Integration
Backend.AI Client
Backend.AI Agent
Brand Guidelines
TensorFlow is an end-to-end open-source platform
for machine learning. It has a comprehensive,
flexible ecosystem of tools, libraries and community
resources that lets researchers push the
10 / 38
11. Backend.AI Differentiation
• The only solution that provides machine learning container technology in a
single framework
Existing orchestration layers are optimized for domain-specific functions other than
machine learning (e.g. scheduling, microservice hosting)
Lack of products to solve the problems of real machine learning researchers and
developers
• Backend.AI
GPU optimization technology
ü Implementing CUDA-optimized solutions with NVIDIA partnership
ü The only container-based multi / partial GPU sharing (fractional scaling) solution
Dynamic sandboxing: programmable and rewritable syscall filters
ü Support for richer programmable policies compared to apparmor/seccomp, etc.
Docker-based legacy app resource control
ü Calibration of the number of CPU cores recognized by mathematical libraries
such as OpenBLAS
11 / 38
13. Backend.AI https://www.backend.ai
Backend.AI is an open-source
cloud resource management platform.
We provide fractional GPU resourcing so
you can scale efficiently
whether you’re a scientist, DevOps,
enterprise, or an AI hobbyist.
13 / 38
14. Backend.AI: GPU Features
• Container-level fractional GPU scaling
Assigning slices of SMP / RAM to containers
ü e.g.) Allocating 2.5 GPUs or 0.3 GPUs
Shared GPUs for inference & education workloads
Multiple GPUs for model training workloads
With proprietary CUDA virtualization layer
• NVIDIA platform integration
Optimized for DGX server families
Supports NGC (for DL / HPC) image integration
Example of GPU sharing / allocation
(2.5 / 0.5 slots)
2.5 GPUs 0.5 GPUs
14 / 38
16. NVIDIA DGX Series
• NVIDIA DGX-1/DGX-2
Complete multi-GPU environment system
ü Ubuntu-based Host OS (also RedHat support)
ü NV Link / NV Switch based high-speed networking
ü Great testbed for various load tests!
• Backend.AI on DGX-family
Complements to NVIDIA Container Runtime
ü GPU sharing for multi-user support
ü Scheduling with CPU/GPU topology
ü Features for machine learning pipeline
Technology collaboration via NVIDIA Inception Program
SYSTEM SPECIFICATIONS
GPUs 16X NVIDIA®
Tesla V100
GPU Memory 512GB total
Performance 2 petaFLOPS
NVIDIA CUDA®
Cores 81920
NVIDIA Tensor Cores 10240
NVSwitches 12
Maximum Power Usage 10 kW
CPU Dual Intel Xeon Platinum
8168, 2.7 GHz, 24-cores
System Memory 1.5TB
Network 8X 100Gb/sec
Infiniband/100GigE
Dual 10/25Gb/sec Ethernet
NVIDIA DGX-2
THE WORLD’S MOST POWERFUL
DEEP LEARNING SYSTEM FOR THE
MOST COMPLEX AI CHALLENGES
The Challenge of Scaling to Meet the Demands of
Modern AI and Deep Learning
Deep neural networks are rapidly growing in size and complexity, in response to the
most pressing challenges in business and research. The computational capacity
needed to support today’s modern AI workloads has outpaced traditional data center
architectures. Modern techniques that exploit increasing use of model parallelism
are colliding with the limits of inter-GPU bandwidth, as developers build increasingly
large accelerated computing clusters, pushing the limits of data center scale.
A new approach is needed - one that delivers almost limitless AI computing scale
in order to break through the barriers to achieving faster insights that can transform
the world.
Deep-learning Framework
TensorFlow, Caffe, Torch, mxnet,
Theano, etc.
Deep-learning user program
NVIDIA DIGITS
Container tools
NVIDIA Container Runtime for Docker
GPU Driver
NVIDIA Driver
System
Ubuntu-based Host OS
16 / 38
17. NVIDIA Platform Integration: NGC
• NGC (NVIDIA GPU Cloud)
Container image collection optimized for nvidia-docker
ü Direct optimization options and library dependency
management by NVIDIA
Announced expansion to model store from GTC 2019
ü Sharing deep learning models between users and organizations
ü Supports transfer learning by adding additional data based on
learned deep learning model
ü Easier and faster model learning environment through model
script
• Backend.AI with NGC
Supports all NGC-based image execution
NVIDIA recommended options applied (including docker shm limit)
Fractional GPU sharing
Supports NGC model store and model script (soon)
17 / 38
18. Backend.AI @NVIDIA GTC Silicon Valley 2019
• DGX User Group Meetup
Hearing DGX deployment case and
customer requirements
• NGC User Group Meetup
Presenting Backend.AI NGC
integration
• Main Session Talk
Introducing Backend.AI technology
• Inception Startup Booth
Demonstrating container-level GPU
virtualization
Having direct Q&A with NVIDIA CUDA
Developers
18 / 38
19. Backend.AI Competitor Analysis
Technology
nvidia-
docker
Docker
Swarm
OpenStack Kubernetes
Apache
Mesos
Backend.AI
GPU Support
GPU Assignment &
Isolation
Heterogeneous
Accelerators
Fractional GPU
Scaling
Security
Sandboxing via
Hypervisor/Container
Programmable
Sandboxing
Virtualization
VM (Hypervisor)
Docker Container
Scheduling
Availability-slot based
Advanced (e.g. DRF)
Integration
Modern AI
Frameworks
* Now on beta testing
** Cloud vendor / OpenStack handles VM management
*** slot-based but can do advanced customization with label feature
***
**** **
*
19 / 38
21. Flexible Resource Allocation: Resource Groups
• Resource Groups: Groups of managed hardware resources
Specify the available resource groups for each user, project, and domain
Allow resource requests to be allocated only within specific resource groups
Autoscale implementation in the cloud can be applied in units of resource groups
• Examples
Resource Groups by Device Performance : V100 / P100 / K80 / etc.
Resource Groups by Nodes : Servers / Workstations / IDC / etc.
Resource Groups by Clouds : AWS / GCP / Azure / etc.
• Applications
Assign specific hardware or GPU only to specific users, projects, teams, or domains
Divide node groups by CPU / GPU / storage
Group and manage nodes that are physically in the same network (for multi-network cluster)
21 / 38
22. Flexible Resource Allocation: Scenarios
Resource Group A (On-premise)
Resource Group D (Cloud / Scalable)Resource Group B (On-premise)
Storage Group C
Backend.AI
Manager
22 / 38
23. Flexible Resource Allocation: Scenarios
• Per-user resource group
permission
User 1: grant to RG A
User 2: grant to RG A, B
Each user has separate privileges
to Storage C
• Session / task batch
Manual batch to specific RG
Automatic discovery of optimal
resource combinations across all
available RGs before starting
Resource Group A (On-premise)
Resource Group D (Cloud)Resource Group B (On-premise)
Storage Group C
Backend.AI
Manager
User 1 User 2
User 2
23 / 38
24. Flexible Resource Allocation: Scenarios
• Project-wise resource group
permission
Project 1: grant to RG A, B
Project 2: grant to RG B, D
• Storage sharing
Different resource groups can
share the same storage groups
Personal storage folder
ü Only owners can access
ü Invitation feature for sharing
Project storage folder
ü All project members can access
Resource Group A (On-premise)
Resource Group D (Cloud)Resource Group B (On-premise)
Storage Group C
Backend.AI
Manager
Project 1
Project 2 Project 2Project 1
24 / 38
25. Flexible Resource Allocation: Scenarios
• Resource group example
RG A: NVIDIA V100 GPU group
RG B: NVIDIA P100 GPU group
User 1 can only use V100, User 2
can use both V100 and P100
Project 3 can use P100 group and
AWS cloud
Project 4 can only use Microsoft
Azure cloud
Resource Group A (V100)
Resource Group D (AWS)Resource Group B (P100)
Storage Group C
Backend.AI
Manager
Project 3 Project 3
Project 4
User 1 User 2
User 2
Resource Group E (Azure) 25 / 38
32. Backend.AI Cases: AI Bigdata MBA Dept., Kookmin Univ.
• GPU server farm for students and researchers in finance fields
3 servers with 24 GPUs for the simultaneous use of more than 80 students in a class and
researchers in labs
• Spec.
Different resource policy for students and researchers
18 TiB ceph distributed file system by binding HDDs on nodes with LAN connection
Web GUI for operation and maintenance: no operator needed
1 Gbps LAN
24 NVIDIA GPUs
Node-specific CPU
18 TiB
Distributed file system
(cephfs)
Manager + Agent Agent Agent
ML class with
40+ students
10+ graduates
and researchers
32 / 38
33. Backend.AI Cases: Lablup GPU Cloud
• Backend.AI service for cloud users (B2C)
https://cloud.backend.ai/ (in private beta)
Use Backend.AI on the web after sign-up (invitation needed)
• Spec.
Unified AWS + Azure + GCP
Google TPU support (beta)
Azure FileShare + AWS EFS (Elastic File System) for datastore
DGX-2 Custom-built GPU nodes
ap-northeast-2 LG U+ IDC
korea-south
FileShare
asia-east1
EFSRDS TPUsUsers
Internet
Manager + Agents
Agents
Agents 33 / 38
37. Case of Cloud for Machine Learning Education
• Machine learning education and
development cloud service
25 users / 2 months for each term
• Optimal utilization of each education /
development through GPU virtualization
Infrastructure costs reduced by more than 75%
• Automatic resource allocation and
environment preparation with GUI
Optimal operation without dedicated
administrator
Eliminates long term maintenance burden
Infrastructure / management cost reduction through GPU virtualization also applies to on-premise solutions.
20%
0%
23%
0% 20% 40% 60% 80% 100% 120%
Total Cost
Operator Payroll
Infra. Cost
Case Study : Cloud-base ML Education
Service Costs Comparison
A company ML Cloud Backend.AI Cloud
37 / 38
38. Make AI Accessible!
For more information,
Lablup Inc.
Backend.AI
Backend.AI GitHub
Backend.AI Cloud (beta)
https://www.lablup.com
https://www.backend.ai
https://github.com/lablup/backend.ai
https://cloud.backend.ai
38 / 38