Graphics processing units (GPUs) are becoming integral components of modern machine learning engines and platforms. These will provide an introduction to GPUs and their suitability for machine learning workloads. They also discuss enabling technologies, such as CUDA, and demonstrate GPU-accelerated machine learning with the H2O platform. These slides are targeted to machine learning practitioners new to GPUs.
Author: Wen Phan is a Senior Solutions Architect at H2O.ai. Wen works with customers and organizations to architect systems, smarter applications, and data products to make better decisions, achieve positive outcomes, and transform the way they do business. Internally, Wen uses his hard-earned field experiences, customer feedback, and market trends to drive product innovation and development. Wen holds a B.S. in Electrical Engineering and M.S. in Analytics and Decision Sciences.
Follow him on twitter: @wenphan
NVIDIA compute GPUs and software toolkits are key drivers behind major advancements in machine learning. Of particular interest is a technique called "deep learning", which utilizes what are known as Convolution Neural Networks (CNNs) having landslide success in computer vision and widespread adoption in a variety of fields such as autonomous vehicles, cyber security, and healthcare. In this talk is presented a high level introduction to deep learning where we discuss core concepts, success stories, and relevant use cases. Additionally, we will provide an overview of essential frameworks and workflows for deep learning. Finally, we explore emerging domains for GPU computing such as large-scale graph analytics, in-memory databases.
https://tech.rakuten.co.jp/
NVIDIA compute GPUs and software toolkits are key drivers behind major advancements in machine learning. Of particular interest is a technique called "deep learning", which utilizes what are known as Convolution Neural Networks (CNNs) having landslide success in computer vision and widespread adoption in a variety of fields such as autonomous vehicles, cyber security, and healthcare. In this talk is presented a high level introduction to deep learning where we discuss core concepts, success stories, and relevant use cases. Additionally, we will provide an overview of essential frameworks and workflows for deep learning. Finally, we explore emerging domains for GPU computing such as large-scale graph analytics, in-memory databases.
https://tech.rakuten.co.jp/
The field of artificial intelligence (AI) has witnessed tremendous growth in recent years with the advent of Deep Neural Networks (DNNs) that surpass humans in a variety of cognitive tasks.
Jvm tuning for low latency application & CassandraQuentin Ambard
G1, CMS, Shenandoah, or Zing? Heap size at 8GB or 31GB? compressed pointers? Region size? What is the maximum break time? Throughput or Latency... What gain? MaxGCPauseMillis, G1HeapRegionSize, MaxTenuringThreshold, UnlockExperimentalVMOptions, ParallelGCThreads, InitiatingHeapOccupancyPercent, G1RSetUpdatingPauseTimePercent, which parameters have the most impact?
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
This presentation focuses on Nvidia GPUs and explores the topics of what a GPU is, its basic architecture, how it is different from a CPU, its basic working, and what new Nvidia has to offer in consumer as well as server market
Jvm & Garbage collection tuning for low latencies applicationQuentin Ambard
G1, CMS, Shenandoah, or Zing? Heap size at 8GB or 31GB? compressed pointers? Region size? What is the maximum break time? Throughput or Latency... What gain? MaxGCPauseMillis, G1HeapRegionSize, MaxTenuringThreshold, UnlockExperimentalVMOptions, ParallelGCThreads, InitiatingHeapOccupancyPercent, G1RSetUpdatingPauseTimePercent, which parameters have the most impact?
1) Design and Implementation of Multicore Processors
2) Coherence and Consistency
3) Power and Temperature
4) Interconnects
5) Multicore Caches
6) Security
7) Real world examples
This presentation on "Getting Started with HazelCast" was made by Sandeep Kumar Pandey from Lastminute.com in Core Java / BoJUG meetup group on 24th March.
"In this session, we are going to talk about high level architecture of Hazelcast framework and we will look into the Java Collections and concepts which has been used to build the framework. We will also have a live demo on Distributed Cache using Hazelcast."
This webinar by Dov Nimratz (Senior Solution Architect, Consultant, GlobalLogic) was delivered at Embedded Community Webinar #1 on July 7, 2020.
Webinar agenda:
- CPU / GPU / TPU architectures
- Historical context
- CPU and their variations
- GPU or gin in a bottle for artificial intelligence tasks
- TPU architecture specialized artificial intelligence accelerator
- What's next in technology
More details and presentation: https://www.globallogic.com/ua/about/events/embedded-community-webinar-1/
AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzman Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.
"We are excited to present ROCm, the first open-source HPC/ultrascale-class platform for GPU computing that’s also programming-language independent. We are bringing the UNIX philosophy of choice, minimalism and modular software development to GPU computing. The new ROCm foundation lets you choose or even develop tools and a language run time for your application."
Watch the video presentation: http://wp.me/p3RLHQ-fJT
Learn more: https://radeonopencompute.github.io/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
Yahoo! is one of the most-visited web sites in the world. It runs one of the largest private cloud infrastructures, one that operates on petabytes of data every day. Being able to store and manage that data well is essential to the efficient functioning of Yahoo!`s Hadoop clusters. A key component that enables this efficient operation is data compression. With regard to compression algorithms, there is an underlying tension between compression ratio and compression performance. Consequently, Hadoop provides support for several compression algorithms, including gzip, bzip2, Snappy, LZ4 and others. This plethora of options can make it difficult for users to select appropriate codecs for their MapReduce jobs. This paper attempts to provide guidance in that regard. Performance results with Gridmix and with several corpuses of data are presented. The paper also describes enhancements we have made to the bzip2 codec that improve its performance. This will be of particular interest to the increasing number of users operating on “Big Data” who require the best possible ratios. The impact of using the Intel IPP libraries is also investigated; these have the potential to improve performance significantly. Finally, a few proposals for future enhancements to Hadoop in this area are outlined.
Presentation I gave at the SORT Conference in 2011. Was generalized from some work I had done with using GPUs to accelerate image processing at FamilySearch.
The field of artificial intelligence (AI) has witnessed tremendous growth in recent years with the advent of Deep Neural Networks (DNNs) that surpass humans in a variety of cognitive tasks.
Jvm tuning for low latency application & CassandraQuentin Ambard
G1, CMS, Shenandoah, or Zing? Heap size at 8GB or 31GB? compressed pointers? Region size? What is the maximum break time? Throughput or Latency... What gain? MaxGCPauseMillis, G1HeapRegionSize, MaxTenuringThreshold, UnlockExperimentalVMOptions, ParallelGCThreads, InitiatingHeapOccupancyPercent, G1RSetUpdatingPauseTimePercent, which parameters have the most impact?
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
This presentation focuses on Nvidia GPUs and explores the topics of what a GPU is, its basic architecture, how it is different from a CPU, its basic working, and what new Nvidia has to offer in consumer as well as server market
Jvm & Garbage collection tuning for low latencies applicationQuentin Ambard
G1, CMS, Shenandoah, or Zing? Heap size at 8GB or 31GB? compressed pointers? Region size? What is the maximum break time? Throughput or Latency... What gain? MaxGCPauseMillis, G1HeapRegionSize, MaxTenuringThreshold, UnlockExperimentalVMOptions, ParallelGCThreads, InitiatingHeapOccupancyPercent, G1RSetUpdatingPauseTimePercent, which parameters have the most impact?
1) Design and Implementation of Multicore Processors
2) Coherence and Consistency
3) Power and Temperature
4) Interconnects
5) Multicore Caches
6) Security
7) Real world examples
This presentation on "Getting Started with HazelCast" was made by Sandeep Kumar Pandey from Lastminute.com in Core Java / BoJUG meetup group on 24th March.
"In this session, we are going to talk about high level architecture of Hazelcast framework and we will look into the Java Collections and concepts which has been used to build the framework. We will also have a live demo on Distributed Cache using Hazelcast."
This webinar by Dov Nimratz (Senior Solution Architect, Consultant, GlobalLogic) was delivered at Embedded Community Webinar #1 on July 7, 2020.
Webinar agenda:
- CPU / GPU / TPU architectures
- Historical context
- CPU and their variations
- GPU or gin in a bottle for artificial intelligence tasks
- TPU architecture specialized artificial intelligence accelerator
- What's next in technology
More details and presentation: https://www.globallogic.com/ua/about/events/embedded-community-webinar-1/
AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzman Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.
"We are excited to present ROCm, the first open-source HPC/ultrascale-class platform for GPU computing that’s also programming-language independent. We are bringing the UNIX philosophy of choice, minimalism and modular software development to GPU computing. The new ROCm foundation lets you choose or even develop tools and a language run time for your application."
Watch the video presentation: http://wp.me/p3RLHQ-fJT
Learn more: https://radeonopencompute.github.io/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
Yahoo! is one of the most-visited web sites in the world. It runs one of the largest private cloud infrastructures, one that operates on petabytes of data every day. Being able to store and manage that data well is essential to the efficient functioning of Yahoo!`s Hadoop clusters. A key component that enables this efficient operation is data compression. With regard to compression algorithms, there is an underlying tension between compression ratio and compression performance. Consequently, Hadoop provides support for several compression algorithms, including gzip, bzip2, Snappy, LZ4 and others. This plethora of options can make it difficult for users to select appropriate codecs for their MapReduce jobs. This paper attempts to provide guidance in that regard. Performance results with Gridmix and with several corpuses of data are presented. The paper also describes enhancements we have made to the bzip2 codec that improve its performance. This will be of particular interest to the increasing number of users operating on “Big Data” who require the best possible ratios. The impact of using the Intel IPP libraries is also investigated; these have the potential to improve performance significantly. Finally, a few proposals for future enhancements to Hadoop in this area are outlined.
Presentation I gave at the SORT Conference in 2011. Was generalized from some work I had done with using GPUs to accelerate image processing at FamilySearch.
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetEric Haibin Lin
Training large deep learning models like Mask R-CNN and BERT takes lots of time and compute resources. Using MXNet, the Amazon Web Services deep learning framework team has been working with NVIDIA to optimize many different areas to cut the training time from hours to minutes.
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
This slide introduces technical specs and details about Backend.AI 19.09.
* On-premise clustering / container orchestration / scaling on cloud
* Container-level fractional GPU technology to use one GPU as many GPUs on many containers at the same time.
* NVidia GPU Cloud integrations
* Enterprise features
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
Graphics cards (GPU) open up new ways of processing and analytics over big data, showing millisecond selections over billions of lines, as well as telling stories about data. #QikkDB
How to present data to be understood by everyone? Data analysis is for scientists, but data storytelling is for everyone. For managers, product owners, sales teams, the general public. #TellStory
Learn about high performance computing with GPU and how to present data with a rich Covid-19 data story example on the upcoming webinar.
Netflix success is credited to pioneering ways that the company introduced AI and ML into its products, services and infrastructure. ML learning is applied to solve a wide range of problems at Netflix.
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)PhtRaveller
This repost was presented at Fronties in Computational Astrophysics Conference (Lyon, France, 11-15 October, 2010). I give brief and light introduction to CUDA architecture and it's benefits for scientific HPC. Also a brief description about KIPT in-house package for N-body simulations is given. This talk with minor differences was also presented at
seminars in Institute for Single Crystals (Kharkov) and Kharkov Institute of Physics and Technology.
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Abstract: We will introduce RAPIDS, a suite of open source libraries for GPU-accelerated data science, and illustrate how it operates seamlessly with MLflow to enable reproducible training, model storage, and deployment. We will walk through a baseline example that incorporates MLflow locally, with a simple SQLite backend, and briefly introduce how the same workflow can be deployed in the context of GPU enabled Kubernetes clusters.
1) NVIDIA-Iguazio Accelerated Solutions for Deep Learning and Machine Learning (30 mins):
About the speaker:
Dr. Gabriel Noaje, Senior Solutions Architect, NVIDIA
http://bit.ly/GabrielNoaje
2) GPUs in Data Science Pipelines ( 30 mins)
- GPU as a Service for enterprise AI
- A short demo on the usage of GPUs for model training and model inferencing within a data science workflow
About the speaker:
Anant Gandhi, Solutions Engineer, Iguazio Singapore. https://www.linkedin.com/in/anant-gandhi-b5447614/
Similar to Introduction to GPUs for Machine Learning (20)
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
Sandeep Singh, Head of Applied AI Computer Vision, Beans.ai
H2O Open Source GenAI World SF 2023
In the modern era of machine learning, leveraging both open-source and closed-source solutions has become paramount for achieving cutting-edge results. This talk delves into the intricacies of seamlessly integrating open-source Large Language Model (LLM) solutions like Vicuna, Falcon, and Llama with industry giants such as ChatGPT and Google's Palm. As the demand for fine-tuned and specialized datasets grows, it is imperative to understand the synergy between these tools. Attendees will gain insights into best practices for building and enriching datasets tailored for fine-tuning tasks, ensuring that their LLM projects are both robust and efficient. Through real-world examples and hands-on demonstrations, this talk will equip attendees with the knowledge to harness the power of both open and closed-source tools in a coherent and effective manner.
Patrick Hall, Professor, AI Risk Management, The George Washington University
H2O Open Source GenAI World SF 2023
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you!
Dr. Alexy Khrabrov, Open Source Science Community Director, IBM
H2O Open Source GenAI World SF 2023
In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.
Michelle Tanco, Head of Product, H2O.ai
H2O Open Source GenAI World SF 2023
Learn how the makers at H2O.ai are building internal tools to solve real use cases using H2O Wave and h2oGPT. We will walk through an end-to-end use case and discuss how to incorporate business rules and generated content to rapidly develop custom AI apps using only Python APIs.
Applied Gen AI for the Finance Vertical Sri Ambati
Megan Kurka, Vice President, Customer Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
Discover the transformative power of Applied Gen AI. Learn how the H2O team builds customized applications and workflows that integrate capabilities of Gen AI and AutoML specifically designed to address and enhance financial use cases. Explore real world examples, learn best practices, and witness firsthand how our innovative solutions are reshaping the landscape of finance technology.
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
Pascal Pfeiffer, Principal Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
En esta reunión virtual, damos una introducción a la plataforma de aprendizaje automático de código abierto número 1, H2O-3 y te mostramos cómo puedes usarla para desarrollar modelos para resolver diferentes casos de uso.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
In this session, you will learn about what you should do after you’ve taken an AI transformation baseline. Over the span of this session, we will discuss the next steps in moving toward AI readiness through alignment of talent and tools to drive successful adoption and continuous use within an organization.
To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course
To find the Youtube video about this presentation: https://youtu.be/K1Cl3x3rd8g
Speaker:
Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
2. Agenda
• Context and Why GPUs?
– Matrix Multiplication Example
• CUDA
• GPU and Machine Learning
– Deep Learning
– Parallel Computing: GBM, GLM
• Getting Started
• Others
3. Need for More Compute
• Lots of Data
• Complex Architectures
• Many Models
4. Historic Ways for More Compute
• Faster Clock Rates
• Multi-Core
• Distributed Computing
5. CPU Trends
Original data collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten,
dotted line extrapolations by C. Moore
9. GPUs for Parallel Tasks
Traditional CPUs are
not economically feasible
2.3 PFlops 7000 homes
7.0
Megawatts
7.0
Megawatts
CPU
Optimized for
Serial Tasks
GPU Accelerator
Optimized for Many
Parallel Tasks
10x performance/socket
> 5x energy efficiency
Era of GPU-accelerated
computing is here
nVIDIA
10. GPU Devotes More Transistors to Data Processing
CUDA C Programming Guide
12. Latency Versus Throughput
• Latency: Time to do a task.
• Throughput: Number of tasks per unit time.
• Fictitious Example:
– CPU
• Latency: 1 ns per task
• Throughput: (1 task per ns) x (6 cores) = 6 task per ns
– GPU
• Latency: 10 ns per task
• Throughput: (0.1 task per ns) x (2000 cores) = 200 task per ns
• CPUs are latency optimized; GPUs are throughput optimized
20. CUDA
• Historically, GPUs were used for, well, graphics processing. But, people realized that the fine-
grained parallelism inherently in GPU architecture could be exploited for general purpose
computing.
• CUDA (Compute Unified Device Architecture)
– Parallel computing platform
– Programming model and API
– Allows enabled GPUs for general purpose processing
21. Speed Up Parallelizable Code
Application Code
GPU
Use GPU to
Parallelize
Compute-Intensive
Functions
CPU
Rest of Sequential
CPU Code
nVIDIA
46. Convolutional Neural Networks
• Leverages the fact that data has spatial structure
– Add idea of locality
• Tremendous success with computer vision tasks
• “Put deep learning on the map”
52. Convolutional Layer
• f = receptive field
(filter size)
• p = padding
• s = stride
• m = number of filters
Input Volume Output Volume
Convolution
wI
hI
dI
wO
dO
hO
wO =
wI f + 2p
s
+ 1
hO =
wI f + 2p
s
+ 1
dO = m
59. ImageNet Entries Using GPUs
https://devblogs.nvidia.com/parallelforall/nvidia-ibm-cloud-support-imagenet-large-scale-visual-recognition-challenge/
60. Deep Water: Next-Gen Distributed Deep Learning
One Interface - GPU Enabled - Significant Performance Gains
Inherits All H2O Properties in Scalability, Ease of Use and Deployment
Recurrent Neural Networks
enabling natural language
processing, sequences, time series,
and more
Convolutional Neural Networks
enabling Image, video, speech
recognition
Hybrid Neural Network Architectures
enabling speech to text translation,
image captioning, scene parsing and
more
H2O integrates with existing GPU
backends for significant performance
gains
H2O Deep Learning Algo
77. GBM Data Parallelism
1
2
K
X = {X1, . . . , XK}
math (X1)
math (X2)
math (XK)
{Xi; ti} = f(math (X1) , . . . , math (XK))
78. GBM Data Parallelism
1
2
K
X = {X1, . . . , XK}
math (X1)
math (X2)
math (XK)
{Xi; ti} = f(math (X1) , . . . , math (XK))
Full Data Parallelism for Each Level of Tree Growth!
80. GPU Cluster
• Level GPUs to accelerate processing and fine-grain parallelism on each node
CPU
GPU
math (X1) math (X2) math (XK)math (X3)
81. GBM on GPU
T1(x) T2(x) T3(x) TM (x)
Application Code
GPU
Use GPU to
Compute-Intensive
Functions
CPU
Rest of Sequential
CPU Code
82. Parallel Computing
• Model Parallelism: Split up a single model
• Data Parallelism: Split up data to train a single model
• Training Parallelism: Split up different parts of the training process
– Ensemble Base Learners
– Cross-Validation
– Hyperparameters