This document provides a summary of a presentation on innovating with AI at scale. The presentation discusses:
1. Implementing AI use cases at scale across industries like retail, life sciences, and transportation.
2. Deploying AI models to the edge using tools like TensorFlow and TensorRT for high-performance inference on devices.
3. Best practices and frameworks for distributed deep learning training on large clusters to train models faster.
Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs Indrajit Poddar
GPU and NVLink accelerated training and inference with tensorflow and caffe on OpenPOWER systems. Presented at a meetup prior to DataWorks Summit Munich 2017.
Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs Indrajit Poddar
GPU and NVLink accelerated training and inference with tensorflow and caffe on OpenPOWER systems. Presented at a meetup prior to DataWorks Summit Munich 2017.
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerIndrajit Poddar
Transparently accelerated Deep Learning workloads on OpenPOWER systems and GPUs using easy to use open source frameworks such as Caffe, Torch, Tensorflow, Theano.
NVIDIA DEEP LEARNING INFERENCE PLATFORM PERFORMANCE STUDY
| TECHNICAL OVERVIEW
| 1
Introduction
Artificial intelligence (AI), the dream of computer scientists for over half
a century, is no longer science fiction—it is already transforming every
industry. AI is the use of computers to simulate human intelligence. AI
amplifies our cognitive abilities—letting us solve problems where the
complexity is too great, the information is incomplete, or the details are
too subtle and require expert training.
While the machine learning field has been active for decades, deep
learning (DL) has boomed over the last five years. In 2012, Alex
Krizhevsky of the University of Toronto won the ImageNet image
recognition competition using a deep neural network trained on NVIDIA
GPUs—beating all the human expert algorithms that had been honed
for decades. That same year, recognizing that larger networks can learn
more, Stanford’s Andrew Ng and NVIDIA Research teamed up to develop
a method for training networks using large-scale GPU computing
systems. These seminal papers sparked the “big bang” of modern AI,
setting off a string of “superhuman” achievements. In 2015, Google and
Microsoft both beat the best human score in the ImageNet challenge. In
2016, DeepMind’s AlphaGo recorded its historic win over Go champion
Lee Sedol and Microsoft achieved human parity in speech recognition.
GPUs have proven to be incredibly effective at solving some of the most
complex problems in deep learning, and while the NVIDIA deep learning
platform is the standard industry solution for training, its inferencing
capability is not as widely understood. Some of the world’s leading
enterprises from the data center to the edge have built their inferencing
solution on NVIDIA GPUs. Some examples include:
In this deck from the 2016 HPC Advisory Council Switzerland Conference, DK Panda from Ohio State University presents: High-Performance and Scalable Designs of Programming Models for Exascale Systems.
"This talk will focus on challenges in designing runtime environments for Exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPUs and Intel MIC) and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented."
Watch the video presentation: http://wp.me/p3RLHQ-f7c
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Healthcare has became one of the most important aspects of everyones life. Its importance has surged due to the latests outbreaks and due to this latest pandemic it has become mandatory to collaborate to improve everyones Healthcare as soon as possible.
IBM has reacted quickly sharing not only its knowledge but also its Artificial Intelligence Supercomputers all around the world.
Those Supercomputers are helping to prevail this outbreak and also future ones.
They have completely different features compared to proposals from other players of this Supercomputers market.
We will try to make a quick look at the differences of those AI focused Supercomputers and how they can help in the R&D of Healthcare solutions for everyone, from those ones with access to a big IBM AI Supercomputer to those ones with access to only one small IBM AI focused server.
A Primer on FPGAs - Field Programmable Gate ArraysTaylor Riggan
A focus on the use of FPGAs by cloud service providers. Includes Microsoft Azure Catapult, Google Tensor Processors, and Amazon EC2 F1 instances. Also includes background info on how to get started with FPGAs
JMI Techtalk: 한재근 - How to use GPU for developing AILablup Inc.
이 Techtalk에서는 AI 개발을 위해 GPU를 사용할 때 Nvidia가 제공하는 성능 향상을 위한 다양한 방법들을 기술자료들과 함께 소개합니다. 특히 Volta 아키텍처를 기반으로 Mixed precision을 도입하여 성능을 향상하는 과정에 관한 내용을 자세히 다룹니다.
This Techtalk introduces a variety of ways to improve the performance that Nvidia provides when using the GPU for AI development, along with technical resources. In particular, this talk discusses the process of improving performance by introducing mixed precision based on the Volta architecture.
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
Artificial Intelligence is impacting all areas of society, from healthcare and transportation to smart cities and energy. How NVIDIA invests both in internal pure research and accelerated computation to enable its diverse customer base, across gaming & extended reality, graphics, AI, robotics, simulation, high performance scientific computing, healthcare & more. You will be introduced to the GPU computing platform & shown real world successfully deployed applications as well as a glimpse into the current state of the art across academia, enterprise and startups.
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
This slide introduces technical specs and details about Backend.AI 19.09.
* On-premise clustering / container orchestration / scaling on cloud
* Container-level fractional GPU technology to use one GPU as many GPUs on many containers at the same time.
* NVidia GPU Cloud integrations
* Enterprise features
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerIndrajit Poddar
Transparently accelerated Deep Learning workloads on OpenPOWER systems and GPUs using easy to use open source frameworks such as Caffe, Torch, Tensorflow, Theano.
NVIDIA DEEP LEARNING INFERENCE PLATFORM PERFORMANCE STUDY
| TECHNICAL OVERVIEW
| 1
Introduction
Artificial intelligence (AI), the dream of computer scientists for over half
a century, is no longer science fiction—it is already transforming every
industry. AI is the use of computers to simulate human intelligence. AI
amplifies our cognitive abilities—letting us solve problems where the
complexity is too great, the information is incomplete, or the details are
too subtle and require expert training.
While the machine learning field has been active for decades, deep
learning (DL) has boomed over the last five years. In 2012, Alex
Krizhevsky of the University of Toronto won the ImageNet image
recognition competition using a deep neural network trained on NVIDIA
GPUs—beating all the human expert algorithms that had been honed
for decades. That same year, recognizing that larger networks can learn
more, Stanford’s Andrew Ng and NVIDIA Research teamed up to develop
a method for training networks using large-scale GPU computing
systems. These seminal papers sparked the “big bang” of modern AI,
setting off a string of “superhuman” achievements. In 2015, Google and
Microsoft both beat the best human score in the ImageNet challenge. In
2016, DeepMind’s AlphaGo recorded its historic win over Go champion
Lee Sedol and Microsoft achieved human parity in speech recognition.
GPUs have proven to be incredibly effective at solving some of the most
complex problems in deep learning, and while the NVIDIA deep learning
platform is the standard industry solution for training, its inferencing
capability is not as widely understood. Some of the world’s leading
enterprises from the data center to the edge have built their inferencing
solution on NVIDIA GPUs. Some examples include:
In this deck from the 2016 HPC Advisory Council Switzerland Conference, DK Panda from Ohio State University presents: High-Performance and Scalable Designs of Programming Models for Exascale Systems.
"This talk will focus on challenges in designing runtime environments for Exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPUs and Intel MIC) and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented."
Watch the video presentation: http://wp.me/p3RLHQ-f7c
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Healthcare has became one of the most important aspects of everyones life. Its importance has surged due to the latests outbreaks and due to this latest pandemic it has become mandatory to collaborate to improve everyones Healthcare as soon as possible.
IBM has reacted quickly sharing not only its knowledge but also its Artificial Intelligence Supercomputers all around the world.
Those Supercomputers are helping to prevail this outbreak and also future ones.
They have completely different features compared to proposals from other players of this Supercomputers market.
We will try to make a quick look at the differences of those AI focused Supercomputers and how they can help in the R&D of Healthcare solutions for everyone, from those ones with access to a big IBM AI Supercomputer to those ones with access to only one small IBM AI focused server.
A Primer on FPGAs - Field Programmable Gate ArraysTaylor Riggan
A focus on the use of FPGAs by cloud service providers. Includes Microsoft Azure Catapult, Google Tensor Processors, and Amazon EC2 F1 instances. Also includes background info on how to get started with FPGAs
JMI Techtalk: 한재근 - How to use GPU for developing AILablup Inc.
이 Techtalk에서는 AI 개발을 위해 GPU를 사용할 때 Nvidia가 제공하는 성능 향상을 위한 다양한 방법들을 기술자료들과 함께 소개합니다. 특히 Volta 아키텍처를 기반으로 Mixed precision을 도입하여 성능을 향상하는 과정에 관한 내용을 자세히 다룹니다.
This Techtalk introduces a variety of ways to improve the performance that Nvidia provides when using the GPU for AI development, along with technical resources. In particular, this talk discusses the process of improving performance by introducing mixed precision based on the Volta architecture.
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
Artificial Intelligence is impacting all areas of society, from healthcare and transportation to smart cities and energy. How NVIDIA invests both in internal pure research and accelerated computation to enable its diverse customer base, across gaming & extended reality, graphics, AI, robotics, simulation, high performance scientific computing, healthcare & more. You will be introduced to the GPU computing platform & shown real world successfully deployed applications as well as a glimpse into the current state of the art across academia, enterprise and startups.
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
This slide introduces technical specs and details about Backend.AI 19.09.
* On-premise clustering / container orchestration / scaling on cloud
* Container-level fractional GPU technology to use one GPU as many GPUs on many containers at the same time.
* NVidia GPU Cloud integrations
* Enterprise features
If you're like most of the world, you're on an aggressive race to implement machine learning applications and on a path to get to deep learning. If you can give better service at a lower cost, you will be the winners in 2030. But infrastructure is a key challenge to getting there. What does the technology infrastructure look like over the next decade as you move from Petabytes to Exabytes? How are you budgeting for more colossal data growth over the next decade? How do your data scientists share data today and will it scale for 5-10 years? Do you have the appropriate security, governance, back-up and archiving processes in place? This session will address these issues and discuss strategies for customers as they ramp up their AI journey with a long term view.
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
Discover, manage, deploy, monitor – rinse and repeat. In this session we show how Azure Machine Learning can be used to create the right AI model for your challenge and then easily customize it using your development tools while relying on Azure ML to optimize them to run in hardware accelerated environments for the cloud and the edge using FPGAs and Neural Network accelerators. We then show you how to deploy the model to highly scalable web services and nimble edge applications that Azure can manage and monitor for you. Finally, we illustrate how you can leverage the model telemetry to retrain and improve your content.
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
Tecnologias NVIDIA aplicadas ao e-commerce. Muito além do hardware.
Jomar Silva
Gerente de relacionamento com desenvolvedores para a América Latina - NVIDIA
https://eventos.ecommercebrasil.com.br/forum/
Semiconductors are the driving force behind the AI evolution and enable its adoption across various application areas ranging from connected and automated driving to smart healthcare and wearables. Given that, electronics research, design and manufacturing communities around the world are increasingly investing in specialized AI chips providing less latency, greater processing power, higher bandwidth and faster performance. AI also attracts new technology players to invest in making their own specialized AI chips, changing the electronics manufacturing landscape and moving the AI technology towards machine learning, deep learning and neural networks.
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
RAPIDS – Open GPU-accelerated Data Science
RAPIDS is an initiative driven by NVIDIA to accelerate the complete end-to-end data science ecosystem with GPUs. It consists of several open source projects that expose familiar interfaces making it easy to accelerate the entire data science pipeline- from the ETL and data wrangling to feature engineering, statistical modeling, machine learning, and graph analysis.
Corey J. Nolet
Corey has a passion for understanding the world through the analysis of data. He is a developer on the RAPIDS open source project focused on accelerating machine learning algorithms with GPUs.
Adam Thompson
Adam Thompson is a Senior Solutions Architect at NVIDIA. With a background in signal processing, he has spent his career participating in and leading programs focused on deep learning for RF classification, data compression, high-performance computing, and managing and designing applications targeting large collection frameworks. His research interests include deep learning, high-performance computing, systems engineering, cloud architecture/integration, and statistical signal processing. He holds a Masters degree in Electrical & Computer Engineering from Georgia Tech and a Bachelors from Clemson University.
How to optimize Hortonworks Apache Spark ML workloads on Power - POWER 8/9 architecture is the latest offering from IBM and OpenPower foundation. It is the perfect platform for optimizing Hortonworks Spark's performance. During this presentation we will walk the audience through steps required to optimize YARN, HDFS, and Spark on a Power cluster.
Step required:
1) Classify workload into CPU, Memory, IO or mixed (CPU, memory, IO) intensive
2) Characterize "out-of-box" Hortonworks spark workload to understand CPU, Memory, IO and Network performance characteristics
3) Floor Plan cluster resources
4) Tune "out-of-box" workload to navigate "Roofline" Performance space in the above named dimensions
5) If workload is Memory / IO/ Network intensive bound then tune SPARK to increase operational intensity operations/byte as much as possible to make it CPU bound
6) Divide search space into regions and perform exhaustive search.
7) Identify Performance bottlenecks by resource monitoring and tune the System, JVM or application layer by profiling application and hardware counters if required.
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Abstract: We will introduce RAPIDS, a suite of open source libraries for GPU-accelerated data science, and illustrate how it operates seamlessly with MLflow to enable reproducible training, model storage, and deployment. We will walk through a baseline example that incorporates MLflow locally, with a simple SQLite backend, and briefly introduce how the same workflow can be deployed in the context of GPU enabled Kubernetes clusters.
HPE and NVIDIA are delivering a leading portfolio of optimized AI solutions that transform business and industry to gain deeper insights and facilitate solving the world’s greatest challenges. Join this session to learn about how NVIDIA V100, the world’s most powerful GPU, powering HPE 6500 Systems, the HPE AI Systems, to provide new business insights and outcomes.
End to End Machine Learning Open Source Solution Presented in Cisco Developer...Manish Harsh
The RAPIDS suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Licensed under Apache 2.0, RAPIDS is incubated by NVIDIA® based on extensive hardware and data science science experience. RAPIDS utilizes NVIDIA CUDA® primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...Infoshare
Podczas tej sesji przyjrzymy się, w jaki sposób można skorzystać z platformy Microsoft do budowy tzw. „inteligentnych” rozwiązań. W przykładach zobaczymy zarówno Cognitive Services, jak i wykorzystaniu GPU (a dokładniej – Batch AI) do uczenia sieci neuronowych. Zajmiemy się także skomplikowanym zagadnieniami związanymi z projektowaniem – tak by algorytmy rozszerzały ludzkie możliwości (a nie nas zastępowały). Sesja zakłada że słuchacze umieją programować.
A talk on reducing costs & increasing efficiencies by designing, testing & engineering in simulation first, plus examples of robotics & environmental capability.
Similar to Innovation with ai at scale on the edge vt sept 2019 v0 (20)
The Libre-SOC Project aims to create an entirely Libre-Licensed, transparently-developed fully auditable Hybrid 3D CPU-GPU-VPU, using the Supercomputer-class OpenPOWER ISA as the foundation.
Our first test ASIC is a 180nm "Fixed-Point" Power ISA v3.0B processor, 5.1mm x 5.9mm, as a proof-of-concept for the team, whose primary expertise is in Software Engineering. Software Engineering training brings a radically different approach to Hardware development: extensive unit tests, source code revision control, automated development tools are normal. Libre Project Management brings even more: bug trackers, mailing lists, auditable IRC logs and a wiki are standard fare for Libre Projects that are simply not normal Industry-Standard practice.
This talk therefore goes through the workflow, from the original HDL through to the GDS-II layout, showing how we were able to keep track of the development that led to the IMEC 180nm tape-out in July 2021. In particular, by following a parallel development process involving "Real" and "Symbolic" Cell Libraries, developed by Chips4Makers, will be shown how our developers did not need to sign a Foundry NDA, but were still able to work side-by-side with a University that did. With this parallel development process, the University upheld their NDA obligations, and Libre-SOC were simultaneously able to honour its Transparency Objectives.
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
IT Industry is going through two major transformations. One is adaption of AI and tight integration of the same in the commercial applications and enterprise workflow. Two the transformation in software architecture through the concepts like microservices and the cloud native architecture. These transformation alongside the aggressive adaption of IoT/mobile and 5G in all our day today activities is making the world operate in more real time manner which opens-up a new challenge to improve the hardware architecture to adapt to these requirements. These above two major transformation pushes the boundary of the entire systems stack making the designer rethink hardware. This talk presents you a picture of how the enterprise Industry leading POWER architecture is transforming to fulfill the performance demands of these newer generation workloads with primary focus on the AI acceleration on the chip.
July 16th 2021 , Friday for our newest workshop with DoMS, IIT Roorkee, Concept to Solutions using OpenPOWER Stack. It's time to discover advances in #DeepLearning tools and techniques from the world's leading innovators across industries, research, and public speakers.
Register here:
https://lnkd.in/ggxMq2N
This presentation covers two uses cases using OpenPOWER Systems
1. Diabetic Retinopathy using AI on NVIDIA Jetson Nano: The objective is to classify the diabetic level solely on retina image in a remote area with minimum doctor's inference. The model uses VGG16 network architecture and gets trained from scratch on POWER9. The model was deployed on the Jetson Nano board.
1. Classifying Covid positivity using lung X-ray images: The idea is to build ML models to detect positive cases using X-ray images. The model was trained on POWER9, and the application was developed using Python.
IBM Bayesian Optimization Accelerator (BOA) is a do-it-yourself toolkit to apply state-of-the-art Bayesian inferencing techniques and obtain optimal solutions for complex, real-world design simulations without requiring deep machine learning skills. This talk will describe IBM BOA, its differentiation and ease of use, and how researchers can take advantage of it for optimizing any arbitrary HPC simulation.
This presentation covers various partners and collaborators who are currently working with OpenPOWER foundation ,Use cases of OpenPOWER systems in multiple Industries , OpenPOWER Workgroups and OpenCAPI features .
The IBM POWER10 processor represents the 10th generation of the POWER family of enterprise computing engines. Its performance is a result of both powerful processing cores and high-bandwidth intra- and inter-chip interconnect. POWER10 systems can be configured with up to 16 processor chips and 1920 simultaneous threads of execution. Cross-system memory sharing, through the new Memory Inception technology, and 2 Petabytes of addressing space support an expansive memory system. The POWER10 processing core has been significantly enhanced over its POWER9 predecessor, including a doubling of vector units and the addition of an all-new matrix math engine. Throughput gains from POWER9 to POWER10 average 30% at the core level and three-fold at the socket level. Those gains can reach ten- or twenty-fold at the socket level for matrix-intensive computations.
Everything is changing from Health Care to the Automotive markets without forgetting Financial markets or any type of engineering everything has stopped being created as an individual or best-case scenario a team effort to something that is being developed and perfectioned by using AI and hundreds of computers.And even AI is something that we no longer can run in a single computer, no matter how powerful it is. What drives everything today is HPC or High-Performance Computing heavily linked to AI In this session we will discuss about AI, HPC computing, IBM Power architecture and how it can help develop better Healthcare, better Automobiles, better financials and better everything that we run on them
Macromolecular crystallography is an experimental technique allowing to explore 3D atomic structure of proteins, used by academics for research in biology and by pharmaceutical companies in rational drug design. While up to now development of the technique was limited by scientific instruments performance, recently computing performance becomes a key limitation. In my presentation I will present a computing challenge to handle 18 GB/s data stream coming from the new X-ray detector. I will show PSI experiences in applying conventional hardware for the task and why this attempt failed. I will then present how IC 922 server with OpenCAPI enabled FPGA boards allowed to build a sustainable and scalable solution for high speed data acquisition. Finally, I will give a perspective, how the advancement in hardware development will enable better science by users of the Swiss Light Source.
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
As the adoption of AI technologies increases and matures, the focus will shift from exploration to time to market, productivity and integration with existing workflows. Governing Enterprise data, scaling AI model development, selecting a complete, collaborative hybrid platform and tools for rapid solution deployments are key focus areas for growing data scientist teams tasked to respond to business challenges. This talk will cover the challenges and innovations for AI at scale for the Industires such as Healthcare and Automotive , the AI ladder and AI life cycle and infrastructure architecture considerations.
This talk gives an introduction about Healthcare Use cases - The AI ladder and Lifestyle AI at Scale Themes The iterative nature of the workflow and some of the important components to be aware in developing AI health care solutions were being discussed. The different types of algorithms and when machine learning might be more appropriate in deep learning or the other way will also be discussed. Use cases in terms of examples are also shared as part of this presentation .
Healthcare has became one of the most important aspects of everyones life. Its importance has surged due to the latests outbreaks and due to this latest pandemic it has become mandatory to collaborate to improve everyones Healthcare as soon as possible.
IBM has reacted quickly sharing not only its knowledge but also its Artificial Intelligence Supercomputers all around the world.
Those Supercomputers are helping to prevail this outbreak and also future ones.
They have completely different features compared to proposals from other players of this Supercomputers market.
We will try to make a quick look at the differences of those AI focused Supercomputers and how they can help in the R&D of Healthcare solutions for everyone, from those ones with access to a big IBM AI Supercomputer to those ones with access to only one small IBM AI focused server.
Moving object recognition (MOR) corresponds to the localization and classification of moving objects in videos. Discriminating moving objects from static objects and background in videos is an essential task for many computer vision applications. MOR has widespread applications in intelligent visual surveillance, intrusion detection, anomaly detection and monitoring, industrial sites monitoring, detection-based tracking, autonomous vehicles, etc. In this session, Murari provided a poster about the deep learning algorithms to identify both locations and corresponding categories of moving objects with a convolutional network. The challenges in developing such algorithms have been discussed.
Clarisse Hedglin from IBM presented this as part of 3 days International Summit .. She shared the scenarios AI can solve for today using the IBM AI infrastructure.
Dr Murari Mandal from NUS presented as part of 3 days OpenPOWER Industry summit about Robustness in Deep learning where he talked about AI Breakthroughs , Performance improments in AI models , Adversarial attacks , Attacks on semantic segmentation , Attacs on object detector , Defending Against adversarial attacks and many other areas.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
PHP Frameworks: I want to break free (IPC Berlin 2024)
Innovation with ai at scale on the edge vt sept 2019 v0
1. Innovating with
AI at Scale:
Tools and Tips for
Training and Inference
Presenter: Clarisse Taaffe-Hedglin
clarisse@us.ibm.com
Executive AI Architect
IBM Systems
2. 1. Drivers of the AI explosion
2. Implementing use cases at scale
3. Deploying models to the edge
2
5. Artificial Intelligence brings
new Cognitive Capabilities
• Computers can be trained to “See”
Example: Airport security inspecting luggage
• Computers can be trained to “Hear”
Example: Maintenance crew listening to railcars
• Computers can be trained to “do”: mimic an expert
Example: Mobile phone provider predicting customer churn
6. Data + Algorithms + Compute
CPU
GPU
FPGA
The key triggers rapidly advancing AI
Open Source Software
9. ML Framework Landscape
9
Which ML frameworks have you used
the most over the last 5 years?
Source: Kaggle Data Science Survey 2018
scikit-learn is, by far, the most widely-used
ML framework
Why?
• Wide variety of ML models
• Good documentation
• Standardized API
Some downsides of scikit-learn are:
1. Lack of support of deep learning (DL)
2. Slow performance for large datasets
Problem (1) is addressed by DL frame works in
PowerAI (TensorFlow, PyTorch) recently rebranded
as Watson Machine Learning Accelerator
Problem (2) is addressed by Snap ML
10. Watson Machine Learning Community Edition
TensorFlow
TensorFlow Probability
TensorBoard
TensorFlow-Keras
BVLC Caffe
IBM Enhanced Caffe
Caffe2
OpenBLAS
HDF5
Curated, tested and pre-compiled binary
software distribution that enables enterprises
to quickly and easily deploy deep learning for
their data science and analytics development
Including all of the following frameworks:
Nvidia RAPIDS
11. Distributed Deep Learning
Simplifies the process of training
deep learning models across a
cluster for faster time to results.
Software Libraries
WML CE software and the
accelerated Power servers
support a host of accelerator
libraries like SnapML, Nvidia
RAPIDS
Large Model Support
Use system memory with GPUs
to support more complex models
and higher resolution data.
IBM adds value to curated, tested, and
pre-compiled frameworks with
Watson Machine Learning Community Edition
GPU
CPU
12. Evolving from compute systems to Cognitive Systems
P8 P9 P10
Open Frameworks
Partnerships
Industry Alignment
DevEcosystem
Accelerator Roadmaps
Open Accelerator
Interfaces
Not Just About Hardware Design
hardware
software
+
It’s about co-optimization and open
innovation
which just work for ML, DL, and AI
IBM Software
12
18. Train larger more complex models
Large Model SupportTraditional Model Support
Limited memory on GPU forces tradeoff
in model size / data resolution
Use system memory and GPU to support more
complex and higher resolution data
CPUDDR4
GPU
PCIe
Graphics
Memory
System
Bottleneck
Here
POWER
CPU
DDR4
GPU
NVLink
Graphics
Memory
POWER NVLink
Data Pipe
19. Large AI Models Train
~4 Times Faster
POWER9 Servers with
NVLink to GPUs
vs
x86 Servers with PCIe to
GPUs
19
3.1 Hours
49 Mins
0
2000
4000
6000
8000
10000
12000
Xeon x86 2640v4 w/
4x V100 GPUs
Power AC922 w/ 4x
V100 GPUs
Time(secs)
Caffe with LMS (Large Model Support)
Runtime of 1000 Iterations
3.8x Faster
GoogleNet model on Enlarged
ImageNet Dataset (2240x2240)
20. TensorFlow Large Model Support NVLINK2 Advantage
s.
3DUnet segmentation models with
higher resolution images allows for
learning and labeling finer details
and structures of brain tumors.
https://developer.ibm.com/linuxonpower/2018/07/27/tensorflow-large-model-support-case-study-3d-image-segmentation/
21. Accelerating Machine Learning
Why Fast?
Speed is important/crucial in many cases:
• online re-training of models
• model selection and hyper-parameter tuning
• fast adaptability to changes
Why Large-Scale?
Large datasets arise in numerous business-critical
applications: recommendation, credit fraud, advertising,
space exploration, weather, etc.
Why Resource-Savvy?
Not everyone can afford on-prem computing.
Renting computing in the cloud is billed by usage.
Less usage means savings, higher profit margin.
Snap ML is a framework for training
Machine Learning (ML) Models
It is characterized by:
high performance
scalability to very large datasets
high resource efficiency
Artificial
Intelligence
Machine
Learning
Deep Learning
(Neural Networks)
21
22. Which models are supported?
22
Snap ML (PowerAI 1.6.0) currently supports:
• Generalized Linear Models:
- Logistic Regression
- Ridge Regression
- Lasso Regression
- Support Vector Machines (SVMs)
• Tree-based models:
- Decision Trees
- Random Forest
With more to come…
Source: Kaggle Data Science Survey 2017
Which data science methods are used at work?
Supported
by Snap
ML
23. 23
Decision Tree
Performance Results
Random Forest
Performance Results
23
5.2x 4.5x
On average 6.5x faster than sklearn (CPU-only) On average 3.8x faster than sklearn (CPU-only)
Project www: https://www.zurich.ibm.com/snapml/
Core publication: https://arxiv.org/abs/1803.06333
24. Nvidia RAPIDS
RAPIDS is a set of open source libraries for GPU accelerating data preparation and machine
learning.
OSS website: rapids.ai
25. Nvidia RAPIDS cuDF - GPU DataFrames
is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data
provides a pandas-like API that will be familiar to data engineers & data scientists
Current version is 0.6
PowerAI 1.6.0 CuDF included tech preview version is backlevel (0.2)
WIP to get latest into Conda or build yourself (open source)
Examples of data manipulation in cuDF like object creation, viewing, selection, merge, concat, etc can be
found here:
https://rapidsai.github.io/projects/cudf/en/latest/10min.html
26. Simple cuDF example
download a CSV, then uses the GPU to parse it into rows and columns and run calculations:
output:
27. Nvidia RAPIDS cuML - GPU Machine Learning
is a suite of libraries that implement machine learning algorithms and mathematical primitives functions
enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs
Current version is 0.6
PowerAI 1.6.0 CuML included tech preview version is backlevel (0.2)
WIP to get latest into Conda or build yourself (open source)
Documentation on supported algorithms like Kmeans, tSVD, PCA, DBSCAN can be found here:
https://docs.rapids.ai/api/cuml/stable/
30. COLLECT - Make data simple and accessible
ORGANIZE - Create a trusted analytics foundation
ANALYZE - Scale AI everywhere with trust & transparency
Data of every type, regardless of
where it lives
MODERNIZE
your data estate for an
AI and multicloud world
INFUSE – Operationalize AI across business processes
The AI Ladder
A prescriptive approach to accelerating the journey to AI
30
AI
AI-optimized systems
infrastructure
32. Introduction to Nvidia TensorRT
NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep
learning inference optimizer and runtime that delivers low latency and high-throughput for deep
learning inference applications.
Nvidia website: https://developer.nvidia.com/tensorrt
33. Tensorflow and TensorRT inference
TensorFlow™ integration
with TensorRT™ (TF-TRT)
optimizes and executes
compatible subgraphs,
allowing TensorFlow to execute
the remaining graph. While you
can still use TensorFlow's wide
and flexible feature
set, TensorRT will parse the
model and apply optimizations
to the portions of the graph
wherever possible.
34.
35.
36. Note: TensorRT engines are optimized for the
currently available GPUs, so conversions should
take place on the machine that will be running
inference.
37. Calibrating for lower precision with a minimal loss of accuracy
reduces the requirements on bandwidth and allows for faster
computation speed. It also allows for the use of Tensor Cores,
which perform matrix multiplication on 4×4 FP16 matrices and adds
a 4×4 FP16 or FP32 matrix.
40. Nvidia TensorRT Current Version
Version 6 Announced on September 16th (current)
https://news.developer.nvidia.com/tensorrt6-breaks-bert-record/
Version 5.1.3.6 added as a tech preview to WML CE 1.6.1
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
41. 41
Resources
https://developer.ibm.com/linuxonpower/deep-learning-powerai#tab_education
Nvidia TensorRT: https://developer.nvidia.com/tensorrt
WML CE 1.6.1: https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
TF-TRT Documentation: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/
IBM TensorRT introduction blog: https://developer.ibm.com/linuxonpower/2019/07/29/introducing-tensorflow-with-tensorrt-tf-trt/
IBM Tensorflow Serving blog (includes TensorRT example): https://developer.ibm.com/linuxonpower/2019/08/05/using-tensorrt-models-
with-tensorflow-serving-on-wml-ce/
Image classification and object detection: github.com/tensorflow/tensorrt
Nvidia forum:https://devtalk.nvidia.com/default/board/301/deep-learning-training-and-inference-/
Mixed precision and accuracy: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9143-mixed-precision-
training-of-deep-neural-networks.pdf
Demo: https://github.com/cheeyauk/tf_to_tensorrt
42.
43. IBM Systems WW Client Experience Centers
IBM Internal Use Only
Search Center Offerings in ISCEP:
https://ibm.biz/client-experience-portal
Contact Center via
IBM Systems Worldwide Client Experience
Centers maximize IBM Systems competitive
advantage in the Cloud and Cognitive era by
providing access to world class technical
experts and infrastructure services to assist
Clients with the transformation of their IT
implementations. Center offerings enable IBM
Sellers and Business Partners to progress and
expedite System Sales opportunities.
9 Worldwide Locations (* also Infrastructure
Hubs):
Austin TX , *Poughkeepsie NY, Rochester MN,
Tucson AZ, *Beijing CHINA, Boeblingen
GERMANY, Guadalajara MEXICO,*Montpellier
FRANCE, Tokyo JAPAN
Client Experience
Tailored, in-depth
technology
Innovation Exchange
Events
Relationship building
Demonstrations
Meetups
Solution workshops
Remote options
(Inbound & Outbound)
Infrastructure
Solutions
Benchmarks, MVP & Proof
of Technology
“Test Drives”
Demonstrations
Infrastructure Services
Certify ISV solutions
Hosting
Cloud Environment
(Inbound to Centers)
Architecture &
Design
Advise clients, Enable
Sellers, “Art of the
Possible”
Discovery & Design
Workshops, Consulting,
Showcases, Reference
Architectures, Co-
Creation of assets
Included CSSC
(Inbound & Outbound)
Content
Content Development
IBM Redbooks
Training Courses
Video courses
“Test Drives”
Demonstrations
NEW: Co-Creation Lab; CEC Cloud; IBM Systems Center of Competency for Red
Hat
44. Please note
IBM’s statements regarding its plans, directions, and intent are subject to change
or withdrawal without notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise,
or legal obligation to deliver any material, code or functionality. Information about potential
future products may not be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in
a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the
amount of multiprogramming in the user’s job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an
individual user will achieve results similar to those stated here.
44
46. Notices and disclaimers
continued
46Replace the footer with text from the PPT-Updater. Instructions are included in that file.
Information concerning non-IBM products was obtained from the
suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products about this
publication and cannot confirm the accuracy of performance, compatibility
or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of
those products. IBM does not warrant the quality of any third-party
products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM expressly disclaims all
warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com and [names of other referenced IBM
products and services used in the presentation] are trademarks of
International Business Machines Corporation, registered in many
jurisdictions worldwide. Other product and service names might
be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark
information" at: www.ibm.com/legal/copytrade.shtml.
.
Editor's Notes
So what is triggering the rapid advancements in AI? It comes from major innovation in three critical categories:
1) Digitization of society is creating an abundance of interesting datasets. Inside and outside the enterprise. And that continues to grow about 40% per year.
2) Algorithm innovation in supervised & unsupervised learning techniques. Especially Deep Learning. Most of which is advancing in open source.
3) Ability to run those algorithms on distributed compute and especially on GPUs.
So together, the developments here have allowed us to employ AI on any problem where a human can get a task done in less than a 1 second of thought . It’s in this scope of problems where AI is being applied and it’s being wielded to create an flywheel: Data -> Products -> Users. Which is why competing on algorithms alone is not a defensible model.
REFERENCE NOTES:
Top trends:
99% of commercial value associated with A->B: 0s or 1s. This is called supervised learning.
Speech Recognition: Audio -> Text
Image Recognition
Types of Deep Learning:
Supervised Learning: Learn from labeled datasets. Most economic value is here and drops off quickly through below.
Transfer Learning: Learn about one topic. Apply to another domain.
Unsupervised Learning. Learning without labeled data
Reinforcement Learning.
The rise of the internet via analogy:
Shopping mall + internet doesn’t make an internet/ecommerce company
What defines whether you are truly an internet company? A) architect the organizational design to take advantage of the internet. For instance, A/B tests, short cycle times, push decision making down to PM/dev,
The rise of the AI era:
Traditional tech company + deep learning doesn’t make it an AI company.
Although only some patterns exist, Google & Baidu are good examples.
Other patterns: a) strategic data acquisition, b) unified data ‘warehouse’, c) persuasive automation, d) new job descriptions.
Building an AI company, centrally build an AI group and matrix them into your AI.
When working with clients, these are the top AI scenarios to look for as you explore their potential AI use cases.
The genesis of IBM PowerAI (now known as Watson Machine Learning Community Edition - WML CE) was to make it simple for data scientists to be more productive, more quickly, by greatly simplifying the tasks necessary to get up and running. WML CE is an enterprise software distribution that combines popular open source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems servers to take your deep learning projects to the next level.
For a fee, IBM offers formal support for WML CE components as long as their versions are consistent with the release configuration (NOTE that WML CE is a no charge offering but we do offer support for a fee). If you choose to use a different version of any of the components, no formal support will be available. However, in keeping with industry norms, specific questions can be posted on the WML CE space on DeveloperWorks Answers: https://developer.ibm.com/answers/topics/powerai/. This forum is monitored by the IBM technical team and technical support is provided on a best effort basis.
There a several ways for you to get WML CE.
Order it. WML CE is available as a no charge orderable part number from IBM (called PowerAI until 2H2019).
Download it from here: http://ibm.biz/download-powerai
Get the Docker container from here: https://hub.docker.com/r/ibmcom/powerai/
As of WML CE (PowerAI) 1.5.4, the following frameworks are included in WML CE:
(Make sure to check the Knowledge Center for the latest versions as they change rapidly
https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_software_pkgs.html):
DDL 1.2.0 - Distributed Deep Learning (with support for up to 4 nodes in WML CE)
TensorFlow 1.12.0
Tensorflow Probability 0.5.0 - TensorFlow Probability is a library for probabilistic reasoning and statistical analysis.
TensorBoard 1.12.0 - a suite of visualization tools for TensorFlow
TensorFlow Keras – NOTE that Keras is supported as part of the TensorFlow core library and as such we can support Keras through TensorFlow
IBM enhanced Caffe 1.0.0
BVLC Caffe 1.0.0 - The Berkeley Vision and Learning Center (BVLC)
Caffe2 1.0rc1 – in technology preview
PyTorch 1.0rc1
Snap ML 1.0.0
Spectrum MPI 10.2
Bazel 0.15.0
OpenBLAS 0.3.3
HDF5 1.10.1
Protobuf 3.6.1
ONNX 1.3.0 – in technology preview
There are three additional capabilities on top of the open source frameworks (and in addition to the performance advantage that Power brings to the table); Large Model Support (LMS), Distributed Deep Learning (DDL), and support by IBM.
Large Model Support
WML CE addresses a fundamental limitation for deep learning; the size of memory available within GPUs. When training complex models or training with high definition images, the memory available on a GPU can be prohibitively restrictive. Instead of being forced into less complex, shallower deep learning models, customers can develop more accurate models with Large Model Support.
With Large Model Support, enabled by IBM’s unique NVLink connection between CPU (memory) and GPU, the entire model and dataset can be loaded in to system memory and cached down to the GPU for action. Customers can now address bigger challenges and get much more work done within a cluster of WML CE servers increasing organizational efficiency. We will cover more details on LMS later in this deck.
Distributed Deep Learning
To accelerate the time dedicated to training a model, the WML CE stack includes function for distributing a single training job across a cluster of servers. IBM’s Distributed Deep Learning brings intelligence about the structure and layout of the underlying hardware cluster (topology). The impact of this is significant! WML CE and WML-A with Distributed Deep Learning can scale jobs across large numbers of cluster resources with very little loss due to communications overhead. There will be more details later in the presentation. WML CE allows for the use of DDL with up to a 4 node cluster. If a client wants to scale beyond 4 nodes, they must purchase WML-A.
Supported by IBM
Although WML CE is available free to download and use, IBM also provides a “for fee” support offering for those clients that want enterprise level support for the features and capabilities within the base offering.
We normally would focus on the HW optimization starting with the processor, the IO interfaces enabled by this processor and then what accelerators we would align to those interfaces for the optimal performance. And we are doing that today, however, it is not just about the HW. As I mentioned on the previous slide, we co-optimized the SW. We took the opensource deep learning frameworks and optimized them around this advanced design, added enhancements such as spark conductor for DDL and large model support while supporting everything from the HW to the SW in the solution. Not only do we have differentiated HW in AC922 with many industry only innovation, but we have a full SW offering on top of it that is equally rich of differentiated innovation and innovations only found with Power Systems.
It’s estimated that 1.2 trillion photos will be taken in 2017. Even if each photo only took someone 1 second to organize, tag and annotate, it would still take over 38,000 years to classify them all!
There is a competition every year, known as ImageNet.
Roughly 500,000 images (low resolution) and 200 categories for which to classify them.
We talked about this earlier – it’s all about maximizing accuracy (or minimizing error/loss)
One way to get more accurate models is to simply add more layers
The more layers the more complex, and the more difficult (computationaly) it becomes to train
Distributed deep learning (DDL) is IBM’s high performance approach to training single models across an entire cluster of compute nodes. Unlike native model parallelism (such as Google’s gRPC method for tensorflow), or Spark based approaches, the DDL library distributes model, training data set, and parameter serving across the defined cluster and it uses a novel algorithm to improve communication over very low latency fabric.
The result is extremely efficient performance scaling, losing less than 5% of ideal efficiency when moving from 4 GPUs to 64 GPUs.
This was available as a technology preview within PowerAI, but is now supported in PowerAI Enterprise.
The outcome of this capability is that data science teams can run larger, more complex models while still reducing training time… allowing more iterations faster… and faster time to accurate results.
https://www.olcf.ornl.gov/wp-content/uploads/2018/12/summit_training_mldl.pdf
https://vimeo.com/307071617
Junqi Yin Advanced Data and Workflows Group
Watson Machine Learning Accelerator addresses memory constraints within Deep Learning
Large Model Support
Watson Machine Learning Accelerator (WML-A) addresses a very big deep learning scaling challenge: the size of memory available within GPUs. When data scientists develop a deep learning workload, the structure of matrices in the neural model, and the data elements which train the model (in a batch), must sit within the memory on the GPUs. As models grow in complexity and data sets increase in size, data scientists are forced to make tradeoffs to stay within the constrained 32GB (or even 16GB on older GPU cards) memory limits. Instead of training on web-scale images, WML-A users can train on high definition video. Instead of being forced in to less complex, shallower deep learning models, customers can develop more accurate models for better inference capability.
With Large Model Support, enabled by WML-A’s unique NVLink connection between CPU (memory) and GPU, the entire model and dataset can be loaded in to system memory and cached down to the GPU for action. IBM’s capabilities, with the co-optimized WML-A software on the Power Systems servers, have enabled increased model size (more layers, larger matrices), increased data element sizes (higher definition images), and larger batch sizes (for faster time to convergence). With Large Model Support, data scientists can load models which span nearly an entire terabyte of system memory across the GPUs. The final impact? Customers can now address bigger challenges and get much more work done within a cluster of WML-A servers increasing organizational efficiency.
Not only do large models allow data scientists to work with more complex data, it turns out that for certain models because they rely on pulling significantly larger number of data elements to the training cycle that large models will allow training jobs to actually complete faster. By using the entire system memory resource that is available, Data scientists are able to operate much more efficiently within each single server. The outcome of being able to use larger data and train faster is a significant advantage for power AI enterprise, and is only available operate at this scale because of the architectural choices IBM and Nvidia have made in developing this accelerated architecture.
When you need to retrain models frequently – multiple times per day:
Cybersecurity threats on your critical infra (e.g. energy grid), credit card fraud detection models
Online retraining: e.g. anomaly detection on your compute or storage infrastructure, where you want to constantly learn from new events, to improve model
These are all Power-9 results, CPU-only.
Datasets:
Epsilon: 300K x 2000
Higgs: 8M x 28
Creditcard: 200K x 28
Susy: 3.75M x 18
This is our prescriptive approach to helping clients accelerate their journey to AI which connects their data and AI capabilities within a unified data and AI lifecycle (or platform). This is also a way to help our clients identify where they are and where to focus based upon their maturity on the journey to AI. Furthermore, it is an organizing construct to the Data and AI products and services offered by IBM and our business partners, and it is the technology foundation to unify how those products and services work together.
What we have learned from AI pioneers is that every step of the ladder is critical. AI is not magic and requires a thoughtful and well-architected approach. For example, the vast majority of AI failures are due to data preparation and organization, not the AI models themselves. Success with AI models is dependent on achieving success first with how you COLLECT and ORGANIZE data.
Therefore, we believe clients must:
COLLECT -- Establish a strong foundation of data, making it simple and accessible, regardless where that data resides. Since data used in AI is often very dynamic and fluid with ever-expanding sources, virtualizing how data is collected is critical for clients.
ORGANIZE – Create a trusted, business-ready analytics foundation that ensures your data is ready for AI. Just because you can access your data doesn’t mean that it’s prepared for AI use cases. Bad data is paralyzing to AI. So clients must integrate, cleanse, catalog, and govern the full lifecycle of their AI data.
ANALYZE – Once your data is accessible and AI-ready, then you are better prepared to apply advanced analytics and AI models. This rung provides the business and planning analytics capabilities that are key for success with AI. It also provides the capabilities needed to build, deploy, and manage AI models within an integrated portfolio of technology.
INFUSE – Many businesses create highly useful AI models but then encounter challenges in operationalizing them to attain broader business value. This rung of the ladder infuses AI to achieve trust and transparency in model-recommended decisions, decision explainability, bias detection, decision audits, etc. For clients with common use cases, the INFUSE rung operationalizes those AI use cases with pre-built application services, speeding time to value.
MODERNIZE – Given the dynamic nature of AI, your data estate needs a highly elastic and extensible multi-cloud infrastructure to unify the aforementioned capabilities within a fully governed team-platform. Clients are also looking to automate their AI lifecycles across an array of contributors through collaborative workflows. Essentially, MODERNIZE means building an information architecture for AI that provides choice and flexibility across your enterprise. As clients modernize their data estates for an AI and multicloud world, they will find that there is less "assembly required" in expanding the impact of AI across the organization.
This is the IBM Cloud Architecture Center high level reference architecture. A Data centric and AI reference architecture needs to support capabilities that address the Collect, Analyze, Organize and Infuse activities.
This architecture diagram illustrates the need for strong data management capabilities inside a 'multi cloud data platform' (Dark blue area), on which AI capabilities are plugged in to support analyze done by data scientists ( machine learning workbench and business analytics).
The data platform addresses the data collection and transformation to move data to local highly scalable store. Sometime, it is necessary to avoid moving data when there is no need to do transformations or there is no performance impact to the origin data sources by adding readers, so a virtualization capability is necessary to open a view on remote data sources without moving data.
On the AI side, data scientists need to perform data analysis, which includes making sense of the data using data visualization. To build a model they need to define features, and the AI environment supports feature engineering. Then to build the model, the development environment helps to select and combine the different algorithms and to tune the hyper parameters. The execution can be done on local cluster or can be executed, at the big data scale level, to machine learning cluster.
Once the model provides acceptable accuracy level, it can be published as a service. The model management capability supports the meta-data definition and the life cycle management of the model. When the model is deployed, monitoring capability, ensures the model is still accurate and even not biased.
The intelligent application, represented as a combination of capabilities at the top of the diagram: business process, core application, CRM... can run on cloud, fog, or mist. It accesses the deployed model, access Data using APIs, and even consumes pre-built models, congitive services, like a speech to text and text to speech service, an image recognition, a tone analyzer services, the Natural Language Understanding (NLU), and chatbot.