Auto DeepLab을 간단하게 소개를 먼저 드리면 Semantic Segmentation
테스크를 위한 모델입니다 저자들은 머신러닝을 통해서 세그멘테이션 네트워크 자체를 생성하고자 했습니다 아키텍처 Search 같은 경우에는 AutoML의 대표적인 방법인데요
그래서 이 논문의 제목이 Auto DeepLab인 이유도 이제 AutoML의 방법을 사용했기 때문입니다 저자들은 AutoML 측면에서 DARTS라는 논문을 참고로 해 갖고 다음에 Segmentation측면에서는 DeepLab V3을 많이 참고하였습니다 논문 리뷰를 이미지 처리팀 김선옥님이 디테일한 논문 리뷰 도와주셨습니다!
https://youtu.be/2886fuyKo9g
Image Classification Done Simply using Keras and TensorFlow Rajiv Shah
This presentation walks through the process of building an image classifier using Keras with a TensorFlow backend. It will give a basic understanding of image classification and show the techniques used in industry to build image classifiers. The presentation will start with building a simple convolutional network, augmenting the data, using a pretrained network, and finally using transfer learning by modifying the last few layers of a pretrained network. The classification will be based on the classic example of classifying cats and dogs. The code for the presentation can be found at https://github.com/rajshah4/image_keras, and the presentation will discuss how to extend the code to your own pictures to make a custom image classifier.
First steps with Keras 2: A tutorial with ExamplesFelipe
In this presentation, we give a brief introduction to Keras and Neural networks, and use examples to explain how to build and train neural network models using this framework.
Talk given as part of an event by Rio Machine Learning Meetup.
Auto DeepLab을 간단하게 소개를 먼저 드리면 Semantic Segmentation
테스크를 위한 모델입니다 저자들은 머신러닝을 통해서 세그멘테이션 네트워크 자체를 생성하고자 했습니다 아키텍처 Search 같은 경우에는 AutoML의 대표적인 방법인데요
그래서 이 논문의 제목이 Auto DeepLab인 이유도 이제 AutoML의 방법을 사용했기 때문입니다 저자들은 AutoML 측면에서 DARTS라는 논문을 참고로 해 갖고 다음에 Segmentation측면에서는 DeepLab V3을 많이 참고하였습니다 논문 리뷰를 이미지 처리팀 김선옥님이 디테일한 논문 리뷰 도와주셨습니다!
https://youtu.be/2886fuyKo9g
Image Classification Done Simply using Keras and TensorFlow Rajiv Shah
This presentation walks through the process of building an image classifier using Keras with a TensorFlow backend. It will give a basic understanding of image classification and show the techniques used in industry to build image classifiers. The presentation will start with building a simple convolutional network, augmenting the data, using a pretrained network, and finally using transfer learning by modifying the last few layers of a pretrained network. The classification will be based on the classic example of classifying cats and dogs. The code for the presentation can be found at https://github.com/rajshah4/image_keras, and the presentation will discuss how to extend the code to your own pictures to make a custom image classifier.
First steps with Keras 2: A tutorial with ExamplesFelipe
In this presentation, we give a brief introduction to Keras and Neural networks, and use examples to explain how to build and train neural network models using this framework.
Talk given as part of an event by Rio Machine Learning Meetup.
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big DataPingCAP
The performance of analytical query processing in data management systems depends primarily on the capabilities of the system's query optimizer. Increased data volumes and heightened interest in processing complex analytical queries have prompted Pivotal to build a new query optimizer.
In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ. Orca is a comprehensive development uniting state-of-the-art query optimization technology with own original research resulting in a modular and portable optimizer architecture.
In addition to describing the overall architecture, we highlight several unique features and present performance comparisons against other systems.
The next generation of the Montage image mosaic engineG. Bruce Berriman
Presentation given by Bruce Berriman at the Astronomical Data Analysis Software & Systems XXV (ADASS XXV) Conference, Sydney, Australia, October 29, 2015.
Authors: G. B. Berriman, J.C. Good, B. Rusholme, T. Robitaille.
Presentation given at the Stockholm R useR Group (SRUG) meetup on Dec 6, 2016. Contains a general overview of deep learning, material on using Tensorflow in R etc.
LocationTech is an Eclipse Foundation industry working group for location aware technologies. This presentation introduces LocationTech, looks at what it means for our industry and the participating projects.
Libraries: JTS Topology Suite is the rocket science of GIS providing an implementation of Geometry. Mobile Map Tools provides a C++ foundation that is translated into Java and Javascript for maps on iOS, Andriod and WebGL. GeoMesa is a distributed key/value store based on Accumulo. Spatial4j integrates with JTS to provide Geometry on curved surface.
Process: GeoTrellis real-time distributed processing used scala, akka and spark. GeoJinni mixes spatial data/indexing with Hadoop.
Applications: GEOFF offers OpenLayers 3 as a SWT component. GeoGit distributed revision control for feature data. GeoScipt brings spatial data to Groovy, JavaScript, Python and Scala. uDig offers an eclipse based desktop GIS solution.
Attend this presentation if want to know what LocationTech is about, are interested in these projects or curious about what projects will be next.
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...PingCAP
This paper proposes interleaving with coroutines for
any type of index join. It showcases the proposal on SAP
HANA by implementing binary search and CSB+-tree traversal for an instance of index join related to dictionary compression. Coroutine implementations not only perform similarly to prior interleaving techniques, but also resemble the original code closely, while supporting both interleaved and non-interleaved execution. Thus, this paper claims that coroutines
make interleaving practical for use in real DBMS codebases.
Paper: http://www.vldb.org/pvldb/vol11/p230-psaropoulos.pdf
Follow PingCAP on Twitter: https://twitter.com/PingCAP
Follow PingCAP on LinkedIn: https://www.linkedin.com/company/13205484/
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
DL4J and DataVec for Enterprise Deep Learning Workflows: Applications in NLP, sensor processing (IoT), image processing, and audio processing have all emerged as prime deep learning applications. In this session we will take a look at a practical review of building practical and secure Deep Learning workflows in the enterprise. We’ll see how DL4J’s DataVec tool enables scalable ETL and vectorization pipelines to be created for a single machine or scale out to Spark on Hadoop. We’ll also see how Deep Networks such as Recurrent Neural Networks are able to leverage DataVec to more quickly process data for modeling.
Spark and Deep Learning frameworks with distributed workloadsS N
The increasing complexity of learning algorithms and deep neural networks, combined with size of data and parameters, has made it challenging to exploit existing large-scale data processing pipelines for training and inference.
Approaches are outlined for preprocessing, training, inference, and deployment across datasets that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks.
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big DataPingCAP
The performance of analytical query processing in data management systems depends primarily on the capabilities of the system's query optimizer. Increased data volumes and heightened interest in processing complex analytical queries have prompted Pivotal to build a new query optimizer.
In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ. Orca is a comprehensive development uniting state-of-the-art query optimization technology with own original research resulting in a modular and portable optimizer architecture.
In addition to describing the overall architecture, we highlight several unique features and present performance comparisons against other systems.
The next generation of the Montage image mosaic engineG. Bruce Berriman
Presentation given by Bruce Berriman at the Astronomical Data Analysis Software & Systems XXV (ADASS XXV) Conference, Sydney, Australia, October 29, 2015.
Authors: G. B. Berriman, J.C. Good, B. Rusholme, T. Robitaille.
Presentation given at the Stockholm R useR Group (SRUG) meetup on Dec 6, 2016. Contains a general overview of deep learning, material on using Tensorflow in R etc.
LocationTech is an Eclipse Foundation industry working group for location aware technologies. This presentation introduces LocationTech, looks at what it means for our industry and the participating projects.
Libraries: JTS Topology Suite is the rocket science of GIS providing an implementation of Geometry. Mobile Map Tools provides a C++ foundation that is translated into Java and Javascript for maps on iOS, Andriod and WebGL. GeoMesa is a distributed key/value store based on Accumulo. Spatial4j integrates with JTS to provide Geometry on curved surface.
Process: GeoTrellis real-time distributed processing used scala, akka and spark. GeoJinni mixes spatial data/indexing with Hadoop.
Applications: GEOFF offers OpenLayers 3 as a SWT component. GeoGit distributed revision control for feature data. GeoScipt brings spatial data to Groovy, JavaScript, Python and Scala. uDig offers an eclipse based desktop GIS solution.
Attend this presentation if want to know what LocationTech is about, are interested in these projects or curious about what projects will be next.
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...PingCAP
This paper proposes interleaving with coroutines for
any type of index join. It showcases the proposal on SAP
HANA by implementing binary search and CSB+-tree traversal for an instance of index join related to dictionary compression. Coroutine implementations not only perform similarly to prior interleaving techniques, but also resemble the original code closely, while supporting both interleaved and non-interleaved execution. Thus, this paper claims that coroutines
make interleaving practical for use in real DBMS codebases.
Paper: http://www.vldb.org/pvldb/vol11/p230-psaropoulos.pdf
Follow PingCAP on Twitter: https://twitter.com/PingCAP
Follow PingCAP on LinkedIn: https://www.linkedin.com/company/13205484/
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
DL4J and DataVec for Enterprise Deep Learning Workflows: Applications in NLP, sensor processing (IoT), image processing, and audio processing have all emerged as prime deep learning applications. In this session we will take a look at a practical review of building practical and secure Deep Learning workflows in the enterprise. We’ll see how DL4J’s DataVec tool enables scalable ETL and vectorization pipelines to be created for a single machine or scale out to Spark on Hadoop. We’ll also see how Deep Networks such as Recurrent Neural Networks are able to leverage DataVec to more quickly process data for modeling.
Spark and Deep Learning frameworks with distributed workloadsS N
The increasing complexity of learning algorithms and deep neural networks, combined with size of data and parameters, has made it challenging to exploit existing large-scale data processing pipelines for training and inference.
Approaches are outlined for preprocessing, training, inference, and deployment across datasets that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks.
In this deck, Huihuo Zheng from Argonne National Laboratory presents: Data Parallel Deep Learning.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two weeks of training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-lsl
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Parallel Application Performance Prediction of Using Analysis Based ModelingJason Liu
Parallel Application Performance Prediction Using Analysis Based Models and HPC Simulations, Mohammad Abu Obaida, Jason Liu, Gopinath Chennupati, Nandakishore Santhi, and Stephan Eidenbenz. 2018 SIGSIM Principles of Advanced Discrete Simulation (SIGSIM-PADS’18), May 2018.
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
RAPIDS – Open GPU-accelerated Data Science
RAPIDS is an initiative driven by NVIDIA to accelerate the complete end-to-end data science ecosystem with GPUs. It consists of several open source projects that expose familiar interfaces making it easy to accelerate the entire data science pipeline- from the ETL and data wrangling to feature engineering, statistical modeling, machine learning, and graph analysis.
Corey J. Nolet
Corey has a passion for understanding the world through the analysis of data. He is a developer on the RAPIDS open source project focused on accelerating machine learning algorithms with GPUs.
Adam Thompson
Adam Thompson is a Senior Solutions Architect at NVIDIA. With a background in signal processing, he has spent his career participating in and leading programs focused on deep learning for RF classification, data compression, high-performance computing, and managing and designing applications targeting large collection frameworks. His research interests include deep learning, high-performance computing, systems engineering, cloud architecture/integration, and statistical signal processing. He holds a Masters degree in Electrical & Computer Engineering from Georgia Tech and a Bachelors from Clemson University.
Introduction to GPUs for Machine LearningSri Ambati
Graphics processing units (GPUs) are becoming integral components of modern machine learning engines and platforms. These will provide an introduction to GPUs and their suitability for machine learning workloads. They also discuss enabling technologies, such as CUDA, and demonstrate GPU-accelerated machine learning with the H2O platform. These slides are targeted to machine learning practitioners new to GPUs.
Author: Wen Phan is a Senior Solutions Architect at H2O.ai. Wen works with customers and organizations to architect systems, smarter applications, and data products to make better decisions, achieve positive outcomes, and transform the way they do business. Internally, Wen uses his hard-earned field experiences, customer feedback, and market trends to drive product innovation and development. Wen holds a B.S. in Electrical Engineering and M.S. in Analytics and Decision Sciences.
Follow him on twitter: @wenphan
How to optimize Hortonworks Apache Spark ML workloads on Power - POWER 8/9 architecture is the latest offering from IBM and OpenPower foundation. It is the perfect platform for optimizing Hortonworks Spark's performance. During this presentation we will walk the audience through steps required to optimize YARN, HDFS, and Spark on a Power cluster.
Step required:
1) Classify workload into CPU, Memory, IO or mixed (CPU, memory, IO) intensive
2) Characterize "out-of-box" Hortonworks spark workload to understand CPU, Memory, IO and Network performance characteristics
3) Floor Plan cluster resources
4) Tune "out-of-box" workload to navigate "Roofline" Performance space in the above named dimensions
5) If workload is Memory / IO/ Network intensive bound then tune SPARK to increase operational intensity operations/byte as much as possible to make it CPU bound
6) Divide search space into regions and perform exhaustive search.
7) Identify Performance bottlenecks by resource monitoring and tune the System, JVM or application layer by profiling application and hardware counters if required.
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
Data Infra Meetup
Jan. 25, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Siyuan Sheng (Senior Software Engineer, @Alluxio)
- Chunxu Tang (Research Scientist, @Alluxio)
In this session, cloud optimization specialists Chunxu and Siyuan break down the challenges and present a fresh architecture designed to optimize I/O across the data pipeline, ensuring GPUs function at peak performance. The integrated solution of PyTorch/Ray + Alluxio + S3 offers a promising way forward, and the speakers delve deep into its practical applications. Attendees will not only gain theoretical insights but will also be treated to hands-on instructions and demonstrations of deploying this cutting-edge architecture in Kubernetes, specifically tailored for Tensorflow/PyTorch/Ray workloads in the public cloud.
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Abstract: We will introduce RAPIDS, a suite of open source libraries for GPU-accelerated data science, and illustrate how it operates seamlessly with MLflow to enable reproducible training, model storage, and deployment. We will walk through a baseline example that incorporates MLflow locally, with a simple SQLite backend, and briefly introduce how the same workflow can be deployed in the context of GPU enabled Kubernetes clusters.
Despite the increase of deep learning practitioners and researchers, many of them do not use GPUs, this may lead to long training/evaluation cycles and non-practical research.
In his talk, Lior shares how to get started with GPUs and some of the best practices that helped him during research and work. The talk is for everyone who works with machine learning (deep learning experience is NOT mandatory!), It covers the very basics of how GPU works, CUDA drivers, IDE configuration, training, inference, and multi-GPU training.
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt
Advanced hardware like NVIDIA technology lowers technical barriers to model size and scope, but issues remain in areas like model performance and training infrastructure management. We'll discuss operational challenges to training models at scale with a particular focus on how training management and hyperparameter tuning can inform each other to accomplish specific goals. We'll also explore techniques like parallelism and scheduling, discuss their impact on model optimization, and compare various techniques. We'll also evaluate results of this approach. In particular, we'll focus on how new tools that automate training orchestration accelerate model development and increase the volume and quality of models in production.
Walk Through a Real World ML Production ProjectBill Liu
Success in productionizing ML models is difficult to achieve due to tools, processes and operational procedures. In this session, we demonstrate how data scientists and ML engineers collaborate and efficiently deploy models to production with the Wallaroo platform.
Using a real world scenario we will click down into the ML production journey that Data Scientists and ML engineers go through to take ML models into production. In this session you will learn:
The current pain points and blockers to production
The 2 persona roles in the ML production process. Data Scientist (DS) and ML Engineer
How the ML engineer creates a workspace in Wallaroo, and invites the DS to collaborate
How the DS uploads and deploys models to WL performing simple validation checks on output
How the ML Engineer can check model health (inference speed, etc)
How the DS checks logs, looks for anomalies
How the DS switches model in the pipeline
Speakers: Nina Zumel, Martin Bald
Redefining MLOps with Model Deployment, Management and Observability in Produ...Bill Liu
Tech talk: https://www.aicamp.ai/event/eventdetails/W2022052410
What happens after your machine learning models are deployed in production? How do you make sure that your model performance does not degrade as data and the world change?
The constantly changing data creates challenges for data scientists and engineering teams on how to detect which models have been affected and how to get their ML applications up and running seamlessly.
In this session we will take a deep dive into the new ML model monitoring and drift detection technology. We will discuss:
- How to track the ongoing accuracy of their models in production
- How to immediately detect drift before it causes significant damage to the business
- How to locate the cause of model drifting in live environments.
We will also discuss how data scientists and ML engineers can collaborate effectively using their respective tools to identify issues and take the necessary actions with a live demo and a real world use case.
Speaker: Younes Amar, Head of Product Wallaroo AI.
Resources: https://docs.wallaroo.ai/
These days, training of the Machine Learning models at the device Edge is still a risky endeavor. It is frequently considered a purely academic subject with little value for real-life product development.
In her talk, Vera will challenge this misconception, talk about the advantages of learning at the Edge and guide you through the Edge learning decision-making framework and design principles.
https://www.aicamp.ai/event/eventdetails/W2021102210
Attention Is All You Need.
With these simple words, the Deep Learning industry was forever changed. Transformers were initially introduced in the field of Natural Language Processing to enhance language translation, but they demonstrated astonishing results even outside language processing. In particular, they recently spread in the Computer Vision community, advancing the state-of-the-art on many vision tasks. But what are Transformers? What is the mechanism of self-attention, and do we really need it? How did they revolutionize Computer Vision? Will they ever replace convolutional neural networks?
These and many other questions will be answered during the talk.
In this tech talk, we will discuss:
- A piece of history: Why did we need a new architecture?
- What is self-attention, and where does this concept come from?
- The Transformer architecture and its mechanisms
- Vision Transformers: An Image is worth 16x16 words
- Video Understanding using Transformers: the space + time approach
- The scale and data problem: Is Attention what we really need?
- The future of Computer Vision through Transformers
Speaker: Davide Coccomini, Nicola Messina
Website: https://www.aicamp.ai/event/eventdetails/W2021101110
Deep AutoViML For Tensorflow Models and MLOps WorkflowsBill Liu
deep_autoviml is a powerful new deep learning library with a very simple design goal: Make it as easy as possible for novices and experts alike to experiment with and build tensorflow.keras preprocessing pipelines and models in as few lines of code as possible.
deep_autoviml will enable data scientists, ML engineers and data engineers to fast prototype tensorflow models and data pipelines for MLOps workflows using the latest TF 2.4+ and keras preprocessing layers. You can now upload your saved model to any Cloud provider and make predictions out of the box since all the data preprocessing layers are attached to the model itself!
In this webinar, we will discuss the problems that deep_AutoViML can solve, its architecture design and demo how to build powerful TF.Keras models on structured data, NLP and Image data domains.
https://www.aicamp.ai/event/eventdetails/W2021080918
Metaflow: The ML Infrastructure at NetflixBill Liu
Metaflow was started at Netflix to answer a pressing business need: How to enable an organization of data scientists, who are not software engineers by training, build and deploy end-to-end machine learning workflows and applications independently. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning.
Today, the open-source Metaflow powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics to real estate.
In this talk, you will learn about:
- What to expect from a modern ML infrastructure stack.
- Using Metaflow to boost the productivity of your data science organization, based on lessons learned from Netflix.
- Deployment strategies for a full stack of ML infrastructure that plays nicely with your existing systems and policies.
https://www.aicamp.ai/event/eventdetails/W2021080510
AI stands on three pillars: algorithms, hardware and training data. While the first two have already become commodities on the market, the latter - reliable labelled data - is still a bottleneck in the industry.
Need to add twice as much data to the training set to improve your model? Want to validate the accuracy of a new classificator in an hour? Or maybe you are building a human-in-the-loop process with 90% of cases processed automatically and the trickiest 10% of cases fine-tuned by people in real time. You can do it all with crowdsourcing, but only with crowdsourcing done right.
In this talk, we will discuss how the new generation of methods and tools allows to collect high quality human labelled data on a large scale and why every ML specialist should know how to use crowdsourcing.
You will learn from the talk:
* Understand the applicability, benefits and limits of the crowdsourcing approach.
* Integrate an on-demand workforce into your processes and build human-in-the-loop processes.
* Control the quality and accuracy of data labeling to develop high performing ML models.
* Understand the full-cycle crowdsourcing project
Speaker: Daria Baidakova(Toloka)
Building large scale transactional data lake using apache hudiBill Liu
Data is a critical infrastructure for building machine learning systems. From ensuring accurate ETAs to predicting optimal traffic routes, providing safe, seamless transportation and delivery experiences on the Uber platform requires reliable, performant large-scale data storage and analysis. In 2016, Uber developed Apache Hudi, an incremental processing framework, to power business critical data pipelines at low latency and high efficiency, and helps distributed organizations build and manage petabyte-scale data lakes.
In this talk, I will describe what is APache Hudi and its architectural design, and then deep dive to improving data operations by providing features such as data versioning, time travel.
We will also go over how Hudi brings kappa architecture to big data systems and enables efficient incremental processing for near real time use cases.
Speaker: Satish Kotha (Uber)
Apache Hudi committer and Engineer at Uber. Previously, he worked on building real time distributed storage systems like Twitter MetricsDB and BlobStore.
website: https://www.aicamp.ai/event/eventdetails/W2021043010
Deep Reinforcement Learning and Its ApplicationsBill Liu
What is the most exciting AI news in recent years? AlphaGo!
What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)!
What are application areas for deep RL? A lot! In fact, besides games, deep RL has been making tremendous achievements in diverse areas like recommender systems and robotics.
In this talk, we will introduce deep reinforcement learning, present several applications, and discuss issues and potential solutions for successfully applying deep RL in real life scenarios.
https://www.aicamp.ai/event/eventdetails/W2021042818
Big Data and AI in Fighting Against COVID-19Bill Liu
Website: https://learn.xnextcon.com/event/eventdetails/W20070810
As the COVID-19 pandemic sweeps the globe, big data and AI have emerged as crucial tools for everything from diagnosis and epidemiology to therapeutic and vaccine development.
In this talk, we collect and review how big data is fighting back against COVID-19. We also provide a deep diving for two interesting use cases: 1) Use NLP and BERT to answer scientific questions. 2) Covid-19 data lake from Databricks, Google and Amazon
Agenda:
Introduction
Supercomputers for Scientific Research
Covid-19 Tracking and Prediction
Covid-19 Research and Diagnosis
Use Case 1 NLP and BERT to answer scientific questions
Use Case 2 Covid-19 Data Lake and Platform
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsBill Liu
website: https://learn.xnextcon.com/event/eventdetails/W20051110
video: https://www.youtube.com/watch?v=8tG8PJC6oaU
In reinforcement learning (RL), an agent learns how to optimize performance solely by collecting experience in the real world or via a simulator. RL is being applied to problems such as decision making, process optimization (e.g., manufacturing and supply chains), ad serving, recommendations, self-driving cars, and algorithmic trading.
In this talk, I will discuss RLlib, a reinforcement learning library built on Ray with a strong focus on large-scale execution and scalability, ease-of-use for general users, as well as customizability for developers and researchers.
RLlib offers autonomous task-learning via many common RL algorithms and it scales from a laptop to a cluster with hundreds of machines. It is used by dozens of organizations, from startups to research labs to large organizations. You will see RLlib in action with a live demo.
Build computer vision models to perform object detection and classification w...Bill Liu
event: https://learn.xnextcon.com/event/eventdetails/W20042918
video:
description: Computer Vision has received significant attention over the recent years, both within academia, and industry. As the state-of-the-art rapidly improves, the art-of-the-possible follows , offering innovative forms of computer vision applications for different scenarios.
In this talk, Ramine will cover the background and development of computer vision, and demonstrate how to use AWS to build robust, computer vision models to perform object detection and classification.
Key Takeaways:
Understand the history of Computer Vision
Learn how to use Amazon SageMaker to build and Deploy Computer Vision Models
How to orchestrate multiple models for implementing a real-world use case
Causal Inference in Data Science and Machine LearningBill Liu
Event: https://learn.xnextcon.com/event/eventdetails/W20042010
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A
Modern machine learning techniques are able to learn highly complex associations from data, which has led to amazing progress in computer vision, NLP, and other predictive tasks. However, there are limitations to inference from purely probabilistic or associational information. Without understanding causal relationships, ML models are unable to provide actionable recommendations, perform poorly in new, but related environments, and suffer from a lack of interpretability.
In this talk, I provide an introduction to the field of causal inference, discuss its importance in addressing some of the current limitations in machine learning, and provide some real-world examples from my experience as a data scientist at Brex.
https://learn.xnextcon.com/event/eventdetails/W20040610
This talk explains how to practically bring the power of convolutional neural networks and deep learning to memory and power-constrained devices like smartphones. You will learn various strategies to circumvent obstacles and build mobile-friendly shallow CNN architectures that significantly reduce the memory footprint and therefore make them easier to store on a smartphone;
The talk also dives into how to use a family of model compression techniques to prune the network size for live image processing, enabling you to build a CNN version optimized for inference on mobile devices. Along the way, you will learn practical strategies to preprocess your data in a manner that makes the models more efficient in the real world.
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningBill Liu
https://learn.xnextcon.com/event/eventdetails/W20040310
I will describe what is available in terms of Open Source and Proprietary tools for automating Data Science tasks and introduce 2 new tools: one to visualize any sized data set with one click, another: to try multiple ML models and techniques with a single call. I will provide the Github Repos for both for free in the talk.
Monthly AI Tech Talks in Toronto 2019-08-28
https://www.meetup.com/aittg-toronto
The talk will cover the end-to-end details including contextual and linguistic feature extraction, vectorization, n-grams, topic modeling, named entity resolution which are based on concepts from mathematics, information retrieval and natural language processing. We will also be diving into more advanced feature engineering strategies such as word2vec, GloVe and fastText that leverage deep learning models.
In addition, attendees will learn how to combine NLP features with numeric and categorical features and analyze the feature importance from the resulting models.
The following libraries will be used to demonstrate the aforementioned feature engineering techniques: spaCy, Gensim, fasText and Keras in Python.
https://www.meetup.com/aittg-toronto/events/261940480/
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
Toronto meetup 20190917
1. Elastic Distributed Deep
Learning Training at large
scale on-prem and cloud
productions
Junfeng Liu, STSM, jfliu@ca.ibm.com
Kelvin Lui, Technical Product Manager,
kelvinl@ca.ibm.com
Yonggang Hu, Distinguished Engineer
yhu@ca.ibm.com
ibm.com/spectrum-computing
ibm.com/us-en/marketplace/deep-learning-
platform
2.
3. Red Bull Racing: Competing with Computing
Every week a new challenge, but part of a season long strategy
52 Wins, 58 Poles, 135 Podiums, 52 Fastest Laps, 4 Formula One Constructors' World Championships
A Decade of Racing Successes, a decade with Spectrum Computing
New Car for 2017
• Tailored to the track
• Complex virtual design and
simulation models
• >200 step simulations
• 30K engineering changes per
season
Real-time Decision Making
• Car sensors and real time
telemetry drive decisions, before
during and after the race
• >100 sensors per car
• Pit stops under 2 seconds
Race Strategy
• Scenario-driven decision making
• 1000s of scenarios run per race
• Model environments: Rain, Heat,
Delays
• Pit stops and tire choices win/lose
races
4. Agenda
The needs and challenges of running distributed training
Elastic Distributed Training
§ Architecture
§ Benchmark
§ Interface
Use Cases
Demo (time permitting)
Next Steps
7. 7
$11,458
Tesla V100 – 32GB
Nissan Versa
IBM AC 922 – 4V100
$80,000
Audi A8
Nvidia DGX2
$399,000
Lamborghini Aventador
Resource matters
$500,000,000
8. 8
Workload matters
Inference - A simple language model
125 TFlops
Train - ResNet-50 – ImageNet 1K
29 hours on 8 GPUs
In minutes in Summit
Require proper implementation & tuning
Need 1 TFlops – in ms
14 days to train on 1 GPU
Tens of models
Hundreds of tuning runs
Hundreds of users
Thousands of days NAS on GPUs
Millions of pictures
Thousands of datasets
Millions of jobs
Billions of inferences
SLA from ms to days
9. Faster Training Time with Large-scale Deep Learning
9Days
Recognition
Recognition
54x
Learning
runs with
Power 8
What will you do?
Iterate more and create more accurate models?
Create more models?
Both?
4Hours
4Hours
4Hours
4Hours
4Hours
4Hours
4Hours
4Hours
4Hours
4Hours
4Hours
4Hours
• From 2015 to 2018
– GPU Compute: 18TFOPS to 112 TFLOPS (FP16)
– GPU Memory: 16GB to 32GB
– Communication: 100Gbps to 200Gbps
3x
10. Large-scale Deep Learning
[skymind.ai]
• Data parallelism : constant traffic per GPU (only network size matters)
• Model parallelism : partitioning dictates traffic (need significant research)
[DeepSpeech2]
11. IBM
Spectrum
Conductor
VGG as example
• 128.3M data / GPU
• allreduce – broadcast sync model
• Data increasing according to the
number of GPU
§ 4 GPU = 1026M data transferred
every iteration
Distribution Challenge
12
Source: https://www.semanticscholar.org/paper/Poseidon-An-Efficient-Communication-Architecture-f-Zhang-Zheng/c37145669be8e7f14f4cdd5ddc3935ea03a54673
12. Using Allreduce for SGD
• More performant
• MPI, NCCL
• All large-scale studies
• Scalable
§local
§gradient
§aggregated
§gradient
§[skymind.ai, mpitutorial.com]
13. Prior Arts: Ring-based Allreduce (Thakur 2005, Baidu Feb/2017)
§14
reduce-scatter all-gather
• Bandwidth-optimal for homogeneous network architecture
– Each step is throttled by the worst bandwidth (i.e.. Pipeline)
• Linear dependency on latency
– N GPUs will have N*latency overheads
– Recursive schemes exist yet optimal for the 2m learners
NOT SCALABLE for Many Learners
- Too many iterations (latency adds up fast)
- The weakest link slows down others (cluster, cloud)
[Baidu.com]
14. Prior Arts: Two-step Approach (Tencent Jul/2018, Uber Oct/2018)
§15
NOT SCALABLE for Many Learners
- Still too many iterations (latency adds up fast)
- Sub-optimal traffic pattern due to additional Reduce and Broadcast
- Only master GPUs are active
15. DDL: Mix-Match for Best Performance
MPI, NCCL, IB_Verb, SharedMem, OpenFabric, Custom-lib
§IBM, Nvidia, Mellanox, Intel, OpenCAPI
Ring, Recursive, Tree
[NeuRIPS18, SYSML19]
IBM DDL: https://arxiv.org/pdf/1708.02188.pdf
16. More Challenges
• Flexibility
• Multiple DL frameworks support
• Developer transparency
• Auto Scaling & Elastic training
• Fault tolerance
• Service Quality
• Scalability & Performance & Accuracy
17. 18
Training challenges and reactions to Elastic Training
Distributed training is great, but we
only run training on single GPU?
Why? You do not want speed-up?
300 + GPUs and 300 + students. Each
researcher is entitled to use 1 GPU
You are in meetings right now. Are you using
the GPUs allocated to you?
No …
if you ask for 16 GPUs, you will never get it
in a busy cluster.
A classic large job starvation problem!
If you run large jobs and use more than you
deserved, you jobs will be killed.
What if you start with 1 GPU; Your job can grow; If
there are other high priority jobs, your job gracefully
shrink back to your own quotaF*&?% brilliant Idea!
18. • # cluster specification
• parameter_servers = ["pc-01:2222"]
• workers = [ "pc-02:2222", "pc-03:2222", "pc-04:2222"]
• cluster = tf.train.ClusterSpec({"ps":parameter_servers, "worker":workers})
• tf.app.flags.DEFINE_string("job_name", "", "Either 'ps' or 'worker'")
• tf.app.flags.DEFINE_integer("task_index", 0, "Index of task within the job")
• FLAGS = tf.app.flags.FLAGS
• server = tf.train.Server(cluster, job_name=FLAGS.job_name, task_index=FLAGS.task_index)
• mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
• if FLAGS.job_name == "ps":
• server.join()
• elif FLAGS.job_name == "worker":
• with tf.device(tf.train.replica_device_setter( worker_device="/job:worker/task:%d" %FLAGS.task_index, cluster=cluster)):
• with tf.name_scope('input'):
• x = tf.placeholder(tf.float32, shape=[None, 784], name="x-input")
• y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y-input")
• with tf.name_scope("weights"):
• W1 = tf.Variable(tf.random_normal([784, 100]))
• W2 = tf.Variable(tf.random_normal([100, 10]))
• with tf.name_scope("softmax"):
• y = tf.nn.softmax(z3)
• - …... // lines of code
• with tf.name_scope('train'):
• # optimizer is an "operation" which we can execute in a session
• grad_op = tf.train.GradientDescentOptimizer(learning_rate)
• train_op = grad_op.minimize(cross_entropy, global_step=global_step)
• sv = tf.train.Supervisor(is_chief=(FLAGS.task_index == 0),
• global_step=global_step,
• init_op=init_op)
• with sv.prepare_or_wait_for_session(server.target) as sess:
• if FLAGS.task_index == 0:
• // is chief manage model and log and others
• writer = tf.train.SummaryWriter(logs_path, graph=tf.get_default_graph())
• for epoch in range(training_epochs):
• for i in range(batch_count):
• batch_x, batch_y = mnist.train.next_batch(batch_size)
• _, cost, summary, step = sess.run( [train_op, cross_entropy, summary_op, global_step],
• feed_dict={x: batch_x, y_: batch_y})
• writer.add_summary(summary, step)
Static Cluster configuration
Mixed data ingest, PS and Worker logic
Model/Graph definition
(EDT only needs this)
Training Runtime Management
Distributed Tensorflow
20. MPI – A reference point for parallel applications
§0 §1 §2
§3 §4 §5
§6 §7 §8
§9 §10 §11
§12 §13 §14
§SPMD programming model
§Peer to peer model
§Advantages
• Standard, portable
• Fast, low-latency communications
• Many features
•point-to-point, message selectivity, collective
operations, process groups etc.
§But challenges too! (Not cloud native)
• Not fault-tolerant
• Programmer needs to keep track of rank
• Exception handling left to the developer
• Resource allocations static
• Computations not distributed optimally
• Challenging to debug
§Common binary runs on each
§core or CPU and discovers its rank at run-
time
21. Traditional HPC
App linked with communication lib
APP
Main{
Initialize(
Host1, host2,
host3,);
Printf();
Send(work1,
<)
Elastic Fabric
– Converge HPC and Cloud native
APP
GetMes(
Cal ();
Send(work2,
<)
APP
GetMes(
Cal ();
Send(work2,
<)
§MPI, TCP
Spark, MR, Symphony SOAM, etc.
High Performance fabric manages workload and state –
call out user code to enable elasticity, resilience, mobility
and hide infrastructure complexity and deployment
M
S
S
S
S
S
S
S
C
Client
Services
Tasks
PriceFXOpt()
Session
PriceFXOpt()
PriceFXOpt()
PriceFXOpt()
PriceFXFW()
PriceFXFW()
PriceFXFW()
22. Elastic Deep Learning
• Have the best of performance and flexibility
• Auto Scale up and down based on resource plan
• Priority, Real time Fairshare, FIFO
• Transparent for Tensorflow, Pytorch and Caffe
• Convergence and Hyperparameter awareness
Combine the best scheduling and fastest communication with DL specific high performance optimization
23. Elastic Distributed Training Engine
Session Scheduler
Elastic Scaling
DL Driver
DL Framework
Sync Engine (DDL)
Work Wrapper
Resource policy
(Scaling, preemption, migration)
Training planning, scheduling
(Training task, micro-batch pipeline, sync-plan)
Worker wrapper
(model transparency, Data ingest)
Tensorflow, Caffe, Pytorch, Keras
High performance synchronization
(sync, async, p-2-p, centralized)
(New RDMA Library)
Elastic distribution
challenges:
• Graceful pre-emption
• Auto scale
• Dynamic priority
• Fault tolerant
• Speed up, performance (DDL)
• Accuracy
• Synchronization algorithm
• Topology & GPU Aware
• Model transparency
24. Auto scaling and pre-emption
• Resource policy driven the scale up and down
• Priority
• Sharing policy, fair share, fifo
• GPU demand
• Cluster wide utilization
• Fabric handle the scale
• Automatically without interruption
• Support both sync and async model
• Keep/adjust batch size etc hyperparameter during scaling
25. Keras AutoScaling – Go Beyond one GPU with EDT
mlp = Sequential()
mlp.add(Dense(1000, input_shape=(784,)))
mlp.add(Activation('relu'))
mlp.add(Dense(250))
mlp.add(Activation('relu'))
mlp.add(Dense(10))
mlp.add(Activation('softmax'))
trainer = ElasticDL(
model=mlp,
loss='categorical_crossentropy',
optimizer=optimizer_mlp,
batch_size=4, num_epoch=1)
trainer.fit(training_set, epochs=4)
mlp = Sequential()
mlp.add(Dense(1000, input_shape=(784,)))
mlp.add(Activation('relu'))
mlp.add(Dense(250))
mlp.add(Activation('relu'))
mlp.add(Dense(10))
mlp.add(Activation('softmax'))
mlp.compile(loss='categorical_crossentropy',
optimizer=optimizer_mlp,
callback_MAO)
mlp.fit( training_set, epochs=epochs)
Same Model
definition
Same
optimizer, loss
function
Hide MAO as
call back
Similar API for
fit and compile
Data source
support Spark
dataframe
1 GPU N GPUs
26. Pytorch – Transparent Scaling through EDT
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
model = Net()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
#EDT PYTORCH
model= ElasticDL ( model, optimizer, F.nll_loss, dataLoader)
#hide MAO and data ingest in workers
model.train(200, 64)
# Native PYTORCH
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
27. Transparent DL Insight
• No code modification
• Customize the live metrics
• Plugin interface with third party monitoring
• Elastic and interactive with notebook, and developer tools
• Scale up and down without interruption
28. Autoscaling with Accuracy, Transparency
March 2019 31
Maintain the same accuracy when scaling GPU
up and down.
4 GPUs -> 2 GPUs, preemption
1 GPU
One line of code change – train anywhere on multiple nodes and multiple GPUs
Interactive experience in notebook
30. For questions, contact:
Olga Yiparaki yiparaki@us.ibm.com
Chief engineer, IBM Storage Performance
Eric Fiala Eric.J.Fiala@ibm.com
Solution Architect, Spectrum Computing
Constantine Arnold Constantine.Arnold@ibm.com
Data Science and Storage Systems Research
Jay Vaddi jayvaddi@us.ibm.com
IBM Storage Performance
Brian Porter bporter1@us.ibm.com
Client Technical Specialist, IBM Systems
March 2019 33
31. IBM Storage:
Spectrum Scale
NVMe all-flash appliance
March 2019 35
IB EDR fabric
Up to 8 Hosts were used in these tests
Compute nodes can be increased elastically
. . .
IBM Spectrum Scale
NVMe all-flash
appliance
Only one storage AFA node was used throughout these tests
Additional storage can be added independent of compute nodes
• NVMe-based Storage
provides more than ample
performance for these AI
benchmarks which saturate
the GPUs.
• Single AFA Storage uses 2U
rack space and provides
~63 TB of user capacity
• Max Read from storage:
Over 35 GB/s, assuming
enough network adapters
• Storage can be increased in
a linear fashion to meet
capacity and/or
performance requirements.
Power9 with IBM
Watson Machine
Learning
Accelerator
(WMLA)
2x IB links
per host
8x IB links
• Up to 8 x Power9
AC922 hosts, in this
environment
• GTX model, water
cooled
• 512 GM RAM, per
Power9
• 6 GPU per Power9
host, up to 48 GPU in
this environment
• Single dual-ported IB
adapter per Power9
host
AC922 AC922 AC922
32. Elastic Distributed Training Scaling efficiency
Framework: TensorFlow (Elastic distributed training)
Spark instance group dliauto
Model inceptionV3
Batch size 64
Dataset: flowers
Hyperparameter
Learning rate policy: exponential
Base learning rate: 0.01
Decay steps: 4000
Learning rate decay: 0.9
Staircase: TRUE
Solver type: GradientDescent
Maximum iterations: 10000
• With Elastic Distributed Training Capabilities, included in Watson
Machine Learning Accelerator, the system dynamically scales out
to accommodate the demands of growing AI applications
• Quick growth demands are accommodated elastically, with ease
of management
• As the number of GPUs and hosts increases this showcases
measured data and high scaling efficiency
March 2019 36
1.0
2.0
3.8
7.5
0.0
2.0
4.0
6.0
8.0
1 host 2 hosts 4 hosts 8 hosts
ITERATIONS/MINVS.SINGLEHOST
BASELINE
Speedup by
Scaling out hosts & GPUs
100% 100%
96%
94%
0%
20%
40%
60%
80%
100%
120%
6 GPU 12 GPU 24 GPU 48 GPU
EFFICIENCYVS.BASELINEOF6GPU
Efficiency vs. 6 GPU baseline
50K iterations
In all these cases, the NVMe-based Storage remains unchanged
and provides more than ample performance, since the AI workload
saturates the GPUs, as evidenced by the high scaling efficiency.
33. IBM WML-A Enables Service Level Agreements by
absorbing multiple tenant growth
• POWER9 enables the rapid growth of AI demands: As
multiple tenants increase, every new job or user does
adds negligible overhead, enabling predictable
behaviors and SLAs (Service level Agreements)
• Coupled with EDT (Elastic Distributed Training),
multitenant workloads are accommodated elastically,
with ease of management
• This showcases measured data, with the same
negligible overheads as the number of GPUs varies.
March 2019 37
-10%
-8%
-6%
-4%
-2%
0%
2%
4%
6%
8%
10%
0 2 4 6 8 10 12 14 16 18
Overheadofadditionaltenantrelativetoaverage
Tenant
Multitenant Overhead vs. Average Tenant
3 GPUs x16 Tenants
6 GPUs x8 Tenants
12 GPUs x 4 Tenants
24 GPUs x 2 Tenants
48 GPUs x 1 Tenant
In all these cases, the NVMe-based Storage remains
unchanged and provides more than ample performance,
enabling the GPUs to accommodate the multitenant
AI workload without any slowdowns, as evidenced by the
negligible overheads.
Multitenancy overheads are on par with the
corresponding overheads when each server
uses local storage instead of the external IBM
storage used in these tests
34. Improve Data Scientists productivity by 31% and IT resource utilization by 33% in a
multi-user shared cluster of GPU’s running IBM WML-Accelerator on POWER9 AC922
servers with Nvidia Tesla V100 GPUs connected via NVLink 2.0
• 1.31X reduction in time of training multiple concurrent experiments vs tested x86
systems
• 4 jobs of Inceptionv3 trained for 15000 iterations on Flowers data requiring 3 GPUs each
running on 2 AC922 nodes with 4 GPUs each.
• 1.33x improvement in the utilization of POWER9 DL Infrastructure utilization vs
tested x86 systems
• 0 wait time – Jobs submitted will be executed even if the cluster is busy due to IBM
WML-Accelerator’s elastic scaling
• supports multi-tenancy and elasticity with fairshare policy of scheduling.
A Simple Scenario
• Results are based IBM Internal Measurements running 15000 Iteration training of InceptionV3 model (mini-batch size=32 per GPU) on Flowers dataset.
• Power AC922; 40 cores (2 x 20c chips), POWER9 with NVLink 2.0; 3.8 GHz, 1 TB memory, 4xTesla V100 GPU ; Red Hat Enterprise Linux 7.5 for Power Little Endian
(POWER9) with CUDA 9.2/ CUDNN 7.2.1; WML-A v1.1.1.
• Competitive stack: 2x Xeon(R) Gold 6150; 36 cores (2 x 18c chips); 2.70 GHz; 512 GB memory, 4xTesla V100 GPU, Ubuntu 16.04.4 with CUDA .9.1/ CUDNN 7.1.2;
NGC image:nvcr.io/nvidia/tensorflow Version: 18.08-py2; Kubernetes v1.11.2.
- idle resource
34.4
26.22
0
5
10
15
20
25
30
35
40
x86 AC922
TimeTaken(mins)
Multiuser Jobs 3-3-3-3
4 jobs of InceptionV3/Flowers training for 15000 iterations
35. Traditional
Business Data
Sensor Data
Data from
collaboration
partners
Data from
mobile apps &
social media
Legacy Data
Data Preparation
Pre-Processing
Data Source Model Training Inference
AI Deep Learning
Frameworks
(Tensorflow, Caffe, …)
Monitor
& Advise
Iterate
Distributed & Elastic
Training for Deep Learning
Parallel Hyper-Parameter
Search & Optimization
Network
Models
Hyper-
Parameters
Testing
Dataset
Trained Model
Life Cycle
Management
Deploy in
Production using
Trained Model
(Rest API)
New Data
ML/DL Training & Execution - Watson ML Accelerator
Heavy IO
Instrumentation
Training
Dataset
Data Ingestion
Multi-Tenant, Shared Services Architecture (Conductor)
Resource Groups, Consumers, Resource Plans, Instance Groups, Resiliency,
Workload Management, Notebooks, Anaconda, Reporting, Security
36. Watson ML Accelerator Technical References
Tutoriale Url
Classify images with IBM Watson Machine Learning Accelerator https://developer.ibm.com/tutorials/use-computer-vision-
with-dli-watson-machine-learning-accelerator/
Train Keras and Mllib models with IBM Watson Machine Learning
Accelerator
https://developer.ibm.com/tutorials/training-keras-and-
mllib-model-with-watson-machine-learning-accelerator/
Get Dynamic, elastic, and fine-grained resource allocations and controls
for accelerating multiple model trainings simultaneously
https://developer.ibm.com/tutorials/dynamic-resilient-and-
elastic-deep-learning-with-watson-machine-learning-
accelerator/
Train Xgboost models with IBM Watson Machine Learning Accelerator https://developer.ibm.com/tutorials/train-xgboost-models-
within-watson-ml-accelerator/
Accelerate Generalized Linear Model training with IBM Watson Machine
Learning Accelerator and Snap ML
https://developer.ibm.com/tutorials/accelerate-machine-
model-training-with-watson-ml-accelerator-snap-ml/
Accelerate tree-based model training with Watson Machine Learning
Accelerator and Snap ML
https://developer.ibm.com/tutorials/accelerate-random-
forest-model-training-with-watson-ml-accelerator/
§ Machine Learning and Deep Learning with IBM Watson Machine Learning Accelerator Series
§ Offers walk through, and hands on experience with Watson ML Accelerator key differentiators
§ English Series: https://developer.ibm.com/series/learn-watson-machine-learning-accelerator/
§ Chinese Series: https://developer.ibm.com/cn/blog/2019/learn-ibm-powerai-enterprise/