IBM Spectrum Conductor can manage H2O Driverless AI instances at scale across multiple nodes in an enterprise data center. Key benefits include the ability to run multiple Driverless AI instances on the same host using GPUs, failover capabilities if an instance fails, and role-based access control for users. The integration improves productivity by providing a shared file system, workload management, and allowing easy start/stop of Driverless AI instances.
Get Behind the Wheel with H2O Driverless AI Hands-On Training Sri Ambati
This training took place in London on October 30th, 2018.
A hands-on training on our ground-breaking product, H2O Driverless AI, was delivered by the following makers:
1. Introduction to Driverless AI by Arno Candel
2. Feature Engineering in Driverless AI by Dmitry Larko
3. Time Series in Driverless AI by Marios Michailidis and Mathias Müller
4. NLP in Driverless AI by Sudalai Rajkumar Machine Learning
5. Interpretability in Driverless AI by Arno Candel
This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/nZzHFwaoMpU
In this presentation, we will demonstrate the integration of H2O Driverless.ai with NetApp Cloud Volumes Service. In addition, we’ll describe key considerations for the development of Deep Learning environments and the solutions that enable seamless data management across edge environments, on-premises data centers, and the cloud. This presentation is targeted for data scientists, data engineers, and line of business leaders.
Vinod comes with over 10 years of Marketing & Data Science experience in multiple startups. He was the founding employee for his previous startup, Activehours, where he helped build the product and bootstrap the user acquisition with growth hacking. He has seen the user base for his companies grow from scratch to millions of customers. He’s built models to score leads, reduce churn, increase conversion, prevent fraud and many more use cases. He brings a strong analytical side and an metrics driven approach to marketing.
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes!
This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models
Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling.
He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.
Introducción al Machine Learning AutomáticoSri Ambati
¿Cómo puede llevar el aprendizaje automático a las masas? Los proyectos de Machine Learning con la búsqueda de talento, el tiempo para construir e implementar modelos y confiar en los modelos que se construyen.
¿Cómo puede tener varios equipos en su organización para crear modelos de ML precisos sin ser expertos en ciencia de datos o aprendizaje automático?
¿Se pregunta sobre los diferentes sabores de AutoML?
H2O Driverless AI emplea las técnicas de científicos expertos en datos en una aplicación fácil de usar que ayuda a escalar sus esfuerzos de ciencia de datos. La inteligencia artificial Driverless permite a los científicos de datos trabajar en proyectos más rápido utilizando la automatización y la potencia de computación de vanguardia de las GPU para realizar tareas en minutos que solían tomar meses.
Con H2O Driverless AI, todos, incluyendo expertos y científicos de datos junior, científicos de dominio e ingenieros de datos pueden desarrollar modelos confiables de aprendizaje automático. Esta plataforma de aprendizaje automático de última generación ofrece una funcionalidad única y avanzada para la visualización de datos, la ingeniería de características, la interpretabilidad del modelo y la implementación de baja latencia.
H2O Driverless AI hace:
* Visualización automática de datos
* Ingeniería automática de funciones a nivel de Grandmaster
* Selección automática del modelo
* Ajuste y capacitación automáticos del modelo
* Paralelización automática utilizando múltiples CPU o GPU
* Ensamblaje automático del modelo
*automática del Interpretaciónaprendizaje automático (MLI)
* Generación automática de código de puntuación
¿Quieres probarlo tú mismo? Puede obtener una prueba gratuita aquí: H2O Driverless AI trial.
Venga a esta sesión y descubra cómo comenzar con el Aprendizaje automático automático con AI sin conductor H2O, y cree modelos potentes con solo unos pocos clics.
¡Te veo pronto!
Acerca de H2O.ai
H2O.ai es una empresa visionaria de software de código abierto de Silicon Valley que creó y reimaginó lo que es posible. Somos una empresa de fabricantes que trajeron al mercado nuevas plataformas y tecnologías para impulsar el movimiento de inteligencia artificial. Somos los creadores de, H2O, la principal plataforma de aprendizaje de ciencia de datos de fuente abierta y de aprendizaje automático utilizada por casi la mitad de Fortune 500 y en la que confían más de 14,000 organizaciones y cientos de miles de científicos de datos de todo el mundo.
Portable Scalable Data Visualization Techniques for Apache Spark and Python N...Databricks
Python Notebooks are great for communicating data analysis & research but how do you port these data visualizations between the many available platforms (Jupyter, Databricks, Zeppelin, Colab,…). Also learn about how to scale up your visualizations using Spark
Get Behind the Wheel with H2O Driverless AI Hands-On Training Sri Ambati
This training took place in London on October 30th, 2018.
A hands-on training on our ground-breaking product, H2O Driverless AI, was delivered by the following makers:
1. Introduction to Driverless AI by Arno Candel
2. Feature Engineering in Driverless AI by Dmitry Larko
3. Time Series in Driverless AI by Marios Michailidis and Mathias Müller
4. NLP in Driverless AI by Sudalai Rajkumar Machine Learning
5. Interpretability in Driverless AI by Arno Candel
This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/nZzHFwaoMpU
In this presentation, we will demonstrate the integration of H2O Driverless.ai with NetApp Cloud Volumes Service. In addition, we’ll describe key considerations for the development of Deep Learning environments and the solutions that enable seamless data management across edge environments, on-premises data centers, and the cloud. This presentation is targeted for data scientists, data engineers, and line of business leaders.
Vinod comes with over 10 years of Marketing & Data Science experience in multiple startups. He was the founding employee for his previous startup, Activehours, where he helped build the product and bootstrap the user acquisition with growth hacking. He has seen the user base for his companies grow from scratch to millions of customers. He’s built models to score leads, reduce churn, increase conversion, prevent fraud and many more use cases. He brings a strong analytical side and an metrics driven approach to marketing.
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes!
This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models
Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling.
He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.
Introducción al Machine Learning AutomáticoSri Ambati
¿Cómo puede llevar el aprendizaje automático a las masas? Los proyectos de Machine Learning con la búsqueda de talento, el tiempo para construir e implementar modelos y confiar en los modelos que se construyen.
¿Cómo puede tener varios equipos en su organización para crear modelos de ML precisos sin ser expertos en ciencia de datos o aprendizaje automático?
¿Se pregunta sobre los diferentes sabores de AutoML?
H2O Driverless AI emplea las técnicas de científicos expertos en datos en una aplicación fácil de usar que ayuda a escalar sus esfuerzos de ciencia de datos. La inteligencia artificial Driverless permite a los científicos de datos trabajar en proyectos más rápido utilizando la automatización y la potencia de computación de vanguardia de las GPU para realizar tareas en minutos que solían tomar meses.
Con H2O Driverless AI, todos, incluyendo expertos y científicos de datos junior, científicos de dominio e ingenieros de datos pueden desarrollar modelos confiables de aprendizaje automático. Esta plataforma de aprendizaje automático de última generación ofrece una funcionalidad única y avanzada para la visualización de datos, la ingeniería de características, la interpretabilidad del modelo y la implementación de baja latencia.
H2O Driverless AI hace:
* Visualización automática de datos
* Ingeniería automática de funciones a nivel de Grandmaster
* Selección automática del modelo
* Ajuste y capacitación automáticos del modelo
* Paralelización automática utilizando múltiples CPU o GPU
* Ensamblaje automático del modelo
*automática del Interpretaciónaprendizaje automático (MLI)
* Generación automática de código de puntuación
¿Quieres probarlo tú mismo? Puede obtener una prueba gratuita aquí: H2O Driverless AI trial.
Venga a esta sesión y descubra cómo comenzar con el Aprendizaje automático automático con AI sin conductor H2O, y cree modelos potentes con solo unos pocos clics.
¡Te veo pronto!
Acerca de H2O.ai
H2O.ai es una empresa visionaria de software de código abierto de Silicon Valley que creó y reimaginó lo que es posible. Somos una empresa de fabricantes que trajeron al mercado nuevas plataformas y tecnologías para impulsar el movimiento de inteligencia artificial. Somos los creadores de, H2O, la principal plataforma de aprendizaje de ciencia de datos de fuente abierta y de aprendizaje automático utilizada por casi la mitad de Fortune 500 y en la que confían más de 14,000 organizaciones y cientos de miles de científicos de datos de todo el mundo.
Portable Scalable Data Visualization Techniques for Apache Spark and Python N...Databricks
Python Notebooks are great for communicating data analysis & research but how do you port these data visualizations between the many available platforms (Jupyter, Databricks, Zeppelin, Colab,…). Also learn about how to scale up your visualizations using Spark
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
This talk was recorded in London on October 30, 2018.
KNIME Analytics Platform is an easy to use and comprehensive open source data integration, analysis, and exploration platform, enabling data scientists to visually compose end to end data analysis workflows. The over 2,000 available modules ("nodes") cover each step of the analysis workflow, including blending heterogeneous data types, data transformation, wrangling and cleansing, advanced data visualization, or model training and deployment.
Many of these nodes are provided through open source integrations (why reinvent the wheel?). This provides seamless access to large open source projects such as Keras and Tensorflow for deep learning, Apache Spark for big data processing, Python and R for scripting, and more. These integrations can be used in combination with other KNIME nodes meaning that data scientists can freely select from a vast variety of options when tackling an analysis problem.
The integration of H2O in KNIME offers an extensive number of nodes and encapsulating functionalities of the H2O open source machine learning libraries, making it easy to use H2O algorithms from a KNIME workflow without touching any code - each of the H2O nodes looks and feels just like a normal KNIME node - and the data scientist benefits from the high performance libraries and proven quality of H2O during execution. For prototyping these algorithms are executed locally, however training and deployment can easily be scaled up using a Sparkling Water cluster.
In our talk we give a short introduction to KNIME Analytics Platform and then demonstrate how data scientists benefit from using KNIME Analytics Platform and H2O Machine Learning in combination by using a real world analysis example.
Bio: Christian received a Master’s degree in Computer Science from the University of Konstanz. Having gained experience as a research software engineer at the University of Konstanz, where he developed frameworks and libraries in the fields of bioimage analysis and machine learning, Christian moved on to become a software engineer at KNIME. He now focuses on developing new functionalities and extensions for KNIME Analytics Platform. Some of his recent projects include deep learning integrations built upon Keras and Tensorflow, extensions for image analysis and active learning, and the integration of H2O Machine Learning and H2O Sparkling Water in KNIME Analytics Platform.
Saving Energy in Homes with a Unified Approach to Data and AIDatabricks
Energy wastage by residential buildings is a significant contributor to total worldwide energy consumption. Quby, an Amsterdam based technology company, offers solutions to empower homeowners to stay in control of their electricity, gas and water usage.
Arno Candel, Chief Architect, H2O.ai talks about what's new in H2O including all the new advancements in the algorithms.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Databricks
Workday Prism Analytics enables data discovery and interactive Business Intelligence analysis for Workday customers. Workday is a “pure SaaS” company, providing a suite of Financial and HCM (Human Capital Management) apps to about 2000 companies around the world, including more than 30% from Fortune-500 list. There are significant business and technical challenges to support millions of concurrent users and hundreds of millions daily transactions. Using memory-centric graph-based architecture allowed to overcome most of these problems.
As Workday grew, data transactions from existing and new customers generated vast amounts of valuable and highly sensitive data. The next big challenge was to provide in-app analytics platform, which for the multiple types of accumulated data, and also would allow using blend in external datasets. Workday users wanted it to be super-fast, but also intuitive and easy-to-use both for the financial and HR analysts and for regular, less technical users. Existing backend technologies were not a good fit, so we turned to Apache Spark.
In this presentation, we will share the lessons we learned when building highly scalable multi-tenant analytics service for transactional data. We will start with the big picture and business requirements. Then describe the architecture with batch and interactive modules for data preparation, publishing, and query engine, noting the relevant Spark technologies. Then we will dive into the internals of Prism’s Query Engine, focusing on Spark SQL, DataFrames and Catalyst compiler features used. We will describe the issues we encountered while compiling and executing complex pipelines and queries, and how we use caching, sampling, and query compilation techniques to support interactive user experience.
Finally, we will share the future challenges for 2018 and beyond.
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/rKoBJcnsFpM
Speaker's Bio:
Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling. He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based. In his spare time he tries to be part of the IT community by organizing, attending and speaking at conferences and meet ups.
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks
The BigDL framework scales deep learning for large data sets using Apache Spark. However there is significant scheduling overhead from Spark when running BigDL at large scale. In this talk we propose a new parameter manager implementation that along with coarse-grained scheduling can provide significant speedups for deep learning models like Inception, VGG etc. Aggregation functions like reduce or treeReduce that are used for parameter aggregation in Apache Spark (and the original MapReduce) are slow as the centralized scheduling and driver network bandwidth become a bottleneck especially in large clusters.
To reduce the overhead of parameter aggregation and allow for near-linear scaling, we introduce a new AllReduce operation, a part of the parameter manager in BigDL which is built directly on top of the BlockManager in Apache Spark. AllReduce in BigDL uses a peer-to-peer mechanism to synchronize and aggregate parameters. During parameter synchronization and aggregation, all nodes in the cluster play the same role and driver’s overhead is eliminated thus enabling near-linear scaling. To address the scheduling overhead we use Drizzle, a recently proposed scheduling framework for Apache Spark. Currently, Spark uses a BSP computation model, and notifies the scheduler at the end of each task. Invoking the scheduler at the end of each task adds overheads and results in decreased throughput and increased latency.
Drizzle introduces group scheduling, where multiple iterations (or a group) of iterations are scheduled at once. This helps decouple the granularity of task execution from scheduling and amortizes the costs of task serialization and launch. Finally we will present results from using the new AllReduce operation and Drizzle on a number of common deep learning models including VGG and Inception. Our benchmarks run on Amazon EC2 and Google DataProc will show the speedups and scalability of our implementation.
Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...Sri Ambati
This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/ndUtKRzVUCo
In this presentation, Erin LeDell (Chief Machine Learning Scientist, H2O.ai), will provide an overview of the field of "Automatic Machine Learning" and introduce the new AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.
Bio: Erin is the Chief Machine Learning Scientist at H2O.ai. Erin has a Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from University of California, Berkeley. Her research focuses on automatic machine learning, ensemble machine learning and statistical computing. She also holds a B.S. and M.A. in Mathematics.
Before joining H2O.ai, she was the Principal Data Scientist at Wise.io (acquired by GE Digital in 2016) and Marvin Mobile Security (acquired by Veracode in 2012), and the founder of DataScientific, Inc.
Overview of Google Data Platform echosystem for storage, compute and processing.
Data engineering use cases and building sample data pipeline on GCP. Learnings and challanges while using its different components.
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Databricks
Getting cars to drive autonomously is one of the most exciting problems these days. One of the key challenges is making them drive safely, which requires processing large amounts of data. In our talk we would like to focus on only one task of a self-driving car, namely road detection. Road detection is a software component which needs to be safe for being able to keep the car in the current lane. In order to track the progress of such a software component, a well-designed KPI (key performance indicators) evaluation pipeline is required. In this presentation we would like to show you how we incorporate Spark in our pipeline to deal with huge amounts of data and operate under strict scalability constraints for gathering relevant KPIs. Additionally, we would like to mention several lessons learned from using Spark in this environment.
HPC and cloud distributed computing, as a journeyPeter Clapham
Introducing an internal cloud brings new paradigms, tools and infrastructure management. When placed alongside traditional HPC the new opportunities are significant But getting to the new world with micro-services, autoscaling and autodialing is a journey that cannot be achieved in a single step.
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
This talk was recorded in London on October 30, 2018.
KNIME Analytics Platform is an easy to use and comprehensive open source data integration, analysis, and exploration platform, enabling data scientists to visually compose end to end data analysis workflows. The over 2,000 available modules ("nodes") cover each step of the analysis workflow, including blending heterogeneous data types, data transformation, wrangling and cleansing, advanced data visualization, or model training and deployment.
Many of these nodes are provided through open source integrations (why reinvent the wheel?). This provides seamless access to large open source projects such as Keras and Tensorflow for deep learning, Apache Spark for big data processing, Python and R for scripting, and more. These integrations can be used in combination with other KNIME nodes meaning that data scientists can freely select from a vast variety of options when tackling an analysis problem.
The integration of H2O in KNIME offers an extensive number of nodes and encapsulating functionalities of the H2O open source machine learning libraries, making it easy to use H2O algorithms from a KNIME workflow without touching any code - each of the H2O nodes looks and feels just like a normal KNIME node - and the data scientist benefits from the high performance libraries and proven quality of H2O during execution. For prototyping these algorithms are executed locally, however training and deployment can easily be scaled up using a Sparkling Water cluster.
In our talk we give a short introduction to KNIME Analytics Platform and then demonstrate how data scientists benefit from using KNIME Analytics Platform and H2O Machine Learning in combination by using a real world analysis example.
Bio: Christian received a Master’s degree in Computer Science from the University of Konstanz. Having gained experience as a research software engineer at the University of Konstanz, where he developed frameworks and libraries in the fields of bioimage analysis and machine learning, Christian moved on to become a software engineer at KNIME. He now focuses on developing new functionalities and extensions for KNIME Analytics Platform. Some of his recent projects include deep learning integrations built upon Keras and Tensorflow, extensions for image analysis and active learning, and the integration of H2O Machine Learning and H2O Sparkling Water in KNIME Analytics Platform.
Saving Energy in Homes with a Unified Approach to Data and AIDatabricks
Energy wastage by residential buildings is a significant contributor to total worldwide energy consumption. Quby, an Amsterdam based technology company, offers solutions to empower homeowners to stay in control of their electricity, gas and water usage.
Arno Candel, Chief Architect, H2O.ai talks about what's new in H2O including all the new advancements in the algorithms.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Databricks
Workday Prism Analytics enables data discovery and interactive Business Intelligence analysis for Workday customers. Workday is a “pure SaaS” company, providing a suite of Financial and HCM (Human Capital Management) apps to about 2000 companies around the world, including more than 30% from Fortune-500 list. There are significant business and technical challenges to support millions of concurrent users and hundreds of millions daily transactions. Using memory-centric graph-based architecture allowed to overcome most of these problems.
As Workday grew, data transactions from existing and new customers generated vast amounts of valuable and highly sensitive data. The next big challenge was to provide in-app analytics platform, which for the multiple types of accumulated data, and also would allow using blend in external datasets. Workday users wanted it to be super-fast, but also intuitive and easy-to-use both for the financial and HR analysts and for regular, less technical users. Existing backend technologies were not a good fit, so we turned to Apache Spark.
In this presentation, we will share the lessons we learned when building highly scalable multi-tenant analytics service for transactional data. We will start with the big picture and business requirements. Then describe the architecture with batch and interactive modules for data preparation, publishing, and query engine, noting the relevant Spark technologies. Then we will dive into the internals of Prism’s Query Engine, focusing on Spark SQL, DataFrames and Catalyst compiler features used. We will describe the issues we encountered while compiling and executing complex pipelines and queries, and how we use caching, sampling, and query compilation techniques to support interactive user experience.
Finally, we will share the future challenges for 2018 and beyond.
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/rKoBJcnsFpM
Speaker's Bio:
Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling. He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based. In his spare time he tries to be part of the IT community by organizing, attending and speaking at conferences and meet ups.
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks
The BigDL framework scales deep learning for large data sets using Apache Spark. However there is significant scheduling overhead from Spark when running BigDL at large scale. In this talk we propose a new parameter manager implementation that along with coarse-grained scheduling can provide significant speedups for deep learning models like Inception, VGG etc. Aggregation functions like reduce or treeReduce that are used for parameter aggregation in Apache Spark (and the original MapReduce) are slow as the centralized scheduling and driver network bandwidth become a bottleneck especially in large clusters.
To reduce the overhead of parameter aggregation and allow for near-linear scaling, we introduce a new AllReduce operation, a part of the parameter manager in BigDL which is built directly on top of the BlockManager in Apache Spark. AllReduce in BigDL uses a peer-to-peer mechanism to synchronize and aggregate parameters. During parameter synchronization and aggregation, all nodes in the cluster play the same role and driver’s overhead is eliminated thus enabling near-linear scaling. To address the scheduling overhead we use Drizzle, a recently proposed scheduling framework for Apache Spark. Currently, Spark uses a BSP computation model, and notifies the scheduler at the end of each task. Invoking the scheduler at the end of each task adds overheads and results in decreased throughput and increased latency.
Drizzle introduces group scheduling, where multiple iterations (or a group) of iterations are scheduled at once. This helps decouple the granularity of task execution from scheduling and amortizes the costs of task serialization and launch. Finally we will present results from using the new AllReduce operation and Drizzle on a number of common deep learning models including VGG and Inception. Our benchmarks run on Amazon EC2 and Google DataProc will show the speedups and scalability of our implementation.
Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...Sri Ambati
This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/ndUtKRzVUCo
In this presentation, Erin LeDell (Chief Machine Learning Scientist, H2O.ai), will provide an overview of the field of "Automatic Machine Learning" and introduce the new AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.
Bio: Erin is the Chief Machine Learning Scientist at H2O.ai. Erin has a Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from University of California, Berkeley. Her research focuses on automatic machine learning, ensemble machine learning and statistical computing. She also holds a B.S. and M.A. in Mathematics.
Before joining H2O.ai, she was the Principal Data Scientist at Wise.io (acquired by GE Digital in 2016) and Marvin Mobile Security (acquired by Veracode in 2012), and the founder of DataScientific, Inc.
Overview of Google Data Platform echosystem for storage, compute and processing.
Data engineering use cases and building sample data pipeline on GCP. Learnings and challanges while using its different components.
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Databricks
Getting cars to drive autonomously is one of the most exciting problems these days. One of the key challenges is making them drive safely, which requires processing large amounts of data. In our talk we would like to focus on only one task of a self-driving car, namely road detection. Road detection is a software component which needs to be safe for being able to keep the car in the current lane. In order to track the progress of such a software component, a well-designed KPI (key performance indicators) evaluation pipeline is required. In this presentation we would like to show you how we incorporate Spark in our pipeline to deal with huge amounts of data and operate under strict scalability constraints for gathering relevant KPIs. Additionally, we would like to mention several lessons learned from using Spark in this environment.
HPC and cloud distributed computing, as a journeyPeter Clapham
Introducing an internal cloud brings new paradigms, tools and infrastructure management. When placed alongside traditional HPC the new opportunities are significant But getting to the new world with micro-services, autoscaling and autodialing is a journey that cannot be achieved in a single step.
If you're like most of the world, you're on an aggressive race to implement machine learning applications and on a path to get to deep learning. If you can give better service at a lower cost, you will be the winners in 2030. But infrastructure is a key challenge to getting there. What does the technology infrastructure look like over the next decade as you move from Petabytes to Exabytes? How are you budgeting for more colossal data growth over the next decade? How do your data scientists share data today and will it scale for 5-10 years? Do you have the appropriate security, governance, back-up and archiving processes in place? This session will address these issues and discuss strategies for customers as they ramp up their AI journey with a long term view.
Docker & aPaaS: Enterprise Innovation and Trends for 2015WaveMaker, Inc.
WaveMaker Webinar: Cloud-based App Development and Docker: Trends to watch out for in 2015 - http://www.wavemaker.com/news/webinar-cloud-app-development-and-docker-trends/
CIOs, IT planners and developers at a growing number of organizations are taking advantage of the simplicity and productivity benefits of cloud application development. With Docker technology, cloud-based app development or aPaaS (Application Platform as a Service) is only becoming more disruptive − forcing organizations to rethink how they handle innovation, time-to-market pressures, and IT workloads.
Mike Spicer is the lead architect for the IBM Streams team. In his presentation, Mike provides an overview of the many key new features available in IBM Streams V4.1. Simpler development, simpler management, and Spark integration are a few of the capabilities included in IBM Streams V4.1.
Doug Cutting discusses:
- A brief history of Spark and its rise in popularity across developers and enterprises
- Spark's advantages over MapReduce
- The One Platform Initiative and the roadmap for Spark
- The future of data processing in Hadoop
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
Sandeep Singh, Head of Applied AI Computer Vision, Beans.ai
H2O Open Source GenAI World SF 2023
In the modern era of machine learning, leveraging both open-source and closed-source solutions has become paramount for achieving cutting-edge results. This talk delves into the intricacies of seamlessly integrating open-source Large Language Model (LLM) solutions like Vicuna, Falcon, and Llama with industry giants such as ChatGPT and Google's Palm. As the demand for fine-tuned and specialized datasets grows, it is imperative to understand the synergy between these tools. Attendees will gain insights into best practices for building and enriching datasets tailored for fine-tuning tasks, ensuring that their LLM projects are both robust and efficient. Through real-world examples and hands-on demonstrations, this talk will equip attendees with the knowledge to harness the power of both open and closed-source tools in a coherent and effective manner.
Patrick Hall, Professor, AI Risk Management, The George Washington University
H2O Open Source GenAI World SF 2023
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you!
Dr. Alexy Khrabrov, Open Source Science Community Director, IBM
H2O Open Source GenAI World SF 2023
In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.
Michelle Tanco, Head of Product, H2O.ai
H2O Open Source GenAI World SF 2023
Learn how the makers at H2O.ai are building internal tools to solve real use cases using H2O Wave and h2oGPT. We will walk through an end-to-end use case and discuss how to incorporate business rules and generated content to rapidly develop custom AI apps using only Python APIs.
Applied Gen AI for the Finance Vertical Sri Ambati
Megan Kurka, Vice President, Customer Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
Discover the transformative power of Applied Gen AI. Learn how the H2O team builds customized applications and workflows that integrate capabilities of Gen AI and AutoML specifically designed to address and enhance financial use cases. Explore real world examples, learn best practices, and witness firsthand how our innovative solutions are reshaping the landscape of finance technology.
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
Pascal Pfeiffer, Principal Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
En esta reunión virtual, damos una introducción a la plataforma de aprendizaje automático de código abierto número 1, H2O-3 y te mostramos cómo puedes usarla para desarrollar modelos para resolver diferentes casos de uso.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
In this session, you will learn about what you should do after you’ve taken an AI transformation baseline. Over the span of this session, we will discuss the next steps in moving toward AI readiness through alignment of talent and tools to drive successful adoption and continuous use within an organization.
To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course
To find the Youtube video about this presentation: https://youtu.be/K1Cl3x3rd8g
Speaker:
Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
When stars align: studies in data quality, knowledge graphs, and machine lear...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI World London 2018
1. Scaling out Driverless AI
in Enterprise Data
Centers with IBM
Spectrum Conductor
Kevin Doyle
Lead Architect IBM Spectrum Conductor
IBM
LinkedIn: https://www.linkedin.com/in/kevin-doyle-675a4031/
2. Benefits of managing H2O with IBM Spectrum
Conductor
• H2O Driverless AI can scale across compute nodes for multiple instances, with each instance
allocated to one host
• In a future IBM Spectrum Conductor release, integration improves at the GPU level: You will be
able to run multiple Driverless AI instances on the same host, where each instance is allocated to
an assigned GPU
• Shared file system for Data and logs
• Failover to another host if Driverless AI goes down: IBM Spectrum Conductor starts it up on
another host (if resources available)
• Easily start and stop H2O Driverless AI and maintain instances for each user or groups of users
through role-based access control (RBAC) and consumer association, along with all other
workloads in one shared compute cluster
• H2O Driverless AI and IBM POWER9 GPU Systems are bringing together the best of breed AI
innovation. To handle the increasingly complex workloads of AI you need an integrated system of
software and hardware:
• IBM supports nearly 2.6x mPOWER9ore RAM, 9.5x more I/O bandwidth than comparable systems
• Nearly 2X the data ingest speed and over 50% faster feature engineering
• With GPU accelerated machine learning delivering nearly 30X speedup on model building
• Support for up to 6 V100 GPUs on a single system
3. What is IBM® Spectrum Conductor?
• IBM Spectrum Conductor confidently deploys modern computing frameworks and
services for a multitenant enterprise environment, both on-premises and in the cloud
• Provides multitenancy through application instances and Spark instance groups. You can
deploy modern computing frameworks and services, such as Spark, Anaconda, Driverless
AI, and H2O Sparkling Water efficiently and effectively, supporting multiple versions and
instances of each framework and service
• Increases performance and scale through granular and dynamic resource allocation for
application instances and Spark instance groups that share a resource pool
• Maximizes usage of resources and eliminates silos of resources that would otherwise
each be tied to separate application implementations
• Provides flexible and efficient data management for shared storage and high availability
by connecting to existing storage infrastructure, such as NFS mounts to a file system or
IBM Spectrum Scale™
4. VIRTUALIZED VIEWOF COMPUTE,NETWORKAND STORAGERESOURCES
Application
Application
examples
• Simulation
• Analysis
• Design
• Big data
IT constrained
• Long wait times
• Low utilization
• Data access
bottlenecks
• IT Sprawl
IBM Software Defined Infrastructure
Big data
Simulation and
modeling
Analytics
Traditional IBM Spectrum Conductor
Make multiple computers look
like one
Prioritized matching of supply
with demand
Benefits
• High utilization
• Throughput
• Performance
• Prioritization
• Reduced cost
Repeated for many
apps and groups
Converged
compute
and
storage
VIRTUALIZED VIEWOF COMPUTE,NETWORKAND STORAGERESOURCES
Faster results Fewer resources
Long running services
Distinct resources for
compute and storage
Traditional vs Conductor Management
5. IBM Systems
Shared Services Model for Spark, Machine Learning, and Deep Learning
• Physical view: IBM Spectrum Conductor installed on each Linux server
• Logical view: Users (groups) have their own Spark cluster (optional) that is isolated, protected, and
secured by Spark instance groups or application instances – Managed by SLA
| 5
Administrator
Compute Nodes
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Instance #1
LOB
Marketing…
Fraud Detection…
Data scientist
Instance #2
Data scientist
Driverless AI
Instance #3
Researcher
Instance #4
x86 Systems
Cloud Object Storage (COS)Spectrum Scale
Spectrum Conductor
Data Connectors
6. IBM Systems
IBM Spectrum Conductor
The most complete enterprise-grade solution for Data Science
• Anaconda Distributions
The solution supports multiple distributions of Anaconda running concurrently.
Users can add/remove Conda packages.
• Notebooks Integration
Out-of-the-box notebooks available: Jupyter, Zeppelin, RStudio, H2O
Sparkling Water. Other notebooks and distributed frameworks can be quickly
integrated.
• Spark Distributions
The solution supports multiple versions of Spark running concurrently.
• Workload Management / Scheduling
A proven workload scheduling engine that enhances the Spark master
scheduling logic to enable multi-tenancy.
• Services Management
Management of other long running application services on the same grid.
Spark applications commonly have dependencies on other services that can
now be managed as a single solution.
• Resource Management & Orchestration
Proven architecture at scale. Resources are dynamically allocated to Spark
workload with fine grain sharing across applications.
• IBM services and support
A single point of contact for your services and support needs.
| 6
Monitoring&Reporting
Workload Management / Scheduling
Resource Management &
Orchestration
Services Management
Services and Support
Red Hat Linux
x86…
Notebooks
9. PowerAI Enterprise ML/DL - Data Science Stack
Open Source Frameworks Distribution
Data Layer
Runtimes,
Resource &
WL Managers
DL Frameworks
ML Libraries
ML/DL
UI and Flow
Data Science
Apps
Value-add Tools
IBM Spectrum Conductor
Tensor
Flow
Caffe PyTorch Chainer MLLib Graphx
Scikit-
learn
R xgboost
GPU Support / Distributed / BYOF / Session Scheduler / MPI / Containers… Anaconda
Python
Spark
Anaconda
Distributed Deep Learning (DDL)
Data Prep / Parallel Training / Model Tuning / Model Evaluation / Inference Services…
IBM Spectrum Conductor Deep Learning Impact
PowerAI Vision
IBM
PowerAI
Enterprise
IBM Spectrum Scale IBM Cloud Object Store
Watson Studio
Elastic Distributed Training (EDT)
10. Key concepts of IBM Spectrum Conductor
• Application instances
• Customizable feature to support running any long-running service within the cluster
• Application templates (yaml) are created to define the processes (services) that you
want to run in the cluster
• Driverless AI integration is done through application instances
• Spark instance groups
• Is an installation of Apache Spark that can run Spark core services (master, shuffle,
and history), Anaconda distribution instances, and notebooks as configured
• You can create and run multiple Spark instance groups, associating each instance
group with different Spark/Anaconda/notebook version packages as required
• H2O Sparkling Water integration is treated as a notebook within your Spark instance
groups
11. Key concepts of IBM Spectrum Conductor Cont
• Resource groups
• Provide a simple way of organizing and grouping resources (hosts)
• Defines how to divide up the hosts in the group into slots
• Slots are used to decide if a host is available to place new workload on it
• Consumers
• A way to map organizations/teams to resources they are allowed to use
• Resource planning uses consumers to determine advanced policies for when
to borrow/lend resources to other consumers
• Resource groups map to consumers to allow users adding application
instances or Spark instance groups to only use those resource groups
12. Role-based access control
• Permissions are assigned to roles
• Roles are assigned to users
• Most permissions are based on a consumer
• Users will have the permissions/role assigned but only for the consumers they
have access to
• Ability to allow users to only access/control what they should
• Example: Each user can see only their Driverless AI instances as desired
13. How does the integration work?
• H2O Driverless AI is launched on a single host
• The host can have either GPUs or just run with CPUs
• If using GPUs the entire host is taken (with current integration)
• An application instance is created for each user of Driverless AI
• Maintains security for the data this user has access to
• Environment variables through parameters are used to configure Driverless AI
• H2O Sparkling Water runs as a notebook in a Spark instance group
• When the notebook is started up it forms a mini cluster of executors
• These executors stay alive for the entire duration of the notebook
• IBM Spectrum Conductor disables preemption to not reclaim these hosts
• Multiple users can share a Sparkling Water notebook instance or have
dedicated ones per user
14. Current Integration
14
Session Scheduler
Security
Data Connector
Report/log management
Notebook Spark ELKPython
Resource, Cluster, Service Management (K8s/EGO)
ContainerGPU and Acceleration
Multi-tenancy
Batch Scheduler
Session Scheduler
Session Scheduler
Instance Group #1 Instance Group #2
App instance
# marketing
App instance
# fraud
Instance Group
# 5
Elastic Distributed
Training (EDT)
# other
apps …
16. Future Plans (short term)
• Log retrieval from IBM Spectrum Conductor web UI
• Ability to deploy Driverless AI with IBM Spectrum Conductor instead
of installing on all systems (new application template)
• Ability to modify application instance outputs more effectively
• Enhance job monitor to check when Driverless AI is up
17. Future Plans (longer term)
• Improved port management
• Today you can specify the ports to use, however, you don’t know if they are
being used on existing hots
• The ports might work at first but not later if something else is using the ports
• Improve handling of running Driverless AI with a subset of GPUs on
hosts in the cluster
• Integrate Driverless AI authentication with IBM Spectrum Conductor
authentication/authorization for easier setup
• Look at supporting Driverless AI to run across multiple machines
• Investigate the best approaches to connect to data sources
18. Long term architecture vision for Driverless AI
integrated with IBM Spectrum Conductor
H2O Driverless AI
Batch Scheduler
(1) Start Driverless AI
Linux Linux
Linux Linux
Linux Linux
Linux Linux
Linux Linux
Linux Linux
Linux Linux
Linux Linux
Linux Linux
Linux Linux
Session Scheduler
(2) Find a host to run Driverless AI
(3) Run workload
(training,
experiment, etc)
(4) Find hosts to run the
workload on to speed up
execution
19. It’s available now
• Contact Richard Shedrick ( rshedrick@us.ibm.com ) to get access to
the integration and learn more
• Future announcements and contact points on the integration at:
• IBM Spectrum Conductor Blog:
http://ibm.biz/ConductorBlogs
• IBM Spectrum Conductor’s Slack channel:
http://ibm.biz/ConductorSlack
20. 20
Simplicity: Integrated
Platform that Just Works
Curate, Test, and Support
Fast Moving Open Source
Provide Enterprise
Distribution on RedHat
Easy to deploy Enterprise
AI Platform
Ease of Use, Unique
Capabilities
Faster Model
Training Time
Large data & model
support due to NVLink
Acceleration of Analytics &
ML
AutoML: PowerAI Vision
Elastic Training: Scale GPUs
as Required
Faster Training Times in
Single Server
Scalability to 100s of
Servers (Cluster level
Integration)
Leads to Faster Insights
and Better Economics
Platform that Partners can
build on
Software Partners: H2O,
IBM, Anaconda
SIs, Solution Vendors
& Accelerator Partners
Open AI Platform w/
Ecosystem Partners
Power9
CPU
GPU
PowerAI
IBM
SW
ISV SW
Solution
SIs
Top Reasons to Choose PowerAI Enterprise