SlideShare a Scribd company logo
Deploying H2O in
Large-Scale Distributed
Environments using
Containers
Nanda Vijaydev
Lead Data Scientist and Senior Director of Solutions
BlueData (now part of HPE)
www.bluedata.ai @NandaVijaydev @BlueData
#H2OWORLD
Hype versus Reality …
Source: https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
AI / ML Stack: Tools + Infrastructure
Multiple ML / DL frameworks, notebooks, and other tools
Data Science
Notebooks
Analytics
& BI Tools
Pipeline Tools
Data Scientists Developers Data Engineers Decision Makers
Role-based
access control
Tools for
distributed
AI / ML / DL
pipeline
RDBMS HDFS Streams Spark
Model
Storage
Workflow
Mgmt
Data Frameworks Access to data
and storage
• Access to valuable data: small, big, or both
• Choices of modeling techniques: each problem is
different
• Ability to build on datasets, validate on other
datasets, iterate, and improve
• Access to GPUs (and CPUs)
• Scale easily on real datasets
• Ability to operationalize in production
Distributed AI / ML / DL – Key Requirements
Source: https://rohitnarurkar.wordpress.com/2013/11/02/cuda-matrix-multiplication
• Scalability, repeatability, complexity,
reproducibility across environments
• Sharing data, not duplicating data
• Deploying distributed platforms, libraries,
applications, and versions
• Efficiently sharing expensive resources like GPUs
• Agility to scale up and down compute resources
• Providing a future-proof solution
• Ensuring compatible NVIDIA device kernel
module installation
Distributed AI / ML / DL – Challenges
Laptop On-Prem
Cluster
Off-Prem
Cluster
Distributed
Machine Learning
with H20 on
Containers
Docker is a computer program that
performs operating-system-level virtualization
also known as containerization.
Containerization allows the existence of
multiple isolated user-space instances.
Docker Containers
Source: https://en.wikipedia.org/wiki/docker_(software)
Container-Based Platform for AI / ML
Data Scientists Developers Data Engineers Data Analysts
BI/Analytics Tools Bring-Your-Own
NFS HDFS
Compute
Storage
On-Premises Public Cloud
ML/DL & Big Data Tools Data Science Tools
CPUs GPUs
IOBoost™ – Extreme performance and enterprise-grade scalability
ElasticPlane™ – Self-service, multi-tenant containerized environments
DataTap™ – In-place access to data on-prem or in the cloud
BlueData EPIC™ Software Platform
H2O for AI/ML
Accelerated AI / ML Deployment
• With H2O + BlueData, customers now have:
• Pre-built Docker H2O images with CUDA and automated cluster
creation for the entire stack
• Appropriate NVIDIA kernel module surfaced automatically to the
containers
• Easy access to resources required (e.g. single node, single GPU, multi-
node, multi-GPU combinations)
• UI, CLI, and API access (notebooks, web, SSH)
• NFS mounts surfaced as local drives for sharing assets
Example of an H2O Pipeline on Containers
H2O Driverless AI
Import Validate
Export
Shared Data Access Layer
… Data Sources …
Docker images for multiple
applications and versions
Ability to create and
add new images, and
save or restore
tested combinations
on demand
Deploy H2O from Pre-Built Images in the
BlueData EPIC App Store
Multi-Tenant, with Quotas for GPU Resources
Support for multi-tenancy
and ability to define quota
per tenant
Define ‘flavor’ types used to
launch Docker containers
Spin Up Multiple Environments
Quick launch templates
for one-click cluster
creation
Run multiple clusters,
with different versions or
combinations of tools,
side by side
Pick from a list of
pre-built and tested images
Assign specific resources (GPUs,
CPUs) to the cluster, depending on
the use case (e.g. for Driverless AI)
Define number of nodes, here for
H2O and Sparkling Water
On-Demand Cluster Creation
• 2 Docker containers running different versions
of Driverless AI with 1 GPU (Tesla P100) each
• NVIDIA device kernel (driver version: 390.46)
• NVIDIA CUDA (9.x) and cuDNN libraries
including
1. libcudnn7-7.4.2.24-1.cuda9.0.x86_64.rpm
2. libcudnn7-devel-7.4.2.24-
1.cuda9.0.x86_64.rpm
Source: https://medium.com/linagora-engineering/making-image-classification-simple-with-spark-deep-learning-f654a8b876b8
H2O Driverless AI with GPUs
• The user authenticates on Driverless AI
• Import datasets from BlueData DataTap
with DataTap connector, optimized
access with BlueData IOBoost
• Analyze the data
• Run experiments
• Build models, save them …
• Validate against other datasets from
DataTap …
• Export model for production
Run Driverless AI on Containers with GPUs
dtap
• Optionally initialize Sparkling Water against an existing H2O cluster created previously
[external backend]
• Pass to Sparkling Water the appropriate jar to use for the HDFS connectivity
• Work on your dataset using the HDFS connectivity
Work with Sparkling Water Cluster and HDFS
• BlueData EPIC automatically
deploys the environments
• Using persistent containers
• Providing true multi-tenancy
• Access to shared resources (CPU,
RAM, GPUs, storage)
• Pre-built H2O images in the
BlueData EPIC App Store
• Enterprise-grade security
(integration with AD /LDAP / TDE)
Simplify H2O Deployments at Scale in Minutes
BlueData DataTap
BlueData IOBoost
Enable Compute / Storage Separation
Connect the clusters to different datasets without
copying the data, and with performance optimized
From the BlueData EPIC App Store, deploy
more application clusters to connect to H2O
Integrate H2O with Production Environment
• Infrastructure for distributed ML / DL is complex (CPUs, GPUs, data …)
 This complexity can be abstracted from data science teams with self-
service provisioning and automation, using containers
 GPU access can be effectively used by the containerized application,
then released for other applications and users
 For a flexible and scalable solution, data resources
should be decoupled from compute
• H2O, Driverless AI, and Sparkling Water can be
deployed at scale on containers – whether
on-premises, on any public cloud, or hybrid
 BlueData + H2O proven in production with Global 2000 enterprises
Lessons Learned – H2O on Containers
Thank you!
www.bluedata.ai
@BlueData @NandaVijaydev
Nanda Vijaydev
Lead Data Scientist and Senior Director of Solutions
BlueData (now part of HPE)
https://www.linkedin.com/in/nanda-vijaydev-3638693
#H2OWORLD

More Related Content

What's hot

H2O-3: Overview of new features and algorithms
H2O-3: Overview of new features and algorithmsH2O-3: Overview of new features and algorithms
H2O-3: Overview of new features and algorithmsSri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFSri Ambati
 
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCMegan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCSri Ambati
 
Productionizing H2O Models with Apache Spark
Productionizing H2O Models with Apache SparkProductionizing H2O Models with Apache Spark
Productionizing H2O Models with Apache SparkSri Ambati
 
An Introduction to H2O4GPU
An Introduction to H2O4GPUAn Introduction to H2O4GPU
An Introduction to H2O4GPUSri Ambati
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneSri Ambati
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Sri Ambati
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureMicrosoft
 
Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabSri Ambati
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadForget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadData Con LA
 
H2O Driverless AI Workshop
H2O Driverless AI WorkshopH2O Driverless AI Workshop
H2O Driverless AI WorkshopSri Ambati
 
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...Sri Ambati
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scaleHenry Saputra
 
Scalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OSri Ambati
 
Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYCSri Ambati
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLYulia Tell
 
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIAH2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIASri Ambati
 

What's hot (20)

H2O-3: Overview of new features and algorithms
H2O-3: Overview of new features and algorithmsH2O-3: Overview of new features and algorithms
H2O-3: Overview of new features and algorithms
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCMegan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
Productionizing H2O Models with Apache Spark
Productionizing H2O Models with Apache SparkProductionizing H2O Models with Apache Spark
Productionizing H2O Models with Apache Spark
 
An Introduction to H2O4GPU
An Introduction to H2O4GPUAn Introduction to H2O4GPU
An Introduction to H2O4GPU
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
 
Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on Lab
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadForget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
 
H2O Driverless AI Workshop
H2O Driverless AI WorkshopH2O Driverless AI Workshop
H2O Driverless AI Workshop
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
Building Real Time Targeting Capabilities - Ryan Zotti, Subbu Thiruppathy - C...
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
 
Scalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2O
 
Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYC
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDL
 
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIAH2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
 

Similar to Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco

VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup
 
Hambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OHambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OSri Ambati
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes vty
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 
Persistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU projectPersistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU projectvty
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipelineDataWorks Summit
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
Docker datascience pipeline
Docker datascience pipelineDocker datascience pipeline
Docker datascience pipelineDataWorks Summit
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSWJason Hubbard
 

Similar to Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco (20)

Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
Hambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OHambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2O
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
BlueData DataSheet
BlueData DataSheetBlueData DataSheet
BlueData DataSheet
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Persistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU projectPersistent identifiers in DataverseEU project
Persistent identifiers in DataverseEU project
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipeline
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Docker datascience pipeline
Docker datascience pipelineDocker datascience pipeline
Docker datascience pipeline
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 

Recently uploaded (20)

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 

Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environments using Containers - H2O World San Francisco

  • 1. Deploying H2O in Large-Scale Distributed Environments using Containers Nanda Vijaydev Lead Data Scientist and Senior Director of Solutions BlueData (now part of HPE) www.bluedata.ai @NandaVijaydev @BlueData #H2OWORLD
  • 2. Hype versus Reality … Source: https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
  • 3. AI / ML Stack: Tools + Infrastructure Multiple ML / DL frameworks, notebooks, and other tools Data Science Notebooks Analytics & BI Tools Pipeline Tools Data Scientists Developers Data Engineers Decision Makers Role-based access control Tools for distributed AI / ML / DL pipeline RDBMS HDFS Streams Spark Model Storage Workflow Mgmt Data Frameworks Access to data and storage
  • 4. • Access to valuable data: small, big, or both • Choices of modeling techniques: each problem is different • Ability to build on datasets, validate on other datasets, iterate, and improve • Access to GPUs (and CPUs) • Scale easily on real datasets • Ability to operationalize in production Distributed AI / ML / DL – Key Requirements Source: https://rohitnarurkar.wordpress.com/2013/11/02/cuda-matrix-multiplication
  • 5. • Scalability, repeatability, complexity, reproducibility across environments • Sharing data, not duplicating data • Deploying distributed platforms, libraries, applications, and versions • Efficiently sharing expensive resources like GPUs • Agility to scale up and down compute resources • Providing a future-proof solution • Ensuring compatible NVIDIA device kernel module installation Distributed AI / ML / DL – Challenges Laptop On-Prem Cluster Off-Prem Cluster
  • 7. Docker is a computer program that performs operating-system-level virtualization also known as containerization. Containerization allows the existence of multiple isolated user-space instances. Docker Containers Source: https://en.wikipedia.org/wiki/docker_(software)
  • 8. Container-Based Platform for AI / ML Data Scientists Developers Data Engineers Data Analysts BI/Analytics Tools Bring-Your-Own NFS HDFS Compute Storage On-Premises Public Cloud ML/DL & Big Data Tools Data Science Tools CPUs GPUs IOBoost™ – Extreme performance and enterprise-grade scalability ElasticPlane™ – Self-service, multi-tenant containerized environments DataTap™ – In-place access to data on-prem or in the cloud BlueData EPIC™ Software Platform H2O for AI/ML
  • 9. Accelerated AI / ML Deployment • With H2O + BlueData, customers now have: • Pre-built Docker H2O images with CUDA and automated cluster creation for the entire stack • Appropriate NVIDIA kernel module surfaced automatically to the containers • Easy access to resources required (e.g. single node, single GPU, multi- node, multi-GPU combinations) • UI, CLI, and API access (notebooks, web, SSH) • NFS mounts surfaced as local drives for sharing assets
  • 10. Example of an H2O Pipeline on Containers H2O Driverless AI Import Validate Export Shared Data Access Layer … Data Sources …
  • 11. Docker images for multiple applications and versions Ability to create and add new images, and save or restore tested combinations on demand Deploy H2O from Pre-Built Images in the BlueData EPIC App Store
  • 12. Multi-Tenant, with Quotas for GPU Resources Support for multi-tenancy and ability to define quota per tenant Define ‘flavor’ types used to launch Docker containers
  • 13. Spin Up Multiple Environments Quick launch templates for one-click cluster creation Run multiple clusters, with different versions or combinations of tools, side by side
  • 14. Pick from a list of pre-built and tested images Assign specific resources (GPUs, CPUs) to the cluster, depending on the use case (e.g. for Driverless AI) Define number of nodes, here for H2O and Sparkling Water On-Demand Cluster Creation
  • 15. • 2 Docker containers running different versions of Driverless AI with 1 GPU (Tesla P100) each • NVIDIA device kernel (driver version: 390.46) • NVIDIA CUDA (9.x) and cuDNN libraries including 1. libcudnn7-7.4.2.24-1.cuda9.0.x86_64.rpm 2. libcudnn7-devel-7.4.2.24- 1.cuda9.0.x86_64.rpm Source: https://medium.com/linagora-engineering/making-image-classification-simple-with-spark-deep-learning-f654a8b876b8 H2O Driverless AI with GPUs
  • 16. • The user authenticates on Driverless AI • Import datasets from BlueData DataTap with DataTap connector, optimized access with BlueData IOBoost • Analyze the data • Run experiments • Build models, save them … • Validate against other datasets from DataTap … • Export model for production Run Driverless AI on Containers with GPUs dtap
  • 17. • Optionally initialize Sparkling Water against an existing H2O cluster created previously [external backend] • Pass to Sparkling Water the appropriate jar to use for the HDFS connectivity • Work on your dataset using the HDFS connectivity Work with Sparkling Water Cluster and HDFS
  • 18. • BlueData EPIC automatically deploys the environments • Using persistent containers • Providing true multi-tenancy • Access to shared resources (CPU, RAM, GPUs, storage) • Pre-built H2O images in the BlueData EPIC App Store • Enterprise-grade security (integration with AD /LDAP / TDE) Simplify H2O Deployments at Scale in Minutes
  • 19. BlueData DataTap BlueData IOBoost Enable Compute / Storage Separation Connect the clusters to different datasets without copying the data, and with performance optimized
  • 20. From the BlueData EPIC App Store, deploy more application clusters to connect to H2O Integrate H2O with Production Environment
  • 21. • Infrastructure for distributed ML / DL is complex (CPUs, GPUs, data …)  This complexity can be abstracted from data science teams with self- service provisioning and automation, using containers  GPU access can be effectively used by the containerized application, then released for other applications and users  For a flexible and scalable solution, data resources should be decoupled from compute • H2O, Driverless AI, and Sparkling Water can be deployed at scale on containers – whether on-premises, on any public cloud, or hybrid  BlueData + H2O proven in production with Global 2000 enterprises Lessons Learned – H2O on Containers
  • 22. Thank you! www.bluedata.ai @BlueData @NandaVijaydev Nanda Vijaydev Lead Data Scientist and Senior Director of Solutions BlueData (now part of HPE) https://www.linkedin.com/in/nanda-vijaydev-3638693 #H2OWORLD

Editor's Notes

  1. Deep learning uses general learning algorithms The algorithms need to build the layers of an artificial neural network Training data Processing this training data requires lots of computation Matrix multiplications
  2. The #1 challenge with respect to bringing the DevOps mindset to to Big Data is the scalability, reproducibility and repeatability. It’s easy enough for developers to work on their laptops. Data scientists sometimes prototype the entire pipeline on a powerful laptop with a whatever it takes, “make it work” mentality. You can take a single node VM, install a bunch of libraries and work on smallish data sets. But will that same program successfully deploy and work on a real environment that uses multi-node clusters, potentially different versions and libraries and more importantly significantly larger volumes of data. This last aspect is unique to the Big Data and is one of single biggest reason that data team are unable to iterate rapidly ML / DL local Single node VM Local libraries Limited data (10s of GB) “It works on my laptop” Multi-node environments Different versions Different environment variables Libraries and dependencies must exist on all nodes Big Data (TBs of data)
  3. The BlueData EPIC software platform leverages Docker container technology – together with patented innovations – to deliver self-service, speed, security, and efficiency for Big Data Analytics, Data Science, and AI / ML / DL environments. The key components of the BlueData EPIC (which stands for Elastic Private Instant Clusters) platform are: • ElasticPlane™ enables users to spin up virtual clusters on-demand in a secure, multi-tenant environment. • IOBoost™ ensures performance on par with bare-metal, with the agility and simplicity of Docker containers. • DataTap™ accelerates time-to-value for Big Data by eliminating time-consuming data movement. Our software platform can be deployed on any infrastructure – whether on-premises, in the public cloud (e.g. AWS and now Azure and GCP), or a hybrid architecture