SlideShare a Scribd company logo
Caffe + H2O
Cyprien Noel
Context - me
● Distributed systems - trading, air control, neural nets
● Multi-GPU Caffe
● Caffe over InfiniBand in Spark
Now at UCB
● Caffe: python, help merge forks
● Project: how to generalize work above?
○ Help leverage devices, e.g. in H2O
○ New distributed Caffe, meta graph
Context - industry
Example
Problem
● DPDK
● Libfabric
● Accelio
● UCX
● PMEM
● More every week...
● GPUDirect
● NVM Express
● HMM
● CAPI
● CCIX
● HSA
● OFED
A single abstraction?
● Intra (device bus) vs inter-machine (networks)
○ E.g. CUDA copy and sockets
○ RDMA blurs local and remote devices
● Communication vs persistence
○ Sockets vs files is orthogonal to location
○ NVMe allows storage on remote disks
● Ephemeral vs durable
○ 3D XPoint & ReRAM are in-between RAM and SSD
○ Intel’s pmem exposes device directly as memory
Proposal
● An in-memory file system
○ Location transparent mmap
○ Transactional
Example - GPU kernel on data in storage
Today
BFS
● Client reads HDFS path
● HDFS client resolves worker
● Establishes connection
● Server accepts connection
● Authentication, authorization
● File system operation
● Network transfer
● CUDA transfer
data = mmap("/path")
gpu_kernel(data)
Example - Compute graph in hardware
/app/jpgs/*
/layers/*
/vars/* // Access DB
/redis db = redis.open("./redis")
● Everything is a file
○ Using mmap, named pipes, unix sockets
○ E.g. inputs jpgs, weights, activations, counters
● All state and coordination in fs
○ Minimal code, e.g. persistent GPU kernels
○ Location independent → dynamic placement
○ Arbitrary graph splitting, e.g. data & model parallel ML
Example - Caffe & H2O
● H2O can write to Caffe input layers
○ Data directly placed GPUs
○ RDMA atomic ops to count dependencies
● Can form pipelines
○ No need for pair wise integrations
○ Uniform monitoring, logging etc.
○ Leverage best device for each step
Benefits
● Performance
○ mmap lowest possible overhead
○ Leverages hardware, e.g. GPUDirect, RDMA, NVMe, atomic ops
● Complexity
○ Unified naming, permissioning, distributed state management
○ Hierarchical naming & location transparency → HA, placement
● Security
○ File permissions familiar & kernel level, other networking disabled
○ Mounting folder gives access to well defined resources / capabilities
Prototype
● Single master with meta data
● Distributed mmap (CPU)
● Embedded platform (X1)
● Ethernet, InfiniBand
Summary
● Caffe progress - multi-GPU in python, merge NV work
● Working on new programming model
○ “Unix philosophy for modern apps”
○ Helps leverage devices, e.g. in H2O
○ Simplifies apps integration & pipelines
○ Distributed version of Caffe first use case

More Related Content

What's hot

Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
Gagan Bajpai
 
Integrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI TargetIntegrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI Target
ijsrd.com
 
MongoDB SF Ruby
MongoDB SF RubyMongoDB SF Ruby
MongoDB SF Ruby
Mike Dirolf
 
Embedded Recipes 2018 - Shared memory / telemetry - Yves-Marie Morgan
Embedded Recipes 2018 - Shared memory / telemetry - Yves-Marie MorganEmbedded Recipes 2018 - Shared memory / telemetry - Yves-Marie Morgan
Embedded Recipes 2018 - Shared memory / telemetry - Yves-Marie Morgan
Anne Nicolas
 
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Ian Lumb
 
Gjergj Sheldija: Albania
Gjergj Sheldija: AlbaniaGjergj Sheldija: Albania
Gjergj Sheldija: Albania
AccessITplus
 
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraApache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Anant Corporation
 
SNIA SDC 2016 final
SNIA SDC 2016 finalSNIA SDC 2016 final
SNIA SDC 2016 final
Dan Lambright
 
RubiX
RubiXRubiX
Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0
Ian Lumb
 
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelOSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
NETWAYS
 
Unba.se – San Diego Rust – march 2017 (abridged)
Unba.se – San Diego Rust – march 2017 (abridged)Unba.se – San Diego Rust – march 2017 (abridged)
Unba.se – San Diego Rust – march 2017 (abridged)
Daniel Norman
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
Shubham Tagra
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNET
Nikos Kormpakis
 
Web scale monitoring
Web scale monitoringWeb scale monitoring
Web scale monitoring
Dobrica Pavlinušić
 
Doing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOceanDoing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOcean
DigitalOcean
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
Theophanis Kontogiannis
 
Ch16
Ch16Ch16
Introducing Apache Jackrabbit OAK
Introducing Apache Jackrabbit OAKIntroducing Apache Jackrabbit OAK
Introducing Apache Jackrabbit OAK
Yash Mody
 
Redis Overview
Redis OverviewRedis Overview
Redis Overview
Hoang Long
 

What's hot (20)

Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
Integrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI TargetIntegrating GlusterFS with iSCSI Target
Integrating GlusterFS with iSCSI Target
 
MongoDB SF Ruby
MongoDB SF RubyMongoDB SF Ruby
MongoDB SF Ruby
 
Embedded Recipes 2018 - Shared memory / telemetry - Yves-Marie Morgan
Embedded Recipes 2018 - Shared memory / telemetry - Yves-Marie MorganEmbedded Recipes 2018 - Shared memory / telemetry - Yves-Marie Morgan
Embedded Recipes 2018 - Shared memory / telemetry - Yves-Marie Morgan
 
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
 
Gjergj Sheldija: Albania
Gjergj Sheldija: AlbaniaGjergj Sheldija: Albania
Gjergj Sheldija: Albania
 
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraApache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
 
SNIA SDC 2016 final
SNIA SDC 2016 finalSNIA SDC 2016 final
SNIA SDC 2016 final
 
RubiX
RubiXRubiX
RubiX
 
Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0
 
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelOSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
 
Unba.se – San Diego Rust – march 2017 (abridged)
Unba.se – San Diego Rust – march 2017 (abridged)Unba.se – San Diego Rust – march 2017 (abridged)
Unba.se – San Diego Rust – march 2017 (abridged)
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNET
 
Web scale monitoring
Web scale monitoringWeb scale monitoring
Web scale monitoring
 
Doing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOceanDoing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOcean
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
 
Ch16
Ch16Ch16
Ch16
 
Introducing Apache Jackrabbit OAK
Introducing Apache Jackrabbit OAKIntroducing Apache Jackrabbit OAK
Introducing Apache Jackrabbit OAK
 
Redis Overview
Redis OverviewRedis Overview
Redis Overview
 

Viewers also liked

Matrix transposition
Matrix transpositionMatrix transposition
Matrix transposition
동호 이
 
Cybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurCybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith Barthur
Sri Ambati
 
Cybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurCybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith Barthur
Sri Ambati
 
Sparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal MalohlavaSparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal Malohlava
Sri Ambati
 
Deep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno CandelDeep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno Candel
Sri Ambati
 
H2O & Tensorflow - Fabrizio
H2O & Tensorflow - Fabrizio H2O & Tensorflow - Fabrizio
H2O & Tensorflow - Fabrizio
Sri Ambati
 
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2ODeep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Sri Ambati
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
Sri Ambati
 

Viewers also liked (8)

Matrix transposition
Matrix transpositionMatrix transposition
Matrix transposition
 
Cybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurCybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith Barthur
 
Cybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith BarthurCybersecurity with AI - Ashrith Barthur
Cybersecurity with AI - Ashrith Barthur
 
Sparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal MalohlavaSparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal Malohlava
 
Deep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno CandelDeep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno Candel
 
H2O & Tensorflow - Fabrizio
H2O & Tensorflow - Fabrizio H2O & Tensorflow - Fabrizio
H2O & Tensorflow - Fabrizio
 
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2ODeep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 

Similar to Caffe + H2O - By Cyprien noel

Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
HostedbyConfluent
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
Linaro
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
Ceph Community
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
Ceph Community
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
Frédéric Parienté
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm Basebands
Priyanka Aash
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
Databricks
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
Amer Ather
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
Ceph Community
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
Omid Vahdaty
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
Avinash Ramineni
 
Quest for a low powered home hub 120522
Quest for a low powered home hub 120522Quest for a low powered home hub 120522
Quest for a low powered home hub 120522
Paul Tanner
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Hisham Mardam-Bey
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
Open-NFP
 
ApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptxApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptx
XinliShang1
 

Similar to Caffe + H2O - By Cyprien noel (20)

Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm Basebands
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
Quest for a low powered home hub 120522
Quest for a low powered home hub 120522Quest for a low powered home hub 120522
Quest for a low powered home hub 120522
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
ApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptxApacheCon 2022_ Large scale unification of file format.pptx
ApacheCon 2022_ Large scale unification of file format.pptx
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 

Recently uploaded (20)

社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 

Caffe + H2O - By Cyprien noel

  • 2. Context - me ● Distributed systems - trading, air control, neural nets ● Multi-GPU Caffe ● Caffe over InfiniBand in Spark Now at UCB ● Caffe: python, help merge forks ● Project: how to generalize work above? ○ Help leverage devices, e.g. in H2O ○ New distributed Caffe, meta graph
  • 5. Problem ● DPDK ● Libfabric ● Accelio ● UCX ● PMEM ● More every week... ● GPUDirect ● NVM Express ● HMM ● CAPI ● CCIX ● HSA ● OFED
  • 6. A single abstraction? ● Intra (device bus) vs inter-machine (networks) ○ E.g. CUDA copy and sockets ○ RDMA blurs local and remote devices ● Communication vs persistence ○ Sockets vs files is orthogonal to location ○ NVMe allows storage on remote disks ● Ephemeral vs durable ○ 3D XPoint & ReRAM are in-between RAM and SSD ○ Intel’s pmem exposes device directly as memory
  • 7. Proposal ● An in-memory file system ○ Location transparent mmap ○ Transactional
  • 8. Example - GPU kernel on data in storage Today BFS ● Client reads HDFS path ● HDFS client resolves worker ● Establishes connection ● Server accepts connection ● Authentication, authorization ● File system operation ● Network transfer ● CUDA transfer data = mmap("/path") gpu_kernel(data)
  • 9. Example - Compute graph in hardware /app/jpgs/* /layers/* /vars/* // Access DB /redis db = redis.open("./redis") ● Everything is a file ○ Using mmap, named pipes, unix sockets ○ E.g. inputs jpgs, weights, activations, counters ● All state and coordination in fs ○ Minimal code, e.g. persistent GPU kernels ○ Location independent → dynamic placement ○ Arbitrary graph splitting, e.g. data & model parallel ML
  • 10. Example - Caffe & H2O ● H2O can write to Caffe input layers ○ Data directly placed GPUs ○ RDMA atomic ops to count dependencies ● Can form pipelines ○ No need for pair wise integrations ○ Uniform monitoring, logging etc. ○ Leverage best device for each step
  • 11. Benefits ● Performance ○ mmap lowest possible overhead ○ Leverages hardware, e.g. GPUDirect, RDMA, NVMe, atomic ops ● Complexity ○ Unified naming, permissioning, distributed state management ○ Hierarchical naming & location transparency → HA, placement ● Security ○ File permissions familiar & kernel level, other networking disabled ○ Mounting folder gives access to well defined resources / capabilities
  • 12. Prototype ● Single master with meta data ● Distributed mmap (CPU) ● Embedded platform (X1) ● Ethernet, InfiniBand
  • 13. Summary ● Caffe progress - multi-GPU in python, merge NV work ● Working on new programming model ○ “Unix philosophy for modern apps” ○ Helps leverage devices, e.g. in H2O ○ Simplifies apps integration & pipelines ○ Distributed version of Caffe first use case