Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Caffe + H2O - By Cyprien noel

349 views

Published on

Hardware once reserved to HPC systems is entering the datacenter. Cyprien will describe an effort to help developers leverage its new capabilities. Its integration to H2O, along with tools like Caffe, is accelerating and making the platform more powerful. #h2ony

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Caffe + H2O - By Cyprien noel

  1. 1. Caffe + H2O Cyprien Noel
  2. 2. Context - me ● Distributed systems - trading, air control, neural nets ● Multi-GPU Caffe ● Caffe over InfiniBand in Spark Now at UCB ● Caffe: python, help merge forks ● Project: how to generalize work above? ○ Help leverage devices, e.g. in H2O ○ New distributed Caffe, meta graph
  3. 3. Context - industry
  4. 4. Example
  5. 5. Problem ● DPDK ● Libfabric ● Accelio ● UCX ● PMEM ● More every week... ● GPUDirect ● NVM Express ● HMM ● CAPI ● CCIX ● HSA ● OFED
  6. 6. A single abstraction? ● Intra (device bus) vs inter-machine (networks) ○ E.g. CUDA copy and sockets ○ RDMA blurs local and remote devices ● Communication vs persistence ○ Sockets vs files is orthogonal to location ○ NVMe allows storage on remote disks ● Ephemeral vs durable ○ 3D XPoint & ReRAM are in-between RAM and SSD ○ Intel’s pmem exposes device directly as memory
  7. 7. Proposal ● An in-memory file system ○ Location transparent mmap ○ Transactional
  8. 8. Example - GPU kernel on data in storage Today BFS ● Client reads HDFS path ● HDFS client resolves worker ● Establishes connection ● Server accepts connection ● Authentication, authorization ● File system operation ● Network transfer ● CUDA transfer data = mmap("/path") gpu_kernel(data)
  9. 9. Example - Compute graph in hardware /app/jpgs/* /layers/* /vars/* // Access DB /redis db = redis.open("./redis") ● Everything is a file ○ Using mmap, named pipes, unix sockets ○ E.g. inputs jpgs, weights, activations, counters ● All state and coordination in fs ○ Minimal code, e.g. persistent GPU kernels ○ Location independent → dynamic placement ○ Arbitrary graph splitting, e.g. data & model parallel ML
  10. 10. Example - Caffe & H2O ● H2O can write to Caffe input layers ○ Data directly placed GPUs ○ RDMA atomic ops to count dependencies ● Can form pipelines ○ No need for pair wise integrations ○ Uniform monitoring, logging etc. ○ Leverage best device for each step
  11. 11. Benefits ● Performance ○ mmap lowest possible overhead ○ Leverages hardware, e.g. GPUDirect, RDMA, NVMe, atomic ops ● Complexity ○ Unified naming, permissioning, distributed state management ○ Hierarchical naming & location transparency → HA, placement ● Security ○ File permissions familiar & kernel level, other networking disabled ○ Mounting folder gives access to well defined resources / capabilities
  12. 12. Prototype ● Single master with meta data ● Distributed mmap (CPU) ● Embedded platform (X1) ● Ethernet, InfiniBand
  13. 13. Summary ● Caffe progress - multi-GPU in python, merge NV work ● Working on new programming model ○ “Unix philosophy for modern apps” ○ Helps leverage devices, e.g. in H2O ○ Simplifies apps integration & pipelines ○ Distributed version of Caffe first use case

×