Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to apache horn (incubating)

4,702 views

Published on

Apache Horn is neuron-centric programming model and execution framework, inspired by Google's DistBelief, supports both data and model parallelism for training large models with massive datasets.

Published in: Software

Introduction to apache horn (incubating)

  1. 1. Apache Horn (Incubating) a Large-scale Deep Learning Platform Edward J. Yoon @eddieyoon Oct 15, 2015 @ R3 Diva-Hall, Samsung Electronics
  2. 2. I am .. ● Member of Apache Software Foundation ● PMC member and committer, or Mentor of ○ Apache Incubator, ○ Apache Hama, Apache Horn, Apache MRQL, ○ and Apache Rya, Apache BigTop. ● Cloud Tech Lab, Software R&D Center. ○ HPC Cloud (Network Analysis, ML & DNN)
  3. 3. What’s Apache Horn? Horn [hɔ:n]: 얼(혼) 魂 = Mind ● Horn is a clone project of Google’s DistBelief, supports both data and model parallelism. ○ Apache Incubator Project (Since Sep 2015) ○ 9 initial members are from Samsung Electronics, Microsoft, Cldi Inc, LINE plus, TUM, KAIST, …, etc.
  4. 4. Google’s DistBelief ● GPUs are expensive, both to buy and to rent. ● Most GPUs can only hold a relatively small amount of data in memory and CPU-to-GPU data transfer is very slow. ○ Therefore, the training speed-up is small when the model does not fit in GPU memory. ● DistBelief is a framework for training deep neural networks that avoids GPUs-only approach (for the above reasons) and solves the problems with a large number of examples and dimensions (e.g., high-resolution images).
  5. 5. Google’s DistBelief ● It supports both Data and Model Parallelism ○ Data Parallelism: The training data is partitioned across several machines each having its own replica of the model. Each model trains with its partition of the data in parallel. ○ Model Parallelism: The layers of each model replica are distributed across machines.
  6. 6. DistBelief: Basic Architecture Each worker group performs minibatch in BSP paradigm, and interacts with Parameter Server asynchronously.
  7. 7. What’s BSP? ● Bulk Synchronous Parallel It was developed by Leslie Valiant of Harvard University during the 1980s. ● Iteratively: a. Local Computation b. Communication (Message Passing) c. Global Barrier Synchronization
  8. 8. DistBelief: Batch Optimization Coordinator 1) finds stragglers (slow tasks) for better load balancing and resource usage. It similar to Google MapReduce’s “Backup Tasks” 2) reduces communication overheads between the central Parameter Server and workers something like Aggregators.
  9. 9. As a result: ● CPU cluster to train deep networks significantly faster than a GPU, w/o limitation on the max size of model. ○ CPU cluster is 10x faster than a GPU. ● Trained a model with over 1 billion parameters to achieve better than state-of-the-art performance on ImageNet challenge. Nov 2012: IBM simulates 530 billion neurons, 100 trillion synapses * 1,572,864 processor cores, 1.5 PB memory, and 6,291,456 threads.
  10. 10. Wait, .. Why do we need this? ● Deep learning is likely to spur other applications beyond speech and image recognition in the nearer term. ○ e.g., medicine, manufacturing, and transportation.
  11. 11. and, it’s a Closed Source Software ● We needs to solve size matters (training set and the size of neural networks), but many OSS such as Caffe, DeepDist, Spark MLlib, Deeplearning4j, and NeuralGiraph are data or model parallel only. ● So, we started to clone the Google’s DistBelief, called Apache Horn (Incubating).
  12. 12. The key idea of implementation ● .. is to use existing OSS distributed systems ○ Apache Hadoop: Distributed File System, Resource Manager. ○ Apache Hama: general-purpose BSP computing engine on top of Hadoop, which can be used for Both data-parallel and graph-parallel in flexible way.
  13. 13. Apache Hama: BSP framework BSP framework on Hama or YARN Hadoop HDFS Task 1 Task 2 Task 3 Task N... Like MapReduce, Apache Hama BSP framework schedules tasks according to the distance between the input data of the tasks and request nodes. BSP tasks are globally synchronized after performing computations on local data and communication actions.
  14. 14. Global Regional Synchronization BSP framework on Hama or YARN Hadoop HDFS Task 1 Task 2 Task 3 Task 4 Like MapReduce, Apache Hama BSP framework schedules tasks according to the distance between the input data of the tasks and request nodes. All tasks within the same group are synchronized with each others. Each group works asynchronously as independent BSP job. ... Task 6 Task 5
  15. 15. Async mini-batches using Regional Synchronization BSP framework on Hama or YARN Hadoop HDFS Task 1 Task 2 Task 3 Task 4 Like MapReduce, Apache Hama BSP framework schedules tasks according to the distance between the input data of the tasks and request nodes. ... Task 5 Task 6 Each group performs minibatch in BSP paradigm, and interacts with Parameter Server asynchronously. Parameter Swapping Parameter Server Parameter Server
  16. 16. BSP framework on Hama or YARN Hadoop HDFS Task 1 Task 2 Task 3 Task 4 Like MapReduce, Apache Hama BSP framework schedules tasks according to the distance between the input data of the tasks and request nodes. ... Task 5 Task 6 One of group works as a Coordinator Each group performs minibatch in BSP paradigm, and interacts with Parameter Server asynchronously. Parameter Swapping Async mini-batches using Regional Synchronization Parameter Server Parameter Server
  17. 17. Neuron-centric Programming APIs User-defined neuron-centric programming APIs: The activation and cost functions computes the propagated information, or error messages and sends its updates to Parameter Server (but not fully designed yet). Similar to Google’s Pregel.
  18. 18. Job Configuration APIs /* * Sigmoid Activation Function */ public static class Sigmoid extends ActivationFunction { public double apply(double input) { return 1.0 / (1 + Math.exp(-input)); } } ... public static void main(String[] args) { ANNJob ann = new ANNJob(); // Initialize the topology of the model ann.addLayer(int featureDimension, Sigmoid.class, int numOfTasks); ann.addLayer(int featureDimension, Step.class, int numOfTasks); ann.addLayer(int featureDimention, Tanh.class, int numOfTasks); … ann.setCostFunction(CrossEntropy.class); .. }
  19. 19. Job Submission Flow BSP framework on Apache Hama or YARN clusters Task 1 Task 4 Task 7 Task 2 Task 3 Task 5 Task 6 Task 8 Task 9 Parameter Server Parameter Server Parameter Swapping One of worker group works as a Coordinator Hadoop HDFS Data Parallelism Model Parallelism Apache Horn Client and Web UI User’s ANN Job
  20. 20. Horn Community ● https://horn.incubator.apache.org/ ● https://issues.apache.org/jira/browse/HORN ● Mailing lists ○ dev-subscribe@horn.incubator.apache.org

×