Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep learning on a mixed cluster with deeplearning4j and spark


Published on

Slides for a talk given at the Spark Barcelona Meetup on Dec 9th

Published in: Software
  • Be the first to comment

Deep learning on a mixed cluster with deeplearning4j and spark

  1. 1. Deep learning on a mixed cluster with Deeplearning4j and Spark Barcelona Spark meetup, Dec 9, 2016 (right after NIPS) @huitseeker
  2. 2. Agenda Intro Why Deep Learning on a Cluster Big Data Architecture Deeplearning4j Spark challenges
  3. 3. Introduction : Deep Learning in the trenches today
  4. 4. The bad thing about doing a talk right after NIPS you guys are scary.
  5. 5. The good thing about doing a talk right after NIPS You guys don't need to be told SkyNet is a fantasy (for now).
  6. 6. Paying algorithms Anomaly detection in many forms (bad guys / predictive maintenance / market rally) Fraud detection Network intrusion Fintech secutiries churn prediction Video object detection (security)
  7. 7. Models that are being neglected in benchmarks and implementation efforts LSTMs Autoencoders
  8. 8. How to deal with this in the Spark world ? experiment with trained model application: Tensorframes, what are the deep learning frameworks that let you train ?
  9. 9. Why Deep Learning on a cluster ?
  10. 10. Practically ... let's look at benchmarks
  11. 11. Practically ... let's look at benchmarks
  12. 12. Practically ... let's look at benchmarks
  13. 13. Practically ... let's look at benchmarks
  14. 14. Training, but how ? New Amazon GPU instances
  15. 15. Training, but how ?
  16. 16. Training, but how ?
  17. 17. Cluster training in the enterprise it's really about multi-tenancy & economies of scale a big bunch of machines shared among everybody shares better if only because you can reuse it for other workloads Minor reasons enterprises may not have GPUs
  18. 18. Distributing training basically distributing SGD (R) challenge is AllReduce Communication Sparse updates, async communications
  19. 19. Distributing training : good engineering matters
  20. 20. Cluster training in your (experimentor) case ? it's a fun problem : AllReduce Ultimately solved for people with a large amount of images that solution is not open-source (but at Facebook, Google, Amazon, Microsoft¹, Baidu) ¹: 1-bit SGD is under non-commercial license in CNTK 2.0
  21. 21. Big Data architecture
  22. 22. With a parameter server
  23. 23. With Spark Spark does the initial ETL Spark ingests the nal result In the middle : parameter server.
  24. 24. Spark cluster modes Mesos GPU support merged devices cgroups ! YARN GPU support through tags Spark Standalone : ?
  25. 25. Deeplearning4j
  26. 26. Deeplearning4j the rst commercial-grade, open-source, distributed deep- learning library written for Java and Scala Skymind its commercial support arm
  27. 27. Scienti c computing on the JVM libnd4j : Vectorization, 32-bit addressing, linalg (BLAS!) JavaCPP: generates JNI bindings to your CPP libs ND4J : numpy for the JVM, native superfast arrays Datavec : one-stop interface to an NDArray DeepLearning4J: orchestration, backprop, layer de nition ScalNet: gateway drug, inspired from (and closely following) Keras RL4J : Reinforcement learning for the JVM
  28. 28. With Spark JavaSparkContent sc = ...; JavaRDD<DataSet> trainingData = ...; MultiLayerConfiguration networkConfig = ...; //Create the TrainingMaster instance int examplesPerDataSetObject = 1; TrainingMaster trainingMaster = new ParameterAveragingTrainingMaster.Builder(examplesPerDataSetObjec .(other configuration options) .build(); //Create the SparkDl4jMultiLayer instance SparkDl4jMultiLayer sparkNetwork = new SparkDl4jMultiLayer(sc, networkConfig, trainingMaster); //Fit the network using the training data:;
  29. 29. Spark Challenges
  30. 30. Even if you don't care about Deep learning (from Kazuaki Ishizaki @ IBM Japan) SPARK-6442 : better linear algebra than breeze ND4J will have sparse representations soon
  31. 31. Even if you don't care about Deep learning II Meta- RDDs
  32. 32. Killing the bottlenecks Spark has already changed its networking backend once. better support for parameters servers and their fault tolerance.
  33. 33. A Last Word (from Andrew Y. Ng) get involved ! don't just read papers, reproduce research results Also We're happy to mentor contributions, and there's a book !
  34. 34. Questions ?