Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bringing Deep Learning into production - Paolo Platter, AgileLab


Published on

We will go deep into a SWOT analysis about deep learning frameworks in order to choose the best one for your needs. We will also cover the process to smoothly bring deep learning into an enterprise environment and the reason why H2O could be a good choice for this purpose.

Paolo Platter is a Telematics Engineer living in Turin and he is CTO & co-founder of Agile Lab, an emerging Big Data consulting company. He is passionate about software architectures & life cycles. He is also a certified Cassandra Architect and Scala Developer.

Published in: Data & Analytics
  • Be the first to comment

Bringing Deep Learning into production - Paolo Platter, AgileLab

  1. 1. Brief introduction • CTO & co-founder of Agile Lab • Data & Tech addicted • Contributor of Spark Notebook • Spark early adopter • Certified Cassandra Architect • DeepLearning enthusiast
  2. 2. Who is Agile Lab ? GO BIG (data) or GO HOME
  3. 3. What we do Applications High scalability Decision Support Systems data engineering, data mining and data «meaning» Big Data Strategies Training Reactive, NoSQL, Big Data, Machine learning
  4. 4. Why Deep Learning
  5. 5. Deep Learning is trending
  6. 6. What is Deep Learning • Deep learning is just another name for artificial neural networks • An algorithm is deep if the input is passed through several non-li nearity before being output • Deep learning is discovering the features that best represent the problem, rather than just a way to combine them
  7. 7. Deep Learning: Use cases
  8. 8. Do you want start with Deep Learning ? Let’s choose the right tools !!
  9. 9. Deep Learning Frameworks • Deeplearning4J • TensorFlow • Caffe • Theano • Torch • Spark ML MultilayerPerceptrons • H2O • CNTK • MatLab • maxDNN And many others
  10. 10. How to choose Background Target Environment Vision
  11. 11. Background Productivity !! • Scala • Java Big Data Engineer • Java • Python Math Engineer • R • Python Statistician
  12. 12. Target Environment Trained model should be deployable !! Trained Model Dev Env Prod Env
  13. 13. Target Environment Prod Env Dev Env Training Data Cleaning ETLScheduling ML Pipeline - Track model performance over time - Care about SLA - Continous tweaks
  14. 14. Enterprise Architecture HADOOP Online DataStore Enterprise Service BUS DataIntegrationLayer Data Integration Layer DataIntegrationLayer External Sources ANALYTICS VALUE ADDED SERVICES API SERVICES Internal Business Sources Internal System Sources DeepLearning
  15. 15. Easy Wins Training pipeline should run on Spark or Hadoop Trained Model should be represented in Java objects
  16. 16. Vision: keep in mind Scaling High Level dynamic languages are incredibly productive for prototyping and data exploration Scaling on larger data sets quickly runs into performance limitations Keep in mind scaling requirements from beginning
  17. 17. Vision: simplify the pipeline Copy & Sample data from Dev Env to Data Scientist Env Prototype in Python or R Train model Predict on validation Data Translate Model to match Prod Env  Java, MapReduce, Spark Deploy training pipeline and model
  18. 18. Easy Wins Datascientists should work directly on distributed environment Datascientist and big data engineers should co-operate on the same platform
  19. 19. SWOT Analysis
  20. 20. Tensor Flow Strenghts: - Powered By Google - Nice UI Weaknesses: - Powered By Google - No support for “inline” matrix operations  Slow Opportunities: - Awesome community Threats: - No Scala or Java integration - No commercial support
  21. 21. Theano Strenghts: - Grand Daddy of deep learning - RNN and CNN - Computational graph abstraction - Python Weaknesses: - No support for Hadoop or Spark - No plug & play nets Opportunities: - Great community Threats: - No Scala or Java integration - No commercial support
  22. 22. Torch Strenghts: - GPU support - Lots of pretrained models and packages - Easy to use Weaknesses: - Lua language Opportunities: - Backed by DeepMind and Facebook Threats: - No Scala or Java integration - No commercial support
  23. 23. Caffè Strenghts: - C++ & Python - Good Performance - GPU Support Weaknesses: - Focused on image processing Opportunities: - Backed by Yahoo for Spark integration - Gpu Clustering Threats: - No commercial support
  24. 24. DeepLearning4j Strenghts: - GPU support - Java and Scala - Full DNN set - Support Hadoop, Spark & Akka Weaknesses: - Not for dummies Opportunities: - Commercial support - SkyMind Threats: - Not so sexy for DataScientist because of Java/Scala
  25. 25. H2O • Easy to use Web UI • Multi language API • Run directly on HDFS or S3 • Model is Java PoJo • Big Data Ready • Really Fast • Compressed data • Regularization • Grid Search • GPU is still on roadmap • CNN and RNN too
  26. 26. H2O - Flow
  27. 27. H20 – Sparkling Water • Python, R and Scala API • Best Kagglers use H20 • Tons of tools for profiling and tu ning • Spark leverage • Best in class algorithms – battle tested • Regolarization • Grid search
  28. 28. H20 – Sparkling Water
  29. 29. Workflow POJO Java Training Set Embeddable in: • J2EE App • Spark Job • MR Job • DWH as UDF training
  30. 30. Spark as middleware Using Spark as middleware, you can leverage : • Deeplearning4J • H2O • TensorFlow ( Arimo Extension) • Caffe ( Yahoo Extension ) • ML MultilayerPerceptrons and future implementations NO tech provider Lock-in
  31. 31. Our Stack for Enterprise • Ready for Enterprise and Hadoop World • Deployable into Java Env • Notebook ( Flow ) • H2O for out of the box algorithms • DeepLearning 4J for advanced DNN and n-dimension array manipulation • Good usability for both DataScientists and Big Data Engineers • Enterprise Support along the whole stack
  32. 32. Thanks! We are hiring !