Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Autodeploy a complete end-to-end machine learning pipeline on Kubernetes using tools like Spark, TensorFlow, HDFS, etc. - it requires a running Kubernetes (K8s) cluster in the cloud or on-premise.

Published in: Data & Analytics
  • Be the first to comment


  2. 2. Single machineML hero Small data
  3. 3. Single machineSmall data Single machineSmall data ML hero ML hero
  4. 4. More Data + Bigger Models + More Computation = Better Results in Machine Learning
  5. 5.
  6. 6. Single machineBig data Single machineBig data ML hero ML hero
  7. 7. Cluster Big data Big data ML hero ML hero
  8. 8. Single machine Data center 1 user Many users Megabyte of data Petabyte of data Local filesystem Distributed filesystem Exclusive use Resource sharing, scheduling, queueing, resource isolation Scale up Scale out pip install ... Automating deployment - Operations, monitoring, ...
  9. 9. Development cycle for autonomous vehicles 1 Collect sensors data 3 Autonomous Driving 2 Model Engineering Data Logger Control Unit Big Data Trained Model Data Center
  10. 10. Sensors Udacity Lincoln MKZ Camera 3x Blackfly GigE Camera, 20 Hz Lidar Velodyne HDL-32E, 9.5 Hz IMU Xsens, 400 Hz GPS 2x fixed, 1 Hz CAN bus, 1,1 kHz Robot Operating System Data 3 GB per minute
  11. 11. Lidar Velodyne HDL-32E
  12. 12. ROS bag data structure
  13. 13. Robot Operating System + Popular open source robotics framework + Reliable distributed architecture + Wide use in the robotics research community + Huge selection of “off-the-shelf” software packages for hardware/algorithms/etc. + Used by Bosch, BMW, KUKA, Google, Siemens, etc.
  14. 14. 17 1 Collect sensors data 3 Autonomous Driving 2 Model Engineering Data Logger Control Unit Big Data Trained Model Data Center Development cycle for autonomous vehicles
  15. 15. 18 Carla
  16. 16. Ingest data Data Preprocessing Feature Engineering Model Training Simulation Reports Results Model Deployment Training data Model Validation Train Test Loop Test data Model Feedback Loop
  17. 17. Train and evaluate machine learning models at scale Single machine Data center How to run more experiments faster and in parallel? How to share and reproduce research? How to go from research to real products?
  18. 18. Distributed Machine Learning Data Size Model Size Model parallelism Single machine Data center Data parallelism training very large models exploring several model architectures, hyper- parameter optimization, training several independent models speeds up the training
  19. 19. Compute Workload for Training and Evaluation I/O workload Compute workload Single machine Data center
  20. 20. I/O Workload for Simulation and Testing I/O workload Compute workload Single machine Data center
  21. 21. Flux – Open Machine Learning Stack Training & Test data Compute + Network + Storage Deploy model ML Development & Catalog & REST API ML-Heros Feature Engineering Training Evaluation Re-Simulation Testing CaffeOnSpark Sample Model Prediction Batch Regression Cluster Dataset Correlation Centroid Anomaly Test Scores  Mainly open source  No vendor lock in  Scale-out architecture  Multi user support  Resource management  Job scheduling  Speed-up training  Speed-up simulation
  22. 22. Feature Engineering + Hadoop InputFormat and Record Reader for Rosbag + Process Rosbag with Spark, Yarn, MapReduce, Hadoop Streaming API, … + Spark RDD are cached and optimized for analysis Ros bag Processing Engine Computer Network Storage Advanced Analytics RDD Record Reader RDD DataFrame, DataSet SQL, Spark APIs NumPy Ros Msg
  23. 23. Hadoop InputFormat for ROS bags
  24. 24. Training & Evaluation + Tensorflow ROSRecordDataset + Protocol Buffers to serialize records + Save time because data conversion not needed + Save storage because data duplication not needed Training Engine Machine Learning Ros bag Computer Network Storage ROS Dataset Ros msg
  25. 25. Re-Simulation & Testing + Use Spark for preprocessing, transformation, cleansing, aggregation, time window selection before publish to ROS topics + Use Re-Simulation framework of choice to subscribe to the ROS topics Engine Re-Simulation with framework of choice Computer Network Storage Ros bag Ros topic core subscribe publish
  26. 26. Flux Open Machine Learning Stack Apache License 2.0