Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Flux - Open Machine Learning Stack / Pipeline

1,120 views

Published on

http://flux-project.org - End-to-end ML / AI pipeline for distributed training/eval on big data.

Published in: Data & Analytics
  • https://github.com/flux-project/flux
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Flux - Open Machine Learning Stack / Pipeline

  1. 1. Flux – Open Machine Learning Stack Training & Test data Compute + Network + Storage ML Development & Catalog & REST API ML-Specialists Feature Engineering Training Evaluation Re-Simulation Testing CaffeOnSpark Sample Model Prediction Batch Regression Cluster Dataset Correlation Centroid Anomaly Test Scores  Native format support  Scale-out architecture  Multi user support  Resource management  Job scheduling  Speed-up workload  Apache license 2.0 Models f(x) Reports Insights Decisions http://flux-project.org
  2. 2. Train and evaluate machine learning models at scale Single machine Data center How to run more experiments faster and in parallel? How to share and reproduce research? How to go from research to real products?
  3. 3. Distributed Machine Learning Data Size Model Size Model parallelism Single machine Data center Data parallelism training very large models exploring several model architectures, hyper- parameter optimization, training several independent models speeds up the training
  4. 4. Compute Workload for Training and Evaluation I/O intensive Compute intensive Single machine Data center
  5. 5. I/O Workload for Simulation and Testing I/O intensive Compute intensive Single machine Data center
  6. 6. Machine Learning Cycle Data collection for training/test Feature engineering I/O workload Model development and architecture Compute workload I/O workload Training and evaluation Re- Simulation and Testing Scaling and monitoring Model deployment versioning 1 2 3 Model tuning
  7. 7. Flux – Open Machine Learning Stack Compute + Network + Storage ML Development & Catalog & REST API ML-Specialists Feature Engineering Sample Model Prediction Batch Regression Cluster Dataset Correlation Centroid Anomaly Test Scores  Mainly open source  No vendor lock in  Scale-out architecture  Multi user support  Resource management  Job scheduling  Speed-up training  Speed-up simulation Training & Test data Models f(x) Reports Insights Decisions
  8. 8. Feature Engineering + Hadoop InputFormat and Record Reader for Rosbag + Process Rosbag with Spark, Yarn, MapReduce, Hadoop Streaming API, … + Spark RDD are cached and optimized for analysis Ros bag Processing Engine Computer Network Storage Advanced Analytics RDD Record Reader RDD DataFrame, DataSet SQL, Spark APIs NumPy Ros Msg
  9. 9. Native format support for ROS (Robot Operating System)
  10. 10. Native format support for ROS (Robot Operating System)
  11. 11. Flux – Open Machine Learning Stack Compute + Network + Storage ML Development & Catalog & REST API ML-Specialists Training Evaluation CaffeOnSpark Sample Model Prediction Batch Regression Cluster Dataset Correlation Centroid Anomaly Test Scores  Mainly open source  No vendor lock in  Scale-out architecture  Multi user support  Resource management  Job scheduling  Speed-up training  Speed-up simulation Training & Test data Models f(x) Reports Insights Decisions
  12. 12. Training & Evaluation + Tensorflow ROSRecordDataset + Protocol Buffers to serialize records + Save time because data conversion not needed + Save storage because data duplication not needed Training Engine Machine Learning Ros bag Computer Network Storage ROS Dataset Ros msg
  13. 13. Flux – Open Machine Learning Stack Compute + Network + Storage ML Development & Catalog & REST API ML-Specialists Re-Simulation Testing Sample Model Prediction Batch Regression Cluster Dataset Correlation Centroid Anomaly Test Scores  Mainly open source  No vendor lock in  Scale-out architecture  Multi user support  Resource management  Job scheduling  Speed-up training  Speed-up simulation Training & Test data Models f(x) Reports Insights Decisions
  14. 14. Re-Simulation & Testing + Use Spark for preprocessing, transformation, cleansing, aggregation, time window selection before publish to ROS topics + Use Re-Simulation framework of choice to subscribe to the ROS topics Engine Re-Simulation with framework of choice Computer Network Storage Ros bag Ros topic core subscribe publish
  15. 15. Time Travel fold(left) t fold(right) reduce/ shuffle
  16. 16. 16 DEMO
  17. 17. DEMO: 2016 Lincoln MKZ Camera 3x Blackfly GigE Camera, 20 Hz Lidar Velodyne HDL-32E, 9.5 Hz IMU Xsens, 400 Hz GPS 2x fixed, 1 Hz CAN bus, 1,1 kHz Data 223 GB in ROS bags Driving 70 minutes in Mountain View
  18. 18. All in one Docker Image + Ansible script 2.3 + Ubuntu 16.04.2 LTS + HDFS 2.7.3 + Spark on Yarn 2.1.0 + ROS core Kinetic Kame + NVIDIA GPU driver 375.39 for Titan X Pascal 12 GB + Tensorflow 1.0.1 / Keras 2.0.3 + Python 2.7.12 depends on ROS, Scala 2.11, Java 1.8
  19. 19. Machine Learning Workflow Ingest data Data Preprocessing Search Analysis Model Training Re- simulation Reports Results Model Deployment Training data Model Testing Train Test Loop Test data Model Feedback Loop
  20. 20. Ingest data Data Preprocessing Search Analysis Model Training Simulation Reports Results Model Deployment Training data Model Testing Train Test Loop Test data Model Feedback Loop Check that the rosbag file version is V2.0 $ java -jar lib/rosbaginputformat_2.11-0.1.0-SNAPSHOT.jar --version -f data/HMB_1.bag #ROSBAG V2.0 BagRecord(Header(69,Map(chunk_count -> 857, index_pos - > 704124491, conn_count -> 39, op -> 3))… Extract the index from rosbag file $ java -jar lib/rosbaginputformat_2.11-0.1.0-SNAPSHOT.jar -f data/HMB_1.bag > data/HMB_1.json -rw-r--r-- 1 root root 672M May 3 09:53 data/HMB_1.bag -rw-r--r-- 1 root root 8.3K May 4 10:26 data/HMB_1.bag.json Copy the rosbag in HDFS hdfs dfs -put data/HMB_1.bag data/ hdfs dfs -ls data/ Found 1 items -rw-r--r-- 1 root supergroup 704510416 2017-05-04 10:33 data/HMB_1.bag
  21. 21. Ingest data Data Preprocessing Search Analysis Model Training Simulation Reports Results Model Deployment Training data Model Testing Train Test Loop Test data Model Feedback Loop Process the ros bag file in Spark using the RosbagInputFormat fin = sc.newAPIHadoopFile("hdfs://0.0.0.0:9000/user/root/data/HMB_1.bag", "org.foss.RosbagInputFormat", "org.apache.hadoop.io.LongWritable", "org.apache.hadoop.io.BytesWritable", conf={"RosbagInputFormat.chunkIdx":"./HMB_1.bag.idx.json"}) Count the rosbag raw chunks fin.count() 857 Count messages grouped by message type from all chunks (on all blocks of the bag from all servers) rdd = fin.map(chunk_map) rdd.flatMap(chunk_types).reduceByKey(add).collect() [('Connection', 39), ('Index', 25943), ('Message', 910943)]
  22. 22. Ingest data Data Preprocessing Search Analysis Model Training Simulation Reports Results Model Deployment Training data Model Testing Train Test Loop Test data Model Feedback Loop Collect the connections from all Spark partitions of the bag file into the Spark driver connections = rdd.flatMap(lambda r: r[1]['x07']).collect()[(k['conn'],k['topic']) for k in connections] [(0, '/can_bus_dbw/can_rx'), (1, '/vehicle/dbw_enabled'), (2, '/ecef/'), (3, '/fix'), (4, '/imu/data)… Aggregate values on each channel conn_d = dict((k['conn'],k) for k in connections) histogram = rdd.flatMap(lambda r: r[1]['x02']).map(lambda r: (conn_d[r['conn']]['topic [('/vehicle/joint_states', 33157), ('/vehicle/suspension_report', 11060), ('/vehicle/twist_controller/parameter_updates', 1), ('/vehicle/steering_report', 11040), ('/velodyne_packets', 2110), ('/vehicle/tire_pressure_report', 442)...
  23. 23. a ng Model Training Simulation Reports Results Model Deployment Training data Model Testing Train Test Loop Test data Model Feedback Loop Collect the connections from all Spark partitions of the bag file into the Spark driver fig, ax = plt.subplots(figsize=(17,9)) ax.bar(np.arange(len(histogram)), map(itemgetter(1), histogram)) ax.set_xticks(np.arange(len(histogram))) ax.set_xticklabels(map(itemgetter(0), histogram), rotation=90) plt.show()
  24. 24. a ng Model Training Simulation Reports Results Model Deployment Training data Model Testing Train Test Loop Test data Model Feedback Loop Deserialize a single record r = {u'conn': 11, 'data': 'xe3xbc6x00xd68.Xx89,xc5.x04x00x00x00/imuVxa9.Xx80xcei)x08x0 0 'data_length': 40, 'ftell': 30303L, u'op': 'x02', u'time': 1479424214} msg_type = _get_message_type(conn_d[r['conn']]['data']) msg = msg_type() msg.deserialize(r['data']) header: seq: 3587299 stamp: secs: 1479424214 nsecs: 784673929 frame_id: /imu time_ref: secs: 1479453014 nsecs: 694800000 source: UTC time Sample training and test data imu_all = rdd.flatMap(partial(msg_map, func=f, conn=conn_d[5])) imu_train = imu_all.sample(False, 0.7) imu_test = imu_all.sample(False, 0.3)
  25. 25. a ng Model Training Simulation Reports Results Model Deployment Training data Model Testing Train Test Loop Test data Model Feedback Loop Deserialize Image data from RDD from PIL import Image res = rdd.flatMap(partial(msg_map, func=lambda r: r.data, conn=conn_d[38])).take(2) Image.open(BytesIO(res[0]))
  26. 26. Model Training Re- Simulation Model Deployment Model Testing Train Test Loop dback Loop Keras Model on data from Rosbag RDD x = Conv2D(8, (3, 3))(img_in) x = Activation('relu')(x) x = MaxPooling2D(pool_size=(2, 2))(x) x = Conv2D(16, (3, 3))(x) x = Activation('relu')(x) x = MaxPooling2D(pool_size=(2, 2))(x) [...] merged = Flatten()(x) x = Dense(256)(merged) x = Activation('linear')(x) x = Dropout(.2)(x) angle_out = Dense(1, name='angle_out')(x) model = Model(inputs=[img_in], outputs=[angle_out]) model.compile(optimizer='adam', loss='mean_squared_error') inp = np.array([np.array(Image.open(BytesIO(k))) for k in df['img']])out = df["steering_wheel_angle"] model.fit(inp, out, epochs=200, batch_size=2)
  27. 27. n Model Deployment Model Testing ain Test Loop Predict steering angle from the right camera topic m_yaml = model.to_yaml() m_weights = model.get_weights() def f(r): from keras.models import model_from_yaml import pandas as pd import numpy as np from PIL import Image from io import BytesIO m = model_from_yaml(m_yaml) m.set_weights(m_weights) return m.predict(np.array( Image.open(BytesIO(r.data)))[np.newaxis,:]) fin.flatMap( partial(msg_map, func=f, conn=conn_d['/right_camera/image_color/compressed'])) .take(10)
  28. 28. Model Training Re- Simulation Model Deployment Model Testing Train Test Loop dback Loop Publish GPS Messages from Spark RDD to a Topic def f(r): import rospy from sensor_msgs.msg import NavSatFix def talker(): pub = rospy.Publisher('chatter', NavSatFix, queue_size=10) rospy.init_node('talker', anonymous=True) rate = rospy.Rate(10) # 10hz while not rospy.is_shutdown(): pub.publish(r) #rate.sleep() break try: talker() except rospy.ROSInterruptException: pass return 'Done.' rdd.flatMap(partial(msg_map, func=f, conn=conn_d[27])).collect() ['Done.', 'Done.', 'Done.', 'Done.', 'Done.', 'Done.',….
  29. 29. Model Training Re- Simulation Model Deployment Model Testing Train Test Loop dback Loop
  30. 30. Model Deployment Testdrive...
  31. 31. Flux – Open Machine Learning Stack + Native format support e.g. rosbags (Robot Operating System) + End-to-end machine learning pipeline + Layered API (provisioning, operating, processing) + Optimized for scale-out based on cost, time, space + One-click on-premise/cloud deployment + Apache License 2.0 – release Q4/2017 + http://flux-project.org
  32. 32. Flux Apache License 2.0 release Q4/2017 http://flux-project.org

×