Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Udacity-Didi Challenge Finalists


Published on

The presentations by the five finalist teams in the $100,000 Udacity-Didi Challenge.

Published in: Automotive
  • Hey guys! Who wants to chat with me? More photos with me here 👉
    Are you sure you want to  Yes  No
    Your message goes here

Udacity-Didi Challenge Finalists

  1. 1. ICTANS
  2. 2. Yiming Zeng, Yu Hu, Qiankun Tang, Shice Liu, Beibei Jin Autonomous Navigation System Research Group State Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences August 31st 2017 Position and Orientation Estimation of Cars and Pedestrians
  3. 3. ANS@ Sensor and Data Sensors setup Sensors range For Round2 test data, obstacles detectable in sensors are listed as follow ford01 ford02 Ford03 ford0 4 ford05 ford0 6 for07 mustang01 pedestrian 100% 42% 49% 100% 68.6% 70.5% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 71.7% 78.2% 100% 54.8% 47.2% 83.8% 59.1% 64.6% 74.74% 14.9% - Coordinate system transformation(used in Round1) To detect cars in various ranges, different sensor data were utilized in Round1. Trading off detection accuracy and 10Hz constraint, only velodeny data was used to detect obstacle in Round2
  4. 4. ANS@ Related work 3D detection 2D detection Multi-view[3] Region-based Region-free RGB-D[2] YOLO[5] 3D conv[1] Faster RCNN[4] R-FCN[6] SSD[7]
  5. 5. ANS@ Round1 Framework Camera Lidar point cloud Radar Radar msgs R-FCN detector 1 R-FCN detector2 R-FCN detector3 Front view Bird view Fusion Front view Bird view Camera imagePositions of car detected in coordinates are projected to the camera coordinate and determined by scores
  6. 6. ANS@ Round2 Framework Encoding 3D point cloud into compact representation End-to-end regressing to estimate position Calculating height of obstacle center in 3D point cloud Tracking and correcting by Kalman filter point cloud 1 2 3 4 correction prediction
  7. 7. ANS@ Representation for 3D point cloud Bird view: • Average height • Height maximum • Variance of height • Density • Gradient • Intensity Point cloud Bird view Different representation • Height maps • Density • Average height • Density • Gradient • Average height • Density
  8. 8. ANS@ CNN model Training and validation We eliminated Lidar frames that have wrong GPS positions, then randomly picked frames from these good ones and projected them to bird view as training data and validation data. car pedestrian Training 12436 7847 Validation 1455 872 AP 0.8169 0.6278 Caffe SDG base_lr: 0.001 display: 20 lr_policy: "step" gamma: 0.1 momentum: 0.9 weight_decay: 0.001 stepsize: 20000 conv ResNet-50 OHEM[8] RFCN[6] was used to detect obstacles in bird view GPS error
  9. 9. ANS@ Correction Tracking and Correcting Comparison between DNN and KF Decision KF Action CNN output is near to KF prediction CNN output Update CNN output with high confidence level is far from KF prediction CNN output Reinitialize CNN output with low confidence level is far from KF prediction KF prediction Update • For pedestrian detection, Kalman filter were used to validate and correct • For car detection, the Kalman filter didn’t significantly improve the score, so we didn’t use it Prediction Synchronization* We tried two strategies: • Nearest interpolation • Linear interpolation There is no obvious difference between the two. *we think Kalman filter will achieve better results, however, we don’t have enough time to try it.
  10. 10. ANS@ Result Score:0.332 Rank: 5
  11. 11. ANS@ Reference [1] B. Li, “3D Fully Convolutional Network for Vehicle Detection in Point Cloud,” Robot. Sci. Syst., Nov. 2016. [2] J. Schlosser, C. K. Chow, and Z. Kira, “Fusing LIDAR and images for pedestrian detection using convolutional neural networks,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 2198–2205. [3] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-View 3D Object Detection Network for Autonomous Driving,” arxiv, 2016. [4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2015. [5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788. [6] W. Liu et al., “SSD: Single Shot MultiBox Detector,” arxiv, 2016. [7] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object Detection via Region-based Fully Convolutional Networks,” arxiv, 2016. [8] A. Shrivastava, A. Gupta, and R. Girshick, “Training Region-Based Object Detectors with Online Hard Example Mining,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 761–769.
  12. 12. ANS@ Thanks! Q&A
  13. 13. ANS@ conv ResNet-50 R-FCN OHEM RFCN, Region based obstacle detection, which incorporates location information into feature map
  14. 14. Tea
  15. 15. Team Tea Andres Torrubia Ali Aliev
  16. 16. Agenda Team Pipeline overview Obstacle segmentation Obstacle localization Filtering Implementation + closing thoughts
  17. 17. Team background - Ali Aliev→ computer vision engineer; ROS, filtering, point cloud preprocessing, visualization - Andres Torrubia → Udacity SDCND Student, no previous ROS/Lidar experience before challenge, devised and implemented deep learning architecture (segmentation + localization). - We merged teams 2 weeks before final deadline.
  18. 18. Pipeline design - Most state of the art solutions are based on building image-like features from lidar and convolutional networks (YOLO, SSD, FCNs, etc.), e.g. - We wanted to do something different, original and new.
  19. 19. Pipeline Lidar: n x 5 (x y z i r) (n ~ 30,000) Lidar: 32 x N x 3 (d i h) (N = 2048 ) Obstacl e segmenter Segmented obstacle (m points) Clusteri ng and filtering Obstacle points: M x 4 (x y z i) (M = 2048) localizer Obstacle pose: (x y z yaw) and size (h l w) (10 hz) Filtration Radar 1 x 3 (x y vx vy) (20 Hz) Obstacle Obstacle pose: (x y z yaw) and size (h l w) (24 hz)
  20. 20. Obstacle segmentation
  21. 21. Obstacle segmentation - 32 signals x 3 ⇒ distance intensity height - Nearest neighbor interpolation - Sampled @ 2048 points from -π to π
  22. 22. (512) ectional x₀ x₁ x₆₃ (48) - 2048 samples split in 32 sectors (64 samples each) - 16 sequences (we use 16 rings out of the 32 from the HDL32e) - Each x is a vector of 16 x 3 dimensions: 16 rings x 3 (d i h) - GRU = Gated Recurrent Unit (Cho et al. 2014) - Last GRU layer uses sigmoid and dropout 0.1, rest use tanh and dropout 0.2 - 2.6m parameters, trained w/ binary x-entropy using release3 data + augmentation Obstacle segmentation GRU …. GRU GRU GRU …. GRU GRU …. GRU GRU GRU bidir (16) (256) (512) GRU …. GRU GRU y₀ y₁ y₆₃
  23. 23. mx4 nx4 m > 1 x y z mean = 0 n = 1024 Resampling mlp mlp mlp mlp mlp mlp mlp nx64 mlp mlp mlp mlp mlp mlp mlp nx128 mlp mlp mlp mlp mlp mlp mlp nx256 mlp mlp mlp mlp mlp mlp mlp nx2048 shared weights Max pool 2048 Latent space FC 0.1 dropout 64 FC 0.2 dropout Obstacle localization 3 FC 512 FC 256 Centroid 64 FC 0.2 dropout 3 FC Size 512 31 distance 128 FC 0.1 dropout FC FC 256 FC 32 FC 1 64 Yaw -See: PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, CVPR 2017, Qi et al -Trained on fixed release 2 data (vehicle) + r3 (pedestrian) - Size, centroid: l2 loss - Yaw: angle loss Activation → tanh(.) * π/2
  24. 24. Filtering: obstacle pose We used Unscented Kalman Filter Lidar-fixed coordinate frame Input: lidar (x, y, z, yaw), radar (x, y, vx, vy), camera ticks Output: pose (x, y, z, yaw) Internal state: S = (x, vx, ax, y, vy, ay, z, vz, az, yaw) Noisy input rejection based on S covariance Resetting filter when S covariance too high Kalman Filter Lidar@10HZ Radar@20HZ Tick@24HZ Noise rejection Pose@24HZ only predictpredict & update
  25. 25. Filtering: obstacle pose Fusion details: Prefer lidar measurements over radar measurement at close distances Use “nearest neighbour” to pick a radar measurement of the obstacle radar only radar & lidar
  26. 26. Filtering: obstacle bounding box Car: exponential moving average for bbox length, width, height Trick: shift radar radius by a constant value to better fit car bbox centroid Pedestrian: constant cylinder radius and height (allowed by the rules) shift radar radius
  27. 27. Closing thoughts - Implementation, performance & gotchas: - No resolution lost when using raw lidar points - Substantial polishing of release3 noisy "ground truth" - Trained using single 1080 GTX Ti - Code primarily in Python, optimized lidar cloud interfacing in C++ - Trained GRU (RNN) w/ theano (2x faster than tensorflow) - Used tensorflow for inference (theano segfaulted when using two models sequentially) - - Areas of improvement: - Train two networks end to end (need differentiable filtering and resampling) - Fix release3 "ground truth" - Train localizer with release3 data for car - Track ego and obstacle position in a fixed global frame, separately - Account for time delta in lidar frames - Fuse camera, odometry - Use phased LSTM to avoid lidar sampling
  28. 28. zbzc
  29. 29. 31DiDi-Udacity Self-Driving Car Challenge 2017 Pipeline Python Node 2C Node Input:(Bag file) Output:(Obs info) Lidar Model yaw location H,w,l Lidar to 2D features Classifications Localizations Orientations Obstacles state tracking RGB Model Python Node 1 Classifications Localizations Lidar msg Radar msg Camera msg
  30. 30. 32DiDi-Udacity Self-Driving Car Challenge 2017 1. Lidar Information to 2D Features Features for neural network height • height • maximum z value in each cell. • intensity • maximum intensity value in each cell. • ring number • maximum ring number value in each cell. Intensity(ped) Intensity(car) ring Features for calculate obstacle height • minimum z • minimum z value in each cell.
  31. 31. 33DiDi-Udacity Self-Driving Car Challenge 2017 Network Architecture2. Name Filters Size/Stride Output Input 600x600 conv1_1 32 3x3 600x600 conv1_2 64 3x3 600x600 pool1 2x2/2 300x300 conv2_1 128 3x3 300x300 conv2_1_1x1 64 1x1 300x300 conv2_2 128 3x3 300x300 pool2 2x2/2 150x150 conv3_1 256 3x3 150x150 conv3_1_1x1 128 1x1 150x150 conv3_2 256 3x3 150x150 conv3_3_1x1 128 1x1 150x150 conv3_3 256 3x3 150x150 pool3 2x2/2 75x75 conv4_1 512 3x3 75x75 conv4_1_1x1 256 1x1 75x75 conv4_2 512 3x3 75x75 conv4_2_1x1 256 1x1 75x75 conv4_3 512 3x3 75x75
  32. 32. 34DiDi-Udacity Self-Driving Car Challenge 2017 Training Details Input data • Bounding box • Classification • Orientation Data Augment • data normalization, random crops and horizontal flip Batch normalization 3. Bounding Box Object Orientation
  33. 33. 35DiDi-Udacity Self-Driving Car Challenge 2017 H, W, L Calculation Car • Length and width: Pedestrian 4. Bounding Box Object Orientation α β L W • Height: • Height:
  34. 34. 36DiDi-Udacity Self-Driving Car Challenge 2017 Obstacle Status Tracking Car • Unscented Kalman Filter: ● CTRV model ● State vector: Pedestrian 5. • Standard Kalman Filter : ● State vector: k+1
  35. 35. 37DiDi-Udacity Self-Driving Car Challenge 2017 [1]. Multi-View 3D Object Detection Network for Autonomous Driving. Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia International Conference on Computer Vision and Pattern Recognition (CVPR), 2017 [2]. SSD: Single Shot MultiBox Detector. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg ECCV 2016 [3]. [4]. Emerging Topics in Computer Vision. Edited by G erard Medioni and Sing Bing Kang [5]. Calibration of RGB Camera With Velodyne LiDAR. Martin Velas, Michal Spanel, Zdenek Materna, Adam Herout [6]. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015. [7]. YOLO9000: Better, Faster, Stronger Joseph Redmon∗y, Ali Farhadi∗y University of Washington∗, Allen Institute for Aiy Reference
  36. 36. 38DiDi-Udacity Self-Driving Car Challenge 2017 Thank you!
  37. 37. 39DiDi-Udacity Self-Driving Car Challenge 2017
  38. 38. Robodreams
  39. 39. Didi-Udacity Challenge Robodreams Team
  40. 40. 3D Lidar Architecture Radar IMU GPS Camera Lidar Object detection EKF Estimation Object detection using Deep Learning XML /tracks /imu /pos /image /point_cloud /vision_poses /lidar_poses /tracklets
  41. 41. EKF Estimation Main features: ● Speed and rotation of a vehicle is considered ● Delay of sensors data is taken into account State vector: System model: Tracked vehicle Ego vehicle
  42. 42. Lidar Object Detection Lidar Remove EGO Vehicle Find and remove Ground Plane Clusterization Select a cluster related to vehicle Shape alignment around the cluster
  43. 43. Shape alignment Particle weight is ● Each particle is a parallelepiped with different parameters: x, y; width, length, height ● We generate a particle in the center of a found cluster using normal distribution ● Each parallelepiped plane has a different weight. The nearest plane has the maximum weight dmin dmin dmin
  44. 44. Object detection using Deep Learning (Camera) Orientation Pooling Detector And Classifier Detector And Classifier Input (1242x375x3) VGG up to conv4_3 156x47x512 VGG up to fc7 78x24x1024 Conv layers 16x2x256 Normalization Detector And Classifier Orient. Classifier Detector And Classifier Fast NMS Final Detections Orient. Classifier Orient. Classifier Orient. Classifier Orientation Prediction
  45. 45. Examples: Car I Detection and Tracking 3D Point Cloud + Radar IMU+GPS
  46. 46. Examples: Car II Detection and Tracking 3D Point Cloud + Radar IMU+GPS
  47. 47. Examples: Pedestrian Detection and Tracking 3D Point Cloud + Radar IMU+GPS
  48. 48. Background: Research in the field of Robotics at Innopolis University ➢ Nonlinear MPC for a race car ➢ Getting ready for the Roborace: a competition of autonomous racing cars Team
  49. 49. Tried different approaches and neural networks Increased performance thanks to reducing the number of cloud points Added orientation to SSD network instead of using a separate CNN for orientation Speeded up the development process due to the access to the high-performance GPU Reflections
  50. 50. Improve detection with lidar and stay in realtime Use a larger training dataset to improve the quality of visual detection Detect steering wheels position of a car Multiple object tracking in realtime Future work
  51. 51. Thank you!
  52. 52. abccba
  53. 53. DiDi-Udacity Self-Driving Car Challenge Presenter:Jian Li
  54. 54. Team Introduction Team name abccba Team members Zhenzhe Ying (Graduated from Xian Jiaotong University. Working as algorithm engineer) Jian Li (Master in Nanjing University of Secience and Technology. Research on deep learning)
  55. 55. Dataset Challenges (1) Lidar point cloud is sparse; (2) Target may be a long distance away; (3) Few points is hard to distinguish car, pedestrian (4) Camera may not find target behind or beside; (5) Radar captures less object feature.
  56. 56. Our Solutions Coarse Detection • Clustering algorithm for lidar point cloud Fine Location • Fine tune 3D box for each lidar point cluster Verification • Validate current results using history infomation Multi-Sensor Coarse-to-Fine Detection Framework • Tiny YOLO for camera images • Simple central point rules for radar data • Interpolate frames and refine the track
  57. 57. Tiny YOLO network (1) Conv+Pooling+FC+Multi-loss; (2) Remove redundant code; (3) Downsize network structure; Why YOLO (1)Developed by C language; (2) One-stage detection; (3) Fast and easily deployed; Train yolo on kitti dataset; Detect car or ped on didi-uda dataset; Output: (x , y , w, l) and categories; Transformation from 2d box to 3d box. YOLO Coarse Detection You Only Look Once: Unified, Real-Time Object Detection. J Redmon, S Divvala, R Girshick, AFarhadi 2016 CVPR
  58. 58. Point cloud cluster algorithm Input : lidar point cloud; Output: point clusters. (1) (2) (3) Remove ground and objects too high. Swing scan remaining points. Cluster point cloud into several point clusters by spatial distance Consdier each points cluster (5) Coarse Detection (4) (3) ... ...
  59. 59. Fine Location (1) Given Few lidar points. Based on this, we initialize a central point; (2)For each point cloud cluster, We grid search x, y, z, yaw, around these points; (3)After fixing w, h, l. we generate some 3d box proposals centered at x,y,z in different orientations; (4)We evaluate each proposal and output the one with the highest score. Score is based on Evaluation Metrics in next page. (2) (3) (4)(1) 3D box fine tuning scheme
  60. 60. Fine Location Car(left), Pedestrian(right) parameters N :the number of points. dis :distance from the point to surface of box; f(N) : the more points in box, the better the 3d box will be; Lmin(V) :try to minimum the volume of the 3d box. m n a b c 2.0 1.5 2.0 0.6 1.2 Evaluation Metrics
  61. 61. Verification • Central point rules for radar points to locate far target • Validate current results using history infomation • Interpolate frames and refine the track • Point cloud may fail to capture far target. <35m >35m ..... .... .... Radar Camera Lidar Validation Interpolation 35m Lidar Radar
  62. 62. Simple scene examples
  63. 63. Complex scene examples dust car
  64. 64. Summary (1). System design Agile development, easy deploy; Low coupling and more flexbility; (2). Multi-sensor info ensemble Lidar, Radar, Camera and GPS; (3). Algorithm Corse-to-fine detection; Adopt CNN for camera images; Point cloud reduction and cluster algorithm; Based on spatial distribution of points, we design evaluation criteria (4). Get 0.43 IOU and 20HZ on K80 GPU platform; TODO (1). Record the speed of target for tracking to predict the next position more precisely; (2). Fuse a small neural network module for coarse detection from bird view for point cloud; Team scores abccba 0.4333510468 Robodreams 0.4097831892 zbzc 0.3978965429 Tea 0.3914668045 ICTANS 0.3463341661 Round1 Team scores abccba 0.28531890 zbzc 0.23590994 Roboauto 0.21162456 Robodreams 0.18696818 Something 0.17618155 Round2
  65. 65. Thanks! bird sword