Successfully reported this slideshow.
Your SlideShare is downloading. ×

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 52 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution (20)

Advertisement

More from Preferred Networks (20)

Recently uploaded (20)

Advertisement

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution

  1. 1. Kaggle Lyft Motion Prediction 4th place solution
  2. 2. ● We are #4 out of 935 teams, in competitive situation. Competition result: PFN!
  3. 3. Competition introduction
  4. 4. ● Kaggle: Lyft Motion Prediction for Autonomous Vehicles
 ● l5kit Data HP: Data - Lyft
 Competition/Dataset page
  5. 5. ● Focus on “Motion Prediction” part ○ Given bird-eye-view image (No natural images) ○ Predict 3 possible trajectories with confidence. Competition introduction Competition Scope Image from https://self-driving.lyft.com/level5/data/
  6. 6. ● It was focusing “Perception” part ○ https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles ○ Detect car as 3d object Last year competition: Lyft 3D Object Detection Image from https://self-driving.lyft.com/level5/data/ Image from https://www.kaggle.com/tarunpaparaju/lyft-competition-understanding-the-data
  7. 7. ● Information in the bird-eye-view ○ Label of passengers (e.g. car, bicycle and pedestrian...) ○ Status of traffic light ○ Road information (e.g. pedestrian crossings and direction) ○ Location and timestamp... Competition introduction These information can be gathered into single image using l5kit library
  8. 8. ● Total dataset size: 1118 hours, 26344 km ● Road length: 6.8 miles ● Train (89GB), Validation (11GB), Test Dataset (3GB): ○ Big data: Approx 200M, 190K, 71K Agents to predict motion. Lyft level5 Data description Image from https://arxiv.org/pdf/2006.14480.pdf “One Thousand and One Hours: Self-driving Motion Prediction Dataset”
  9. 9. EDA Exploratory Data Analysis

  10. 10. ● Route on google map ● Not so long distance, around Lyft office (Actually, CNN can “memorize” the place from image) EDA using google earth 1.Station 2.Intersection 2.←Paper fig 2.Signals
  11. 11. ● Many straight roads ● Some complicated intersections... EDA using google earth
  12. 12. ● More & more EDA, Train/Valid/Test stat is almost same! No extrapolation found in this dataset… ○ Agent type distribution:CAR 91%, CYCLIST 2%, PEDESTRIAN 7% ○ Date :From 2019 October to 2020 March ○ Time :Daytime, From 7am to 7pm ○ Place:All road is included in train/valid/test ● Less effort is necessary “how to handle & train data” → Pure programming skill & ML techniques were important. More EDA, No extrapolation found in this dataset... Time https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/189516 Date
  13. 13. Technology stack
  14. 14. ● Structured numpy array + zarr is used to save data on disk. ● structured array: https://numpy.org/doc/stable/user/basics.rec.html Raw Data format ● zarr: https://zarr.readthedocs.io/en/stable/ ○ It can save structured array on disk
  15. 15. ● l5kit is provided as baseline: https://github.com/lyft/l5kit ○ (Complicated) data preprocessing part is already implemented ○ Rasterizer ■ Semantic → protocol buffer is used inside MapAPI to draw semantic Map ■ Satellite → Draw satellite image. ● Most kaggle competition : 0 → 1 This competition : 1 → 10 L5kit library Rasterizer (base implementation provided by Lyft) Raw data (zarr) - World coordinate in time - Extent (size) - Yaw CNN Predict future coordinates (3 trajectories) Typical approach already supported by l5kit Image
  16. 16. Approach
  17. 17. Short Summary ● Distributed training: 8 V100 GPUs * 5 days for 1 epoch
  18. 18. ● 1. Use train_full.zarr ● 2. l5kit==1.1.0 ● 3. Set min_history=0, min_future=10 in AgentDataset ● 4. Cosine annealing for LR decrease until 0, with training 1 epoch → That’s enough to win the prize! (Private LB: 10.274) ● 5. Ensemble with GMM (Gaussian Mixture Models) → Further boosted score by 0.8 (Private LB: 9.475) Short Summary
  19. 19. Solutions
  20. 20. ● How to predict probabilistic behavior? ● Suggested Baseline kernel “Lyft: Training with multi-mode confidence” ○ Single model outputs 3 trajectories with the confidence at the same time ○ Train using competition evaluation metric loss directly ○ 1st place solution also originate from our approach (link) Approach/Solution:
  21. 21. Approach/Metric: • In this competition, model outputs 3 hypotheses (trajectories).
 – ground truth:
 – hypotheses:
 • Assume the ground truth positions to be modeled by a mixture of Normal distributions.
 
 
 
 
 • LB score is calculated by following metric and we directory used it as loss function of CNN.

  22. 22. ● To utilize all possible data? → Let’s use train_full.zarr without down sampling ○ But size is big!…. ○ 89 GB ○ 191,177,863 record with default setting → Need distributed training! ※ It was important to use all the data, to get good score in the competition. Use train_full.zarr dataset
  23. 23. ● torch.distributedis used ○ 8 V100 GPUs * 5 days for 1 epoch ● Practically, need to modify AgentDataset to cache index arrays in disk ○ AgentDataset is copied in DataLoader when num_workers is set. ■ 8 multiprocesses * 4 num_workers = 32 copy is created ■ On-memory usage of AgentDataset is huge! Cannot fit in RAM. ● cumulative_sizesattribute was the bottleneck. ○ Cache track_id, scene_index, state_indexinto zarr to reduce on-memory usage. Distributed training
  24. 24. ● Pointed out in “We did it all wrong” discussion: ○ The target_positions value need to be rotated in the same way with the image, specified by agent’s “yaw” Use l5kit==1.1.0 l5kit==1.0.6 target_positions l5kit==1.1.0 target_positions
  25. 25. ● Use chopped dataset: Only use 100-th frame from each scene. ○ This is how test data is made. ○ But it discards all ground truth data, instead, set agent_mask in AgentDataset to make validation data. ● Check validation/test dataset carefully ○ We Noticed that it contains at least 10 future frames & 0 history frames. → Next page Validation strategy
  26. 26. ● Set min_history=0, min_future=10 in AgentDataset ○ MOST IMPORTANT! ○ Public LB Score jumps to 13.059 here. Align training dataset to validation/test dataset
  27. 27. ● Tried several models ● Worked Well: ○ Resnet18 ○ Resnet50 ○ SEResNeXt50 ○ ecaresnet18 ● Not working well: Big, deeper models tend to have worse performance... ○ ResNet101 ○ ResNet152 CNN Models
  28. 28. ● Trained hyperparameters ○ Batch size 12 * 8 processes ○ Adam optimizer ○ Cosing annealing with 1 epoch (Better than Exponential decay) Training with cosine annealing
  29. 29. ● Used albumentationslibrary, tried several augmentations. ○ Tried Cutout, Blur, Downscale ○ Other augmentation used in natural image, ex flip, was not appropriate this time ● Only cutout is adopted for final model. Augmentation: 1. Image based augmentation Cutout Blur DownscaleOriginal image
  30. 30. ● Modified BoxRasterizer to add augmentation ○ 1. Random Agent drop ○ 2. Agent extent size scaling ● We could not find clear improvement during our experiment. Final model does not use this augmentation... Augmentation: 2. Rasterizer level augmentation Several agents are dropped Host car size is different
  31. 31. ● How to ensemble models?
 ○ In this competition, we train model to predict three trajectories (x1,x2,x3) and three confidences (c1,c2,c3).
 ○ Simple ensemble methods such as averaging do not work.
 
 ● Consider the outputs as Gaussian mixture models
 ○ The outputs can be considered as confidence-weighted GMMs with n_components=3 
 ○ You can take the average of GMMs and the average of N GMMs takes the form of GMM with n_components=3N Ensemble by GMM and EM algorithm
  32. 32. ● You can get ensembled outputs from by following the steps below.
 ○ Sampling enough points (e.g. 1000N) from the distribution . 
 ○ Run the EM algorithm with n_components=3on the sampled points 
 (We used sklearn.mixture.GaussianMixture).
 ○ Let be the output of the EM algorithm.
 Ensemble by GMM and EM algorithm
  33. 33. 
 
 model1:loss=67.85
 model2:loss=77.60
 ensemble model:loss=8.26
 Ensemble by GMM and EM algorithm sampling from GMM
 fitting by EM algorithm
 ● Example1: loss has reduced dramatically by taking “average trajectory”!

  34. 34. 
 
 model1:loss=3340
 model2:loss=68.99
 ensemble model:loss=69.69
 Ensemble by GMM and EM algorithm sampling from GMM
 fitting by EM algorithm
 ● Example2: Model 1’s loss was very bad, ensembled result can get benefit of better predictions from model 2.

  35. 35. ● The final best submission was ensemble of 9 different models ● That’s all for our solution presentation, thank you! Final submission
  36. 36. Other approach & Future discussion
  37. 37. ● CNN Models: Smaller model was enough ○ ResNet18 was enough to get 4th place ○ Tried bigger ResNet101, ResNet152, etc… But worse performance ● Only 1 epoch training was enough! ○ Because data is very big & almost duplicated for consecutive frames ○ Important to use Cosine annealing for learning rate schedule ● Rasterizer (drawing image) is bottleneck ○ CPU intensive task, GPU util is not 100%. Findings Rasterizer (base implementation provided by Lyft) Raw data - World coordinate in time - Extent (size) - Yaw CNN Predict future coordinates (3 trajectories) Typical approach Image
  38. 38. ● https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/201493 ● Optimize Rasterizer implementation → 8 GPU * 2 days for 1 epoch ● Hyperparameters with “heavy” training ○ Semantic + Satellite images ○ Bigger image (448 * 224) ← (224, 224) ○ num history: 30 ← 10 ○ min_future: 5 ← 10 ○ Modify agent filter threshold ○ batch_size: 64 etc... ● Pre-training small image 4 epoch → Fine tune big image 1 epoch ○ It was very effective [1st place solution] : L5kit Speedup
  39. 39. ● 10th place solution GNN based methods called VectorNet ○ Faster training & inference ■ They did not use rasterized images at all ■ 11 GPU hours for 1 epoch (Our CNN needs about 960 GPU hours) ○ Comparable performance to CNN-based methods Other interesting approaches: VectorNet VectorNet [Gao+, CVPR2020]
 VectorNet
 CNN
 CNN
 (or not shared)

  40. 40. Appendix1 Data analysis/Error analysis

  41. 41. ● How different is the 3 trajectory generated by CNN models? ● Case1: Different directions ○ CNN can predict different possible ways/directions that agents move in the future. The diversity of 3 trajectory
  42. 42. ● How different is the 3 trajectory generated by CNN models? ● Case2: Speed or start time is different ○ Even direction is straight, CNN can predict different possible speed/acceleration that agents move in the future. The diversity of 3 trajectory
  43. 43. Appendix2 What we tried and not worked

  44. 44. ● raster_size (Image size) ○ Tried 224x224 & 128x128. ○ Default 224x224 was better ● pixel_size ○ Tried 0.5, 0.25, 0.15. ○ Default 0.5 was better. ● num_history specific model ○ Short history model: ■ Tried to train 0 history model → the performance was not better than original model ○ Long history model ■ Tried 10, 14, 20 ■ Default 10 was better in our experiment (But 1st place solution used num_history=30) Hyperparamter change
  45. 45. ● Added velocity arrow to the BoxRasterizer Custom Rasterizer: 1. VelocityBoxRasterizer
  46. 46. ● Original SemanticRasterizer: Semantic image is drawn as RGB image Custom Rasterizer: 2. ChannelSemanticRasterizer ● ChannelSemanticRasterizer: ○ Separated road, lane, green/yellow/red signal & crosswalk Somehow, the training performance was worse than original SemanticRasterizer...
  47. 47. ● We thought that the red signal length is important to predict when the stopping agent starts moving in the future. ● This Semantic Rasterizer changes its value by looking how long the single continued in the history. Custom Rasterizer: 3. TLSemanticRasterizer
  48. 48. ● Draw each agent type in different color/channel ○ CAR = Blue ○ CYCLIST = Yellow ○ PEDESTRIAN = Red ○ UNKNOWN = Gray ● Unknown type agent is also drawn Custom Rasterizer: 4. AgentTypeBoxRasterizer
  49. 49. ● Predict all agent’s future coords at once, from 1 image. ● Using semantic segmentation models (segmentation-models-pytorch) ● Stopped investigation because agent sometimes exists very far from host car. Multi-agent prediction model https://self-driving.lyft.com/level5/data/
  50. 50. ● What kind of data makes the serious big error? ● When the “yaw” annotation is wrong, prediction & actual direction becomes different! ● Fix data’s yaw field contributes total score improvement? ○ YES! for validation dataset (see below). ○ NO!! for test dataset, yaw annotation seems wrong for only stopped cars. ● In the application, I guess this is very important problem to be considered... Yaw correction Loss=43988 Loss=30962 Loss=10818
  51. 51. ● Kaggle page: Lyft Motion Prediction for Autonomous Vehicles ● Data HP: https://self-driving.lyft.com/level5/data/ ● Solution Discussion: Lyft Motion Prediction for Autonomous Vehicles ● Solution Code: https://github.com/pfnet-research/kaggle-lyft-motion-prediction-4th-place-solution References

×