Kaggle Lyft Motion Prediction
4th place solution
● We are #4 out of 935 teams, in competitive situation.
Competition result:
PFN!
Competition introduction
● Kaggle: Lyft Motion Prediction for Autonomous Vehicles

● l5kit Data HP: Data - Lyft

Competition/Dataset page
● Focus on “Motion Prediction” part
○ Given bird-eye-view image (No natural images)
○ Predict 3 possible trajectories with confidence.
Competition introduction
Competition Scope Image from https://self-driving.lyft.com/level5/data/
● It was focusing “Perception” part
○ https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles
○ Detect car as 3d object
Last year competition: Lyft 3D Object Detection
Image from https://self-driving.lyft.com/level5/data/ Image from https://www.kaggle.com/tarunpaparaju/lyft-competition-understanding-the-data
● Information in the bird-eye-view
○ Label of passengers (e.g. car, bicycle and pedestrian...)
○ Status of traffic light
○ Road information (e.g. pedestrian crossings and direction)
○ Location and timestamp...
Competition introduction
These information
can be gathered into
single image using
l5kit library
● Total dataset size: 1118 hours, 26344 km
● Road length: 6.8 miles
● Train (89GB), Validation (11GB), Test Dataset (3GB):
○ Big data: Approx 200M, 190K, 71K Agents to predict motion.
Lyft level5 Data description
Image from https://arxiv.org/pdf/2006.14480.pdf
“One Thousand and One Hours: Self-driving Motion Prediction Dataset”
EDA
Exploratory Data Analysis

● Route on google map
● Not so long distance, around Lyft office (Actually, CNN can “memorize” the place from image)
EDA using google earth
1.Station 2.Intersection
2.←Paper fig
2.Signals
● Many straight roads
● Some complicated intersections...
EDA using google earth
● More & more EDA, Train/Valid/Test stat is almost same!
No extrapolation found in this dataset…
○ Agent type distribution:CAR 91%, CYCLIST 2%, PEDESTRIAN 7%
○ Date :From 2019 October to 2020 March
○ Time :Daytime, From 7am to 7pm
○ Place:All road is included in train/valid/test
● Less effort is necessary “how to handle & train data”
→ Pure programming skill & ML techniques were important.
More EDA, No extrapolation found in this dataset...
Time
https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/189516
Date
Technology stack
● Structured numpy array + zarr is used to save data on disk.
● structured array: https://numpy.org/doc/stable/user/basics.rec.html
Raw Data format
● zarr: https://zarr.readthedocs.io/en/stable/
○ It can save structured array on disk
● l5kit is provided as baseline: https://github.com/lyft/l5kit
○ (Complicated) data preprocessing part is already implemented
○ Rasterizer
■ Semantic → protocol buffer is used inside MapAPI to draw semantic Map
■ Satellite → Draw satellite image.
● Most kaggle competition : 0 → 1
This competition : 1 → 10
L5kit library
Rasterizer
(base implementation
provided by Lyft)
Raw data (zarr)
- World coordinate
in time
- Extent (size)
- Yaw
CNN
Predict future
coordinates
(3 trajectories)
Typical approach already supported by l5kit Image
Approach
Short Summary
● Distributed training: 8 V100 GPUs * 5 days for 1 epoch
● 1. Use train_full.zarr
● 2. l5kit==1.1.0
● 3. Set min_history=0, min_future=10 in AgentDataset
● 4. Cosine annealing for LR decrease until 0, with training 1 epoch
→ That’s enough to win the prize! (Private LB: 10.274)
● 5. Ensemble with GMM (Gaussian Mixture Models)
→ Further boosted score by 0.8 (Private LB: 9.475)
Short Summary
Solutions
● How to predict probabilistic behavior?
● Suggested Baseline kernel “Lyft: Training with multi-mode confidence”
○ Single model outputs 3 trajectories with the confidence at the same time
○ Train using competition evaluation metric loss directly
○ 1st place solution also originate from our approach (link)
Approach/Solution:
Approach/Metric:
• In this competition, model outputs 3 hypotheses (trajectories).

– ground truth:

– hypotheses:

• Assume the ground truth positions to be modeled by a mixture of Normal distributions.









• LB score is calculated by following metric and we directory used it as loss function of
CNN.

● To utilize all possible data? → Let’s use train_full.zarr without down sampling
○ But size is big!….
○ 89 GB
○ 191,177,863 record with default setting
→ Need distributed training!
※ It was important to use all the data, to get good score in the competition.
Use train_full.zarr dataset
● torch.distributedis used
○ 8 V100 GPUs * 5 days for 1 epoch
● Practically, need to modify AgentDataset to cache index arrays in disk
○ AgentDataset is copied in DataLoader when num_workers is set.
■ 8 multiprocesses * 4 num_workers = 32 copy is created
■ On-memory usage of AgentDataset is huge! Cannot fit in RAM.
● cumulative_sizesattribute was the bottleneck.
○ Cache track_id, scene_index, state_indexinto zarr to
reduce on-memory usage.
Distributed training
● Pointed out in “We did it all wrong” discussion:
○ The target_positions value need to be rotated in the same way with the image,
specified by agent’s “yaw”
Use l5kit==1.1.0
l5kit==1.0.6 target_positions l5kit==1.1.0 target_positions
● Use chopped dataset: Only use 100-th frame from each scene.
○ This is how test data is made.
○ But it discards all ground truth data,
instead, set agent_mask in AgentDataset to make validation data.
● Check validation/test dataset carefully
○ We Noticed that it contains at least 10 future frames & 0 history frames.
→ Next page
Validation strategy
● Set min_history=0, min_future=10 in AgentDataset
○ MOST IMPORTANT!
○ Public LB Score jumps to 13.059 here.
Align training dataset to validation/test dataset
● Tried several models
● Worked Well:
○ Resnet18
○ Resnet50
○ SEResNeXt50
○ ecaresnet18
● Not working well: Big, deeper models tend to have worse performance...
○ ResNet101
○ ResNet152
CNN Models
● Trained hyperparameters
○ Batch size 12 * 8 processes
○ Adam optimizer
○ Cosing annealing with 1 epoch (Better than Exponential decay)
Training with cosine annealing
● Used albumentationslibrary, tried several augmentations.
○ Tried Cutout, Blur, Downscale
○ Other augmentation used in natural image, ex flip, was not appropriate this time
● Only cutout is adopted for final model.
Augmentation: 1. Image based augmentation
Cutout Blur DownscaleOriginal image
● Modified BoxRasterizer to add augmentation
○ 1. Random Agent drop
○ 2. Agent extent size scaling
● We could not find clear improvement during our experiment.
Final model does not use this augmentation...
Augmentation: 2. Rasterizer level augmentation
Several agents
are dropped
Host car size
is different
● How to ensemble models?

○ In this competition, we train model to predict three trajectories (x1,x2,x3) and
three confidences (c1,c2,c3).

○ Simple ensemble methods such as averaging do not work.



● Consider the outputs as Gaussian mixture models

○ The outputs can be considered as confidence-weighted GMMs with
n_components=3


○ You can take the average of GMMs and the average of N GMMs takes the form
of GMM with n_components=3N
Ensemble by GMM and EM algorithm
● You can get ensembled outputs from by
following the steps below.

○ Sampling enough points (e.g. 1000N) from the distribution . 

○ Run the EM algorithm with n_components=3on the sampled points 

(We used sklearn.mixture.GaussianMixture).

○ Let be the output of the EM algorithm.

Ensemble by GMM and EM algorithm




model1:loss=67.85

model2:loss=77.60

ensemble model:loss=8.26

Ensemble by GMM and EM algorithm
sampling from GMM
 fitting by EM algorithm

● Example1: loss has reduced dramatically by taking “average trajectory”!





model1:loss=3340

model2:loss=68.99

ensemble model:loss=69.69

Ensemble by GMM and EM algorithm
sampling from GMM
 fitting by EM algorithm

● Example2: Model 1’s loss was very bad, ensembled result can get benefit
of better predictions from model 2.

● The final best submission was ensemble of 9 different models
● That’s all for our solution presentation, thank you!
Final submission
Other approach &
Future discussion
● CNN Models: Smaller model was enough
○ ResNet18 was enough to get 4th place
○ Tried bigger ResNet101, ResNet152, etc… But worse performance
● Only 1 epoch training was enough!
○ Because data is very big & almost duplicated for consecutive frames
○ Important to use Cosine annealing for learning rate schedule
● Rasterizer (drawing image) is bottleneck
○ CPU intensive task, GPU util is not 100%.
Findings
Rasterizer
(base implementation
provided by Lyft)
Raw data
- World coordinate
in time
- Extent (size)
- Yaw
CNN
Predict future
coordinates
(3 trajectories)
Typical approach
Image
● https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/201493
● Optimize Rasterizer implementation
→ 8 GPU * 2 days for 1 epoch
● Hyperparameters with “heavy” training
○ Semantic + Satellite images
○ Bigger image (448 * 224) ← (224, 224)
○ num history: 30 ← 10
○ min_future: 5 ← 10
○ Modify agent filter threshold
○ batch_size: 64
etc...
● Pre-training small image 4 epoch → Fine tune big image 1 epoch
○ It was very effective
[1st place solution] : L5kit Speedup
● 10th place solution GNN based methods called VectorNet
○ Faster training & inference
■ They did not use rasterized images at all
■ 11 GPU hours for 1 epoch (Our CNN needs about 960 GPU hours)
○ Comparable performance to CNN-based methods
Other interesting approaches: VectorNet
VectorNet [Gao+, CVPR2020]
 VectorNet

CNN

CNN

(or not shared)

Appendix1
Data analysis/Error analysis

● How different is the 3 trajectory generated by CNN models?
● Case1: Different directions
○ CNN can predict different possible ways/directions that agents move in the
future.
The diversity of 3 trajectory
● How different is the 3 trajectory generated by CNN models?
● Case2: Speed or start time is different
○ Even direction is straight, CNN can predict different possible
speed/acceleration that agents move in the future.
The diversity of 3 trajectory
Appendix2
What we tried and not worked

● raster_size (Image size)
○ Tried 224x224 & 128x128.
○ Default 224x224 was better
● pixel_size
○ Tried 0.5, 0.25, 0.15.
○ Default 0.5 was better.
● num_history specific model
○ Short history model:
■ Tried to train 0 history model
→ the performance was not better than original model
○ Long history model
■ Tried 10, 14, 20
■ Default 10 was better in our experiment
(But 1st place solution used num_history=30)
Hyperparamter change
● Added velocity arrow to the BoxRasterizer
Custom Rasterizer: 1. VelocityBoxRasterizer
● Original SemanticRasterizer: Semantic image is drawn as RGB image
Custom Rasterizer: 2. ChannelSemanticRasterizer
● ChannelSemanticRasterizer:
○ Separated road, lane, green/yellow/red signal & crosswalk
Somehow, the training performance was worse than original SemanticRasterizer...
● We thought that the red signal length is important to predict when the stopping
agent starts moving in the future.
● This Semantic Rasterizer changes its value by looking how long the single continued
in the history.
Custom Rasterizer: 3. TLSemanticRasterizer
● Draw each agent type in different color/channel
○ CAR = Blue
○ CYCLIST = Yellow
○ PEDESTRIAN = Red
○ UNKNOWN = Gray
● Unknown type agent is also drawn
Custom Rasterizer: 4. AgentTypeBoxRasterizer
● Predict all agent’s future coords at once, from 1 image.
● Using semantic segmentation models (segmentation-models-pytorch)
● Stopped investigation because agent sometimes exists very far from host car.
Multi-agent prediction model
https://self-driving.lyft.com/level5/data/
● What kind of data makes the serious big error?
● When the “yaw” annotation is wrong, prediction & actual direction becomes different!
● Fix data’s yaw field contributes total score improvement?
○ YES! for validation dataset (see below).
○ NO!! for test dataset, yaw annotation seems wrong for only stopped cars.
● In the application, I guess this is very important problem to be considered...
Yaw correction
Loss=43988 Loss=30962 Loss=10818
● Kaggle page: Lyft Motion Prediction for Autonomous Vehicles
● Data HP: https://self-driving.lyft.com/level5/data/
● Solution Discussion: Lyft Motion Prediction for Autonomous Vehicles
● Solution Code: https://github.com/pfnet-research/kaggle-lyft-motion-prediction-4th-place-solution
References
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution

  • 1.
    Kaggle Lyft MotionPrediction 4th place solution
  • 2.
    ● We are#4 out of 935 teams, in competitive situation. Competition result: PFN!
  • 3.
  • 4.
    ● Kaggle: LyftMotion Prediction for Autonomous Vehicles
 ● l5kit Data HP: Data - Lyft
 Competition/Dataset page
  • 5.
    ● Focus on“Motion Prediction” part ○ Given bird-eye-view image (No natural images) ○ Predict 3 possible trajectories with confidence. Competition introduction Competition Scope Image from https://self-driving.lyft.com/level5/data/
  • 6.
    ● It wasfocusing “Perception” part ○ https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles ○ Detect car as 3d object Last year competition: Lyft 3D Object Detection Image from https://self-driving.lyft.com/level5/data/ Image from https://www.kaggle.com/tarunpaparaju/lyft-competition-understanding-the-data
  • 7.
    ● Information inthe bird-eye-view ○ Label of passengers (e.g. car, bicycle and pedestrian...) ○ Status of traffic light ○ Road information (e.g. pedestrian crossings and direction) ○ Location and timestamp... Competition introduction These information can be gathered into single image using l5kit library
  • 8.
    ● Total datasetsize: 1118 hours, 26344 km ● Road length: 6.8 miles ● Train (89GB), Validation (11GB), Test Dataset (3GB): ○ Big data: Approx 200M, 190K, 71K Agents to predict motion. Lyft level5 Data description Image from https://arxiv.org/pdf/2006.14480.pdf “One Thousand and One Hours: Self-driving Motion Prediction Dataset”
  • 9.
  • 10.
    ● Route ongoogle map ● Not so long distance, around Lyft office (Actually, CNN can “memorize” the place from image) EDA using google earth 1.Station 2.Intersection 2.←Paper fig 2.Signals
  • 11.
    ● Many straightroads ● Some complicated intersections... EDA using google earth
  • 12.
    ● More &more EDA, Train/Valid/Test stat is almost same! No extrapolation found in this dataset… ○ Agent type distribution:CAR 91%, CYCLIST 2%, PEDESTRIAN 7% ○ Date :From 2019 October to 2020 March ○ Time :Daytime, From 7am to 7pm ○ Place:All road is included in train/valid/test ● Less effort is necessary “how to handle & train data” → Pure programming skill & ML techniques were important. More EDA, No extrapolation found in this dataset... Time https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/189516 Date
  • 13.
  • 14.
    ● Structured numpyarray + zarr is used to save data on disk. ● structured array: https://numpy.org/doc/stable/user/basics.rec.html Raw Data format ● zarr: https://zarr.readthedocs.io/en/stable/ ○ It can save structured array on disk
  • 15.
    ● l5kit isprovided as baseline: https://github.com/lyft/l5kit ○ (Complicated) data preprocessing part is already implemented ○ Rasterizer ■ Semantic → protocol buffer is used inside MapAPI to draw semantic Map ■ Satellite → Draw satellite image. ● Most kaggle competition : 0 → 1 This competition : 1 → 10 L5kit library Rasterizer (base implementation provided by Lyft) Raw data (zarr) - World coordinate in time - Extent (size) - Yaw CNN Predict future coordinates (3 trajectories) Typical approach already supported by l5kit Image
  • 16.
  • 17.
    Short Summary ● Distributedtraining: 8 V100 GPUs * 5 days for 1 epoch
  • 18.
    ● 1. Usetrain_full.zarr ● 2. l5kit==1.1.0 ● 3. Set min_history=0, min_future=10 in AgentDataset ● 4. Cosine annealing for LR decrease until 0, with training 1 epoch → That’s enough to win the prize! (Private LB: 10.274) ● 5. Ensemble with GMM (Gaussian Mixture Models) → Further boosted score by 0.8 (Private LB: 9.475) Short Summary
  • 19.
  • 20.
    ● How topredict probabilistic behavior? ● Suggested Baseline kernel “Lyft: Training with multi-mode confidence” ○ Single model outputs 3 trajectories with the confidence at the same time ○ Train using competition evaluation metric loss directly ○ 1st place solution also originate from our approach (link) Approach/Solution:
  • 21.
    Approach/Metric: • In thiscompetition, model outputs 3 hypotheses (trajectories).
 – ground truth:
 – hypotheses:
 • Assume the ground truth positions to be modeled by a mixture of Normal distributions.
 
 
 
 
 • LB score is calculated by following metric and we directory used it as loss function of CNN.

  • 22.
    ● To utilizeall possible data? → Let’s use train_full.zarr without down sampling ○ But size is big!…. ○ 89 GB ○ 191,177,863 record with default setting → Need distributed training! ※ It was important to use all the data, to get good score in the competition. Use train_full.zarr dataset
  • 23.
    ● torch.distributedis used ○8 V100 GPUs * 5 days for 1 epoch ● Practically, need to modify AgentDataset to cache index arrays in disk ○ AgentDataset is copied in DataLoader when num_workers is set. ■ 8 multiprocesses * 4 num_workers = 32 copy is created ■ On-memory usage of AgentDataset is huge! Cannot fit in RAM. ● cumulative_sizesattribute was the bottleneck. ○ Cache track_id, scene_index, state_indexinto zarr to reduce on-memory usage. Distributed training
  • 24.
    ● Pointed outin “We did it all wrong” discussion: ○ The target_positions value need to be rotated in the same way with the image, specified by agent’s “yaw” Use l5kit==1.1.0 l5kit==1.0.6 target_positions l5kit==1.1.0 target_positions
  • 25.
    ● Use choppeddataset: Only use 100-th frame from each scene. ○ This is how test data is made. ○ But it discards all ground truth data, instead, set agent_mask in AgentDataset to make validation data. ● Check validation/test dataset carefully ○ We Noticed that it contains at least 10 future frames & 0 history frames. → Next page Validation strategy
  • 26.
    ● Set min_history=0,min_future=10 in AgentDataset ○ MOST IMPORTANT! ○ Public LB Score jumps to 13.059 here. Align training dataset to validation/test dataset
  • 27.
    ● Tried severalmodels ● Worked Well: ○ Resnet18 ○ Resnet50 ○ SEResNeXt50 ○ ecaresnet18 ● Not working well: Big, deeper models tend to have worse performance... ○ ResNet101 ○ ResNet152 CNN Models
  • 28.
    ● Trained hyperparameters ○Batch size 12 * 8 processes ○ Adam optimizer ○ Cosing annealing with 1 epoch (Better than Exponential decay) Training with cosine annealing
  • 29.
    ● Used albumentationslibrary,tried several augmentations. ○ Tried Cutout, Blur, Downscale ○ Other augmentation used in natural image, ex flip, was not appropriate this time ● Only cutout is adopted for final model. Augmentation: 1. Image based augmentation Cutout Blur DownscaleOriginal image
  • 30.
    ● Modified BoxRasterizerto add augmentation ○ 1. Random Agent drop ○ 2. Agent extent size scaling ● We could not find clear improvement during our experiment. Final model does not use this augmentation... Augmentation: 2. Rasterizer level augmentation Several agents are dropped Host car size is different
  • 31.
    ● How toensemble models?
 ○ In this competition, we train model to predict three trajectories (x1,x2,x3) and three confidences (c1,c2,c3).
 ○ Simple ensemble methods such as averaging do not work.
 
 ● Consider the outputs as Gaussian mixture models
 ○ The outputs can be considered as confidence-weighted GMMs with n_components=3 
 ○ You can take the average of GMMs and the average of N GMMs takes the form of GMM with n_components=3N Ensemble by GMM and EM algorithm
  • 32.
    ● You canget ensembled outputs from by following the steps below.
 ○ Sampling enough points (e.g. 1000N) from the distribution . 
 ○ Run the EM algorithm with n_components=3on the sampled points 
 (We used sklearn.mixture.GaussianMixture).
 ○ Let be the output of the EM algorithm.
 Ensemble by GMM and EM algorithm
  • 33.
    
 
 model1:loss=67.85
 model2:loss=77.60
 ensemble model:loss=8.26
 Ensemble byGMM and EM algorithm sampling from GMM
 fitting by EM algorithm
 ● Example1: loss has reduced dramatically by taking “average trajectory”!

  • 34.
    
 
 model1:loss=3340
 model2:loss=68.99
 ensemble model:loss=69.69
 Ensemble byGMM and EM algorithm sampling from GMM
 fitting by EM algorithm
 ● Example2: Model 1’s loss was very bad, ensembled result can get benefit of better predictions from model 2.

  • 35.
    ● The finalbest submission was ensemble of 9 different models ● That’s all for our solution presentation, thank you! Final submission
  • 36.
  • 37.
    ● CNN Models:Smaller model was enough ○ ResNet18 was enough to get 4th place ○ Tried bigger ResNet101, ResNet152, etc… But worse performance ● Only 1 epoch training was enough! ○ Because data is very big & almost duplicated for consecutive frames ○ Important to use Cosine annealing for learning rate schedule ● Rasterizer (drawing image) is bottleneck ○ CPU intensive task, GPU util is not 100%. Findings Rasterizer (base implementation provided by Lyft) Raw data - World coordinate in time - Extent (size) - Yaw CNN Predict future coordinates (3 trajectories) Typical approach Image
  • 38.
    ● https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/201493 ● OptimizeRasterizer implementation → 8 GPU * 2 days for 1 epoch ● Hyperparameters with “heavy” training ○ Semantic + Satellite images ○ Bigger image (448 * 224) ← (224, 224) ○ num history: 30 ← 10 ○ min_future: 5 ← 10 ○ Modify agent filter threshold ○ batch_size: 64 etc... ● Pre-training small image 4 epoch → Fine tune big image 1 epoch ○ It was very effective [1st place solution] : L5kit Speedup
  • 39.
    ● 10th placesolution GNN based methods called VectorNet ○ Faster training & inference ■ They did not use rasterized images at all ■ 11 GPU hours for 1 epoch (Our CNN needs about 960 GPU hours) ○ Comparable performance to CNN-based methods Other interesting approaches: VectorNet VectorNet [Gao+, CVPR2020]
 VectorNet
 CNN
 CNN
 (or not shared)

  • 40.
  • 41.
    ● How differentis the 3 trajectory generated by CNN models? ● Case1: Different directions ○ CNN can predict different possible ways/directions that agents move in the future. The diversity of 3 trajectory
  • 42.
    ● How differentis the 3 trajectory generated by CNN models? ● Case2: Speed or start time is different ○ Even direction is straight, CNN can predict different possible speed/acceleration that agents move in the future. The diversity of 3 trajectory
  • 43.
    Appendix2 What we triedand not worked

  • 44.
    ● raster_size (Imagesize) ○ Tried 224x224 & 128x128. ○ Default 224x224 was better ● pixel_size ○ Tried 0.5, 0.25, 0.15. ○ Default 0.5 was better. ● num_history specific model ○ Short history model: ■ Tried to train 0 history model → the performance was not better than original model ○ Long history model ■ Tried 10, 14, 20 ■ Default 10 was better in our experiment (But 1st place solution used num_history=30) Hyperparamter change
  • 45.
    ● Added velocityarrow to the BoxRasterizer Custom Rasterizer: 1. VelocityBoxRasterizer
  • 46.
    ● Original SemanticRasterizer:Semantic image is drawn as RGB image Custom Rasterizer: 2. ChannelSemanticRasterizer ● ChannelSemanticRasterizer: ○ Separated road, lane, green/yellow/red signal & crosswalk Somehow, the training performance was worse than original SemanticRasterizer...
  • 47.
    ● We thoughtthat the red signal length is important to predict when the stopping agent starts moving in the future. ● This Semantic Rasterizer changes its value by looking how long the single continued in the history. Custom Rasterizer: 3. TLSemanticRasterizer
  • 48.
    ● Draw eachagent type in different color/channel ○ CAR = Blue ○ CYCLIST = Yellow ○ PEDESTRIAN = Red ○ UNKNOWN = Gray ● Unknown type agent is also drawn Custom Rasterizer: 4. AgentTypeBoxRasterizer
  • 49.
    ● Predict allagent’s future coords at once, from 1 image. ● Using semantic segmentation models (segmentation-models-pytorch) ● Stopped investigation because agent sometimes exists very far from host car. Multi-agent prediction model https://self-driving.lyft.com/level5/data/
  • 50.
    ● What kindof data makes the serious big error? ● When the “yaw” annotation is wrong, prediction & actual direction becomes different! ● Fix data’s yaw field contributes total score improvement? ○ YES! for validation dataset (see below). ○ NO!! for test dataset, yaw annotation seems wrong for only stopped cars. ● In the application, I guess this is very important problem to be considered... Yaw correction Loss=43988 Loss=30962 Loss=10818
  • 51.
    ● Kaggle page:Lyft Motion Prediction for Autonomous Vehicles ● Data HP: https://self-driving.lyft.com/level5/data/ ● Solution Discussion: Lyft Motion Prediction for Autonomous Vehicles ● Solution Code: https://github.com/pfnet-research/kaggle-lyft-motion-prediction-4th-place-solution References