Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Realtime
classification
of pointclouds
DSD-INT 2018
Maarten Pronk
Index
• Pointclouds
• Applications
• Issues
• Classification
• Feature engineering
• “Realtime”
• Streaming algorithms
Pointclouds
• X Y Z
• Millions, billions
• LiDAR
• ALS, TLS
Applications
Digital Terrain Maps (DTM)
Forestry management
Forests in Indonesia
• DTM, CHM, water detection, canal depth ...
GBs of data (AHN2 is several TB)
Larger than memory (tiling)
Classification is required for derived products
• Filtering f...
Solution?
Streaming approach
• Process only a few points at a time
• Skip tiling
• Only local operations
Machine learning
...
Machine learning
All about the data (features)
AHN2 dataset
• Ground, buildings and water
• No roads, nor trees (up to 50%...
Features
• XYZ
• Intensity
• # return | total returns
Not enough to
do multi-label classification
Features
Derived features
• Height above ground
• Principal Component Analysis
• K nearest neighbours
• Geometric distribu...
Features
PCA
• Three values?
• Omni variance
• 3
𝑙1 𝑙2 𝑙3
• 0.98, 0.01, 0.01 → 0,05 1D
• 0.48, 0.48, 0.04 → 0,15 2D
• 0.33...
Feature selection before training
Which scales to use?
• More radii
• Compute time
Importance analysis
• Pairplots
Feature selection after training
PCA is not enough
• Flat surfaces?
Importance analysis
• Times used in tree
• Confusion m...
Current results (original)
Current results (classified)
Current results
Good:
• Roads
• Shrubs
• Roofs
• Facades
Ok:
• Low vegetation
• Trees
Bad:
• Cars
• Fences
Streaming approach
First pass
• Determine raster based on bounding box
• Determine raster cells for all points (index)
> [...
Streaming approach
Second pass
• Start storing points in memory
• Until one cell is completely full
• Process all points i...
Streaming approach
Process all points in cell and classify
• For each point, take nearest neighbors
• Calculate new attrib...
Streaming approach
Implemented in Julia ✓
Performance dependent on
• Number of scales
• Size of each scale
Current result ...
Lessons
• Workflow for high number of iterations
• Data preparation (feature engineering) is important
• Training data has...
Questions?
DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk
DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk
Upcoming SlideShare
Loading in …5
×

DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk

118 views

Published on

Presentation by Maarten Pronk (Deltares) at the Data Science Symposium 2018, during Delft Software Days - Edition 2018. Thursday 15 November 2018, Delft.

Published in: Software
  • Be the first to comment

  • Be the first to like this

DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk

  1. 1. Realtime classification of pointclouds DSD-INT 2018 Maarten Pronk
  2. 2. Index • Pointclouds • Applications • Issues • Classification • Feature engineering • “Realtime” • Streaming algorithms
  3. 3. Pointclouds • X Y Z • Millions, billions • LiDAR • ALS, TLS
  4. 4. Applications Digital Terrain Maps (DTM) Forestry management Forests in Indonesia • DTM, CHM, water detection, canal depth detection Resilient infrastructure • Anywhere • Roads
  5. 5. GBs of data (AHN2 is several TB) Larger than memory (tiling) Classification is required for derived products • Filtering for ground • Normalizing for tree height We want faster results, doable on your laptop Issues
  6. 6. Solution? Streaming approach • Process only a few points at a time • Skip tiling • Only local operations Machine learning • Training on existing datasets • Classification is done instantly • Does it generalize?
  7. 7. Machine learning All about the data (features) AHN2 dataset • Ground, buildings and water • No roads, nor trees (up to 50%!) Vaihingen 3D labeled dataset • ASPRS reference dataset • Many classifications but small Semantic 3D dataset • TLS reference datasets for NN • Very large, density issues
  8. 8. Features • XYZ • Intensity • # return | total returns Not enough to do multi-label classification
  9. 9. Features Derived features • Height above ground • Principal Component Analysis • K nearest neighbours • Geometric distribution • Different scales (radius)
  10. 10. Features PCA • Three values? • Omni variance • 3 𝑙1 𝑙2 𝑙3 • 0.98, 0.01, 0.01 → 0,05 1D • 0.48, 0.48, 0.04 → 0,15 2D • 0.33, 0.33, 0.33 → 0.33 3D
  11. 11. Feature selection before training Which scales to use? • More radii • Compute time Importance analysis • Pairplots
  12. 12. Feature selection after training PCA is not enough • Flat surfaces? Importance analysis • Times used in tree • Confusion matrices
  13. 13. Current results (original)
  14. 14. Current results (classified)
  15. 15. Current results Good: • Roads • Shrubs • Roofs • Facades Ok: • Low vegetation • Trees Bad: • Cars • Fences
  16. 16. Streaming approach First pass • Determine raster based on bounding box • Determine raster cells for all points (index) > [1,1,2,2,2,3,4,3,4,4,4] • Determine number of points for each cell > 2 3 2 0 0 .. 4 .. .. ..
  17. 17. Streaming approach Second pass • Start storing points in memory • Until one cell is completely full • Process all points in cell and classify • Write results to disk • Remove points from memory • Spatial coherence (ordering) 1,2 3,4,5 5,7 .. .. .. 6,8,9, …
  18. 18. Streaming approach Process all points in cell and classify • For each point, take nearest neighbors • Calculate new attribute(s) based on these points • Normalize attribute(s) • Classify based on new attribute(s) • Several small k-d trees for number of raster cells • Classification done using gradient boosted trees
  19. 19. Streaming approach Implemented in Julia ✓ Performance dependent on • Number of scales • Size of each scale Current result dataset • Worst case • 40000 points/s • Several million per minute
  20. 20. Lessons • Workflow for high number of iterations • Data preparation (feature engineering) is important • Training data has biases • Generalizing is hard • Near realtime classification of pointclouds is possible
  21. 21. Questions?

×