Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk

78 views

Published on

Presentation by Maarten Pronk (Deltares) at the Data Science Symposium 2018, during Delft Software Days - Edition 2018. Thursday 15 November 2018, Delft.

Published in: Software
  • Be the first to comment

  • Be the first to like this

DSD-INT 2018 Realtime classification of lidar pointclouds - Pronk

  1. 1. Realtime classification of pointclouds DSD-INT 2018 Maarten Pronk
  2. 2. Index • Pointclouds • Applications • Issues • Classification • Feature engineering • “Realtime” • Streaming algorithms
  3. 3. Pointclouds • X Y Z • Millions, billions • LiDAR • ALS, TLS
  4. 4. Applications Digital Terrain Maps (DTM) Forestry management Forests in Indonesia • DTM, CHM, water detection, canal depth detection Resilient infrastructure • Anywhere • Roads
  5. 5. GBs of data (AHN2 is several TB) Larger than memory (tiling) Classification is required for derived products • Filtering for ground • Normalizing for tree height We want faster results, doable on your laptop Issues
  6. 6. Solution? Streaming approach • Process only a few points at a time • Skip tiling • Only local operations Machine learning • Training on existing datasets • Classification is done instantly • Does it generalize?
  7. 7. Machine learning All about the data (features) AHN2 dataset • Ground, buildings and water • No roads, nor trees (up to 50%!) Vaihingen 3D labeled dataset • ASPRS reference dataset • Many classifications but small Semantic 3D dataset • TLS reference datasets for NN • Very large, density issues
  8. 8. Features • XYZ • Intensity • # return | total returns Not enough to do multi-label classification
  9. 9. Features Derived features • Height above ground • Principal Component Analysis • K nearest neighbours • Geometric distribution • Different scales (radius)
  10. 10. Features PCA • Three values? • Omni variance • 3 𝑙1 𝑙2 𝑙3 • 0.98, 0.01, 0.01 → 0,05 1D • 0.48, 0.48, 0.04 → 0,15 2D • 0.33, 0.33, 0.33 → 0.33 3D
  11. 11. Feature selection before training Which scales to use? • More radii • Compute time Importance analysis • Pairplots
  12. 12. Feature selection after training PCA is not enough • Flat surfaces? Importance analysis • Times used in tree • Confusion matrices
  13. 13. Current results (original)
  14. 14. Current results (classified)
  15. 15. Current results Good: • Roads • Shrubs • Roofs • Facades Ok: • Low vegetation • Trees Bad: • Cars • Fences
  16. 16. Streaming approach First pass • Determine raster based on bounding box • Determine raster cells for all points (index) > [1,1,2,2,2,3,4,3,4,4,4] • Determine number of points for each cell > 2 3 2 0 0 .. 4 .. .. ..
  17. 17. Streaming approach Second pass • Start storing points in memory • Until one cell is completely full • Process all points in cell and classify • Write results to disk • Remove points from memory • Spatial coherence (ordering) 1,2 3,4,5 5,7 .. .. .. 6,8,9, …
  18. 18. Streaming approach Process all points in cell and classify • For each point, take nearest neighbors • Calculate new attribute(s) based on these points • Normalize attribute(s) • Classify based on new attribute(s) • Several small k-d trees for number of raster cells • Classification done using gradient boosted trees
  19. 19. Streaming approach Implemented in Julia ✓ Performance dependent on • Number of scales • Size of each scale Current result dataset • Worst case • 40000 points/s • Several million per minute
  20. 20. Lessons • Workflow for high number of iterations • Data preparation (feature engineering) is important • Training data has biases • Generalizing is hard • Near realtime classification of pointclouds is possible
  21. 21. Questions?

×