2. Dataset
● 38 tiles of 6000x6000px Orthorectified RGB Imagery
● DSM: ground height removed for each pixel => height above the ground
● Ground Sampling Distance of 5cm for both DSM and TOP
● Two classes:
○ Background
○ Building
● Pre-processing:
○ Each tile is split into patches of 2048x2048 pixels
○ Resized down to 256x256
3. Baseline Architecture: Unet-ResNet34
● Backbone: Unet - https://arxiv.org/abs/1505.04597
● Encoder: 34 Layers Residual Network - https://arxiv.org/abs/1512.03385
● Layer fusion strategy: Conv2d Fusion
○ First stack feature maps
○ Then run a 2d convolution with 1x1 kernel filter
5. Model: 1-Channel DSM Only
● Input: DSM
● Weight Initialization: average of RGB channels from ImageNet pre-trained weights
● IOU: 0.832
6. Model: Elevation Detection Pre-training
● Pre-training phase: train an elevation detection model
● Re-training the model on ground truth
● IOU: 0.829