[2024]Digital Global Overview Report 2024 Meltwater.pdf
Looking from Above: Object Detection and Other Computer Vision Tasks on Satellite Imagery
1. Looking from Above:
Object Detection and Other
Computer Vision Tasks on Satellite
Imagery
Xiaoyong Zhu, Siyu Yang
Microsoft
2.
3.
4. Satellite Imagery of Las Vegas from 1984-2018
Source: http://world.time.com/timelapse/
5. Environmentally related
Others
Cars Overhead with Context
(2016)
Microsoft project on drones for power line
maintenance
(2016)
SpaceNet for building and road extraction
(DeepGlobe workshop at CVPR 2018)
(2016, ongoing)
NWPU-RESISC45 dataset for satellite scene
classification
(2017)
IARPA Functional Map of the World Challenge
(2017-18)
xView
(2018)
Airbus ship detection on Kaggle
(2018)
TGS salt identification from seismic images –
A mining application
• Planet: Understanding the
Amazon from Space –
Kaggle competition
• Classification of chip
into 12 classes
related to rainforest
health
• NOAA drone and aerial
datasets on monk seal and
right whales for
population counting and
identification
• Various land use
segmentation datasets
(DeepGlobe, UC Merced,
Chesapeake Conservancy)
6. Counting caribou from aerial
imagery (AI for Wildlife
Conservation 2018)
NOAA aircraft surveys use
images to count sea lions,
seals and polar bears
7.
8.
9. The xView Dataset
Released by Defense Innovation Unit
Experimental (DIUx) to advance
computer vision solutions for national
defense and disaster response.
60 classes, 1 million labeled instances,
object detection (bounding boxes)
tasks
xView compared to other computer vision contests, measured by number of labeled instances and
number of classes
Fixed-wing aircraft, Small aircraft, Cargo/passenger planes, Helicopter
Building, Shed, Damaged building, Facility, Shipping container, Shipping
container lot, Vehicle lot, Construction site, Helipad, Pylon, Storage tank
Motorboat, Sailboat, Tugboat, Barge, Fishing vessel, Ferry, Yacht, Container
ship, Oil tanker
Pickup truck, Utility truck, Cargo truck, Truck with box, Truck with Flatbed,
Truck with Liquid, Truck tractor trailer, Cement mixer, Mobile Crane, Straddle
carrier, Excavator, Small car…
10. vs
1. Class imbalance 2. Fine-grained classification required (Cargo Truck and Pickup Truck)
3. Large scale
variance
4. Input labeled images are
very large (approx. 5000 x
5000 pixels)
5. Label quality
11. 846 released images
Shuffle and split
Approx.5000
pixels
Approx. 5000 pixels
596 training images
250 validation images
while making
sure all 60
categories
represented in
both training and
validation sets
16. Type of detector Intuition Feature Classic models
One stage Unified proposal and
classification network
Faster at inference Single Shot MultiBox
Detector (SSD), YOLO
Two stage Separate proposal and
classification network
Higher accuracy Faster R-CNN, 2015
18. Model Name overall mAP small mAP medium mAP large mAP
SSD + Inception v2 0.16 0.12 0.19 0.18
Faster RCNN + FPN + Deformable Operators
0.19
(+18.75%)
0.17 0.22 0.21
SNIP/SNIPER (+ Data augmentation)
0.22
(+15.79%)
0.17 0.23 0.28
19.
20.
21.
22. DATA BUILD TRAIN DEPLOY
1M Objects
60 Classes
0.3 Meter resolution
1,415 square
kilometers
Azure Machine
Learning Service
Data Science
VM
SNIPER as the training model
(a variant of Faster R-CNN)
Stored in Azure BLOB storage
Edge devices - iOS
Cloud – Azure Machine
Learning/Azure Kubernetes Service
23. • A range of detector and
backbone networks to
choose from
• Hyperparameter selected
via a config file
• Mature and relatively well
maintained
• Hard to customize
• Current version does not
support distributed training
• Integrates with
distributed training
framework Horovod
• Poor GPU utilization
(since improved)
• Has a small
community around it
• State-of-the-art result
on COCO
• Does not integrate
with other network
architecture
implementations
24.
25.
26. • Model is deployed to iOS devices so
field workers can use this to see real
time analysis
• iOS by interns in Microsoft Garage
Team
• https://www.microsoft.com/en-
us/garage/profiles/earth-lens/
27. • Model is also deployed to Azure Machine Learning Service for scalable
inferencing