Large scale landuse classification of satellite imagery

Large Scale LanduseLarge Scale Landuse
Classiﬁcation of SatelliteClassiﬁcation of Satellite
ImageryImagery
Suneel MarthiSuneel Marthi
February 27, 2019February 27, 2019
Big Data Technology Summit, Warsaw, PolandBig Data Technology Summit, Warsaw, Poland
1

$WhoAmI$WhoAmI
Suneel MarthiSuneel Marthi
 @suneelmarthi@suneelmarthi
Member of Apache Software Foundation
Committer and PMC on Apache Mahout, Apache OpenNLP, Apache
Streams
2

AgendaAgenda
Introduction
Satellite Image Data Description
Cloud Classification
Segmentation
Apache Beam
Beam Inference Pipeline
Future Work
3

IntroductionIntroduction
Deep Learning has moved from Academia to IndustryDeep Learning has moved from Academia to Industry
Availability of Massive Cloud Computing PowerAvailability of Massive Cloud Computing Power
Combination of Compute Resources + Big Data withCombination of Compute Resources + Big Data with
Deep Learning models often produces useful andDeep Learning models often produces useful and
interesting applicationsinteresting applications
4

IntroductionIntroduction
Computer Vision for Satellite ImageryComputer Vision for Satellite Imagery
Availability of low cost satellite images for researchAvailability of low cost satellite images for research
Train a Deep Learning model to identify Tulip beds fromTrain a Deep Learning model to identify Tulip beds from
satellite datasatellite data
5

Data: Sentinel-2Data: Sentinel-2
Earth observation mission from ESAEarth observation mission from ESA
13 spectral bands, from RGB to SWIR (Short Wave13 spectral bands, from RGB to SWIR (Short Wave
Infrared)Infrared)
Spatial resolution: 10m/px (RGB bands)Spatial resolution: 10m/px (RGB bands)
5 day revisit time5 day revisit time
Free and open data policyFree and open data policy
6

Goal: Identify Tulip ﬁelds from Sentinel-2Goal: Identify Tulip ﬁelds from Sentinel-2
satellite imagessatellite images
8

Data acquisitionData acquisition
Images downloaded using Sentinel Hub’s WMS (webImages downloaded using Sentinel Hub’s WMS (web
mapping service)mapping service)
Download tool from Matthieu Guillaumin (@mguillau)Download tool from Matthieu Guillaumin (@mguillau)
9

DataData
256 x 256 px images, RGB256 x 256 px images, RGB
10

Filter CloudsFilter Clouds
Need to remove cloudy images before segmentingNeed to remove cloudy images before segmenting
Approach: train a Neural Network to classify images asApproach: train a Neural Network to classify images as
clear or cloudyclear or cloudy
CNN Architectures: ResNet50 and ResNet101CNN Architectures: ResNet50 and ResNet101
12

ResNet building blockResNet building block
13

Filter Clouds: training dataFilter Clouds: training data
‘Planet: Understanding the Amazon from Space’ Kaggle‘Planet: Understanding the Amazon from Space’ Kaggle
competitioncompetition
40K images labeled as clear, hazy, partly cloudy or40K images labeled as clear, hazy, partly cloudy or
cloudycloudy
14

Filter Clouds: Training data(2)Filter Clouds: Training data(2)
Origin No. of
Images
Cloudy
Images
Kaggle Competition 40000 30%
Sentinel-2(hand
labelled)
5000 50%
Total 45000 32%
Only two classes: clear and cloudy (cloudy = haze +Only two classes: clear and cloudy (cloudy = haze +
partly cloudy + cloudy)partly cloudy + cloudy)
15

Training data splitTraining data split
16

ResultsResults
Model Accuracy F1 Epochs (train +
finetune)
ResNet50 0.983 0.986 23 + 7
ResNet101 0.978 0.982 43 + 9
Choose ResNet50 for filtering cloudy imagesChoose ResNet50 for filtering cloudy images
17

Example ResultsExample Results
18

Data AugmentationData Augmentation
import Augmentor
p = Augmentor.Pipeline(img_dir)
p.skew(probability=0.5, magnitude=0.5)
p.shear(probability=0.3, max_shear=15)
p.flip_left_right(probability=0.5)
p.flip_top_bottom(probability=0.5)
p.rotate_random_90(probability=0.75)
p.rotate(probability=0.75, max_rotation=20)
19

Example Data AugmentationExample Data Augmentation
20

Segmentation GoalsSegmentation Goals
22

Approach U-NetApproach U-Net
State of the Art CNN for Image Segmentation
Commonly used with biomedical images
Best Architecture for tasks like this
O. Ronneberger, P.Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. arxiv:1505.04597, 2015O. Ronneberger, P.Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. arxiv:1505.04597, 2015
23

U-Net ArchitectureU-Net Architecture
24

U-Net Building BlocksU-Net Building Blocks
def conv_block(channels, kernel_size):
out = nn.HybridSequential()
out.add(
nn.Conv2D(channels, kernel_size, padding=1, use_bias=False
nn.BatchNorm(),
nn.Activation('relu')
)
return out
def down_block(channels):
out = nn.HybridSequential()
out.add(
conv_block(channels, 3),
conv_block(channels, 3)
)
return out
25

U-Net Building Blocks (2)U-Net Building Blocks (2)
class up_block(nn.HybridBlock):
def __init__(self, channels, shrink=True, **kwargs):
super(up_block, self).__init__(**kwargs)
self.upsampler = nn.Conv2DTranspose(channels=channels, ker
strides=2, padding=1,
self.conv1 = conv_block(channels, 1)
self.conv3_0 = conv_block(channels, 3)
if shrink:
self.conv3_1 = conv_block(int(channels/2), 3)
else:
self.conv3_1 = conv_block(channels, 3)
def hybrid_forward(self, F, x, s):
x = self.upsampler(x)
x = self.conv1(x)
x = F.relu(x)
x = F Crop(*[x s] center crop=True)
26

U-Net: Training dataU-Net: Training data
Ground truth: tulip fields in the
Netherlands
Provided by Geopedia, from
Sinergise
27

Loss function: Soft Dice Coefﬁcient lossLoss function: Soft Dice Coefﬁcient loss
Prediction = Probability of each pixel belonging to aPrediction = Probability of each pixel belonging to a
Tulip Field (Softmax output)Tulip Field (Softmax output)
ε serves to prevent division by zeroε serves to prevent division by zero
28

Evaluation Metric: Intersection Over Union(IoU)Evaluation Metric: Intersection Over Union(IoU)
AkaAka Jaccard IndexJaccard Index
Similar to Dice coefficient, standard metric for imageSimilar to Dice coefficient, standard metric for image
segmentationsegmentation
29

Evaluation Metric: Intersection Over Union(IoU)Evaluation Metric: Intersection Over Union(IoU)
30

ResultsResults
IoU = 0.73 after 23 training epochs
Related results: DSTL Kaggle competition
IoU = 0.84 on crop vs building/road/water/etc
segmentation
https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/discussion/29790https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/discussion/29790
31

No Tulip FieldsNo Tulip Fields
32

Large Tulip FieldsLarge Tulip Fields
33

Small Tulips FieldsSmall Tulips Fields
34

Multi-Spectral ImagesMulti-Spectral Images
Measures reflectances with wavelength from 440nm -
2200nm
13 bands covering - visible, near infrared and
shortwave infrared spectrum
https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/discussion/29790https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/discussion/29790
35

RGB vs MultiSpectral (Full Bloom)RGB vs MultiSpectral (Full Bloom)
37

RGB vs MultiSpectral (Full Bloom)RGB vs MultiSpectral (Full Bloom)
38

RGB vs MultiSpectral (Cloudy)RGB vs MultiSpectral (Cloudy)
39

RGB vs MultiSpectral (Cloudy)RGB vs MultiSpectral (Cloudy)
40

RGB vs MultiSpectral (Complex Tulip Fields)RGB vs MultiSpectral (Complex Tulip Fields)
41

RGB vs MultiSpectral (Complex Tulip Fields)RGB vs MultiSpectral (Complex Tulip Fields)
42

RGB vs MultiSpectral (Tulips Not Obvious)RGB vs MultiSpectral (Tulips Not Obvious)
43

RGB vs MultiSpectral (Tulips Not Obvious)RGB vs MultiSpectral (Tulips Not Obvious)
44

Comparison: RGB vs MultiSpectralComparison: RGB vs MultiSpectral
45

How to Scale - Batch or Stream ?How to Scale - Batch or Stream ?
"Batch is an extension of Streaming, except when"Batch is an extension of Streaming, except when
Streaming is an extension of Batch"Streaming is an extension of Batch"
-- Shannon Quinn, Apache Mahout-- Shannon Quinn, Apache Mahout
46

Spark or Flink ?Spark or Flink ?
"Spark Streaming is for people who want to operate on"Spark Streaming is for people who want to operate on
their streams using Batch idioms.their streams using Batch idioms.
Flink Batch is for people who want to operate on theirFlink Batch is for people who want to operate on their
batches using Streaming idioms."batches using Streaming idioms."
-- Joey Frazee, Apache NiFi-- Joey Frazee, Apache NiFi
47

What is Apache Beam?What is Apache Beam?
Agnostic (unified Batch + Stream) programming
model
Java, Python, Go SDKs
Runners for Dataflow
Apache Flink
Apache Spark
Google Cloud Dataflow
Local DataRunner
48

Why Apache Beam?Why Apache Beam?
Portability: Code abstraction that can be executed on
different backend runners
Unified: Unified batch and Streaming API
Expandable models and SDK: Extensible API to define
custom sinks and sources
49

End Users: Create
pipelines in a familiar
language
SDK Writers: Make Beam
concepts available in
new languages
Runner Writers: Support
Beam pipelines in
distributed processing
environments
The Apache Beam VisionThe Apache Beam Vision
50

Portable Beam ArchitecturePortable Beam Architecture
OverviewOverview

Inference PipelineInference Pipeline
52

Beam Inference PipelineBeam Inference Pipeline
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
pipeline_options.view_as(StandardOptions).streaming = True
with beam.Pipeline(options=pipeline_options) as p:
filtered_images = (p | "Read Images" >> beam.Create(glob.glob
| "Batch elements" >> beam.BatchElements(0, known_args.batchs
| "Filter Cloudy images" >> beam.ParDo(FilterCloudyFn.FilterC
filtered_images | "Segment for Land use" >>
beam.ParDo(UNetInference.UNetInferenceFn(known_args.m
53

Classify Rock FormationsClassify Rock Formations
Using Shortwave Infrared images (2.107 - 2.294 nm)Using Shortwave Infrared images (2.107 - 2.294 nm)
Radiant Energy reflected/transmitted per unit timeRadiant Energy reflected/transmitted per unit time
(Radiant Flux)(Radiant Flux)
Eg: Plants don't grow on rocksEg: Plants don't grow on rocks
https://en.wikipedia.org/wiki/Radiant_fluxhttps://en.wikipedia.org/wiki/Radiant_flux
55

Measure Crop HealthMeasure Crop Health
Using Near-Infrared (NIR) radiationUsing Near-Infrared (NIR) radiation
Emitted by plant Chlorophyll and MesophyllEmitted by plant Chlorophyll and Mesophyll
Chlorophyll content differs between plants and plantChlorophyll content differs between plants and plant
stagesstages
Good measure to identify different plants and theirGood measure to identify different plants and their
healthhealth
https://en.wikipedia.org/wiki/Near-infrared_spectroscopy#Agriculturehttps://en.wikipedia.org/wiki/Near-infrared_spectroscopy#Agriculture
56

Use images from Red bandUse images from Red band
Identify borders, regions without much details withIdentify borders, regions without much details with
naked eye - Wonder Why?naked eye - Wonder Why?
Images are in Red bandImages are in Red band
Unsupervised Learning - ClusteringUnsupervised Learning - Clustering
57

CreditsCredits
Jose Contreras, Matthieu Guillaumin, Kellen
Sunderland (Amazon - Berlin)
Anse Zupanc - Synergise
Apache Beam: Pablo Estrada, Łukasz Cwik, Ankur
Goenka, Maximilian Michels (Google)
Apache Flink: Fabian Hueske (Ververica)
58

LinksLinks
Earth on AWS: https://aws.amazon.com/earth/
Semantic Segmentation - U-Net:
https://medium.com/@keremturgutlu/semantic-
segmentation-u-net-part-1-d8d6f6005066
ResNet: https://arxiv.org/pdf/1512.03385.pdf
U-Net: https://arxiv.org/pdf/1505.04597.pdf
59

Links (contd)Links (contd)
Apache Beam: https://beam.apache.org
Apache Flink: https://flink.apache.org
Slides:
https://smarthi.github.io/BigDataTechWarsaw-
satellite-imagery
Code: https://github.com/smarthi/satellite-images
60

Large scale landuse classification of satellite imagery

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Large scale landuse classification of satellite imagery

Similar to Large scale landuse classification of satellite imagery (20)

More from Suneel Marthi

More from Suneel Marthi (7)

Recently uploaded

Recently uploaded (20)

Large scale landuse classification of satellite imagery