20181128 satellogic @ barcelona ai

Understanding what happens on earth
using satellites
Barcelona CityAI 2019
Albert Pujol Torras
apujol@satellogic.com https://www.linkedin.com/in/albert-pujol-torras-3a7367/

Agenda
● Satellogic
● Satellogic Data Science and Solutions
● What we can do with satellites, examples of problems we face
● What type of data do we work with ?
● Processing infrastructure, hardware and software
● Our team
● Machine learning algorithms , ...and challenges.
● Lessons learned
● Questions

Data Science & Solutions
BCN
Delivery platform
TLV
Headquarters & Design
BSAS
Manufacturing Plant
MVD
Comprehensive services
PEK

Object detection/Counting
What kinds of problems do we face ?

Object amount/density estimation / regression with lower image resolution

Estimation of other image modalities
HR RGB LR TIR LR SWIR1 LR SWIR2
HR THERMAL

Regression: time series image prediction
-Estimation of the yield at the end of the season
-Monitoring of changes in the estimation to know when and where to act.

Image semantic segmentation: Land use detection

Image semantic segmentation
Crop stage segmentation: barren, sowing, growth, blooming, senescence, harvesting

“anomaly change” and “semantic change” detection

Satellogic Data
3rd Party Satellite Data
Primary Data
Sources
Derived Layers
Temporal Evolution
Land Use Maps
Advanced Indices
Distance to Water
Terrain Orientation
Superresolution Images
...
These sources can be available globally or locally, dynamic or static, high or low res...
nKappa: Data science platform with focus on geographic data and satellite imagery.
Main goal: To scale solution development by automating/accelerating data science work.
nKappa enables solution development using aligned sets of image tiles (Kappas)
World Climate Maps
Geologic Data
Elevation Models
Georef: Man-Made Structure
Political Boundaries
Census Data Maps
Data - Data Sources

Sizes:
-Typical project: 350.000 km2, 3 times per week, 8 bands, at 10 meters per pixel resolution. 20Gb/day.
-We expect to acquire 7 Terabytes data per day by 2021.
Sources of image variation:
-Clouds….70% of the world is cloud covered.
-Perspective changes (off nadir satellite images, drone images).
-Shadows orientation, intensity, and longitude variations depending on day hour, clouds, and season.
-Chromatic changes due to aerosol and hour of day.
-Variations between sensors (different satellites, drone images,..)
-Variations/errors in image orthorectification, geolocalization.
-Growth and color vegetation changes,...
Data - Data Sources
clouds perspective shadows
Chromatic and vegetation
changes

Data - Data Sources
Super unbalanced datasets

rare and expensive: indispensable to train and to assess quality of ML and computer vision approaches.
Sources of ground truth:
- Land ground truth provided by client.
- GT generated using highest resolution imagery.
- Human annotation
- Our team always annotate ... to understand the problem.
- internal and external annotation (mechanical turk, supahands, ...)
- sample what to annotate to preserve variability and input domain coverage.
- Measure biases and variances of annotators (discard annotators, images,reconstruct annotation
instructions...).
- Other GT sources: first world surveyed data annotated from visual imagery or using land ground truth (Corina project, Creaf,
Siose in spain, USA USGS land cover dataset,...)
- Out of data, differing resolution, how to transfer it to places that differ in land management culture, climate or relief
(domain shift).
Data - Ground truth

Data: Covariate shift & Domain adaptation
Existent “good quality” Ground Truth
Rice fields in Europe
Target areas without ground truth
Urban areas in Europe Urban areas in Lagos
Rice fields in China

● huge amount of data --> cloud infrastructure.
● nkappa platform for distributed processing (actually using Microsoft Azure)
and in-house gpu servers (equipped with 1080ti’s)
● Nkappa is used both in development and production stages.
● GPU-servers mostly used in the stage of EDA and DS algorithms and models
development.
● Cloud infrastructure mainly used to keep track, team share, and audit
datasets, algorithms and models putting pipelines and models in production.
Infrastructure - Hardware

Infraestructure: Software
Data scientist scripts
Infrastructure - Software
GIS Processing & remote sensing Rasterio, telluric,...
Distributed processing
nkappa
Trace, reuse and audit experiments, datasets, pipelines and models
Accelerate ds experimentation on remote sensing
Automatize insertion of new pipelines into production environment.

Our team Profile
19
Our development team
Data Scientists
Platform developers
Computer Vision, Machine Learning
specialists.
Additional background on remote
sensing.
● Strong python developers,
knowledge on machine
learning, computer vision
/image processing
● DevOps
● Front-end developers
● GIS python developers
specialists.
Solutions started 1 year and a half ago...
We are currently 13 and we are hiring !!

Algorithms- Computer vision algorithms, ML machinery from logistic regression, random forest to to the latest
deep NN.
- Training with tailored datasets using a smart sampling policy to maintain the input and output
variability of the original Datasets.
- We prefer Context knowledge + common sense heuristics + ML methods rather than pursuing end-
to-end Neural Networks (unless you are absolutely sure you have all relevant sources of image
variations in your training set and you are sure that your data augmentation policy is not biassing)
- Random forests, CNNs and variations of Unets alone or in ensembles, are the most used
algorithms by our team.
- Relevant lines of research:
- Generative models for Data augmentation, ground truth generation and hyper resolution.
- Transfer learning / Domain adaptation.
- Satellital Image invariant and efficient image Embeddings and Distance Metric Learning.
ML Algorithms: What we use

Lessons learned
- Project success :
- 5% ML algorithm and algorithm parameters selection,
- 95% really understanding what the client needs, how to generate value, and anticipate how
your output is going to be consume, defining good features, good ground truth, good sampling
data policy, pre and post processing.
- Dedicate the time first to ensure success, … after that improve:
- Using fast ML algorithms.
- Starting with small datasets with the input and output variability of the original one.
- Worth invest on automatically measure dataset quality before start training on big datasets.
- Missing values, constant variables, unaligned bands, duplicated variables, unbalancing…

20181128 satellogic @ barcelona ai

Recommended

Recommended

More Related Content

Similar to 20181128 satellogic @ barcelona ai

Similar to 20181128 satellogic @ barcelona ai (20)

Recently uploaded

Recently uploaded (20)

20181128 satellogic @ barcelona ai