How Satellogic uses AI to understand the world to provide knowledge to our clients using satellites. The presentation was done at the Barcelona CitiAi meetup in January 2019.
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
20181128 satellogic @ barcelona ai
1. Understanding what happens on earth
using satellites
Barcelona CityAI 2019
Albert Pujol Torras
apujol@satellogic.com https://www.linkedin.com/in/albert-pujol-torras-3a7367/
2. Agenda
● Satellogic
● Satellogic Data Science and Solutions
● What we can do with satellites, examples of problems we face
● What type of data do we work with ?
● Processing infrastructure, hardware and software
● Our team
● Machine learning algorithms , ...and challenges.
● Lessons learned
● Questions
3.
4. Data Science & Solutions
BCN
Delivery platform
TLV
Headquarters & Design
BSAS
Manufacturing Plant
MVD
Comprehensive services
PEK
7. Estimation of other image modalities
HR RGB LR TIR LR SWIR1 LR SWIR2
HR THERMAL
8. Regression: time series image prediction
-Estimation of the yield at the end of the season
-Monitoring of changes in the estimation to know when and where to act.
12. Satellogic Data
3rd Party Satellite Data
Primary Data
Sources
Derived Layers
Temporal Evolution
Land Use Maps
Advanced Indices
Distance to Water
Terrain Orientation
Superresolution Images
...
These sources can be available globally or locally, dynamic or static, high or low res...
nKappa: Data science platform with focus on geographic data and satellite imagery.
Main goal: To scale solution development by automating/accelerating data science work.
nKappa enables solution development using aligned sets of image tiles (Kappas)
World Climate Maps
Geologic Data
Elevation Models
Georef: Man-Made Structure
Political Boundaries
Census Data Maps
Data - Data Sources
13. Sizes:
-Typical project: 350.000 km2, 3 times per week, 8 bands, at 10 meters per pixel resolution. 20Gb/day.
-We expect to acquire 7 Terabytes data per day by 2021.
Sources of image variation:
-Clouds….70% of the world is cloud covered.
-Perspective changes (off nadir satellite images, drone images).
-Shadows orientation, intensity, and longitude variations depending on day hour, clouds, and season.
-Chromatic changes due to aerosol and hour of day.
-Variations between sensors (different satellites, drone images,..)
-Variations/errors in image orthorectification, geolocalization.
-Growth and color vegetation changes,...
Data - Data Sources
clouds perspective shadows
Chromatic and vegetation
changes
15. rare and expensive: indispensable to train and to assess quality of ML and computer vision approaches.
Sources of ground truth:
- Land ground truth provided by client.
- GT generated using highest resolution imagery.
- Human annotation
- Our team always annotate ... to understand the problem.
- internal and external annotation (mechanical turk, supahands, ...)
- sample what to annotate to preserve variability and input domain coverage.
- Measure biases and variances of annotators (discard annotators, images,reconstruct annotation
instructions...).
- Other GT sources: first world surveyed data annotated from visual imagery or using land ground truth (Corina project, Creaf,
Siose in spain, USA USGS land cover dataset,...)
- Out of data, differing resolution, how to transfer it to places that differ in land management culture, climate or relief
(domain shift).
Data - Ground truth
16. Data: Covariate shift & Domain adaptation
Existent “good quality” Ground Truth
Rice fields in Europe
Target areas without ground truth
Urban areas in Europe Urban areas in Lagos
Rice fields in China
17. ● huge amount of data --> cloud infrastructure.
● nkappa platform for distributed processing (actually using Microsoft Azure)
and in-house gpu servers (equipped with 1080ti’s)
● Nkappa is used both in development and production stages.
● GPU-servers mostly used in the stage of EDA and DS algorithms and models
development.
● Cloud infrastructure mainly used to keep track, team share, and audit
datasets, algorithms and models putting pipelines and models in production.
Infrastructure - Hardware
18. Infraestructure: Software
Data scientist scripts
Infrastructure - Software
GIS Processing & remote sensing Rasterio, telluric,...
Distributed processing
nkappa
Trace, reuse and audit experiments, datasets, pipelines and models
Accelerate ds experimentation on remote sensing
Automatize insertion of new pipelines into production environment.
19. Our team Profile
19
Our development team
Data Scientists
Platform developers
Computer Vision, Machine Learning
specialists.
Additional background on remote
sensing.
● Strong python developers,
knowledge on machine
learning, computer vision
/image processing
● DevOps
● Front-end developers
● GIS python developers
specialists.
Solutions started 1 year and a half ago...
We are currently 13 and we are hiring !!
20. Algorithms- Computer vision algorithms, ML machinery from logistic regression, random forest to to the latest
deep NN.
- Training with tailored datasets using a smart sampling policy to maintain the input and output
variability of the original Datasets.
- We prefer Context knowledge + common sense heuristics + ML methods rather than pursuing end-
to-end Neural Networks (unless you are absolutely sure you have all relevant sources of image
variations in your training set and you are sure that your data augmentation policy is not biassing)
- Random forests, CNNs and variations of Unets alone or in ensembles, are the most used
algorithms by our team.
- Relevant lines of research:
- Generative models for Data augmentation, ground truth generation and hyper resolution.
- Transfer learning / Domain adaptation.
- Satellital Image invariant and efficient image Embeddings and Distance Metric Learning.
ML Algorithms: What we use
21. Lessons learned
- Project success :
- 5% ML algorithm and algorithm parameters selection,
- 95% really understanding what the client needs, how to generate value, and anticipate how
your output is going to be consume, defining good features, good ground truth, good sampling
data policy, pre and post processing.
- Dedicate the time first to ensure success, … after that improve:
- Using fast ML algorithms.
- Starting with small datasets with the input and output variability of the original one.
- Worth invest on automatically measure dataset quality before start training on big datasets.
- Missing values, constant variables, unaligned bands, duplicated variables, unbalancing…