SlideShare a Scribd company logo
1 of 22
Understanding what happens on earth
using satellites
Barcelona CityAI 2019
Albert Pujol Torras
apujol@satellogic.com https://www.linkedin.com/in/albert-pujol-torras-3a7367/
Agenda
● Satellogic
● Satellogic Data Science and Solutions
● What we can do with satellites, examples of problems we face
● What type of data do we work with ?
● Processing infrastructure, hardware and software
● Our team
● Machine learning algorithms , ...and challenges.
● Lessons learned
● Questions
Data Science & Solutions
BCN
Delivery platform
TLV
Headquarters & Design
BSAS
Manufacturing Plant
MVD
Comprehensive services
PEK
Object detection/Counting
What kinds of problems do we face ?
Object amount/density estimation / regression with lower image resolution
Estimation of other image modalities
HR RGB LR TIR LR SWIR1 LR SWIR2
HR THERMAL
Regression: time series image prediction
-Estimation of the yield at the end of the season
-Monitoring of changes in the estimation to know when and where to act.
Image semantic segmentation: Land use detection
Image semantic segmentation
Crop stage segmentation: barren, sowing, growth, blooming, senescence, harvesting
“anomaly change” and “semantic change” detection
Satellogic Data
3rd Party Satellite Data
Primary Data
Sources
Derived Layers
Temporal Evolution
Land Use Maps
Advanced Indices
Distance to Water
Terrain Orientation
Superresolution Images
...
These sources can be available globally or locally, dynamic or static, high or low res...
nKappa: Data science platform with focus on geographic data and satellite imagery.
Main goal: To scale solution development by automating/accelerating data science work.
nKappa enables solution development using aligned sets of image tiles (Kappas)
World Climate Maps
Geologic Data
Elevation Models
Georef: Man-Made Structure
Political Boundaries
Census Data Maps
Data - Data Sources
Sizes:
-Typical project: 350.000 km2, 3 times per week, 8 bands, at 10 meters per pixel resolution. 20Gb/day.
-We expect to acquire 7 Terabytes data per day by 2021.
Sources of image variation:
-Clouds….70% of the world is cloud covered.
-Perspective changes (off nadir satellite images, drone images).
-Shadows orientation, intensity, and longitude variations depending on day hour, clouds, and season.
-Chromatic changes due to aerosol and hour of day.
-Variations between sensors (different satellites, drone images,..)
-Variations/errors in image orthorectification, geolocalization.
-Growth and color vegetation changes,...
Data - Data Sources
clouds perspective shadows
Chromatic and vegetation
changes
Data - Data Sources
Super unbalanced datasets
rare and expensive: indispensable to train and to assess quality of ML and computer vision approaches.
Sources of ground truth:
- Land ground truth provided by client.
- GT generated using highest resolution imagery.
- Human annotation
- Our team always annotate ... to understand the problem.
- internal and external annotation (mechanical turk, supahands, ...)
- sample what to annotate to preserve variability and input domain coverage.
- Measure biases and variances of annotators (discard annotators, images,reconstruct annotation
instructions...).
- Other GT sources: first world surveyed data annotated from visual imagery or using land ground truth (Corina project, Creaf,
Siose in spain, USA USGS land cover dataset,...)
- Out of data, differing resolution, how to transfer it to places that differ in land management culture, climate or relief
(domain shift).
Data - Ground truth
Data: Covariate shift & Domain adaptation
Existent “good quality” Ground Truth
Rice fields in Europe
Target areas without ground truth
Urban areas in Europe Urban areas in Lagos
Rice fields in China
● huge amount of data --> cloud infrastructure.
● nkappa platform for distributed processing (actually using Microsoft Azure)
and in-house gpu servers (equipped with 1080ti’s)
● Nkappa is used both in development and production stages.
● GPU-servers mostly used in the stage of EDA and DS algorithms and models
development.
● Cloud infrastructure mainly used to keep track, team share, and audit
datasets, algorithms and models putting pipelines and models in production.
Infrastructure - Hardware
Infraestructure: Software
Data scientist scripts
Infrastructure - Software
GIS Processing & remote sensing Rasterio, telluric,...
Distributed processing
nkappa
Trace, reuse and audit experiments, datasets, pipelines and models
Accelerate ds experimentation on remote sensing
Automatize insertion of new pipelines into production environment.
Our team Profile
19
Our development team
Data Scientists
Platform developers
Computer Vision, Machine Learning
specialists.
Additional background on remote
sensing.
● Strong python developers,
knowledge on machine
learning, computer vision
/image processing
● DevOps
● Front-end developers
● GIS python developers
specialists.
Solutions started 1 year and a half ago...
We are currently 13 and we are hiring !!
Algorithms- Computer vision algorithms, ML machinery from logistic regression, random forest to to the latest
deep NN.
- Training with tailored datasets using a smart sampling policy to maintain the input and output
variability of the original Datasets.
- We prefer Context knowledge + common sense heuristics + ML methods rather than pursuing end-
to-end Neural Networks (unless you are absolutely sure you have all relevant sources of image
variations in your training set and you are sure that your data augmentation policy is not biassing)
- Random forests, CNNs and variations of Unets alone or in ensembles, are the most used
algorithms by our team.
- Relevant lines of research:
- Generative models for Data augmentation, ground truth generation and hyper resolution.
- Transfer learning / Domain adaptation.
- Satellital Image invariant and efficient image Embeddings and Distance Metric Learning.
ML Algorithms: What we use
Lessons learned
- Project success :
- 5% ML algorithm and algorithm parameters selection,
- 95% really understanding what the client needs, how to generate value, and anticipate how
your output is going to be consume, defining good features, good ground truth, good sampling
data policy, pre and post processing.
- Dedicate the time first to ensure success, … after that improve:
- Using fast ML algorithms.
- Starting with small datasets with the input and output variability of the original one.
- Worth invest on automatically measure dataset quality before start training on big datasets.
- Missing values, constant variables, unaligned bands, duplicated variables, unbalancing…
Questions ?

More Related Content

Similar to 20181128 satellogic @ barcelona ai

Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
DataWorks Summit
 
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
Codiax
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 

Similar to 20181128 satellogic @ barcelona ai (20)

Desktop Softwares for Unmanned Aerial Systems(UAS))
Desktop Softwares for Unmanned Aerial Systems(UAS))Desktop Softwares for Unmanned Aerial Systems(UAS))
Desktop Softwares for Unmanned Aerial Systems(UAS))
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same Game
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Arpan pal roboticsensing_sw2015
Arpan pal roboticsensing_sw2015Arpan pal roboticsensing_sw2015
Arpan pal roboticsensing_sw2015
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
Deep Learning Applications to Satellite Imagery
Deep Learning Applications to Satellite ImageryDeep Learning Applications to Satellite Imagery
Deep Learning Applications to Satellite Imagery
 
Image Analytics In Healthcare
Image Analytics In HealthcareImage Analytics In Healthcare
Image Analytics In Healthcare
 
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Sat...
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
The unknown spatial quality of dense point clouds derived from stereo images
The unknown spatial quality of dense point clouds derived from stereo imagesThe unknown spatial quality of dense point clouds derived from stereo images
The unknown spatial quality of dense point clouds derived from stereo images
 
Webinar: Machine Learning para Microcontroladores
Webinar: Machine Learning para MicrocontroladoresWebinar: Machine Learning para Microcontroladores
Webinar: Machine Learning para Microcontroladores
 
Deeplearning in finance
Deeplearning in financeDeeplearning in finance
Deeplearning in finance
 

Recently uploaded

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 

Recently uploaded (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 

20181128 satellogic @ barcelona ai

  • 1. Understanding what happens on earth using satellites Barcelona CityAI 2019 Albert Pujol Torras apujol@satellogic.com https://www.linkedin.com/in/albert-pujol-torras-3a7367/
  • 2. Agenda ● Satellogic ● Satellogic Data Science and Solutions ● What we can do with satellites, examples of problems we face ● What type of data do we work with ? ● Processing infrastructure, hardware and software ● Our team ● Machine learning algorithms , ...and challenges. ● Lessons learned ● Questions
  • 3.
  • 4. Data Science & Solutions BCN Delivery platform TLV Headquarters & Design BSAS Manufacturing Plant MVD Comprehensive services PEK
  • 5. Object detection/Counting What kinds of problems do we face ?
  • 6. Object amount/density estimation / regression with lower image resolution
  • 7. Estimation of other image modalities HR RGB LR TIR LR SWIR1 LR SWIR2 HR THERMAL
  • 8. Regression: time series image prediction -Estimation of the yield at the end of the season -Monitoring of changes in the estimation to know when and where to act.
  • 9. Image semantic segmentation: Land use detection
  • 10. Image semantic segmentation Crop stage segmentation: barren, sowing, growth, blooming, senescence, harvesting
  • 11. “anomaly change” and “semantic change” detection
  • 12. Satellogic Data 3rd Party Satellite Data Primary Data Sources Derived Layers Temporal Evolution Land Use Maps Advanced Indices Distance to Water Terrain Orientation Superresolution Images ... These sources can be available globally or locally, dynamic or static, high or low res... nKappa: Data science platform with focus on geographic data and satellite imagery. Main goal: To scale solution development by automating/accelerating data science work. nKappa enables solution development using aligned sets of image tiles (Kappas) World Climate Maps Geologic Data Elevation Models Georef: Man-Made Structure Political Boundaries Census Data Maps Data - Data Sources
  • 13. Sizes: -Typical project: 350.000 km2, 3 times per week, 8 bands, at 10 meters per pixel resolution. 20Gb/day. -We expect to acquire 7 Terabytes data per day by 2021. Sources of image variation: -Clouds….70% of the world is cloud covered. -Perspective changes (off nadir satellite images, drone images). -Shadows orientation, intensity, and longitude variations depending on day hour, clouds, and season. -Chromatic changes due to aerosol and hour of day. -Variations between sensors (different satellites, drone images,..) -Variations/errors in image orthorectification, geolocalization. -Growth and color vegetation changes,... Data - Data Sources clouds perspective shadows Chromatic and vegetation changes
  • 14. Data - Data Sources Super unbalanced datasets
  • 15. rare and expensive: indispensable to train and to assess quality of ML and computer vision approaches. Sources of ground truth: - Land ground truth provided by client. - GT generated using highest resolution imagery. - Human annotation - Our team always annotate ... to understand the problem. - internal and external annotation (mechanical turk, supahands, ...) - sample what to annotate to preserve variability and input domain coverage. - Measure biases and variances of annotators (discard annotators, images,reconstruct annotation instructions...). - Other GT sources: first world surveyed data annotated from visual imagery or using land ground truth (Corina project, Creaf, Siose in spain, USA USGS land cover dataset,...) - Out of data, differing resolution, how to transfer it to places that differ in land management culture, climate or relief (domain shift). Data - Ground truth
  • 16. Data: Covariate shift & Domain adaptation Existent “good quality” Ground Truth Rice fields in Europe Target areas without ground truth Urban areas in Europe Urban areas in Lagos Rice fields in China
  • 17. ● huge amount of data --> cloud infrastructure. ● nkappa platform for distributed processing (actually using Microsoft Azure) and in-house gpu servers (equipped with 1080ti’s) ● Nkappa is used both in development and production stages. ● GPU-servers mostly used in the stage of EDA and DS algorithms and models development. ● Cloud infrastructure mainly used to keep track, team share, and audit datasets, algorithms and models putting pipelines and models in production. Infrastructure - Hardware
  • 18. Infraestructure: Software Data scientist scripts Infrastructure - Software GIS Processing & remote sensing Rasterio, telluric,... Distributed processing nkappa Trace, reuse and audit experiments, datasets, pipelines and models Accelerate ds experimentation on remote sensing Automatize insertion of new pipelines into production environment.
  • 19. Our team Profile 19 Our development team Data Scientists Platform developers Computer Vision, Machine Learning specialists. Additional background on remote sensing. ● Strong python developers, knowledge on machine learning, computer vision /image processing ● DevOps ● Front-end developers ● GIS python developers specialists. Solutions started 1 year and a half ago... We are currently 13 and we are hiring !!
  • 20. Algorithms- Computer vision algorithms, ML machinery from logistic regression, random forest to to the latest deep NN. - Training with tailored datasets using a smart sampling policy to maintain the input and output variability of the original Datasets. - We prefer Context knowledge + common sense heuristics + ML methods rather than pursuing end- to-end Neural Networks (unless you are absolutely sure you have all relevant sources of image variations in your training set and you are sure that your data augmentation policy is not biassing) - Random forests, CNNs and variations of Unets alone or in ensembles, are the most used algorithms by our team. - Relevant lines of research: - Generative models for Data augmentation, ground truth generation and hyper resolution. - Transfer learning / Domain adaptation. - Satellital Image invariant and efficient image Embeddings and Distance Metric Learning. ML Algorithms: What we use
  • 21. Lessons learned - Project success : - 5% ML algorithm and algorithm parameters selection, - 95% really understanding what the client needs, how to generate value, and anticipate how your output is going to be consume, defining good features, good ground truth, good sampling data policy, pre and post processing. - Dedicate the time first to ensure success, … after that improve: - Using fast ML algorithms. - Starting with small datasets with the input and output variability of the original one. - Worth invest on automatically measure dataset quality before start training on big datasets. - Missing values, constant variables, unaligned bands, duplicated variables, unbalancing…