SlideShare a Scribd company logo
1 of 13
Download to read offline
May 18, 2021
Big Data from Space 2021
Manolis Koubarakis
National and Kapodistrian University of Athens
Artificial Intelligence and Big Data Technologies for
Copernicus Data: the ExtremeEarth Project
2
The ExtremeEarth Project
• I will present the project “ExtremeEarth: From Copernicus Big Data to Extreme Earth Analytics”
funded under the H2020 call ICT-12-2018-2020: “Big Data technologies and extreme-scale analytics”
• The project started in January 2019 and finishes in December 2021.
• See http://earthanalytics.eu/ .
3
ExtremeEarth Main Objective
• The main objective of ExtremeEarth is to develop Artificial Intelligence and Big Data techniques and
technologies that scale to the PBs of big Copernicus data, information and knowledge, and apply these
technologies in two of the ESA Thematic Exploitation Platforms: Food Security and Polar.
• The technologies to be developed will extend the European Hopsworks data intensive AI platform of
partner Logical Clocks to offer unprecedented scalability to extreme data volumes and scale-out
distributed deep learning for Copernicus data.
• The extended Hopsworks platform will run on CREODIAS and will be available as open source to enable
its adoption by the strong European Earth Observation downstream services industry.
• The technologies to be developed will also extend the linked geospatial data systems GeoTriples,
JedAI, Strabon and SemaGrow pioneered by project partners UoA and NCSR in the past, so that they
scale to the extreme volumes of Copernicus data.
4
ExtremeEarth Consortium
1. National and Kapodistrian University of Athens (UoA)
2. VISTA
3. The Arctic University of Norway (UiT)
4. University of Trento (UNITN)
5. The Royal Institute of Technology (KTH)
6. National Center for Scientific Research – Demokritos (NCSR-D)
7. German Aerospace Center (DLR)
8. Polar View
9. Norwegian Meteorological Institute (METNO)
10.LogicalClocks
11.British Antarctic Survey (UKRI-BAS)
5
The Food Security Use Case
• The objective is to develop high resolution water availability maps for agricultural areas allowing a
new level of detail for wide-scale irrigation support for farmers. The Danube and Duero catchments will
be targeted.
• See presentation “Water Stress Assessment In Austria Based
On Deep Learning And Crop Growth Modelling” by Markus
Muerth (VISTA) tomorrow.
6
The Polar Use Case
• The objective is to produce high resolution ice maps for maritime users from massive volumes of
heterogeneous Copernicus data.
7
Scalable Deep Learning Techniques for Copernicus
Big Data
• Developed an LSTM deep neural network architecture for crop type mapping from Sentinel 2 data.
This has been implemented on Hopsworks and it is being used in the Food Security use case.
• Developed various machine learning architectures (LDA, CNN, variational autoencoders and GANs)
for sea-ice classification from Sentinel 1 data. These have been implemented on Hopsworks and are
being used in the Polar use case.
8
Very Large Training Datasets for Deep Learning
Architectures
• Developed a training dataset consisting of ~1M pixels of 16 Sentinel 2 images located in Austria
where each pixel is labelled with one of 13 crop types. This dataset was developed using existing crop
type maps and Sentinel 2 data and it was used to train the LSTM network for the Food Security use
case.
• Available publicly very soon (http://earthanalytics.eu/datasets.html).
9
Very Large Training Datasets for Deep Learning
Architectures (cont’d)
• Developed a training dataset consisting of 63048 patches of 30 Sentinel 1 images located in the
European Arctic where each patch is labelled with one of 6 ice types. This dataset was developed by
expert photo-interpretation and it was used to train three of the CNN networks for the Polar use case.
• Developed a training dataset consisting of ~62M patches of 24 Sentinel 1 images located in the
Belgica bank of Greenland Sea where each patch is labelled with one of 11 ice types. This dataset was
developed using active learning and it was used to train the LDA and one of the CNN networks for the
Polar use case.
• See http://earthanalytics.eu/datasets.html.
10
Big Linked Geospatial Data Systems
• Developed the system GeoTriples-Spark for transforming big geospatial data from their legacy formats
into RDF. GeoTriples-Spark can transform 2TBs of geospatial data into RDF in 50 minutes.
• Developed the system JedAI-spatial for interlinking big linked geospatial data. JedAI-spatial has been
tested with >100 GB of geospatial data and has been shown to scale almost linearly.
• Developed the system Strabo2 for querying big linked geospatial data using the OGC standard
GeoSPARQL. Strabo2 can process queries of the Geographica benchmark over 171GBs in 30 to
400 seconds.
• Developed the version 3 of system Semagrow for federating big linked geospatial data sources.
Semagrow can process queries used in the Food Security use case over <1GB of data in <20
seconds.
11
Integration in Hopsworks, TEPs and CREODIAS
• We integrated the AI and Big Data technologies presented above in the Hopsworks data platform and
deployed them in CREODIAS and the two TEPs for developing the two use cases.
• See e-poster “The ExtremeEarth Software Architecture for
Copernicus Earth Observation Data” by Desta
Haileselassie Hagos (KTH).
12
New Deep Learning Functionalities in Hopsworks
• We extended the filesystem HopsFS and the resource scheduler HopsYARN of Hopsworks for managing
EO data.
• We extended the Hopsworks metadata and security model with APIs for EO metadata.
• We carried out the following extensions to the Hopsworks platform that enable large-scale distributed data
processing and building ML/DL pipelines: EO Data Management, Feature Store, Experiment API,
Maggy framework for asynchronous parallel execution of trials for machine learning experiments,
distribution oblivious training functions, Maggy support for hyperparameter tuning and parallel
ablation studies.
Thank you!
Visit our Web site: http://earthanalytics.eu/
Follow us on Twitter:
@ExtremeEarth_EU @mkoubarakis

More Related Content

What's hot

NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
inside-BigData.com
 
Estimating the Impact of Agriculture on the Environment of Catalunya by means...
Estimating the Impact of Agriculture on the Environment of Catalunya by means...Estimating the Impact of Agriculture on the Environment of Catalunya by means...
Estimating the Impact of Agriculture on the Environment of Catalunya by means...
Andreas Kamilaris
 
The IUGONET project and its international cooperation on development of metad...
The IUGONET project and its international cooperation on development of metad...The IUGONET project and its international cooperation on development of metad...
The IUGONET project and its international cooperation on development of metad...
Iugo Net
 
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSSMetadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
plan4all
 
Implementation of RS-EBVs in Habitat Modelling
Implementation of RS-EBVs in Habitat ModellingImplementation of RS-EBVs in Habitat Modelling
Implementation of RS-EBVs in Habitat Modelling
plan4all
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 

What's hot (20)

big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.ppt
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
 
Drones in the Earth Sciences - Opportunities and issues
Drones in the Earth Sciences - Opportunities and issuesDrones in the Earth Sciences - Opportunities and issues
Drones in the Earth Sciences - Opportunities and issues
 
Estimating the Impact of Agriculture on the Environment of Catalunya by means...
Estimating the Impact of Agriculture on the Environment of Catalunya by means...Estimating the Impact of Agriculture on the Environment of Catalunya by means...
Estimating the Impact of Agriculture on the Environment of Catalunya by means...
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
 
Drones and A.I in Earth Science
Drones and A.I in Earth ScienceDrones and A.I in Earth Science
Drones and A.I in Earth Science
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
 
Application of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailandApplication of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailand
 
Application of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in ThailandApplication of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in Thailand
 
The IUGONET project and its international cooperation on development of metad...
The IUGONET project and its international cooperation on development of metad...The IUGONET project and its international cooperation on development of metad...
The IUGONET project and its international cooperation on development of metad...
 
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
 
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSSMetadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
 
Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020
 
Implementation of RS-EBVs in Habitat Modelling
Implementation of RS-EBVs in Habitat ModellingImplementation of RS-EBVs in Habitat Modelling
Implementation of RS-EBVs in Habitat Modelling
 
DGalfonsi
DGalfonsiDGalfonsi
DGalfonsi
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Building High Performance Computing Capability in the African Continent/Happy...
Building High Performance Computing Capability in the African Continent/Happy...Building High Performance Computing Capability in the African Continent/Happy...
Building High Performance Computing Capability in the African Continent/Happy...
 

Similar to Artificial Intelligence and Big Data Technologies for Copernicus Data: the ExtremeEarth Project

Similar to Artificial Intelligence and Big Data Technologies for Copernicus Data: the ExtremeEarth Project (20)

Dati satellitari e prodotti derivati in modalità open and free del programma ...
Dati satellitari e prodotti derivati in modalità open and free del programma ...Dati satellitari e prodotti derivati in modalità open and free del programma ...
Dati satellitari e prodotti derivati in modalità open and free del programma ...
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth Project
 
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
LCG project description
LCG project descriptionLCG project description
LCG project description
 
Big Data Europe at eHealth Week 2017: Linking Big Data in Health
Big Data Europe at eHealth Week 2017: Linking Big Data in HealthBig Data Europe at eHealth Week 2017: Linking Big Data in Health
Big Data Europe at eHealth Week 2017: Linking Big Data in Health
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
 
Accelerating Research and Enterprise Solutions by Bridging HPC and AI
Accelerating Research and Enterprise Solutions by Bridging HPC and AIAccelerating Research and Enterprise Solutions by Bridging HPC and AI
Accelerating Research and Enterprise Solutions by Bridging HPC and AI
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
 
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
Promoting a Joint EU-BR Digital Future - High Performance Computing
Promoting a Joint EU-BR Digital Future - High Performance ComputingPromoting a Joint EU-BR Digital Future - High Performance Computing
Promoting a Joint EU-BR Digital Future - High Performance Computing
 
Peering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains NetworkPeering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains Network
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
 
An Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchAn Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive Research
 
Use of satellite imagery for the generation of an aquaculture atlas : a case ...
Use of satellite imagery for the generation of an aquaculture atlas : a case ...Use of satellite imagery for the generation of an aquaculture atlas : a case ...
Use of satellite imagery for the generation of an aquaculture atlas : a case ...
 

More from ExtremeEarth

More from ExtremeEarth (10)

Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open Workshop
 
Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open Workshop
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - Introduction
 
AI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopAI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open Workshop
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open Workshop
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and Irrigation
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19
 

Recently uploaded

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 

Artificial Intelligence and Big Data Technologies for Copernicus Data: the ExtremeEarth Project

  • 1. May 18, 2021 Big Data from Space 2021 Manolis Koubarakis National and Kapodistrian University of Athens Artificial Intelligence and Big Data Technologies for Copernicus Data: the ExtremeEarth Project
  • 2. 2 The ExtremeEarth Project • I will present the project “ExtremeEarth: From Copernicus Big Data to Extreme Earth Analytics” funded under the H2020 call ICT-12-2018-2020: “Big Data technologies and extreme-scale analytics” • The project started in January 2019 and finishes in December 2021. • See http://earthanalytics.eu/ .
  • 3. 3 ExtremeEarth Main Objective • The main objective of ExtremeEarth is to develop Artificial Intelligence and Big Data techniques and technologies that scale to the PBs of big Copernicus data, information and knowledge, and apply these technologies in two of the ESA Thematic Exploitation Platforms: Food Security and Polar. • The technologies to be developed will extend the European Hopsworks data intensive AI platform of partner Logical Clocks to offer unprecedented scalability to extreme data volumes and scale-out distributed deep learning for Copernicus data. • The extended Hopsworks platform will run on CREODIAS and will be available as open source to enable its adoption by the strong European Earth Observation downstream services industry. • The technologies to be developed will also extend the linked geospatial data systems GeoTriples, JedAI, Strabon and SemaGrow pioneered by project partners UoA and NCSR in the past, so that they scale to the extreme volumes of Copernicus data.
  • 4. 4 ExtremeEarth Consortium 1. National and Kapodistrian University of Athens (UoA) 2. VISTA 3. The Arctic University of Norway (UiT) 4. University of Trento (UNITN) 5. The Royal Institute of Technology (KTH) 6. National Center for Scientific Research – Demokritos (NCSR-D) 7. German Aerospace Center (DLR) 8. Polar View 9. Norwegian Meteorological Institute (METNO) 10.LogicalClocks 11.British Antarctic Survey (UKRI-BAS)
  • 5. 5 The Food Security Use Case • The objective is to develop high resolution water availability maps for agricultural areas allowing a new level of detail for wide-scale irrigation support for farmers. The Danube and Duero catchments will be targeted. • See presentation “Water Stress Assessment In Austria Based On Deep Learning And Crop Growth Modelling” by Markus Muerth (VISTA) tomorrow.
  • 6. 6 The Polar Use Case • The objective is to produce high resolution ice maps for maritime users from massive volumes of heterogeneous Copernicus data.
  • 7. 7 Scalable Deep Learning Techniques for Copernicus Big Data • Developed an LSTM deep neural network architecture for crop type mapping from Sentinel 2 data. This has been implemented on Hopsworks and it is being used in the Food Security use case. • Developed various machine learning architectures (LDA, CNN, variational autoencoders and GANs) for sea-ice classification from Sentinel 1 data. These have been implemented on Hopsworks and are being used in the Polar use case.
  • 8. 8 Very Large Training Datasets for Deep Learning Architectures • Developed a training dataset consisting of ~1M pixels of 16 Sentinel 2 images located in Austria where each pixel is labelled with one of 13 crop types. This dataset was developed using existing crop type maps and Sentinel 2 data and it was used to train the LSTM network for the Food Security use case. • Available publicly very soon (http://earthanalytics.eu/datasets.html).
  • 9. 9 Very Large Training Datasets for Deep Learning Architectures (cont’d) • Developed a training dataset consisting of 63048 patches of 30 Sentinel 1 images located in the European Arctic where each patch is labelled with one of 6 ice types. This dataset was developed by expert photo-interpretation and it was used to train three of the CNN networks for the Polar use case. • Developed a training dataset consisting of ~62M patches of 24 Sentinel 1 images located in the Belgica bank of Greenland Sea where each patch is labelled with one of 11 ice types. This dataset was developed using active learning and it was used to train the LDA and one of the CNN networks for the Polar use case. • See http://earthanalytics.eu/datasets.html.
  • 10. 10 Big Linked Geospatial Data Systems • Developed the system GeoTriples-Spark for transforming big geospatial data from their legacy formats into RDF. GeoTriples-Spark can transform 2TBs of geospatial data into RDF in 50 minutes. • Developed the system JedAI-spatial for interlinking big linked geospatial data. JedAI-spatial has been tested with >100 GB of geospatial data and has been shown to scale almost linearly. • Developed the system Strabo2 for querying big linked geospatial data using the OGC standard GeoSPARQL. Strabo2 can process queries of the Geographica benchmark over 171GBs in 30 to 400 seconds. • Developed the version 3 of system Semagrow for federating big linked geospatial data sources. Semagrow can process queries used in the Food Security use case over <1GB of data in <20 seconds.
  • 11. 11 Integration in Hopsworks, TEPs and CREODIAS • We integrated the AI and Big Data technologies presented above in the Hopsworks data platform and deployed them in CREODIAS and the two TEPs for developing the two use cases. • See e-poster “The ExtremeEarth Software Architecture for Copernicus Earth Observation Data” by Desta Haileselassie Hagos (KTH).
  • 12. 12 New Deep Learning Functionalities in Hopsworks • We extended the filesystem HopsFS and the resource scheduler HopsYARN of Hopsworks for managing EO data. • We extended the Hopsworks metadata and security model with APIs for EO metadata. • We carried out the following extensions to the Hopsworks platform that enable large-scale distributed data processing and building ML/DL pipelines: EO Data Management, Feature Store, Experiment API, Maggy framework for asynchronous parallel execution of trials for machine learning experiments, distribution oblivious training functions, Maggy support for hyperparameter tuning and parallel ablation studies.
  • 13. Thank you! Visit our Web site: http://earthanalytics.eu/ Follow us on Twitter: @ExtremeEarth_EU @mkoubarakis