SlideShare a Scribd company logo
May 26, 2021
Data Week 2021
Manolis Koubarakis
National and Kapodistrian University of Athens
The ExtremeEarth Data Science Pipeline for Linked
Earth Observation Data
2
Not Just for ExtremeEarth
• Previous projects:
o TELEIOS
o LEO
o MELODIES
• See paper:
o Manolis Koubarakis, Kostis Kyzirakos, Charalampos Nikolaou, George Garbis, Konstantina Bereta, Roi
Dogani, Stella Giannakopoulou, Panayiotis Smeros, Dimitrianos Savva, George Stamoulis, Giannis
Vlachopoulos, Stefan Manegold, Charalampos Kontoes, Themistocles Herekakis, Ioannis Papoutsis,
and Dimitrios Michail. Managing Big, Linked, and Open Earth-Observation Data Using the
TELEIOS/LEO software stack. IEEE Geoscience and Remote Sensing Magazine, Vol. 4, Issue 3,
pages 23-37, September 2016.
3
The ExtremeEarth Project (From Copernicus Big
Data to Extreme Earth Analytics)
• Funded under the H2020 call ICT-12-2018-2020: “Big Data technologies and extreme-scale analytics”
• The project started in January 2019 and finishes in December 2021
• See http://earthanalytics.eu/
4
ExtremeEarth Main Objective
• The main objective of ExtremeEarth is to develop Artificial Intelligence and Big Data techniques and
technologies that scale to the PBs of big Copernicus data, information and knowledge, and apply these
technologies in two of the ESA Thematic Exploitation Platforms: Food Security and Polar.
• The technologies to be developed will extend the European Hopsworks data intensive AI platform of
partner Logical Clocks to offer unprecedented scalability to extreme data volumes and scale-out
distributed deep learning for Copernicus data.
• The extended Hopsworks platform will run on CREODIAS and will be available as open source to enable
its adoption by the strong European Earth Observation downstream services industry.
• The technologies to be developed will also extend the linked geospatial data systems GeoTriples,
JedAI, Strabon and SemaGrow pioneered by project partners UoA and NCSR in the past, so that they
scale to the extreme volumes of Copernicus data.
5
The Data Science Pipeline for Linked Earth
Observation Data
Ingestion Processing
Transformation
into RDF
Publishing
Storage, Querying and
Question Answering
Search, Browse,
Explore and Visualize
Cataloguing
Archiving
Dataset Discovery
Knowledge Discovery
Interlinking
6
The Data Science Pipeline (cont’d)
Ingestion Processing
Transformation
into RDF
Publishing
Storage, Querying and
Question Answering
Search, Browse,
Explore and Visualize
Cataloguing
Archiving
Dataset Discovery
Knowledge Discovery
Interlinking
7
The Food Security Use Case
• The objective is to develop high resolution water availability maps for agricultural areas allowing a
new level of detail for wide-scale irrigation support for farmers. The Danube and Duero catchments will
be targeted.
8
Knowledge Discovery with Scalable Deep Learning
Architectures
• Developed an LSTM deep neural network architecture for crop type mapping from Sentinel 2 data.
This has been implemented on Hopsworks and it is being used in the Food Security use case.
9
Very Large Training Datasets for Deep Learning
Architectures
• Developed a training dataset consisting of ~1M pixels of 16 Sentinel 2 images located in Austria
where each pixel is labelled with one of 13 crop types. This dataset was developed using existing crop
type maps and Sentinel 2 data and it was used to train the LSTM network for the Food Security use
case.
• Available publicly very soon (http://earthanalytics.eu/datasets.html).
10
Transformation into RDF
• Developed the system GeoTriples-Spark for transforming big geospatial data from their legacy formats
into RDF. GeoTriples-Spark can transform 2TBs of geospatial data into RDF in 50 minutes.
• See:
o George Mandilaras and Manolis Koubarakis. Scalable Transformation of Big Geospatial Data into
Linked Data. Submitted. 2021.
o https://github.com/LinkedEOData/GeoTriples
11
Interlinking
• Developed the system JedAI-spatial for interlinking big linked geospatial data. JedAI-spatial has been
tested with >100 GB of geospatial data and has been shown to scale almost linearly. For a dataset
with 72M x 115M geometries, it computes all DE9IM topological relations in less than 50 minutes.
• See:
o George Papadakis, Georgios Mandilaras, Nikos Mamoulis and Manolis Koubarakis. Progressive,
Holistic Geospatial Interlinking. The Web Conference 2021, April 19 - 23, Ljubljana, Slovenia, 2021.
o https://github.com/giantInterlinking/prGIAnt
12
Storage, Querying and …
• Developed the system Strabo2 for querying big linked geospatial data using the OGC standard
GeoSPARQL. Strabo2 can process queries of the Geographica benchmark over 171GBs in 30 to
400 seconds.
• Developed the version 3 of system Semagrow for federating big linked geospatial data sources.
Semagrow can process queries used in the Food Security use case over <1GB of data in <20
seconds.
13
… Question Answering
• Developed GeoQA, the first question answering engine for linked geospatial data.
• See:
o Dharmen Punjani, Markos Iliakis, Theodoros Stefou, Kuldeep Singh, Andreas Both, Manolis
Koubarakis, Iosif Angelidis, Konstantina Bereta, Themis Beris, Dimitris Bilidas, Theofilos Ioannidis,
Nikolaos Karalis, Christoph Lange, Despina-Athanasia Pantazi, Christos Papaloukas and Georgios
Stamoulis. Template-Based Question Answering over Linked Geospatial Data.
https://arxiv.org/pdf/2007.07060.pdf
• http://geoqa.di.uoa.gr/
14
Search, Browse, Explore and Visualize
• We always use the visualization tool Sextant (http://sextant.di.uoa.gr/)
Thank you!
Visit our Web site: http://earthanalytics.eu/
Follow us on Twitter:
@ExtremeEarth_EU @mkoubarakis

More Related Content

What's hot

big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.ppt
vishal choudhary
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
inside-BigData.com
 
Drones in the Earth Sciences - Opportunities and issues
Drones in the Earth Sciences - Opportunities and issuesDrones in the Earth Sciences - Opportunities and issues
Drones in the Earth Sciences - Opportunities and issues
ARDC
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster
 
Estimating the Impact of Agriculture on the Environment of Catalunya by means...
Estimating the Impact of Agriculture on the Environment of Catalunya by means...Estimating the Impact of Agriculture on the Environment of Catalunya by means...
Estimating the Impact of Agriculture on the Environment of Catalunya by means...
Andreas Kamilaris
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
PacificResearchPlatform
 
Drones and A.I in Earth Science
Drones and A.I in Earth ScienceDrones and A.I in Earth Science
Drones and A.I in Earth Science
ARDC
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020
ExtremeEarth
 
Application of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailandApplication of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailand
AIMS (Agricultural Information Management Standards)
 
Application of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in ThailandApplication of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in Thailand
AIMS (Agricultural Information Management Standards)
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Globus
 
The IUGONET project and its international cooperation on development of metad...
The IUGONET project and its international cooperation on development of metad...The IUGONET project and its international cooperation on development of metad...
The IUGONET project and its international cooperation on development of metad...
Iugo Net
 
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
Larry Smarr
 
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSSMetadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
plan4all
 
Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020
Larry Smarr
 
Implementation of RS-EBVs in Habitat Modelling
Implementation of RS-EBVs in Habitat ModellingImplementation of RS-EBVs in Habitat Modelling
Implementation of RS-EBVs in Habitat Modelling
plan4all
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
Larry Smarr
 
DGalfonsi
DGalfonsiDGalfonsi
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 
Predicting Plant Growth
Predicting Plant GrowthPredicting Plant Growth
Predicting Plant Growth
SistemadeEstudiosMed
 

What's hot (20)

big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.ppt
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
 
Drones in the Earth Sciences - Opportunities and issues
Drones in the Earth Sciences - Opportunities and issuesDrones in the Earth Sciences - Opportunities and issues
Drones in the Earth Sciences - Opportunities and issues
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
Estimating the Impact of Agriculture on the Environment of Catalunya by means...
Estimating the Impact of Agriculture on the Environment of Catalunya by means...Estimating the Impact of Agriculture on the Environment of Catalunya by means...
Estimating the Impact of Agriculture on the Environment of Catalunya by means...
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
 
Drones and A.I in Earth Science
Drones and A.I in Earth ScienceDrones and A.I in Earth Science
Drones and A.I in Earth Science
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020
 
Application of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailandApplication of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailand
 
Application of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in ThailandApplication of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in Thailand
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
 
The IUGONET project and its international cooperation on development of metad...
The IUGONET project and its international cooperation on development of metad...The IUGONET project and its international cooperation on development of metad...
The IUGONET project and its international cooperation on development of metad...
 
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
 
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSSMetadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
 
Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020
 
Implementation of RS-EBVs in Habitat Modelling
Implementation of RS-EBVs in Habitat ModellingImplementation of RS-EBVs in Habitat Modelling
Implementation of RS-EBVs in Habitat Modelling
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
 
DGalfonsi
DGalfonsiDGalfonsi
DGalfonsi
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Predicting Plant Growth
Predicting Plant GrowthPredicting Plant Growth
Predicting Plant Growth
 

Similar to ExtremeEarth Data Science Pipeline for Linked Earth Observation Data

Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)
Kerstin Lehnert
 
Lehnert: Making Small Data Big, IACS, April2015
Lehnert: Making Small Data Big, IACS, April2015Lehnert: Making Small Data Big, IACS, April2015
Lehnert: Making Small Data Big, IACS, April2015
Kerstin Lehnert
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
iedadata
 
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Kerstin Lehnert
 
Big Data
Big Data Big Data
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
Geoffrey Fox
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
Ola Spjuth
 
Presentation on INSPIRE and Higher Education (1 of 2)
Presentation on INSPIRE and Higher Education (1 of 2)Presentation on INSPIRE and Higher Education (1 of 2)
Presentation on INSPIRE and Higher Education (1 of 2)
JISC GECO
 
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataAzure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Microsoft Technet France
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
ExtremeEarth
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
ssuserff37aa
 
The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)
Globus
 
The GAME project database – an example of interdisciplinary, open access envi...
The GAME project database – an example of interdisciplinary, open access envi...The GAME project database – an example of interdisciplinary, open access envi...
The GAME project database – an example of interdisciplinary, open access envi...
Platforma Otwartej Nauki
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
Larry Smarr
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
Geoffrey Fox
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research Data
Gudmundur Thorisson
 
Sharing Big Data - Bob Jones
Sharing Big Data - Bob JonesSharing Big Data - Bob Jones
Sharing Big Data - Bob Jones
Helix Nebula The Science Cloud
 
Why data science matters and what we can do with it
Why data science matters and what we can do with itWhy data science matters and what we can do with it
Why data science matters and what we can do with it
Xiaogang (Marshall) Ma
 
Dealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time InformationDealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time Information
Edward Curry
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
Larry Smarr
 

Similar to ExtremeEarth Data Science Pipeline for Linked Earth Observation Data (20)

Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)
 
Lehnert: Making Small Data Big, IACS, April2015
Lehnert: Making Small Data Big, IACS, April2015Lehnert: Making Small Data Big, IACS, April2015
Lehnert: Making Small Data Big, IACS, April2015
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
 
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)
 
Big Data
Big Data Big Data
Big Data
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Presentation on INSPIRE and Higher Education (1 of 2)
Presentation on INSPIRE and Higher Education (1 of 2)Presentation on INSPIRE and Higher Education (1 of 2)
Presentation on INSPIRE and Higher Education (1 of 2)
 
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataAzure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big data
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
 
The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)
 
The GAME project database – an example of interdisciplinary, open access envi...
The GAME project database – an example of interdisciplinary, open access envi...The GAME project database – an example of interdisciplinary, open access envi...
The GAME project database – an example of interdisciplinary, open access envi...
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research Data
 
Sharing Big Data - Bob Jones
Sharing Big Data - Bob JonesSharing Big Data - Bob Jones
Sharing Big Data - Bob Jones
 
Why data science matters and what we can do with it
Why data science matters and what we can do with itWhy data science matters and what we can do with it
Why data science matters and what we can do with it
 
Dealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time InformationDealing with Semantic Heterogeneity in Real-Time Information
Dealing with Semantic Heterogeneity in Real-Time Information
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
 

More from ExtremeEarth

Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open Workshop
ExtremeEarth
 
Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open Workshop
ExtremeEarth
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
ExtremeEarth
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
ExtremeEarth
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - Introduction
ExtremeEarth
 
AI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopAI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open Workshop
ExtremeEarth
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open Workshop
ExtremeEarth
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and Irrigation
ExtremeEarth
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19
ExtremeEarth
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
ExtremeEarth
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
ExtremeEarth
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19
ExtremeEarth
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth Project
ExtremeEarth
 

More from ExtremeEarth (13)

Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open Workshop
 
Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open Workshop
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - Introduction
 
AI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopAI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open Workshop
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open Workshop
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and Irrigation
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth Project
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 

ExtremeEarth Data Science Pipeline for Linked Earth Observation Data

  • 1. May 26, 2021 Data Week 2021 Manolis Koubarakis National and Kapodistrian University of Athens The ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
  • 2. 2 Not Just for ExtremeEarth • Previous projects: o TELEIOS o LEO o MELODIES • See paper: o Manolis Koubarakis, Kostis Kyzirakos, Charalampos Nikolaou, George Garbis, Konstantina Bereta, Roi Dogani, Stella Giannakopoulou, Panayiotis Smeros, Dimitrianos Savva, George Stamoulis, Giannis Vlachopoulos, Stefan Manegold, Charalampos Kontoes, Themistocles Herekakis, Ioannis Papoutsis, and Dimitrios Michail. Managing Big, Linked, and Open Earth-Observation Data Using the TELEIOS/LEO software stack. IEEE Geoscience and Remote Sensing Magazine, Vol. 4, Issue 3, pages 23-37, September 2016.
  • 3. 3 The ExtremeEarth Project (From Copernicus Big Data to Extreme Earth Analytics) • Funded under the H2020 call ICT-12-2018-2020: “Big Data technologies and extreme-scale analytics” • The project started in January 2019 and finishes in December 2021 • See http://earthanalytics.eu/
  • 4. 4 ExtremeEarth Main Objective • The main objective of ExtremeEarth is to develop Artificial Intelligence and Big Data techniques and technologies that scale to the PBs of big Copernicus data, information and knowledge, and apply these technologies in two of the ESA Thematic Exploitation Platforms: Food Security and Polar. • The technologies to be developed will extend the European Hopsworks data intensive AI platform of partner Logical Clocks to offer unprecedented scalability to extreme data volumes and scale-out distributed deep learning for Copernicus data. • The extended Hopsworks platform will run on CREODIAS and will be available as open source to enable its adoption by the strong European Earth Observation downstream services industry. • The technologies to be developed will also extend the linked geospatial data systems GeoTriples, JedAI, Strabon and SemaGrow pioneered by project partners UoA and NCSR in the past, so that they scale to the extreme volumes of Copernicus data.
  • 5. 5 The Data Science Pipeline for Linked Earth Observation Data Ingestion Processing Transformation into RDF Publishing Storage, Querying and Question Answering Search, Browse, Explore and Visualize Cataloguing Archiving Dataset Discovery Knowledge Discovery Interlinking
  • 6. 6 The Data Science Pipeline (cont’d) Ingestion Processing Transformation into RDF Publishing Storage, Querying and Question Answering Search, Browse, Explore and Visualize Cataloguing Archiving Dataset Discovery Knowledge Discovery Interlinking
  • 7. 7 The Food Security Use Case • The objective is to develop high resolution water availability maps for agricultural areas allowing a new level of detail for wide-scale irrigation support for farmers. The Danube and Duero catchments will be targeted.
  • 8. 8 Knowledge Discovery with Scalable Deep Learning Architectures • Developed an LSTM deep neural network architecture for crop type mapping from Sentinel 2 data. This has been implemented on Hopsworks and it is being used in the Food Security use case.
  • 9. 9 Very Large Training Datasets for Deep Learning Architectures • Developed a training dataset consisting of ~1M pixels of 16 Sentinel 2 images located in Austria where each pixel is labelled with one of 13 crop types. This dataset was developed using existing crop type maps and Sentinel 2 data and it was used to train the LSTM network for the Food Security use case. • Available publicly very soon (http://earthanalytics.eu/datasets.html).
  • 10. 10 Transformation into RDF • Developed the system GeoTriples-Spark for transforming big geospatial data from their legacy formats into RDF. GeoTriples-Spark can transform 2TBs of geospatial data into RDF in 50 minutes. • See: o George Mandilaras and Manolis Koubarakis. Scalable Transformation of Big Geospatial Data into Linked Data. Submitted. 2021. o https://github.com/LinkedEOData/GeoTriples
  • 11. 11 Interlinking • Developed the system JedAI-spatial for interlinking big linked geospatial data. JedAI-spatial has been tested with >100 GB of geospatial data and has been shown to scale almost linearly. For a dataset with 72M x 115M geometries, it computes all DE9IM topological relations in less than 50 minutes. • See: o George Papadakis, Georgios Mandilaras, Nikos Mamoulis and Manolis Koubarakis. Progressive, Holistic Geospatial Interlinking. The Web Conference 2021, April 19 - 23, Ljubljana, Slovenia, 2021. o https://github.com/giantInterlinking/prGIAnt
  • 12. 12 Storage, Querying and … • Developed the system Strabo2 for querying big linked geospatial data using the OGC standard GeoSPARQL. Strabo2 can process queries of the Geographica benchmark over 171GBs in 30 to 400 seconds. • Developed the version 3 of system Semagrow for federating big linked geospatial data sources. Semagrow can process queries used in the Food Security use case over <1GB of data in <20 seconds.
  • 13. 13 … Question Answering • Developed GeoQA, the first question answering engine for linked geospatial data. • See: o Dharmen Punjani, Markos Iliakis, Theodoros Stefou, Kuldeep Singh, Andreas Both, Manolis Koubarakis, Iosif Angelidis, Konstantina Bereta, Themis Beris, Dimitris Bilidas, Theofilos Ioannidis, Nikolaos Karalis, Christoph Lange, Despina-Athanasia Pantazi, Christos Papaloukas and Georgios Stamoulis. Template-Based Question Answering over Linked Geospatial Data. https://arxiv.org/pdf/2007.07060.pdf • http://geoqa.di.uoa.gr/
  • 14. 14 Search, Browse, Explore and Visualize • We always use the visualization tool Sextant (http://sextant.di.uoa.gr/)
  • 15. Thank you! Visit our Web site: http://earthanalytics.eu/ Follow us on Twitter: @ExtremeEarth_EU @mkoubarakis