SlideShare a Scribd company logo
Three years of the ExtremeEarth project
Online workshop - December 9th 2021
Theofilos Kakantousis
Desta Haileselassie Hagos
Logical Clocks, KTH
The ExtremeEarth platform: scalable deep learning
pipelines with Earth observation data and Hopsworks
ExtremeEarth
From Copernicus Big Data
to Extreme Earth Analytics
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 825258.
3
Contents
1. ExtremeEarth platform architecture
2. End-to-end scalable deep learning
pipelines with Hopsworks
3. Exploitation of results
4. Research
ExtremeEarth Platform
Architecture
5
Background
• The Copernicus programme produces more than three petabytes (PB) of Earth Observation (EO)
data annually from Sentinel satellites.*
• Data and Information Access Services (DIAS) provide centralised access to Copernicus data and
processing tools.
• European Space Agency (ESA)Thematic Exploitation Platforms (TEPs) make sure complex data
streams are exploited to their full potential.
○ Food Security, Polar
• Hopsworks Data-Intensive AI platform brings scalable AI support for Earth Observation data.
* https://workshop.copernicus.eu/sites/default/files/content/attachments/ajax/copernicus_overview.pdf
6
How to build AI products with EO data
7
ExtremeEarth architecture goals
• ExtremeEarth brings together these components
○ Under the same architecture…
○ … and infrastructure.
○ Reduce cost and increase productivity by providing a seamless end-user experience without
having to manage different services
• Combine
○ EO data access from DIASes
○ End-user facing EO data products from TEPs
○ Scalable AI capabilities of Hopsworks
8
ExtremeEarth architecture overview
9
ExtremeEarth architecture deep dive 1/2
• Infrastructure provided by Creodias and
managed by the TEPs
○ OpenStack cluster with GPU support
• Data layer with multiple data sources
○ Raw Creodias data
○ Intermediate TEP data
○ Training datasets
• Processing layer provided by Hopsworks.
○ Core AI engine
○ Develop PB-scale machine learning
algorithms with deep learning
architectures.
○ Platform that provides support for
semantic data tools
10
ExtremeEarth architecture deep dive 2/2
• Product layer
○ Hopsworks serves AI products to
external clients
• User interface
○ Hopsworks is integrated with the
TEPs via APIs
○ TEP users make direct use of AI
models develop in Hopsworks.
11
Real World Use Cases - Food Security
12
Real World Use Cases - Polar
13
ExtremeEarth running in production
• Hopsworks installed alongside TEP
infrastructure on CREODIAS
○ https://hopsworks.polartep.io
• Provides easy EO data access and
machine learning development tooling to
developers and data scientists.
• Deep learning architectures developed on
this Hopsworks cluster for the Food
Security and Polar use cases.
End-to-end scalable
machine learning
pipelines
15
Hopsworks
Open source platform to develop end-to-end machine learning pipelines at scale
for Enterprise AI.
Use your tools of choice and serve at the lowest latency on any cloud, at any
scale.
The Data Platform for AI
16
Organizations are struggling to deploy AI
because of Data
● “87% identified data as the reason their organizations failed to successfully implement AI.”*
Venture Beat * https://venturebeat.com/2021/03/24/employees-attribute-ai-project-failure-to-poor-data-quality/
Where the data is
(storage)
Discover and
Access the data
Clean, Join and Aggregate the Data
Extract the Data
Transform the
data into features
Validate the data.
Make the process
repeatable
🔁
Serve for real-time
applications or train.
🏆
17
Growing Consensus on How to Manage
Complexity of AI
Data validation
Distributed Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data Collection
Hardware
Management
* Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems
Data Model Prediction
φ(x)
18
Growing Consensus on How to Manage
Complexity of AI
Data validation
Distributed Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data Collection
Hardware
Management
* Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems
FEATURE STORE
FEATURE ENGINEERING
Data Model Prediction
φ(x)
FEATURE STORE
FEATURE ENGINEERING
19
Growing Consensus on How to Manage
Complexity of AI
Data validation
Distributed Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data Collection
Hardware
Management
* Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems
FEATURE STORE
FEATURE ENGINEERING
FEATURE STORE
FEATURE ENGINEERING
ML PLATFORM
TRAIN and SERVE
Data Model Prediction
φ(x)
20
Scalable end-to-end deep learning pipelines
● Horizontally scalable infrastructure that enables developers to manage the lifecycle of EO
machine learning applications
21
End-to-end machine learning components
Streaming Train/Test Data
(S3, HDFS, etc)
Online
Application
Data Warehouse
Data Lake
Feature
Engineering
Offline
Feature Store
Model Training
Model
Serving
Online
Feature Store
Model
Repository
Monitor
Deploy
Feature Vectors
Result Sink (DB)
Batch
Scoring
Batch Access
Deploy
Feature Store
HopsFS
Scaleout Metadata
22
Hopsworks - one open source platform with
all the tools
APPLICATIONS
API
DASHBOARDS
HOPSWORKS
DATASOURCE
ORCHESTRATION
In Airflow
BATCH
Apache Spark
STREAMING
Apache Spark
Apache Flink
HOPSWORKS
FEATURE
STORE
DISTRIBUTED
ML & DL
Pip
Conda
Tensorflow
scikit-learn
PyTorch
Jupyter
Notebooks
Tensorboard
FILESYSTEM & METADATA STORAGE
In HopsFS
MODEL
SERVING
Kubernetes
MODEL
MONITORING
Kafka
+
Spark Streaming
Data Preparation
& Ingestion
Experimentation
& Model Training
Deploy
& Productionalize
Apache
Kafka
23
ML experiments management
24
Distributed deep learning with Hopsworks
# RUNS ON THE WORKERS
def train():
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
history = model.fit(..)
metrics = {
'train_loss': history.history['loss'][-1],
'train_accuracy': history.history['accuracy'][-1],
'val_loss': history.history['val_loss'][-1],
'val_accuracy': history.history['val_accuracy'][-1],
}
tf.estimator.train_and_evaluate(
keras_estimator, input_fn)
# RUNS ON THE DRIVER
experiment.mirrored(train_fn, name='distributed,
metric_key='val_accuracy')
HopsFS
W 1
Driver
TF_CONFIG
W 5
W8
W 7
W 6
W 2
W 4
W 3
Metrics
TensorBoard Checkpoints Training Data Models Logs
25
Hyperparameter tuning with Maggy
● Library for distribution transparent machine
learning experiments on Apache Spark
● Not bound to stage based algorithms, contrary
to existing frameworks.
● Directed Hyperparameter Search (ASHA,
Bayesian) on TensorFlow, PyTorch,
ScikitLearn, XGBoost
● In real-time, unified Logging in Jupyter
notebooks.
26
Ablation studies with Maggy
● Parallel Ablation Studies: without
changing your inner training loop in
TensorFlow/Keras, evaluate (in
parallel) the effect of different
layers, datasets features, etc.
27
ML model registry management
28
Demo
Exploitation of results
30
Exploitation
● Hopsworks is now extended with EO data support
● Creates opportunities to onboard new use cases for AI with EO data
o Hopsworks as the AI platform for other research projects, H2020 DeepCube
● Hopsworks as a product offering
o With the Polar and Food Security TEPs ExtremeAI platform
o Can be seamlessly integrated with further DIASes
o Offered as SaaS at hopsworks.ai on public clouds such as Amazon AWS and Microsoft Azure
Research
32
Publications
o The ExtremeEarth Software Architecture for Copernicus Earth Observation Data. (Conference
paper)
▪ Published: Conference on Big Data from Space (BiDS21).
o ExtremeEarth Meets Data From Space (Journal paper).
▪ Published: IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing (JSTARS) (2021).
o Maggy: Scalable Asynchronous Parallel Hyperparameter Search. (Conference paper)
▪ Published: The 1st Workshop on Distributed Machine Learning (DistributedML'20).
o AutoAblation: Automated Parallel Ablation Studies for Deep Learning. (Conference paper)
▪ Published: The 1st Workshop on Machine Learning and Systems (EuroMLSys‘21)
o Scalable Artificial Intelligence for Earth Observation Data Using Hopsworks. (Journal paper)
▪ Under preparation: IEEE Journal of Selected Topics in Applied Earth Observations and
Remote Sensing (JSTARS) (2021). ⇒ Will be submitted soon.
• Published papers: http://earthanalytics.eu/publications.html
33
Blog posts
o AI Software Architecture for Copernicus Data with Hopsworks.
▪ July 2021 (link)
o End-to-end Deep Learning Pipelines with Earth observation Data in Hopsworks
▪ October 2021 (link)
Thank you!
github.com/logicalclocks/hopsworks
@hopsworks

More Related Content

What's hot

Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
Robert Grossman
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
ExtremeEarth
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Rob Emanuele
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
Rob Emanuele
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
Robert Grossman
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit
 
Opening the Path to Technical Excellence
Opening the Path to Technical ExcellenceOpening the Path to Technical Excellence
Opening the Path to Technical Excellence
NETWAYS
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
DataWorks Summit
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
Rob Emanuele
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)
inside-BigData.com
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
Rob Emanuele
 
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigData_Europe
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor Computation
EUDAT
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
CCRinc
 
Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
The HDF-EOS Tools and Information Center
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
inside-BigData.com
 
EOSC-hub & Geohazards TEP
EOSC-hub & Geohazards TEPEOSC-hub & Geohazards TEP
EOSC-hub & Geohazards TEP
EOSC-hub project
 
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case StudiesWorking with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
The HDF-EOS Tools and Information Center
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
Rob Emanuele
 
Summary of HDF-EOS5 Files, Data Model and File Format
Summary of HDF-EOS5 Files, Data Model and File FormatSummary of HDF-EOS5 Files, Data Model and File Format
Summary of HDF-EOS5 Files, Data Model and File Format
The HDF-EOS Tools and Information Center
 

What's hot (20)

Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19Big linked geospatial data tools in ExtremeEarth-phiweek19
Big linked geospatial data tools in ExtremeEarth-phiweek19
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 
Opening the Path to Technical Excellence
Opening the Path to Technical ExcellenceOpening the Path to Technical Excellence
Opening the Path to Technical Excellence
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor Computation
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
 
EOSC-hub & Geohazards TEP
EOSC-hub & Geohazards TEPEOSC-hub & Geohazards TEP
EOSC-hub & Geohazards TEP
 
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case StudiesWorking with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
Summary of HDF-EOS5 Files, Data Model and File Format
Summary of HDF-EOS5 Files, Data Model and File FormatSummary of HDF-EOS5 Files, Data Model and File Format
Summary of HDF-EOS5 Files, Data Model and File Format
 

Similar to Hopsworks - ExtremeEarth Open Workshop

ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Big Data Value Association
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and research
kchine3
 
Deep Hybrid DataCloud
Deep Hybrid DataCloudDeep Hybrid DataCloud
Deep Hybrid DataCloud
EOSC-hub project
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...
ExtremeEarth
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
BigData_Europe
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Björn Backeberg
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
Red Hat Developers
 
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
RCCSRENKEI
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
CHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in csCHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in cs
TSha7
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
Archiver
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaboration
EOSC-hub project
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
OpenAIRE
 
OCP Summit 2017
OCP Summit 2017OCP Summit 2017
OCP Summit 2017
Jaroslaw Sobel
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
David Wallom
 
Getting Access to ALCF Resources and Services
Getting Access to ALCF Resources and ServicesGetting Access to ALCF Resources and Services
Getting Access to ALCF Resources and Services
davidemartin
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Sri Ambati
 
MDIS workshop 2015
MDIS workshop 2015MDIS workshop 2015
MDIS workshop 2015
terradue
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
Vipin Singhal
 

Similar to Hopsworks - ExtremeEarth Open Workshop (20)

ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and research
 
Deep Hybrid DataCloud
Deep Hybrid DataCloudDeep Hybrid DataCloud
Deep Hybrid DataCloud
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
 
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
CHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in csCHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in cs
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaboration
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
OCP Summit 2017
OCP Summit 2017OCP Summit 2017
OCP Summit 2017
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
Getting Access to ALCF Resources and Services
Getting Access to ALCF Resources and ServicesGetting Access to ALCF Resources and Services
Getting Access to ALCF Resources and Services
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
MDIS workshop 2015
MDIS workshop 2015MDIS workshop 2015
MDIS workshop 2015
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 

More from ExtremeEarth

Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open Workshop
ExtremeEarth
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - Introduction
ExtremeEarth
 
AI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopAI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open Workshop
ExtremeEarth
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open Workshop
ExtremeEarth
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
ExtremeEarth
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and Irrigation
ExtremeEarth
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19
ExtremeEarth
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
ExtremeEarth
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
ExtremeEarth
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19
ExtremeEarth
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020
ExtremeEarth
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth Project
ExtremeEarth
 

More from ExtremeEarth (13)

Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open Workshop
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - Introduction
 
AI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopAI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open Workshop
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open Workshop
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and Irrigation
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth Project
 

Recently uploaded

一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 

Recently uploaded (20)

一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 

Hopsworks - ExtremeEarth Open Workshop

  • 1. Three years of the ExtremeEarth project Online workshop - December 9th 2021 Theofilos Kakantousis Desta Haileselassie Hagos Logical Clocks, KTH The ExtremeEarth platform: scalable deep learning pipelines with Earth observation data and Hopsworks
  • 2. ExtremeEarth From Copernicus Big Data to Extreme Earth Analytics This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825258.
  • 3. 3 Contents 1. ExtremeEarth platform architecture 2. End-to-end scalable deep learning pipelines with Hopsworks 3. Exploitation of results 4. Research
  • 5. 5 Background • The Copernicus programme produces more than three petabytes (PB) of Earth Observation (EO) data annually from Sentinel satellites.* • Data and Information Access Services (DIAS) provide centralised access to Copernicus data and processing tools. • European Space Agency (ESA)Thematic Exploitation Platforms (TEPs) make sure complex data streams are exploited to their full potential. ○ Food Security, Polar • Hopsworks Data-Intensive AI platform brings scalable AI support for Earth Observation data. * https://workshop.copernicus.eu/sites/default/files/content/attachments/ajax/copernicus_overview.pdf
  • 6. 6 How to build AI products with EO data
  • 7. 7 ExtremeEarth architecture goals • ExtremeEarth brings together these components ○ Under the same architecture… ○ … and infrastructure. ○ Reduce cost and increase productivity by providing a seamless end-user experience without having to manage different services • Combine ○ EO data access from DIASes ○ End-user facing EO data products from TEPs ○ Scalable AI capabilities of Hopsworks
  • 9. 9 ExtremeEarth architecture deep dive 1/2 • Infrastructure provided by Creodias and managed by the TEPs ○ OpenStack cluster with GPU support • Data layer with multiple data sources ○ Raw Creodias data ○ Intermediate TEP data ○ Training datasets • Processing layer provided by Hopsworks. ○ Core AI engine ○ Develop PB-scale machine learning algorithms with deep learning architectures. ○ Platform that provides support for semantic data tools
  • 10. 10 ExtremeEarth architecture deep dive 2/2 • Product layer ○ Hopsworks serves AI products to external clients • User interface ○ Hopsworks is integrated with the TEPs via APIs ○ TEP users make direct use of AI models develop in Hopsworks.
  • 11. 11 Real World Use Cases - Food Security
  • 12. 12 Real World Use Cases - Polar
  • 13. 13 ExtremeEarth running in production • Hopsworks installed alongside TEP infrastructure on CREODIAS ○ https://hopsworks.polartep.io • Provides easy EO data access and machine learning development tooling to developers and data scientists. • Deep learning architectures developed on this Hopsworks cluster for the Food Security and Polar use cases.
  • 15. 15 Hopsworks Open source platform to develop end-to-end machine learning pipelines at scale for Enterprise AI. Use your tools of choice and serve at the lowest latency on any cloud, at any scale. The Data Platform for AI
  • 16. 16 Organizations are struggling to deploy AI because of Data ● “87% identified data as the reason their organizations failed to successfully implement AI.”* Venture Beat * https://venturebeat.com/2021/03/24/employees-attribute-ai-project-failure-to-poor-data-quality/ Where the data is (storage) Discover and Access the data Clean, Join and Aggregate the Data Extract the Data Transform the data into features Validate the data. Make the process repeatable 🔁 Serve for real-time applications or train. 🏆
  • 17. 17 Growing Consensus on How to Manage Complexity of AI Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management * Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems Data Model Prediction φ(x)
  • 18. 18 Growing Consensus on How to Manage Complexity of AI Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management * Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems FEATURE STORE FEATURE ENGINEERING Data Model Prediction φ(x) FEATURE STORE FEATURE ENGINEERING
  • 19. 19 Growing Consensus on How to Manage Complexity of AI Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management * Diagram from Google’s paper Hidden Technical Debt in Machine Learning Systems FEATURE STORE FEATURE ENGINEERING FEATURE STORE FEATURE ENGINEERING ML PLATFORM TRAIN and SERVE Data Model Prediction φ(x)
  • 20. 20 Scalable end-to-end deep learning pipelines ● Horizontally scalable infrastructure that enables developers to manage the lifecycle of EO machine learning applications
  • 21. 21 End-to-end machine learning components Streaming Train/Test Data (S3, HDFS, etc) Online Application Data Warehouse Data Lake Feature Engineering Offline Feature Store Model Training Model Serving Online Feature Store Model Repository Monitor Deploy Feature Vectors Result Sink (DB) Batch Scoring Batch Access Deploy Feature Store HopsFS Scaleout Metadata
  • 22. 22 Hopsworks - one open source platform with all the tools APPLICATIONS API DASHBOARDS HOPSWORKS DATASOURCE ORCHESTRATION In Airflow BATCH Apache Spark STREAMING Apache Spark Apache Flink HOPSWORKS FEATURE STORE DISTRIBUTED ML & DL Pip Conda Tensorflow scikit-learn PyTorch Jupyter Notebooks Tensorboard FILESYSTEM & METADATA STORAGE In HopsFS MODEL SERVING Kubernetes MODEL MONITORING Kafka + Spark Streaming Data Preparation & Ingestion Experimentation & Model Training Deploy & Productionalize Apache Kafka
  • 24. 24 Distributed deep learning with Hopsworks # RUNS ON THE WORKERS def train(): def input_fn(): # return dataset model = … optimizer = … model.compile(…) history = model.fit(..) metrics = { 'train_loss': history.history['loss'][-1], 'train_accuracy': history.history['accuracy'][-1], 'val_loss': history.history['val_loss'][-1], 'val_accuracy': history.history['val_accuracy'][-1], } tf.estimator.train_and_evaluate( keras_estimator, input_fn) # RUNS ON THE DRIVER experiment.mirrored(train_fn, name='distributed, metric_key='val_accuracy') HopsFS W 1 Driver TF_CONFIG W 5 W8 W 7 W 6 W 2 W 4 W 3 Metrics TensorBoard Checkpoints Training Data Models Logs
  • 25. 25 Hyperparameter tuning with Maggy ● Library for distribution transparent machine learning experiments on Apache Spark ● Not bound to stage based algorithms, contrary to existing frameworks. ● Directed Hyperparameter Search (ASHA, Bayesian) on TensorFlow, PyTorch, ScikitLearn, XGBoost ● In real-time, unified Logging in Jupyter notebooks.
  • 26. 26 Ablation studies with Maggy ● Parallel Ablation Studies: without changing your inner training loop in TensorFlow/Keras, evaluate (in parallel) the effect of different layers, datasets features, etc.
  • 27. 27 ML model registry management
  • 30. 30 Exploitation ● Hopsworks is now extended with EO data support ● Creates opportunities to onboard new use cases for AI with EO data o Hopsworks as the AI platform for other research projects, H2020 DeepCube ● Hopsworks as a product offering o With the Polar and Food Security TEPs ExtremeAI platform o Can be seamlessly integrated with further DIASes o Offered as SaaS at hopsworks.ai on public clouds such as Amazon AWS and Microsoft Azure
  • 32. 32 Publications o The ExtremeEarth Software Architecture for Copernicus Earth Observation Data. (Conference paper) ▪ Published: Conference on Big Data from Space (BiDS21). o ExtremeEarth Meets Data From Space (Journal paper). ▪ Published: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS) (2021). o Maggy: Scalable Asynchronous Parallel Hyperparameter Search. (Conference paper) ▪ Published: The 1st Workshop on Distributed Machine Learning (DistributedML'20). o AutoAblation: Automated Parallel Ablation Studies for Deep Learning. (Conference paper) ▪ Published: The 1st Workshop on Machine Learning and Systems (EuroMLSys‘21) o Scalable Artificial Intelligence for Earth Observation Data Using Hopsworks. (Journal paper) ▪ Under preparation: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS) (2021). ⇒ Will be submitted soon. • Published papers: http://earthanalytics.eu/publications.html
  • 33. 33 Blog posts o AI Software Architecture for Copernicus Data with Hopsworks. ▪ July 2021 (link) o End-to-end Deep Learning Pipelines with Earth observation Data in Hopsworks ▪ October 2021 (link)