SlideShare a Scribd company logo
The use of big data for dredging
Gerben de Boer, Van Oord , Engineering, OpenEarth data management
Delft Software Days 2017
Hire right cloud provider
• Hadoop / HD insight
• Sparq
• Cassandra
• uSQL
• CosmosDB
Relations: AI.
burn money on cloud providers
Big data philosophies: Statistics requires 30+ realizations
2
Brute force Smart force
Hire right people
• Thematic nerds (any engineering)
• Software developer (py, js, sql)
• DevOps
• Sales, social
• Graphic designer
Relations: Business logic + physics
burn money on wages
• Data scientist
• Data analytics manager
• Data architect
• Data engineer
• Statistician
• DBA
• Business analyst
• Data analyst
Volume
Velocity
Variety
Veracity
4 Vs
3
Volume
wxs → cdn
SQL has almost no limits
5
For most users SQL is not big data.
Only your wallet is a limiting factor
• Out of preview 15 nov
• 1TB
• 99.99% availability
• 35 days point-in-time restore
• We tried 0.5 TB, limited by SSD disk IO.
• 4TB
Azure postgres
Azure SQL server
Postgres in Azure VM
• Pure SQL
• TB SQL database no problem
• Postgres single threaded
• Use indexing, views, caching tools:
think about Content that’s needs to
be Delivered (CDN)
• Postgres native jsonb datatype
• MS uSQL can reach ascii files, and
use R and python code
Overcome SQL limits: hybrid and noSQL
6
SQL
• Put (jsonb) as files on disk
and load the subset you need,
or when replication needed
• csv, json, xml, yml, netcdf
• + many legacy formats
• Database as API, not archive
• Only index to files on disk
• E.g. Tiff postgis raster
• Van Oord vessellog = netCDF
+ PG index “(NASA
technology”):
hybrid
• Pure noSQL: structured
folder with structured
files
device/yy/mm/dd/signal
• Micro service to handle
files on demands
• Regular expressions
are your friend.
• netCDF/HDF was
originally devised to
overcome SQL limit
noSQL = files
OpenEarthRawData: partial checkout
Git has binary file extention. Git canot make a
partial checkout
How to get local copy of a subset of the data
7
vcs
Data to WxS on server
WxS to data by client
2 unnecesarry processing steps
WxS webservices
First computer was
designed to print
gonio tables
flawless.
Now we replicate
the algorithm, not
the table.
Babbage: storage, bandwidth, compute
Babbage: table vs calculator: 2 retrieval methods
Trade-off made explicit by cloud pay-as-you-go
• Storage: disk occupancy + IO operations
• Compute: CPU + Memory
• Bandwidth: too slow:
Replicate database vs replicate raw data + ETL
Cloud
Copy DB dump.
Copy raw data
and rerun ETL.
Example: ANH2: 0.5 x 0.5 m2 DEM of Netherlands: 1000 tiff (wcs broken)
Example: ANH2: 0.5 x 0.5 m2 DEM of Netherlands: download 50 hours
Example: ANH2: 0.5 x 0.5 m2 DEM of Netherlands: partial download
11
Example: ANH2: 0.5 x 0.5 m2 DEM of Netherlands: notebook in cloud
Good idea to stream graphics to screens: WMS.
Limits grid data to what you can actually see
People actually use quad-trees, not WMTS: tiled.
Use (geo)json for plotting vector data: plot.ly
geojson only OGC in 2017, 9 years after conception !
Bad idea to stream big data: WCS, WFS
Keep all processing in the datacenters.
Only graphical results.
INSPIRE + OGC: not front-runners.
WXS > CDN
13
WXS
• CDN - content delivery network
• The backbone behind youtube, netflix
• Makes datacenters geospatially redundant
• Rapidly replicates raw data files (tiff)
• Use your own ETL tools locally
CDN
Big data reinvented wheel (1)
Variety
ETL → ELT
Overload of historic data formats: parsing
Datawell wave buoy: 30 kB code to parse 93 bytes
OGC SOS is not a solution:
xml garbage.
Satellite data still very expensive
Solutions are available:
Google protobuffers
Variety: parsing is ETL
16
Sensor supplier, SCADA
ETL processes are run once
Database is considered archive
ETL removes some raw data features
Collect once, maybe re-use many times
Parsers do not evolve: waterfall
Good for: known knowns
Share data and processing (Manhattan optimization)
17
ETL
In ELT the generic parsers run each request
Parsers can run on-the-fly in a micro-service
All raw data features can be kept as parsers evolve
Collect once, allow any future use
Parsers evolve agile: extra from_* methods
Good for: unknown unknowns
ELT: share code via github !
parser.to_sql()
parser.from_garbage()
• SQL server can now un R and python code
• Windows and linux can run same containers
Big unstructured Datalake
• SQL sources + noSQL sources
• Brute force to run ELT jobs: Hadoop
• Economic trade-off brains vs clouds
Datalake
18
Datalake
18
Codelake
parser.from_garbage()
parser.from_garbage()
parser.from_garbage()
parser.from_garbage()
parser.from_garbage()
L0 raw data
L0_L1 code
L1 products
L1_L2 code
L2 products
…
Big data reinvented wheel (2)
19
Big data reinvented the wheel
Velocity
DTAP → CICD
Run micro services on top of Datalake
One for each specific question.
This software needs to work at any data replication
• Localhost
• Azure
• Amazon
• On-premise
• On-vessel
We need to make servers redistributable
CONTAINERS
Micro services
Datalake
OpenEarth: monthly Docker sprintsession @ Microsoft NL, Schiphol
22
Van Oord, Deltares, Tu Delft, KNMI, NLeSC, Sogeti, Microsoft, Maris, …
• Docker sprint session every month
• https://github.com/openearth-stack
• Van Oord, Tu Delft, Deltares, Microsoft
NLeSC, KNMI, Maris
• Gerben.deboer@vanoord.com
OpenEarth Docker Azure DigiShape
23
Organization
• Pyramid python web framework
• PostgreSQL
• KNMI Adaguc
• Geoserver
• ….
Components
Veracity
xls → app
Excel is our only Big data nightmare
Old, grey clerks and managers.
The use Excel as paper.
Manual data can be digitized with rapid apps.
Low-code revolution: app-in-a-day.
Variety
25
Low-code Apps
http://www.janbanning.com/
Excel course: who ever read the instructions?
https://danjharrington.wordpress.com/2012/08/01/excel-logos-over-the-years/ Gerben J de Boer, Van Oord, E&E, OpenEarth Data Management
4Vs
Volume wxs → cdn
Variety ETL → ELT
Velocity DTAP → CICD
Veracity xls → app
Gerben.deboer@vanoord.com
Questions ?

More Related Content

What's hot

Discover some "Big Data" architectural concepts with Redis
Discover some  "Big Data" architectural concepts with  Redis Discover some  "Big Data" architectural concepts with  Redis
Discover some "Big Data" architectural concepts with Redis
Maturin BADO
 
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
Building a Data Plane with K8ssandra, Apache Cassandra on KubernetesBuilding a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
Christopher Bradford
 
Pachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesPachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On Kubernetes
KubeAcademy
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERN
Belmiro Moreira
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStack
Belmiro Moreira
 
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataRealtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
ScyllaDB
 
Federated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation TherapyFederated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation Therapy
CESGA Centro de Supercomputación de Galicia
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
Gordon Chung
 
Open Source is Good for Both Business and Humanity - DockerCon 2016
Open Source is Good for Both Business and Humanity - DockerCon 2016 Open Source is Good for Both Business and Humanity - DockerCon 2016
Open Source is Good for Both Business and Humanity - DockerCon 2016
{code}
 
Hadoop analytics provisioning based on a virtual infrastructure
Hadoop analytics provisioning based on a virtual infrastructureHadoop analytics provisioning based on a virtual infrastructure
Hadoop analytics provisioning based on a virtual infrastructure
CESGA Centro de Supercomputación de Galicia
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
Tim Bell
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
Tim Bell
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
Tim Bell
 
Containers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKAContainers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKA
Belmiro Moreira
 
WSO2 Virtual Hackathon Big Data in the Cloud Case Study
WSO2 Virtual Hackathon Big Data in the Cloud Case StudyWSO2 Virtual Hackathon Big Data in the Cloud Case Study
WSO2 Virtual Hackathon Big Data in the Cloud Case Study
Lakmal Warusawithana
 
Counters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleCounters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary Tale
Eric Lubow
 
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward
 
DOWNSAMPLING DATA
DOWNSAMPLING DATADOWNSAMPLING DATA
DOWNSAMPLING DATA
InfluxData
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Karan Singh
 
Using OpenStack Swift for Extreme Data Durability
 Using OpenStack Swift for Extreme Data Durability Using OpenStack Swift for Extreme Data Durability
Using OpenStack Swift for Extreme Data Durability
Christian Schwede
 

What's hot (20)

Discover some "Big Data" architectural concepts with Redis
Discover some  "Big Data" architectural concepts with  Redis Discover some  "Big Data" architectural concepts with  Redis
Discover some "Big Data" architectural concepts with Redis
 
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
Building a Data Plane with K8ssandra, Apache Cassandra on KubernetesBuilding a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
Building a Data Plane with K8ssandra, Apache Cassandra on Kubernetes
 
Pachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesPachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On Kubernetes
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERN
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStack
 
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataRealtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
 
Federated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation TherapyFederated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation Therapy
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
 
Open Source is Good for Both Business and Humanity - DockerCon 2016
Open Source is Good for Both Business and Humanity - DockerCon 2016 Open Source is Good for Both Business and Humanity - DockerCon 2016
Open Source is Good for Both Business and Humanity - DockerCon 2016
 
Hadoop analytics provisioning based on a virtual infrastructure
Hadoop analytics provisioning based on a virtual infrastructureHadoop analytics provisioning based on a virtual infrastructure
Hadoop analytics provisioning based on a virtual infrastructure
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
 
Containers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKAContainers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKA
 
WSO2 Virtual Hackathon Big Data in the Cloud Case Study
WSO2 Virtual Hackathon Big Data in the Cloud Case StudyWSO2 Virtual Hackathon Big Data in the Cloud Case Study
WSO2 Virtual Hackathon Big Data in the Cloud Case Study
 
Counters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleCounters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary Tale
 
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...
 
DOWNSAMPLING DATA
DOWNSAMPLING DATADOWNSAMPLING DATA
DOWNSAMPLING DATA
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Using OpenStack Swift for Extreme Data Durability
 Using OpenStack Swift for Extreme Data Durability Using OpenStack Swift for Extreme Data Durability
Using OpenStack Swift for Extreme Data Durability
 

Similar to DSD-INT 2017 The use of big data for dredging - De Boer

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
hansen3032
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
grepalex
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
Sofian Hadiwijaya
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
Denny Lee
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
Konstantin V. Shvachko
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
Sandeep Singh
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
KeithETD_CTO
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
Dendej Sawarnkatat
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Reynold Xin
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
York University
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @Scale
Dr Hajji Hicham
 

Similar to DSD-INT 2017 The use of big data for dredging - De Boer (20)

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @Scale
 

More from Deltares

DSD-INT 2023 Hydrology User Days - Intro - Day 3 - Kroon
DSD-INT 2023 Hydrology User Days - Intro - Day 3 - KroonDSD-INT 2023 Hydrology User Days - Intro - Day 3 - Kroon
DSD-INT 2023 Hydrology User Days - Intro - Day 3 - Kroon
Deltares
 
DSD-INT 2023 Demo EPIC Response Assessment Methodology (ERAM) - Couvin Rodriguez
DSD-INT 2023 Demo EPIC Response Assessment Methodology (ERAM) - Couvin RodriguezDSD-INT 2023 Demo EPIC Response Assessment Methodology (ERAM) - Couvin Rodriguez
DSD-INT 2023 Demo EPIC Response Assessment Methodology (ERAM) - Couvin Rodriguez
Deltares
 
DSD-INT 2023 Demo Climate Stress Testing Tool (CST Tool) - Taner
DSD-INT 2023 Demo Climate Stress Testing Tool (CST Tool) - TanerDSD-INT 2023 Demo Climate Stress Testing Tool (CST Tool) - Taner
DSD-INT 2023 Demo Climate Stress Testing Tool (CST Tool) - Taner
Deltares
 
DSD-INT 2023 Demo Climate Resilient Cities Tool (CRC Tool) - Rooze
DSD-INT 2023 Demo Climate Resilient Cities Tool (CRC Tool) - RoozeDSD-INT 2023 Demo Climate Resilient Cities Tool (CRC Tool) - Rooze
DSD-INT 2023 Demo Climate Resilient Cities Tool (CRC Tool) - Rooze
Deltares
 
DSD-INT 2023 Approaches for assessing multi-hazard risk - Ward
DSD-INT 2023 Approaches for assessing multi-hazard risk - WardDSD-INT 2023 Approaches for assessing multi-hazard risk - Ward
DSD-INT 2023 Approaches for assessing multi-hazard risk - Ward
Deltares
 
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
Deltares
 
DSD-INT 2023 Global hydrological modelling to support worldwide water assessm...
DSD-INT 2023 Global hydrological modelling to support worldwide water assessm...DSD-INT 2023 Global hydrological modelling to support worldwide water assessm...
DSD-INT 2023 Global hydrological modelling to support worldwide water assessm...
Deltares
 
DSD-INT 2023 Modelling implications - IPCC Working Group II - From AR6 to AR7...
DSD-INT 2023 Modelling implications - IPCC Working Group II - From AR6 to AR7...DSD-INT 2023 Modelling implications - IPCC Working Group II - From AR6 to AR7...
DSD-INT 2023 Modelling implications - IPCC Working Group II - From AR6 to AR7...
Deltares
 
DSD-INT 2023 Knowledge and tools for Climate Adaptation - Jeuken
DSD-INT 2023 Knowledge and tools for Climate Adaptation - JeukenDSD-INT 2023 Knowledge and tools for Climate Adaptation - Jeuken
DSD-INT 2023 Knowledge and tools for Climate Adaptation - Jeuken
Deltares
 
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - BootsmaDSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
Deltares
 
DSD-INT 2023 Create your own MODFLOW 6 sub-variant - Muller
DSD-INT 2023 Create your own MODFLOW 6 sub-variant - MullerDSD-INT 2023 Create your own MODFLOW 6 sub-variant - Muller
DSD-INT 2023 Create your own MODFLOW 6 sub-variant - Muller
Deltares
 
DSD-INT 2023 Example of unstructured MODFLOW 6 modelling in California - Romero
DSD-INT 2023 Example of unstructured MODFLOW 6 modelling in California - RomeroDSD-INT 2023 Example of unstructured MODFLOW 6 modelling in California - Romero
DSD-INT 2023 Example of unstructured MODFLOW 6 modelling in California - Romero
Deltares
 
DSD-INT 2023 Challenges and developments in groundwater modeling - Bakker
DSD-INT 2023 Challenges and developments in groundwater modeling - BakkerDSD-INT 2023 Challenges and developments in groundwater modeling - Bakker
DSD-INT 2023 Challenges and developments in groundwater modeling - Bakker
Deltares
 
DSD-INT 2023 Demo new features iMOD Suite - van Engelen
DSD-INT 2023 Demo new features iMOD Suite - van EngelenDSD-INT 2023 Demo new features iMOD Suite - van Engelen
DSD-INT 2023 Demo new features iMOD Suite - van Engelen
Deltares
 
DSD-INT 2023 iMOD and new developments - Davids
DSD-INT 2023 iMOD and new developments - DavidsDSD-INT 2023 iMOD and new developments - Davids
DSD-INT 2023 iMOD and new developments - Davids
Deltares
 
DSD-INT 2023 Recent MODFLOW Developments - Langevin
DSD-INT 2023 Recent MODFLOW Developments - LangevinDSD-INT 2023 Recent MODFLOW Developments - Langevin
DSD-INT 2023 Recent MODFLOW Developments - Langevin
Deltares
 
DSD-INT 2023 Hydrology User Days - Presentations - Day 2
DSD-INT 2023 Hydrology User Days - Presentations - Day 2DSD-INT 2023 Hydrology User Days - Presentations - Day 2
DSD-INT 2023 Hydrology User Days - Presentations - Day 2
Deltares
 
DSD-INT 2023 Needs related to user interfaces - Snippen
DSD-INT 2023 Needs related to user interfaces - SnippenDSD-INT 2023 Needs related to user interfaces - Snippen
DSD-INT 2023 Needs related to user interfaces - Snippen
Deltares
 
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - BootsmaDSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
Deltares
 
DSD-INT 2023 Parameterization of a RIBASIM model and the network lumping appr...
DSD-INT 2023 Parameterization of a RIBASIM model and the network lumping appr...DSD-INT 2023 Parameterization of a RIBASIM model and the network lumping appr...
DSD-INT 2023 Parameterization of a RIBASIM model and the network lumping appr...
Deltares
 

More from Deltares (20)

DSD-INT 2023 Hydrology User Days - Intro - Day 3 - Kroon
DSD-INT 2023 Hydrology User Days - Intro - Day 3 - KroonDSD-INT 2023 Hydrology User Days - Intro - Day 3 - Kroon
DSD-INT 2023 Hydrology User Days - Intro - Day 3 - Kroon
 
DSD-INT 2023 Demo EPIC Response Assessment Methodology (ERAM) - Couvin Rodriguez
DSD-INT 2023 Demo EPIC Response Assessment Methodology (ERAM) - Couvin RodriguezDSD-INT 2023 Demo EPIC Response Assessment Methodology (ERAM) - Couvin Rodriguez
DSD-INT 2023 Demo EPIC Response Assessment Methodology (ERAM) - Couvin Rodriguez
 
DSD-INT 2023 Demo Climate Stress Testing Tool (CST Tool) - Taner
DSD-INT 2023 Demo Climate Stress Testing Tool (CST Tool) - TanerDSD-INT 2023 Demo Climate Stress Testing Tool (CST Tool) - Taner
DSD-INT 2023 Demo Climate Stress Testing Tool (CST Tool) - Taner
 
DSD-INT 2023 Demo Climate Resilient Cities Tool (CRC Tool) - Rooze
DSD-INT 2023 Demo Climate Resilient Cities Tool (CRC Tool) - RoozeDSD-INT 2023 Demo Climate Resilient Cities Tool (CRC Tool) - Rooze
DSD-INT 2023 Demo Climate Resilient Cities Tool (CRC Tool) - Rooze
 
DSD-INT 2023 Approaches for assessing multi-hazard risk - Ward
DSD-INT 2023 Approaches for assessing multi-hazard risk - WardDSD-INT 2023 Approaches for assessing multi-hazard risk - Ward
DSD-INT 2023 Approaches for assessing multi-hazard risk - Ward
 
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
DSD-INT 2023 Dynamic Adaptive Policy Pathways (DAPP) - Theory & Showcase - Wa...
 
DSD-INT 2023 Global hydrological modelling to support worldwide water assessm...
DSD-INT 2023 Global hydrological modelling to support worldwide water assessm...DSD-INT 2023 Global hydrological modelling to support worldwide water assessm...
DSD-INT 2023 Global hydrological modelling to support worldwide water assessm...
 
DSD-INT 2023 Modelling implications - IPCC Working Group II - From AR6 to AR7...
DSD-INT 2023 Modelling implications - IPCC Working Group II - From AR6 to AR7...DSD-INT 2023 Modelling implications - IPCC Working Group II - From AR6 to AR7...
DSD-INT 2023 Modelling implications - IPCC Working Group II - From AR6 to AR7...
 
DSD-INT 2023 Knowledge and tools for Climate Adaptation - Jeuken
DSD-INT 2023 Knowledge and tools for Climate Adaptation - JeukenDSD-INT 2023 Knowledge and tools for Climate Adaptation - Jeuken
DSD-INT 2023 Knowledge and tools for Climate Adaptation - Jeuken
 
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - BootsmaDSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
 
DSD-INT 2023 Create your own MODFLOW 6 sub-variant - Muller
DSD-INT 2023 Create your own MODFLOW 6 sub-variant - MullerDSD-INT 2023 Create your own MODFLOW 6 sub-variant - Muller
DSD-INT 2023 Create your own MODFLOW 6 sub-variant - Muller
 
DSD-INT 2023 Example of unstructured MODFLOW 6 modelling in California - Romero
DSD-INT 2023 Example of unstructured MODFLOW 6 modelling in California - RomeroDSD-INT 2023 Example of unstructured MODFLOW 6 modelling in California - Romero
DSD-INT 2023 Example of unstructured MODFLOW 6 modelling in California - Romero
 
DSD-INT 2023 Challenges and developments in groundwater modeling - Bakker
DSD-INT 2023 Challenges and developments in groundwater modeling - BakkerDSD-INT 2023 Challenges and developments in groundwater modeling - Bakker
DSD-INT 2023 Challenges and developments in groundwater modeling - Bakker
 
DSD-INT 2023 Demo new features iMOD Suite - van Engelen
DSD-INT 2023 Demo new features iMOD Suite - van EngelenDSD-INT 2023 Demo new features iMOD Suite - van Engelen
DSD-INT 2023 Demo new features iMOD Suite - van Engelen
 
DSD-INT 2023 iMOD and new developments - Davids
DSD-INT 2023 iMOD and new developments - DavidsDSD-INT 2023 iMOD and new developments - Davids
DSD-INT 2023 iMOD and new developments - Davids
 
DSD-INT 2023 Recent MODFLOW Developments - Langevin
DSD-INT 2023 Recent MODFLOW Developments - LangevinDSD-INT 2023 Recent MODFLOW Developments - Langevin
DSD-INT 2023 Recent MODFLOW Developments - Langevin
 
DSD-INT 2023 Hydrology User Days - Presentations - Day 2
DSD-INT 2023 Hydrology User Days - Presentations - Day 2DSD-INT 2023 Hydrology User Days - Presentations - Day 2
DSD-INT 2023 Hydrology User Days - Presentations - Day 2
 
DSD-INT 2023 Needs related to user interfaces - Snippen
DSD-INT 2023 Needs related to user interfaces - SnippenDSD-INT 2023 Needs related to user interfaces - Snippen
DSD-INT 2023 Needs related to user interfaces - Snippen
 
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - BootsmaDSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
DSD-INT 2023 Coupling RIBASIM to a MODFLOW groundwater model - Bootsma
 
DSD-INT 2023 Parameterization of a RIBASIM model and the network lumping appr...
DSD-INT 2023 Parameterization of a RIBASIM model and the network lumping appr...DSD-INT 2023 Parameterization of a RIBASIM model and the network lumping appr...
DSD-INT 2023 Parameterization of a RIBASIM model and the network lumping appr...
 

Recently uploaded

Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 

Recently uploaded (20)

Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 

DSD-INT 2017 The use of big data for dredging - De Boer

  • 1. The use of big data for dredging Gerben de Boer, Van Oord , Engineering, OpenEarth data management Delft Software Days 2017
  • 2. Hire right cloud provider • Hadoop / HD insight • Sparq • Cassandra • uSQL • CosmosDB Relations: AI. burn money on cloud providers Big data philosophies: Statistics requires 30+ realizations 2 Brute force Smart force Hire right people • Thematic nerds (any engineering) • Software developer (py, js, sql) • DevOps • Sales, social • Graphic designer Relations: Business logic + physics burn money on wages • Data scientist • Data analytics manager • Data architect • Data engineer • Statistician • DBA • Business analyst • Data analyst
  • 5. SQL has almost no limits 5 For most users SQL is not big data. Only your wallet is a limiting factor • Out of preview 15 nov • 1TB • 99.99% availability • 35 days point-in-time restore • We tried 0.5 TB, limited by SSD disk IO. • 4TB Azure postgres Azure SQL server Postgres in Azure VM
  • 6. • Pure SQL • TB SQL database no problem • Postgres single threaded • Use indexing, views, caching tools: think about Content that’s needs to be Delivered (CDN) • Postgres native jsonb datatype • MS uSQL can reach ascii files, and use R and python code Overcome SQL limits: hybrid and noSQL 6 SQL • Put (jsonb) as files on disk and load the subset you need, or when replication needed • csv, json, xml, yml, netcdf • + many legacy formats • Database as API, not archive • Only index to files on disk • E.g. Tiff postgis raster • Van Oord vessellog = netCDF + PG index “(NASA technology”): hybrid • Pure noSQL: structured folder with structured files device/yy/mm/dd/signal • Micro service to handle files on demands • Regular expressions are your friend. • netCDF/HDF was originally devised to overcome SQL limit noSQL = files
  • 7. OpenEarthRawData: partial checkout Git has binary file extention. Git canot make a partial checkout How to get local copy of a subset of the data 7 vcs Data to WxS on server WxS to data by client 2 unnecesarry processing steps WxS webservices
  • 8. First computer was designed to print gonio tables flawless. Now we replicate the algorithm, not the table. Babbage: storage, bandwidth, compute Babbage: table vs calculator: 2 retrieval methods Trade-off made explicit by cloud pay-as-you-go • Storage: disk occupancy + IO operations • Compute: CPU + Memory • Bandwidth: too slow: Replicate database vs replicate raw data + ETL Cloud Copy DB dump. Copy raw data and rerun ETL.
  • 9. Example: ANH2: 0.5 x 0.5 m2 DEM of Netherlands: 1000 tiff (wcs broken)
  • 10. Example: ANH2: 0.5 x 0.5 m2 DEM of Netherlands: download 50 hours
  • 11. Example: ANH2: 0.5 x 0.5 m2 DEM of Netherlands: partial download 11
  • 12. Example: ANH2: 0.5 x 0.5 m2 DEM of Netherlands: notebook in cloud
  • 13. Good idea to stream graphics to screens: WMS. Limits grid data to what you can actually see People actually use quad-trees, not WMTS: tiled. Use (geo)json for plotting vector data: plot.ly geojson only OGC in 2017, 9 years after conception ! Bad idea to stream big data: WCS, WFS Keep all processing in the datacenters. Only graphical results. INSPIRE + OGC: not front-runners. WXS > CDN 13 WXS • CDN - content delivery network • The backbone behind youtube, netflix • Makes datacenters geospatially redundant • Rapidly replicates raw data files (tiff) • Use your own ETL tools locally CDN
  • 14. Big data reinvented wheel (1)
  • 16. Overload of historic data formats: parsing Datawell wave buoy: 30 kB code to parse 93 bytes OGC SOS is not a solution: xml garbage. Satellite data still very expensive Solutions are available: Google protobuffers Variety: parsing is ETL 16 Sensor supplier, SCADA
  • 17. ETL processes are run once Database is considered archive ETL removes some raw data features Collect once, maybe re-use many times Parsers do not evolve: waterfall Good for: known knowns Share data and processing (Manhattan optimization) 17 ETL In ELT the generic parsers run each request Parsers can run on-the-fly in a micro-service All raw data features can be kept as parsers evolve Collect once, allow any future use Parsers evolve agile: extra from_* methods Good for: unknown unknowns ELT: share code via github ! parser.to_sql() parser.from_garbage()
  • 18. • SQL server can now un R and python code • Windows and linux can run same containers Big unstructured Datalake • SQL sources + noSQL sources • Brute force to run ELT jobs: Hadoop • Economic trade-off brains vs clouds Datalake 18 Datalake 18 Codelake parser.from_garbage() parser.from_garbage() parser.from_garbage() parser.from_garbage() parser.from_garbage()
  • 19. L0 raw data L0_L1 code L1 products L1_L2 code L2 products … Big data reinvented wheel (2) 19 Big data reinvented the wheel
  • 21. Run micro services on top of Datalake One for each specific question. This software needs to work at any data replication • Localhost • Azure • Amazon • On-premise • On-vessel We need to make servers redistributable CONTAINERS Micro services Datalake
  • 22. OpenEarth: monthly Docker sprintsession @ Microsoft NL, Schiphol 22 Van Oord, Deltares, Tu Delft, KNMI, NLeSC, Sogeti, Microsoft, Maris, …
  • 23. • Docker sprint session every month • https://github.com/openearth-stack • Van Oord, Tu Delft, Deltares, Microsoft NLeSC, KNMI, Maris • Gerben.deboer@vanoord.com OpenEarth Docker Azure DigiShape 23 Organization • Pyramid python web framework • PostgreSQL • KNMI Adaguc • Geoserver • …. Components
  • 25. Excel is our only Big data nightmare Old, grey clerks and managers. The use Excel as paper. Manual data can be digitized with rapid apps. Low-code revolution: app-in-a-day. Variety 25 Low-code Apps http://www.janbanning.com/
  • 26. Excel course: who ever read the instructions? https://danjharrington.wordpress.com/2012/08/01/excel-logos-over-the-years/ Gerben J de Boer, Van Oord, E&E, OpenEarth Data Management
  • 27. 4Vs Volume wxs → cdn Variety ETL → ELT Velocity DTAP → CICD Veracity xls → app