SlideShare a Scribd company logo
© 2023
A Modern & Sustainable
Analytics Stack
W A L D
PyConDE / PyData 2023, April 17th
Florian Wilhelm
Mathematical Modelling
Modern Data Warehousing & Analytics
Personalisation & RecSys
Uncertainty Quantification & Causality
Python Data Stack
OSS Contributor & Creator of PyScaffold
Dr. Florian Wilhelm
• H E A D O F D ATA S C I E N C E
FlorianWilhelm.info
florian.wilhelm@inovex.de
Florian Wilhelm
@FlorianWilhelm
‣ Application Development (Web Platforms,
Mobile Apps, Smart Devices and Robotics, UI/
UX design,Backend Services)
‣ Data Management and Analytics (Business
Intelligence, Big Data, Searches, Data Science
and Deep Learning, Machine Perception and
Artificial Intelligence)
‣ Scalable IT-Infrastructures (IT Engineering,
Cloud Services, DevOps, Replatforming,
Security)
‣ Training and Coaching (inovex Academy)
is an innovation and quality-
driven IT project house with a
focus on digital transformation.
Using technology to
inspire our clients.
And ourselves.
Berlin · Karlsruhe · Pforzheim · Stuttgart · München · Köln · Hamburg · Erlangen
www.inovex.de
V S
Data Warehouse
Data Lake
V S
ETL
ELT
&
C O M PA R I S O N
Data Lake vs Data Warehouse
Data Lake Data Warehouse
Data Structure Raw Processed
Purpose of Data Not yet determined Currently in use
Users Data scientists /
engineers
Business professionals
Accessibility Highly accessible and
quick to update
More complicated and
costly to make changes
C O M PA R I S O N
ETL — Extract, Transform, Load
S TA G I N G A R E A
I N D ATA L A K E
T R A N S F O R M
S Q L R D B M S
N O S Q L D B M S
X M L F I L E S
S A A S P L AT F O R M
E X T R A C T
D ATA
WA R E H O U S E
L O A D
A N A LY T I C S
A N A L Y S E
C O M PA R I S O N
ELT — Extract, Load, Transform
D ATA
WA R E H O U S E
T R A N S F O R M
S Q L R D B M S
N O S Q L D B M S
X M L F I L E S
S A A S P L AT F O R M
E X T R A C T L O A D
A N A LY T I C S
A N A L Y S E
P R O G R E S S
The Evolution of Databricks & Snowflake
D ATA L A K E
& E T L
D ATA WA R E H O U S E
& E LT
P R O G R E S S
The Evolution of Databricks & Snowflake
D ATA B R I C K S
L A K E H O U S E
S N OW F L A K E
D ATA C L O U D
U N I F I C AT I O N
O F D ATA WA R E H O U S E & L A K E
W A L D
S T A C K
W A L D
S T A C K
WA L D S TA C K
A Modern & Sustainable Analytics Stack
Warehouse like
Snowflake, Google
BigQuery, Databricks
Airbyte for
Integration of
hundreds of data
sources
Lightdash for
Business
Intelligence
DBT for data
transformations
WA L D S TA C K
WALD based on Google Big Query
‣ fully managed enterprise data
warehouse
‣ only available on Google Cloud
Platform
‣ uses dbt-bigquery to run Python
UDFs with Dataproc using
PySpark
WA L D S TA C K
WALD based on Databricks
‣ uses Databrick’s SQL warehouse
‣ available on all major cloud
providers
‣ uses dbt-databricks to run Python
UDFs with PySpark
WA L D S TA C K
WALD based on Snowflake
‣ comes with all the benefits of
Snowflake, e.g. governance, easy
collaboration, security, near zero
maintenance
‣ runs on all major cloud providers
‣ tons of great documentation
‣ the easiest to use and full-
featured option to get started
with WALD
WA L D S TA C K
Modern Analytics Stack
DATA CLOUD
DATA SOURCES
E X T R A C T L O A D T R A N S F O R M A n a l y s e
SNOWPARK
WA L D S TA C K
Example
Use-Case
>find all code & details under
https://waldstack.org
F O R M U L A 1
Analysis and Position Prediction
1. Upload data using Airbyte to
Snowflake
2. Use dbt for some basic analysis
3. Use dbt with Snowpark for some
ML to predict future position
4. Visualise our results with
Lightdash
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
F O R M U L A 1
Rough Overview of the Data
‣ dimensions: drivers,
circuits, constructors,
races
‣ facts: lap_times,
pit_tops, results
S N OW F L A K E
Snowflake Setup
‣ activate the Anaconda Packages
(Preview Feature)
‣ create a warehouse to be used
for dbt
‣ that’s it already!
A I R B Y T E
Ingesting the
Data with Airbyte
Simply select
‣ file as source
‣ Snowflake as destination
and hit Launch
P Y T H O N + S N OW PA R K
Setup Python/Snowpark for dbt
‣ just install along dbt also
‣ dbt-snowflake
‣ snowflake-snowpark-
python
‣ snowflake-connector-
python
‣ and configure Snowflake in your
profiles.yml
P Y T H O N + S N OW PA R K
Training a Python Model in dbt/Snowpark
D B T / S N OW PA R K
Predicting with a trained Model in dbt/Snowpark
L I G H T D A S H
Visualisation with Lightdash
Lightdash pulls dimensions and metrics from dbt.
Lightdash
‣ runs ad-hoc queries
‣ creates charts from queries
‣ creates dashboards from charts
‣ organises everything in spaces
What are the avg lap times split by year and grand prix?
M E T R I C D I M E N S I O N S
L I G H T D A S H
Main View of Lightdash
L I G H T D A S H
Analysing the Average Lap Times
L I G H T D A S H
Example of a Simple Dashboard
F O R M U L A 1
Analysis and Position Prediction
One more thing…
O N M O R E T H I N G O N …
Streamlit
‣ provides faster way to build and
share data apps
‣ was recently acquired by
Snowflake
‣ might be a nice addition to
Lightdash for interactive data
apps
Dr. Florian Wilhelm, 2023
WA L D S TA C K
Conclusion
The WALD stack combines 4 powerful
technologies that work very well
together.
The WALD stack is flexible enough for
many use-cases from analytics to
building a full-fledged end-to-end
data product at scale.
C H E E R S T O T H E C O M M U N I T Y
Credits & Resources
‣ python-snowpark-formula
Github repo by Hope Watson
from dbt labs
‣ kind support by Michael Gorkow
& Marko Cavar from Snowflake
‣ these awesome slides by
Michael Hofmann from inovex
© 2023
Thank you!
Dr. Florian Wilhelm
Head of Data Science
P y C o n D E / P y D a t a 2 0 2 3
inovex.de
florian.wilhelm@inovex.de
@inovexlife
@inovexgmbh
waldstack.org

More Related Content

What's hot

How Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelHow Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global Travel
Neo4j
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
Databricks
 
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Neo4j
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
DATAVERSITY
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Neo4j
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
DATAVERSITY
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation
Delphix
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
Joseph Arriola
 
Tutustuminen data-analytiikan ja big datan maailmaan
Tutustuminen data-analytiikan ja big datan maailmaanTutustuminen data-analytiikan ja big datan maailmaan
Tutustuminen data-analytiikan ja big datan maailmaan
Jari Jussila
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
data-management-strategy data-management-strategy
data-management-strategy data-management-strategydata-management-strategy data-management-strategy
data-management-strategy data-management-strategy
maheshs191007
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Certus Solutions
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
Applying Network Analytics in KYC
Applying Network Analytics in KYCApplying Network Analytics in KYC
Applying Network Analytics in KYC
Neo4j
 

What's hot (20)

How Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelHow Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global Travel
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
 
Tutustuminen data-analytiikan ja big datan maailmaan
Tutustuminen data-analytiikan ja big datan maailmaanTutustuminen data-analytiikan ja big datan maailmaan
Tutustuminen data-analytiikan ja big datan maailmaan
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
data-management-strategy data-management-strategy
data-management-strategy data-management-strategydata-management-strategy data-management-strategy
data-management-strategy data-management-strategy
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Applying Network Analytics in KYC
Applying Network Analytics in KYCApplying Network Analytics in KYC
Applying Network Analytics in KYC
 

Similar to WALD: A Modern & Sustainable Analytics Stack

How to Transform Into a Data-Driven Organization
How to Transform Into a Data-Driven OrganizationHow to Transform Into a Data-Driven Organization
How to Transform Into a Data-Driven Organization
WarrenCruz3
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
Krishna Sankar
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
Inside Analysis
 
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWSPuppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
johnpainter_id_au
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Denodo
 
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data StrategyDemystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Denodo
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Trivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis - Microsoft Transform your data estate with cloud, data and AITrivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis
 
Digital transformation with microsoft data and ai
Digital transformation with microsoft data and ai Digital transformation with microsoft data and ai
Digital transformation with microsoft data and ai
MichaelRoenker
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
Skillspeed
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
Inside Analysis
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
Denodo
 
Cloud Infrastructure: Changing Today's World
Cloud Infrastructure: Changing Today's WorldCloud Infrastructure: Changing Today's World
Cloud Infrastructure: Changing Today's World
Kirti Khanna
 
Data democratised
Data democratisedData democratised
Data democratised
Lars Albertsson
 
What is CRATE?
What is CRATE?What is CRATE?
What is CRATE?
Jodok Batlogg
 
How to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDBHow to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDB
VoltDB
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
DataWorks Summit/Hadoop Summit
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run Graph
Vaticle
 
The Central Hub: Defining the Data Lake
The Central Hub: Defining the Data LakeThe Central Hub: Defining the Data Lake
The Central Hub: Defining the Data Lake
Eric Kavanagh
 
Process big data within an hour, with the OVH Public Cloud
Process big data within an hour, with the OVH Public CloudProcess big data within an hour, with the OVH Public Cloud
Process big data within an hour, with the OVH Public Cloud
OVHcloud
 

Similar to WALD: A Modern & Sustainable Analytics Stack (20)

How to Transform Into a Data-Driven Organization
How to Transform Into a Data-Driven OrganizationHow to Transform Into a Data-Driven Organization
How to Transform Into a Data-Driven Organization
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
 
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWSPuppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
 
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data StrategyDemystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Trivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis - Microsoft Transform your data estate with cloud, data and AITrivadis - Microsoft Transform your data estate with cloud, data and AI
Trivadis - Microsoft Transform your data estate with cloud, data and AI
 
Digital transformation with microsoft data and ai
Digital transformation with microsoft data and ai Digital transformation with microsoft data and ai
Digital transformation with microsoft data and ai
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Cloud Infrastructure: Changing Today's World
Cloud Infrastructure: Changing Today's WorldCloud Infrastructure: Changing Today's World
Cloud Infrastructure: Changing Today's World
 
Data democratised
Data democratisedData democratised
Data democratised
 
What is CRATE?
What is CRATE?What is CRATE?
What is CRATE?
 
How to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDBHow to Build Cloud-based Microservice Environments with Docker and VoltDB
How to Build Cloud-based Microservice Environments with Docker and VoltDB
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Power of the Run Graph
Power of the Run GraphPower of the Run Graph
Power of the Run Graph
 
The Central Hub: Defining the Data Lake
The Central Hub: Defining the Data LakeThe Central Hub: Defining the Data Lake
The Central Hub: Defining the Data Lake
 
Process big data within an hour, with the OVH Public Cloud
Process big data within an hour, with the OVH Public CloudProcess big data within an hour, with the OVH Public Cloud
Process big data within an hour, with the OVH Public Cloud
 

More from Florian Wilhelm

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer Programming
Florian Wilhelm
 
Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!
Florian Wilhelm
 
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
Florian Wilhelm
 
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Florian Wilhelm
 
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Florian Wilhelm
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AI
Florian Wilhelm
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
Florian Wilhelm
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
Florian Wilhelm
 
How mobile.de brings Data Science to Production for a Personalized Web Experi...
How mobile.de brings Data Science to Production for a Personalized Web Experi...How mobile.de brings Data Science to Production for a Personalized Web Experi...
How mobile.de brings Data Science to Production for a Personalized Web Experi...
Florian Wilhelm
 
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Deep Learning-based Recommendations for Germany's Biggest Vehicle MarketplaceDeep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Florian Wilhelm
 
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Florian Wilhelm
 
Declarative Thinking and Programming
Declarative Thinking and ProgrammingDeclarative Thinking and Programming
Declarative Thinking and Programming
Florian Wilhelm
 
Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017
Florian Wilhelm
 
PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19
Florian Wilhelm
 
Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...
Florian Wilhelm
 

More from Florian Wilhelm (16)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer Programming
 
Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!
 
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
 
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
 
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AI
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
How mobile.de brings Data Science to Production for a Personalized Web Experi...
How mobile.de brings Data Science to Production for a Personalized Web Experi...How mobile.de brings Data Science to Production for a Personalized Web Experi...
How mobile.de brings Data Science to Production for a Personalized Web Experi...
 
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Deep Learning-based Recommendations for Germany's Biggest Vehicle MarketplaceDeep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
 
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
 
Declarative Thinking and Programming
Declarative Thinking and ProgrammingDeclarative Thinking and Programming
Declarative Thinking and Programming
 
Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017
 
PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19
 
Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 

WALD: A Modern & Sustainable Analytics Stack

  • 1. © 2023 A Modern & Sustainable Analytics Stack W A L D PyConDE / PyData 2023, April 17th Florian Wilhelm
  • 2. Mathematical Modelling Modern Data Warehousing & Analytics Personalisation & RecSys Uncertainty Quantification & Causality Python Data Stack OSS Contributor & Creator of PyScaffold Dr. Florian Wilhelm • H E A D O F D ATA S C I E N C E FlorianWilhelm.info florian.wilhelm@inovex.de Florian Wilhelm @FlorianWilhelm
  • 3. ‣ Application Development (Web Platforms, Mobile Apps, Smart Devices and Robotics, UI/ UX design,Backend Services) ‣ Data Management and Analytics (Business Intelligence, Big Data, Searches, Data Science and Deep Learning, Machine Perception and Artificial Intelligence) ‣ Scalable IT-Infrastructures (IT Engineering, Cloud Services, DevOps, Replatforming, Security) ‣ Training and Coaching (inovex Academy) is an innovation and quality- driven IT project house with a focus on digital transformation. Using technology to inspire our clients. And ourselves. Berlin · Karlsruhe · Pforzheim · Stuttgart · München · Köln · Hamburg · Erlangen www.inovex.de
  • 4. V S Data Warehouse Data Lake V S ETL ELT &
  • 5. C O M PA R I S O N Data Lake vs Data Warehouse Data Lake Data Warehouse Data Structure Raw Processed Purpose of Data Not yet determined Currently in use Users Data scientists / engineers Business professionals Accessibility Highly accessible and quick to update More complicated and costly to make changes
  • 6. C O M PA R I S O N ETL — Extract, Transform, Load S TA G I N G A R E A I N D ATA L A K E T R A N S F O R M S Q L R D B M S N O S Q L D B M S X M L F I L E S S A A S P L AT F O R M E X T R A C T D ATA WA R E H O U S E L O A D A N A LY T I C S A N A L Y S E
  • 7. C O M PA R I S O N ELT — Extract, Load, Transform D ATA WA R E H O U S E T R A N S F O R M S Q L R D B M S N O S Q L D B M S X M L F I L E S S A A S P L AT F O R M E X T R A C T L O A D A N A LY T I C S A N A L Y S E
  • 8. P R O G R E S S The Evolution of Databricks & Snowflake D ATA L A K E & E T L D ATA WA R E H O U S E & E LT
  • 9. P R O G R E S S The Evolution of Databricks & Snowflake D ATA B R I C K S L A K E H O U S E S N OW F L A K E D ATA C L O U D U N I F I C AT I O N O F D ATA WA R E H O U S E & L A K E
  • 10. W A L D S T A C K
  • 11. W A L D S T A C K WA L D S TA C K A Modern & Sustainable Analytics Stack Warehouse like Snowflake, Google BigQuery, Databricks Airbyte for Integration of hundreds of data sources Lightdash for Business Intelligence DBT for data transformations
  • 12. WA L D S TA C K WALD based on Google Big Query ‣ fully managed enterprise data warehouse ‣ only available on Google Cloud Platform ‣ uses dbt-bigquery to run Python UDFs with Dataproc using PySpark
  • 13. WA L D S TA C K WALD based on Databricks ‣ uses Databrick’s SQL warehouse ‣ available on all major cloud providers ‣ uses dbt-databricks to run Python UDFs with PySpark
  • 14. WA L D S TA C K WALD based on Snowflake ‣ comes with all the benefits of Snowflake, e.g. governance, easy collaboration, security, near zero maintenance ‣ runs on all major cloud providers ‣ tons of great documentation ‣ the easiest to use and full- featured option to get started with WALD
  • 15. WA L D S TA C K Modern Analytics Stack DATA CLOUD DATA SOURCES E X T R A C T L O A D T R A N S F O R M A n a l y s e SNOWPARK
  • 16. WA L D S TA C K Example Use-Case >find all code & details under https://waldstack.org
  • 17. F O R M U L A 1 Analysis and Position Prediction 1. Upload data using Airbyte to Snowflake 2. Use dbt for some basic analysis 3. Use dbt with Snowpark for some ML to predict future position 4. Visualise our results with Lightdash
  • 18. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 19. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 20. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 21. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 22. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 23. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 24. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 25. F O R M U L A 1 Rough Overview of the Data ‣ dimensions: drivers, circuits, constructors, races ‣ facts: lap_times, pit_tops, results
  • 26. S N OW F L A K E Snowflake Setup ‣ activate the Anaconda Packages (Preview Feature) ‣ create a warehouse to be used for dbt ‣ that’s it already!
  • 27. A I R B Y T E Ingesting the Data with Airbyte Simply select ‣ file as source ‣ Snowflake as destination and hit Launch
  • 28. P Y T H O N + S N OW PA R K Setup Python/Snowpark for dbt ‣ just install along dbt also ‣ dbt-snowflake ‣ snowflake-snowpark- python ‣ snowflake-connector- python ‣ and configure Snowflake in your profiles.yml
  • 29. P Y T H O N + S N OW PA R K Training a Python Model in dbt/Snowpark
  • 30. D B T / S N OW PA R K Predicting with a trained Model in dbt/Snowpark
  • 31. L I G H T D A S H Visualisation with Lightdash Lightdash pulls dimensions and metrics from dbt. Lightdash ‣ runs ad-hoc queries ‣ creates charts from queries ‣ creates dashboards from charts ‣ organises everything in spaces What are the avg lap times split by year and grand prix? M E T R I C D I M E N S I O N S
  • 32. L I G H T D A S H Main View of Lightdash
  • 33. L I G H T D A S H Analysing the Average Lap Times
  • 34. L I G H T D A S H Example of a Simple Dashboard
  • 35. F O R M U L A 1 Analysis and Position Prediction
  • 37. O N M O R E T H I N G O N … Streamlit ‣ provides faster way to build and share data apps ‣ was recently acquired by Snowflake ‣ might be a nice addition to Lightdash for interactive data apps
  • 38. Dr. Florian Wilhelm, 2023 WA L D S TA C K Conclusion The WALD stack combines 4 powerful technologies that work very well together. The WALD stack is flexible enough for many use-cases from analytics to building a full-fledged end-to-end data product at scale.
  • 39. C H E E R S T O T H E C O M M U N I T Y Credits & Resources ‣ python-snowpark-formula Github repo by Hope Watson from dbt labs ‣ kind support by Michael Gorkow & Marko Cavar from Snowflake ‣ these awesome slides by Michael Hofmann from inovex
  • 40. © 2023 Thank you! Dr. Florian Wilhelm Head of Data Science P y C o n D E / P y D a t a 2 0 2 3 inovex.de florian.wilhelm@inovex.de @inovexlife @inovexgmbh waldstack.org