SlideShare a Scribd company logo
Anatomy of a Data Driven
Architecture
@tamir_dresher
System Architect @ Payoneer
2
System Architect @ @tamir_dresher
Tamir Dresher
My Books:
Software Engineering Lecturer
Ruppin Academic Center
tamirdr@payoneer.com
https://www.israelclouds.com/iasaisrael
3
The need for DATA
Your Data
Data-Driven Decision Making Data-powered product
• What markets are leading and where can I expand?
• What’s slowing my process?
• Is there a correlation between the time invested
in a sale and the income from the tenant?
• What products should I recommend to this user?
• Is this action fraudulent ?
• Should I suggest a discount to this user to
raise the chance for a purchase?
4
ETL vs. ELT vs. Streaming
Transform LoadExtract
Transactional DB (OLTP) Analytical DB (OLAP)
LoadExtract
Transactional DB (OLTP)
Analytical DB (OLAP) / Storage
Transform
Events
Event
Stream
Stream
Processor
Real-time
insights
5
Lambda (λ) & Kappa (ϰ) Architectures
Speed Layer
Batch Layer Serving Layer
Real-time
views
Batched
views
Query
Data
Lambda
Speed/Streaming
Layer
Serving
Layer
Speed
views
QueryData
Kappa
Processing
6
The Data Pipeline
Architecture
7Cross Cutting and Ops
The Data Pipeline Architecture
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
(Visualization,
Alerting,
Insights)
Discovery/Catalog/
Metadata Management
Security Quality&Testing Observability
8
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Databases Files Events APIs
• CDC - change data
capture
• Object Storage
• Structured/
Unstructured
• Customer tracking
• WebHooks
• Domain Events
• External Services
• Enrichment
9
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
10
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitch
11
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (Pandas, boto), Hive
• Workflows – Airflow, Dagster, Luigi
12
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (panda, boto), Hive
• Workflows – Airflow, Dagster, Luigi
https://towardsdatascience.com/building-a-production-level-etl-pipeline-platform-using-apache-airflow-a4cf34203fbd
13
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (panda, boto), Hive
• Workflows – Airflow, Dagster, Luigi
Event streaming and processing
• Messaging - Kafka, Pulsar, Kinesis, Event Hub
• Processing – Spark, Flink, Samza, Kafka
Streams, Azure Stream Analytics
14
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Integration Platforms
• Connectors to sources and destinations
• E.g. fivtran, stitchdata, rivery
Data Sources
Customized batching/micro-batching
• Spark jobs, Data libraries (panda, boto), Hive
• Workflows – Airflow, Dagster, Luigi
Event streaming and processing
• Messaging - Kafka, Pulsar, Kinesis, Event Hub
• Processing – Spark, Flink, Samza, Kafka
Streams, Azure Stream Analytics
-- Continuously aggregating a stream into a table with a ksqlDB push query.
CREATE STREAM locationUpdatesStream ...;
CREATE TABLE locationsPerUser AS
SELECT username, COUNT(*)
FROM locationUpdatesStream
GROUP BY username
EMIT CHANGES;
// Continuously aggregating to table
KStream<String, String> locationUpdatesStream = ...;
KTable<String, Long> locationsPerUser
= locationUpdatesStream
.groupBy((k, v) -> v.username)
.count();
https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming/
15
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Data Warehouse
• Structured format
• Designed to quickly generate
insights based on SQL like
queries
• Modern cloud base offerings –
Redshift, BigQuery, Snowflake,
Azure Synapse
Data Lake
• Structured and non-structured
data – CSV, Parquet, Images,
Audio
• Raw and historical data
• Designed to be used by data
scientists and create models by
various languages
16
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Predictive
• Generate model
• Data Science and ML libraries –
Pandas, Numpy, R, scikit etc
• The model is periodically refreshed
Storage
Retrospective (Historical)
• Deriving Intelligence based on
statistics
• Built-in engine (Data Warehouse)
OR
• Query Engines – Presto, Impala
Real Time Analytics
• Run analytical queries
over big volumes of data
with interactive latencies.
• Apache Pinot,
Clickhouse, Druid
Data Science Platforms
• Helps managing the workflows,
productization and operations
- SageMaker, Iguazio, DataBricks etc.
17
Data Sources
Ingestion
&
Transformation
Storage
Query
&
Processing
Consumption
Custom Apps
• Execute the model – how is the model
reachable?
• Translate user/system actions to
queries
• Visualize – custom (Plotly Dash,
Streamlit etc) or embedded (PowerBI,
Looker etc)
External Apps
• Augmented Analytics - External
services to generate insights and
explain them
(e.g. Anomaly Detection with Anodot/
CrunchMetrics/outlier.ai)
• Customizable Reports and Dashboard
(e.g. Looker, Tableu, PowerBI, Sisense)
18
Summary
Data
at Rest
Data in
Motion Event/Dat
a Stream
Workflow Engine
Stream
Processor
Real-time
insights
Analytical
Storage
/Lake
Query
Engine
Model
Engine
Model
Accessible API
Application
Data Sources Ingestion&Transformation Storage Query&Processing Consume
19
Data at
Rest
Data in
Motion Event/Data
Stream
Workflow Engine
Stream
Processor
Real-time
insights
Analytical
Storage
/Lake
Query
Engine
Model
Engine
Model
Accessible API
Application
Data Sources Ingestion&Transformation Storage Query&Processing Consume
Thank You!
@tamir_dresher

More Related Content

What's hot

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Databricks
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
Gleb Kanterov
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
Databricks
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
DataScienceConferenc1
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
Eric Sun
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
Databricks
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
confluent
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 

What's hot (20)

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 

Similar to Anatomy of a data driven architecture - Tamir Dresher

Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
Zhenxiao Luo
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
Radu Tudoran
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
Databricks
 
[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL
WSO2
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
Sriskandarajah Suhothayan
 
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of GiantsMongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
Sungmin Kim
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark Summit
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Amazon Web Services LATAM
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream Processing
Jorge Hirtz
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
Cedric Vidal
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
DataWorks Summit
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 

Similar to Anatomy of a data driven architecture - Tamir Dresher (20)

Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
 
[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
 
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of GiantsMongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream Processing
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

More from Tamir Dresher

NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfNET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
Tamir Dresher
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher
 
Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher - What’s new in ASP.NET Core 6Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher
 
Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#
Tamir Dresher
 
Tamir Dresher Clarizen adventures with the wild GC during the holiday season
Tamir Dresher   Clarizen adventures with the wild GC during the holiday seasonTamir Dresher   Clarizen adventures with the wild GC during the holiday season
Tamir Dresher Clarizen adventures with the wild GC during the holiday season
Tamir Dresher
 
Debugging tricks you wish you knew Tamir Dresher - Odessa 2019
Debugging tricks you wish you knew   Tamir Dresher - Odessa 2019Debugging tricks you wish you knew   Tamir Dresher - Odessa 2019
Debugging tricks you wish you knew Tamir Dresher - Odessa 2019
Tamir Dresher
 
From zero to hero with the actor model - Tamir Dresher - Odessa 2019
From zero to hero with the actor model  - Tamir Dresher - Odessa 2019From zero to hero with the actor model  - Tamir Dresher - Odessa 2019
From zero to hero with the actor model - Tamir Dresher - Odessa 2019
Tamir Dresher
 
Tamir Dresher - Demystifying the Core of .NET Core
Tamir Dresher  - Demystifying the Core of .NET CoreTamir Dresher  - Demystifying the Core of .NET Core
Tamir Dresher - Demystifying the Core of .NET Core
Tamir Dresher
 
Breaking the monolith to microservice with Docker and Kubernetes (k8s)
Breaking the monolith to microservice with Docker and Kubernetes (k8s)Breaking the monolith to microservice with Docker and Kubernetes (k8s)
Breaking the monolith to microservice with Docker and Kubernetes (k8s)
Tamir Dresher
 
.Net december 2017 updates - Tamir Dresher
.Net december 2017 updates - Tamir Dresher.Net december 2017 updates - Tamir Dresher
.Net december 2017 updates - Tamir Dresher
Tamir Dresher
 
Testing time and concurrency Rx
Testing time and concurrency RxTesting time and concurrency Rx
Testing time and concurrency Rx
Tamir Dresher
 
Building responsive application with Rx - confoo - tamir dresher
Building responsive application with Rx - confoo - tamir dresherBuilding responsive application with Rx - confoo - tamir dresher
Building responsive application with Rx - confoo - tamir dresher
Tamir Dresher
 
.NET Debugging tricks you wish you knew tamir dresher
.NET Debugging tricks you wish you knew   tamir dresher.NET Debugging tricks you wish you knew   tamir dresher
.NET Debugging tricks you wish you knew tamir dresher
Tamir Dresher
 
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir DresherFrom Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
Tamir Dresher
 
Building responsive applications with Rx - CodeMash2017 - Tamir Dresher
Building responsive applications with Rx  - CodeMash2017 - Tamir DresherBuilding responsive applications with Rx  - CodeMash2017 - Tamir Dresher
Building responsive applications with Rx - CodeMash2017 - Tamir Dresher
Tamir Dresher
 
Debugging tricks you wish you knew - Tamir Dresher
Debugging tricks you wish you knew  - Tamir DresherDebugging tricks you wish you knew  - Tamir Dresher
Debugging tricks you wish you knew - Tamir Dresher
Tamir Dresher
 
Rx 101 - Tamir Dresher - Copenhagen .NET User Group
Rx 101  - Tamir Dresher - Copenhagen .NET User GroupRx 101  - Tamir Dresher - Copenhagen .NET User Group
Rx 101 - Tamir Dresher - Copenhagen .NET User Group
Tamir Dresher
 
Cloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir DresherCloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir Dresher
Tamir Dresher
 
Reactiveness All The Way - SW Architecture 2015 Conference
Reactiveness All The Way - SW Architecture 2015 ConferenceReactiveness All The Way - SW Architecture 2015 Conference
Reactiveness All The Way - SW Architecture 2015 Conference
Tamir Dresher
 
Rx 101 Codemotion Milan 2015 - Tamir Dresher
Rx 101   Codemotion Milan 2015 - Tamir DresherRx 101   Codemotion Milan 2015 - Tamir Dresher
Rx 101 Codemotion Milan 2015 - Tamir Dresher
Tamir Dresher
 

More from Tamir Dresher (20)

NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfNET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptx
 
Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher - What’s new in ASP.NET Core 6Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher - What’s new in ASP.NET Core 6
 
Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#
 
Tamir Dresher Clarizen adventures with the wild GC during the holiday season
Tamir Dresher   Clarizen adventures with the wild GC during the holiday seasonTamir Dresher   Clarizen adventures with the wild GC during the holiday season
Tamir Dresher Clarizen adventures with the wild GC during the holiday season
 
Debugging tricks you wish you knew Tamir Dresher - Odessa 2019
Debugging tricks you wish you knew   Tamir Dresher - Odessa 2019Debugging tricks you wish you knew   Tamir Dresher - Odessa 2019
Debugging tricks you wish you knew Tamir Dresher - Odessa 2019
 
From zero to hero with the actor model - Tamir Dresher - Odessa 2019
From zero to hero with the actor model  - Tamir Dresher - Odessa 2019From zero to hero with the actor model  - Tamir Dresher - Odessa 2019
From zero to hero with the actor model - Tamir Dresher - Odessa 2019
 
Tamir Dresher - Demystifying the Core of .NET Core
Tamir Dresher  - Demystifying the Core of .NET CoreTamir Dresher  - Demystifying the Core of .NET Core
Tamir Dresher - Demystifying the Core of .NET Core
 
Breaking the monolith to microservice with Docker and Kubernetes (k8s)
Breaking the monolith to microservice with Docker and Kubernetes (k8s)Breaking the monolith to microservice with Docker and Kubernetes (k8s)
Breaking the monolith to microservice with Docker and Kubernetes (k8s)
 
.Net december 2017 updates - Tamir Dresher
.Net december 2017 updates - Tamir Dresher.Net december 2017 updates - Tamir Dresher
.Net december 2017 updates - Tamir Dresher
 
Testing time and concurrency Rx
Testing time and concurrency RxTesting time and concurrency Rx
Testing time and concurrency Rx
 
Building responsive application with Rx - confoo - tamir dresher
Building responsive application with Rx - confoo - tamir dresherBuilding responsive application with Rx - confoo - tamir dresher
Building responsive application with Rx - confoo - tamir dresher
 
.NET Debugging tricks you wish you knew tamir dresher
.NET Debugging tricks you wish you knew   tamir dresher.NET Debugging tricks you wish you knew   tamir dresher
.NET Debugging tricks you wish you knew tamir dresher
 
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir DresherFrom Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
From Zero to the Actor Model (With Akka.Net) - CodeMash2017 - Tamir Dresher
 
Building responsive applications with Rx - CodeMash2017 - Tamir Dresher
Building responsive applications with Rx  - CodeMash2017 - Tamir DresherBuilding responsive applications with Rx  - CodeMash2017 - Tamir Dresher
Building responsive applications with Rx - CodeMash2017 - Tamir Dresher
 
Debugging tricks you wish you knew - Tamir Dresher
Debugging tricks you wish you knew  - Tamir DresherDebugging tricks you wish you knew  - Tamir Dresher
Debugging tricks you wish you knew - Tamir Dresher
 
Rx 101 - Tamir Dresher - Copenhagen .NET User Group
Rx 101  - Tamir Dresher - Copenhagen .NET User GroupRx 101  - Tamir Dresher - Copenhagen .NET User Group
Rx 101 - Tamir Dresher - Copenhagen .NET User Group
 
Cloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir DresherCloud patterns - NDC Oslo 2016 - Tamir Dresher
Cloud patterns - NDC Oslo 2016 - Tamir Dresher
 
Reactiveness All The Way - SW Architecture 2015 Conference
Reactiveness All The Way - SW Architecture 2015 ConferenceReactiveness All The Way - SW Architecture 2015 Conference
Reactiveness All The Way - SW Architecture 2015 Conference
 
Rx 101 Codemotion Milan 2015 - Tamir Dresher
Rx 101   Codemotion Milan 2015 - Tamir DresherRx 101   Codemotion Milan 2015 - Tamir Dresher
Rx 101 Codemotion Milan 2015 - Tamir Dresher
 

Recently uploaded

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 

Recently uploaded (20)

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 

Anatomy of a data driven architecture - Tamir Dresher

  • 1. Anatomy of a Data Driven Architecture @tamir_dresher System Architect @ Payoneer
  • 2. 2 System Architect @ @tamir_dresher Tamir Dresher My Books: Software Engineering Lecturer Ruppin Academic Center tamirdr@payoneer.com https://www.israelclouds.com/iasaisrael
  • 3. 3 The need for DATA Your Data Data-Driven Decision Making Data-powered product • What markets are leading and where can I expand? • What’s slowing my process? • Is there a correlation between the time invested in a sale and the income from the tenant? • What products should I recommend to this user? • Is this action fraudulent ? • Should I suggest a discount to this user to raise the chance for a purchase?
  • 4. 4 ETL vs. ELT vs. Streaming Transform LoadExtract Transactional DB (OLTP) Analytical DB (OLAP) LoadExtract Transactional DB (OLTP) Analytical DB (OLAP) / Storage Transform Events Event Stream Stream Processor Real-time insights
  • 5. 5 Lambda (λ) & Kappa (ϰ) Architectures Speed Layer Batch Layer Serving Layer Real-time views Batched views Query Data Lambda Speed/Streaming Layer Serving Layer Speed views QueryData Kappa Processing
  • 7. 7Cross Cutting and Ops The Data Pipeline Architecture Data Sources Ingestion & Transformation Storage Query & Processing Consumption (Visualization, Alerting, Insights) Discovery/Catalog/ Metadata Management Security Quality&Testing Observability
  • 8. 8 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Databases Files Events APIs • CDC - change data capture • Object Storage • Structured/ Unstructured • Customer tracking • WebHooks • Domain Events • External Services • Enrichment
  • 9. 9 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitchdata, rivery Data Sources
  • 10. 10 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitch
  • 11. 11 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitchdata, rivery Data Sources Customized batching/micro-batching • Spark jobs, Data libraries (Pandas, boto), Hive • Workflows – Airflow, Dagster, Luigi
  • 12. 12 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitchdata, rivery Data Sources Customized batching/micro-batching • Spark jobs, Data libraries (panda, boto), Hive • Workflows – Airflow, Dagster, Luigi https://towardsdatascience.com/building-a-production-level-etl-pipeline-platform-using-apache-airflow-a4cf34203fbd
  • 13. 13 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitchdata, rivery Data Sources Customized batching/micro-batching • Spark jobs, Data libraries (panda, boto), Hive • Workflows – Airflow, Dagster, Luigi Event streaming and processing • Messaging - Kafka, Pulsar, Kinesis, Event Hub • Processing – Spark, Flink, Samza, Kafka Streams, Azure Stream Analytics
  • 14. 14 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Integration Platforms • Connectors to sources and destinations • E.g. fivtran, stitchdata, rivery Data Sources Customized batching/micro-batching • Spark jobs, Data libraries (panda, boto), Hive • Workflows – Airflow, Dagster, Luigi Event streaming and processing • Messaging - Kafka, Pulsar, Kinesis, Event Hub • Processing – Spark, Flink, Samza, Kafka Streams, Azure Stream Analytics -- Continuously aggregating a stream into a table with a ksqlDB push query. CREATE STREAM locationUpdatesStream ...; CREATE TABLE locationsPerUser AS SELECT username, COUNT(*) FROM locationUpdatesStream GROUP BY username EMIT CHANGES; // Continuously aggregating to table KStream<String, String> locationUpdatesStream = ...; KTable<String, Long> locationsPerUser = locationUpdatesStream .groupBy((k, v) -> v.username) .count(); https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming/
  • 15. 15 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Data Warehouse • Structured format • Designed to quickly generate insights based on SQL like queries • Modern cloud base offerings – Redshift, BigQuery, Snowflake, Azure Synapse Data Lake • Structured and non-structured data – CSV, Parquet, Images, Audio • Raw and historical data • Designed to be used by data scientists and create models by various languages
  • 16. 16 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Predictive • Generate model • Data Science and ML libraries – Pandas, Numpy, R, scikit etc • The model is periodically refreshed Storage Retrospective (Historical) • Deriving Intelligence based on statistics • Built-in engine (Data Warehouse) OR • Query Engines – Presto, Impala Real Time Analytics • Run analytical queries over big volumes of data with interactive latencies. • Apache Pinot, Clickhouse, Druid Data Science Platforms • Helps managing the workflows, productization and operations - SageMaker, Iguazio, DataBricks etc.
  • 17. 17 Data Sources Ingestion & Transformation Storage Query & Processing Consumption Custom Apps • Execute the model – how is the model reachable? • Translate user/system actions to queries • Visualize – custom (Plotly Dash, Streamlit etc) or embedded (PowerBI, Looker etc) External Apps • Augmented Analytics - External services to generate insights and explain them (e.g. Anomaly Detection with Anodot/ CrunchMetrics/outlier.ai) • Customizable Reports and Dashboard (e.g. Looker, Tableu, PowerBI, Sisense)
  • 18. 18 Summary Data at Rest Data in Motion Event/Dat a Stream Workflow Engine Stream Processor Real-time insights Analytical Storage /Lake Query Engine Model Engine Model Accessible API Application Data Sources Ingestion&Transformation Storage Query&Processing Consume
  • 19. 19 Data at Rest Data in Motion Event/Data Stream Workflow Engine Stream Processor Real-time insights Analytical Storage /Lake Query Engine Model Engine Model Accessible API Application Data Sources Ingestion&Transformation Storage Query&Processing Consume Thank You! @tamir_dresher

Editor's Notes

  1. Data comes from many sources and feed our application We need it for the OLTP obviously but being the new gold we can use the data feed to and historical data to gain more insights BI ML/AI data-driven decision making (analytic systems) and drive data-powered products
  2. Traditionally , orgs moved the data from the OLTP to the OLAP (DWH) with ETL Modern architectures now relies on ELT for batched load And to get the best real time response streaming is the way to go
  3. Traditionally , orgs moved the data from the OLTP to the OLAP (DWH) with ETL Modern architectures now relies on ELT for batched load And to get the best real time response streaming is the way to go
  4. https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/how-to-build-a-data-architecture-to-drive-innovation-today-and-tomorrow#
  5. https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7