Functional Data Engineering.pdf

Functional Data Engineering
- A Blueprint for adopting functional principles in data pipeline
Ananth Packkildurai

Slack
Data
Engineer
Zendesk
Principal Data
Engineer
Creator
Schemata -
Data Contract
Platform
Author
Data
Engineering
Weekly

Key Principles of
Functional Data
Engineering
Reproducibility
Re-Computability
1
2

The Modern Data Cloud =
LakeHouse & Warehouse
State of the Data 2023
Separation of storage and compute
Unlimited scale data repository
ACID transaction and mutation support

Warehouse
LakeHouse
CREATE TABLE dw.user (
user_id BIGINT, user_name STRING, created_at DATE
) PARTITION BY (ds STRING)
# ds = date timestamp of the snapshot
s3://dw/user/2022-12-20/<all users data at the time of
snapshot>
s3://dw/user/2022-12-21/<all users data at the time of
snapshot>
DateTime Partition Table Design

Entity Modeling
Incremental Snapshot
Full Snapshot
1
2

Entity Modeling
CREATE
OR REPLACE VIEW dw.user_latest
AS
SELECT
user_id,
user_name,
created_at,
ds
FROM
dw.user
WHERE
ds =< current DateTime
partition >;

Key Challenges
Late Arriving Data
Data Deletion
1
2

Hour T1 Data Hour T2 Data Hour T3 Data
Hour T1 Data
Hour T2 Data
Hour T3 Data
Hour T1 Data
Hour T2 Data
Tumbling Window
Hour T1 Pipeline Hour T2 Pipeline
Hour T3 Pipeline
Sliding Window
Apply Window Functions

Hour T1 Data Window Time
Hour T1 pipeline starts
Apply Watermark
Adopt Reconciliation
Hour T1 pipeline Hour T2 pipeline Hour T3 pipeline
Reconciliation pipeline

Choose your
Conﬁdence Window of
Correctness

Data Deletion
Reprocessing
Deletion Audit Log
1
2

https://schemata.app
https://www.linkedin.com/in/ananthdurai
ananth@dataengineeringweekly.com

Functional Data Engineering.pdf

More Related Content

Similar to Functional Data Engineering.pdf

More from Ananth PackkilDurai

Recently uploaded

Functional Data Engineering.pdf