SlideShare a Scribd company logo
1 of 18
Download to read offline
Implementing a robust CI workflow with dbt
for optimizing data warehouse quality
Maeo Molteni
Data Engineer, Data Enablement
maeo.molteni@kognic.com
About Kognic
● Mid-size company (~100 employees)
● Mature operational data
● Average-sized data (Tb/bn)
● Goal: enable analytics at scale by creating a collaborative data
platform
Goal and constraints
● Creation of a collaborative data platform:
○ Fast to deploy, easy to maintain
○ Limit budget, favour open-source solutions
○ Team of ~2 data engineers
Existing setup
● GCP + Kubernetes
● Well-established plateng team for facilitating deployments
Some context
● Minimize the number of tools used
● Extract/Load via dockerized python app
● Pipelines only for ingestion (no transformations)
● All the Transformations are managed via dbt
○ dbt tweaked into a catalog and data governance tool
Our ELT architecture
● dbt (data build tool) is the “Transformation” in ELT
● No compute engine, it piggybacks on your warehouse
● Version control everything in the warehouse
● Each transformation/model is a *.sql file. dbt:
○ compiles templates into plain SQL
○ resolves dependencies
○ executes sequentially the code in your warehouse
dbt for orchestrating data transformation
● Seamless integration with Airflow for scheduling
○ DAGs are dynamically generated (parse json or use Astro SDK)
○ do not use bash operator!
● Latest dbt manifest kept in sync with Airflow
● Multiple targets (test/dev/production)
● Modular dag -> easy to route alerts to model owners (e.g. based on
some metadata)
dbt - how it integrates with Airflow
● Base code version controlled in GIT
● Same approach as in software development, SQLis treated as
code:
○ Version control
○ Linting
○ Testing, testing, testing
○ PR-review
dbt - local development - sql as code
● Minimize boilerplate with dbt codegen
● Automated generation of boilerplate for documentation for sources
● Automated generation of staging model
● Automated propagation of documentation to downstream models
dbt - local development - lower the threshold for developers
● sqlflu is a python based linting for SQL
● New major 2.0 release March 2023
● It comes with an optional dbt-templater
● Fully customizable:
○ Define a set of rules
○ “sqlflu fix” will auto-format most of the code
○ Enforced as first step in CI workflow
● Result: a common standard for all your sql files
dbt CI - Lint your SQL
● dbt provides packages for testing both the data models and the
metadata
● Tests as a one-liner in *.yaml
○ dbt-core: basic tests
○ dbt-expectations (~ assertions in unit testing)
○ dbt-meta-testing (metadata/project)
● Having “some test coverage” is a requirement enforced in the CI
pipeline (dbt run-operation required_tests) for critical data
dbt CI - Test your models
● Documentation kept together with the *.sql code
● Docs compiled into static HTMLcontent at merge.
○ Hosted on dbt.kognic.io
○ Catalog with metadata and basic profiling
○ Data lineage, columns description, and ownership
○ Compiled/uncompiled code generating the table(s)
○ Data discovery functionality
● Having complete documentation is a requirement enforced in the CI pipeline (dbt
run-operation required_docs)
dbt CI - Documentation, lineage, and data discovery
Typical issues faced by the data platform team when developing
and maintaining the platform:
- Seing up a test/dev environment
- Handling conflicts when developing
- Monolithic architecture
- Cost control
- Handling changes
- Unwanted breaking changes
- Rollout of major changes
CD and the maintainer experience
Tables in dbt production environment can only be materialized
by Airflow jobs, users do not have write access to prod
A staging environment is used for everything else:
- Local runs
- CI workflows
- Central config with dynamic datasets based on current
branch name (CI) or user name (local):
- Compartmentalised dev environments
- Separation of concerns
- Enforced lifecycle rules for keeping staging clean and
avoiding the creation of long term dependencies
dbt CD - production & staging project
The main risk of having one environment per branch
and user is the cost skyrocketing, since would
out-of-the-box materialise the model and all the
upstream dependencies in their entirety.
dbt CD - defer and filter for reducing data consumed
Two obvious ways to keep costs
under control:
- Macro for reducing the
volume
of data used in staging
(opt-in) based on partitions
- ----defer upstream to
production environment
(opt-out)
Typical issues faced by the data platform team when developing
and maintaining the platform:
- Seing up a test/dev environment
- Handling conflicts when developing
- Monolithic architecture
- Cost control
- Handling changes
- Unwanted breaking changes
- Rollout of major changes
CD and the maintainer experience
Tests can capture unwanted
behaviour after the model is
created, which sometimes is too late
Model exposed to a broader
audience, or to an exposure can have
an extra layer of controls by mean of
a contract.
Contracts are enforced before the
model is materialised, and defines
the “shape” of the model, not the
business logic therein
dbt CD - contracts
Breaking changes are sometimes needed and in a
collaborative environment it’s crucial having a smooth
transition.
dbt CD - versions for smooth rollout of changes
model versioning:
- Base model defined in *.yml
- One *.sql file per model - only di in
the *yml file
- A view on top of versioned models
pointing at “the latest”
- Older version can keep on existing
until when no downstream
dependencies exists
- Challenge: need to ensure we
eventually have one version of each
model
● “If it is not in git, it does not exist”
● User experience:
○ codegen for reducing boilerplate
○ lint your sql (e.g. with sqlflu)
○ test metadata before merging
■ complete catalog for data discovery “for free”
● Maintainer experience:
○ Separate concerns by having distinct environments between
dev and prod, and within dev
○ Reduce costs with ----defer and partitions pruning in dev
○ Prevent or control changes:
■ model versions for controlling major changes
■ contracts for preventing changes to “exposed” data
Some key learnings
Thanks for listening!
Maeo Molteni
Data Engineer, Data Enablement
maeo.molteni@kognic.com

More Related Content

Similar to [DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt for Optimizing Data Warehouse Quality

Developing, testing and distributing elasticsearch beats in a complex, heter...
Developing, testing and distributing elasticsearch beats in  a complex, heter...Developing, testing and distributing elasticsearch beats in  a complex, heter...
Developing, testing and distributing elasticsearch beats in a complex, heter...Jesper Agerled Wermuth
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackMichel Tricot
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupKaxil Naik
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupYashrajNayak4
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
Kubernetes in a grown environment and integration into continuous delivery
Kubernetes in a grown environment and integration into continuous deliveryKubernetes in a grown environment and integration into continuous delivery
Kubernetes in a grown environment and integration into continuous deliveryTEC Campus
 
(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol PerformanceBIOVIA
 
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CDMulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CDGonzalo Marcos Ansoain
 
2018 08-13-ib ms-latest-buzz-share-final
2018 08-13-ib ms-latest-buzz-share-final2018 08-13-ib ms-latest-buzz-share-final
2018 08-13-ib ms-latest-buzz-share-finalDavid Morlitz
 
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...OW2
 
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDoku
 
Continuous Integration for Oracle Database Development
Continuous Integration for Oracle Database DevelopmentContinuous Integration for Oracle Database Development
Continuous Integration for Oracle Database DevelopmentVladimir Bakhov
 
DWX 2023 - Datenbank-Schema Deployment im Kubernetes Release
DWX 2023 - Datenbank-Schema Deployment im Kubernetes ReleaseDWX 2023 - Datenbank-Schema Deployment im Kubernetes Release
DWX 2023 - Datenbank-Schema Deployment im Kubernetes ReleaseMarc Müller
 
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)Weaveworks
 
CMI 2.0 session at Drupal DevDays in Cluj-Napoca
CMI 2.0 session at Drupal DevDays in Cluj-NapocaCMI 2.0 session at Drupal DevDays in Cluj-Napoca
CMI 2.0 session at Drupal DevDays in Cluj-NapocaNuvole
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...Ambassador Labs
 

Similar to [DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt for Optimizing Data Warehouse Quality (20)

Developing, testing and distributing elasticsearch beats in a complex, heter...
Developing, testing and distributing elasticsearch beats in  a complex, heter...Developing, testing and distributing elasticsearch beats in  a complex, heter...
Developing, testing and distributing elasticsearch beats in a complex, heter...
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Kubernetes in a grown environment and integration into continuous delivery
Kubernetes in a grown environment and integration into continuous deliveryKubernetes in a grown environment and integration into continuous delivery
Kubernetes in a grown environment and integration into continuous delivery
 
(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance
 
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CDMulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
 
2018 08-13-ib ms-latest-buzz-share-final
2018 08-13-ib ms-latest-buzz-share-final2018 08-13-ib ms-latest-buzz-share-final
2018 08-13-ib ms-latest-buzz-share-final
 
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
 
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winner
 
Continuous Integration for Oracle Database Development
Continuous Integration for Oracle Database DevelopmentContinuous Integration for Oracle Database Development
Continuous Integration for Oracle Database Development
 
DWX 2023 - Datenbank-Schema Deployment im Kubernetes Release
DWX 2023 - Datenbank-Schema Deployment im Kubernetes ReleaseDWX 2023 - Datenbank-Schema Deployment im Kubernetes Release
DWX 2023 - Datenbank-Schema Deployment im Kubernetes Release
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
 
CMI 2.0 session at Drupal DevDays in Cluj-Napoca
CMI 2.0 session at Drupal DevDays in Cluj-NapocaCMI 2.0 session at Drupal DevDays in Cluj-Napoca
CMI 2.0 session at Drupal DevDays in Cluj-Napoca
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
Introduction to git & github
Introduction to git & githubIntroduction to git & github
Introduction to git & github
 
OpenCms Days 2015 Workflow using Docker and Jenkins
OpenCms Days 2015 Workflow using Docker and JenkinsOpenCms Days 2015 Workflow using Docker and Jenkins
OpenCms Days 2015 Workflow using Docker and Jenkins
 
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
Montreal Kubernetes Meetup: Developer-first workflows (for microservices) on ...
 

More from DataScienceConferenc1

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdfDataScienceConferenc1
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...DataScienceConferenc1
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdfDataScienceConferenc1
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdfDataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdfDataScienceConferenc1
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptxDataScienceConferenc1
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdfDataScienceConferenc1
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdfDataScienceConferenc1
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...DataScienceConferenc1
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdfDataScienceConferenc1
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptxDataScienceConferenc1
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...DataScienceConferenc1
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptxDataScienceConferenc1
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...DataScienceConferenc1
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...DataScienceConferenc1
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptxDataScienceConferenc1
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptxDataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdfDataScienceConferenc1
 

More from DataScienceConferenc1 (20)

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 

[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt for Optimizing Data Warehouse Quality

  • 1. Implementing a robust CI workflow with dbt for optimizing data warehouse quality Maeo Molteni Data Engineer, Data Enablement maeo.molteni@kognic.com
  • 2. About Kognic ● Mid-size company (~100 employees) ● Mature operational data ● Average-sized data (Tb/bn) ● Goal: enable analytics at scale by creating a collaborative data platform Goal and constraints ● Creation of a collaborative data platform: ○ Fast to deploy, easy to maintain ○ Limit budget, favour open-source solutions ○ Team of ~2 data engineers Existing setup ● GCP + Kubernetes ● Well-established plateng team for facilitating deployments Some context
  • 3. ● Minimize the number of tools used ● Extract/Load via dockerized python app ● Pipelines only for ingestion (no transformations) ● All the Transformations are managed via dbt ○ dbt tweaked into a catalog and data governance tool Our ELT architecture
  • 4. ● dbt (data build tool) is the “Transformation” in ELT ● No compute engine, it piggybacks on your warehouse ● Version control everything in the warehouse ● Each transformation/model is a *.sql file. dbt: ○ compiles templates into plain SQL ○ resolves dependencies ○ executes sequentially the code in your warehouse dbt for orchestrating data transformation
  • 5. ● Seamless integration with Airflow for scheduling ○ DAGs are dynamically generated (parse json or use Astro SDK) ○ do not use bash operator! ● Latest dbt manifest kept in sync with Airflow ● Multiple targets (test/dev/production) ● Modular dag -> easy to route alerts to model owners (e.g. based on some metadata) dbt - how it integrates with Airflow
  • 6. ● Base code version controlled in GIT ● Same approach as in software development, SQLis treated as code: ○ Version control ○ Linting ○ Testing, testing, testing ○ PR-review dbt - local development - sql as code
  • 7. ● Minimize boilerplate with dbt codegen ● Automated generation of boilerplate for documentation for sources ● Automated generation of staging model ● Automated propagation of documentation to downstream models dbt - local development - lower the threshold for developers
  • 8. ● sqlflu is a python based linting for SQL ● New major 2.0 release March 2023 ● It comes with an optional dbt-templater ● Fully customizable: ○ Define a set of rules ○ “sqlflu fix” will auto-format most of the code ○ Enforced as first step in CI workflow ● Result: a common standard for all your sql files dbt CI - Lint your SQL
  • 9. ● dbt provides packages for testing both the data models and the metadata ● Tests as a one-liner in *.yaml ○ dbt-core: basic tests ○ dbt-expectations (~ assertions in unit testing) ○ dbt-meta-testing (metadata/project) ● Having “some test coverage” is a requirement enforced in the CI pipeline (dbt run-operation required_tests) for critical data dbt CI - Test your models
  • 10. ● Documentation kept together with the *.sql code ● Docs compiled into static HTMLcontent at merge. ○ Hosted on dbt.kognic.io ○ Catalog with metadata and basic profiling ○ Data lineage, columns description, and ownership ○ Compiled/uncompiled code generating the table(s) ○ Data discovery functionality ● Having complete documentation is a requirement enforced in the CI pipeline (dbt run-operation required_docs) dbt CI - Documentation, lineage, and data discovery
  • 11. Typical issues faced by the data platform team when developing and maintaining the platform: - Seing up a test/dev environment - Handling conflicts when developing - Monolithic architecture - Cost control - Handling changes - Unwanted breaking changes - Rollout of major changes CD and the maintainer experience
  • 12. Tables in dbt production environment can only be materialized by Airflow jobs, users do not have write access to prod A staging environment is used for everything else: - Local runs - CI workflows - Central config with dynamic datasets based on current branch name (CI) or user name (local): - Compartmentalised dev environments - Separation of concerns - Enforced lifecycle rules for keeping staging clean and avoiding the creation of long term dependencies dbt CD - production & staging project
  • 13. The main risk of having one environment per branch and user is the cost skyrocketing, since would out-of-the-box materialise the model and all the upstream dependencies in their entirety. dbt CD - defer and filter for reducing data consumed Two obvious ways to keep costs under control: - Macro for reducing the volume of data used in staging (opt-in) based on partitions - ----defer upstream to production environment (opt-out)
  • 14. Typical issues faced by the data platform team when developing and maintaining the platform: - Seing up a test/dev environment - Handling conflicts when developing - Monolithic architecture - Cost control - Handling changes - Unwanted breaking changes - Rollout of major changes CD and the maintainer experience
  • 15. Tests can capture unwanted behaviour after the model is created, which sometimes is too late Model exposed to a broader audience, or to an exposure can have an extra layer of controls by mean of a contract. Contracts are enforced before the model is materialised, and defines the “shape” of the model, not the business logic therein dbt CD - contracts
  • 16. Breaking changes are sometimes needed and in a collaborative environment it’s crucial having a smooth transition. dbt CD - versions for smooth rollout of changes model versioning: - Base model defined in *.yml - One *.sql file per model - only di in the *yml file - A view on top of versioned models pointing at “the latest” - Older version can keep on existing until when no downstream dependencies exists - Challenge: need to ensure we eventually have one version of each model
  • 17. ● “If it is not in git, it does not exist” ● User experience: ○ codegen for reducing boilerplate ○ lint your sql (e.g. with sqlflu) ○ test metadata before merging ■ complete catalog for data discovery “for free” ● Maintainer experience: ○ Separate concerns by having distinct environments between dev and prod, and within dev ○ Reduce costs with ----defer and partitions pruning in dev ○ Prevent or control changes: ■ model versions for controlling major changes ■ contracts for preventing changes to “exposed” data Some key learnings
  • 18. Thanks for listening! Maeo Molteni Data Engineer, Data Enablement maeo.molteni@kognic.com