SlideShare a Scribd company logo
1 of 44
Download to read offline
Data Contracts
Consensus as Code
Ryan Collingwood
2023-08-18
Who am I and my current context
• Ryan Collingwood, Head of Data & Analytics at Oroton
• Australia’s oldest luxury fashion company
• Centralised Data Team
• Monoliths (ERP & POS) surrounded by number of SaaS
• Data is mostly moved in batch
Why I think you might care about this
Responsibility in the
modern data stack
Andrew Jones -
Driving Data Quality with
Data Contracts (2023)
Shout out to Andrew Jones
https://data-contracts.com/
Similar, Related, and Complementary Concepts
APIs Data
Dictionaries
Data Mesh Event Storming
I’d be curious to know what else you might add to this list
Data Catalogs
Domain Driven
Design
Advice is a form of nostalgia. Dispensing it is a way
of fishing the past from the disposal, wiping it off,
painting over the ugly parts and recycling it for
more than it's worth
Mary Schmich
https://www.chicagotribune.com/columns/chi-schmich-sunscreen-column-column.htm
“If I could offer you only one tip for the future, sunscreen would be it.”
What are Data
Contracts?
... outlines how data can get exchanged between two parties.
It defines the structure, format, and rules of exchange in a
distributed data architecture. These formal agreements make
sure that there aren’t any uncertainties or undocumented
assumptions about data.
https://atlan.com/data-contracts/
... is an agreed interface between the generators of data and
its consumers. It sets the expectations around that data,
defines how it should be governed, and facilitates the explicit
generation of quality data that meets the business
requirements.
Andrew Jones - Driving Data Quality with Data Contracts (2023)
Data Producers and Data Consumers
Team A Team B
Team C
You can be a Data Producer without knowing about it
Non-consensual API
Team C
��
Broken pipelines, broken non-promises
Non-consensual API
Non-consensual API
Non-consensual API
🧰󰠼
❌
Team A
Team C
��
Team B
One of the largest impediments to addressing data quality at any organization is the
lack of collaboration between data producers and data consumers.
...
A common workaround (is the) proliferation of non-consensual APIs.
Can’t get a software engineer to emit the data you need to solve some business
problem?
Connect your ELT tool to a production source and extract a batch dump on a
schedule.
Easy
(Until things start breaking…whoops).
Chad Sanderson - https://dataproducts.substack.com/p/the-production-grade-data-pipeline
What makes up a Data Contract
https://github.com/PacktPublishing/Driving-Data-Quality-with-Data-Contracts/blob/main/Chapter03/order_events.yaml
However, data contracts are more than just a
schema... we need our data contracts to capture
metadata that describes how the data can be used,
how it is governed, and the controls around the data
Driving Data Quality with Data Contracts - Andrew Jones (2023)
What makes up a Data Contract
Schema
Contract
Governance
Semantics
Service Level
Objectives
Dataset
Governance
Mechanisms of
Transmission
People
Schema versus Semantics
Schema Semantics
Systems interoperability Human Expectations
Support for Implicit Validation
by Database Technologies
Tends to require Explicit
Validation by complimentary
solutions
Ensuring we capture and
retrieve the data consistently
Ensuring we interpret the data
consistently
Dates / times, monetary values - are a trap if considered only as schema.
What are your “schema” but “secretly semantic” situations?
Minimum Viable Data Contract Tooling
Andrew Jones - Driving Data Quality with Data Contracts (2023)
Operate
Meta-Data Powered Tooling
Andrew Jones -
Driving Data Quality
with Data Contracts
(2023)
Data Quality Checks
Andrew Jones -
Driving Data Quality
with Data Contracts
(2023)
Data Contract Tooling - My Context
Data Contract Tooling - My Context
Producer
Boundaries
Semantics
Schema &
SLOs
Checks
and Tests
Semantics
Schema &
SLOs
Checks
and Tests
Semantics
Ok so how are
we going to
make this all
happen?
Awesome humans who
understand models,
abstractions, constraints
You could even do it in
✨code ✨
... and you should definitely
version control it
Why Code? Why not Text?
● Entanglement of meaning and representation
● Finding References instead of text matches
● Enforcement of structure
● Refactoring
● Testable constraints
● More options for document generation
○ Including JSON and yaml
Although... I’ve been having a blast using Logseq (a graph like outliner) and
I might be crazy enough to give that a go as an IDE for this
“Refactoring” Text
Expectation Reality
https://xkcd.com/208/
Scope &
Allies
Constraints
& Guiding
Principles
People
and
Process
Centric
Contract
Meta
Schema
Maximise
Contribution
Opportunities
What was considered
Guiding
Principles
● Primary Objective: Consensus
● Evolution
● Quick Feedback
● First Outcome: Data Tests
Creating a Meta
Model
● Focused around Events
● From UI to DB
● Schema and Semantics
● People
... still figuring it out
Don’t have to do it all at once!
The optimistic path to capturing and generating contracts
The Event Capture spreadsheet
Who’s Going to Do The Work?
Andrew Jones - Driving Data Quality with Data Contracts (2023)
Probably
these people
Hopefully
these people
Why Python? ● Gradual Typing*
● Static Analysis
● Well understood within the team
Helpful Python
Libraries
● Pandas
● Pydantic
● Rope
● Pytest
● Mypy
● Black
Refactoring, doing variable extraction with Rope
https://colab.research.google.com/drive/1fHLit3hF2G0dFV0Xl11jnovcdPR87s-E
Refactoring, doing variable extraction with Rope
https://colab.research.google.com/drive/1fHLit3hF2G0dFV0Xl11jnovcdPR87s-E
Code Refactoring - Other Libraries
• https://pybowler.io/ - doesn't have variable extraction and not much
development activity in the last while
• https://github.com/hchasestevens/astpath - useful for finding parts of the AST
but then I'm not sure how to proceed with it, seems to be powering a number
of meta-programming libs though
• traad - https://av.tib.eu/en/media/19947
Further explorations for wrangling generated code
• Abstract Syntax Tree - Options for querying
• Linting - Define my own rules to as they apply to the meta
schema
• Code duplication detection
• Network (Graph) Analysis
linkedin.com/in/ryancollingwood
mastodon.social/@ryancollingwood
twitter.com/ryancollingwood
www.meetup.com/en-AU/data-engineering-melbourne
• You can be a Data Producer without knowing about it, make it
worthwhile for Consumers to “register” with you
• You can do this through having a contract which provides clarity and
can be used to power tooling and generate artefacts
• Code is easier to refactor, find references, and generally maintain than
the alternatives
Key Takeaways
My References
• Andrew Jones - Driving Data Quality with Data Contracts (2023) - ISBN 13 978-1837635009
• Data Contracts: The Key to Scaling Distributed Data Architecture and Reducing Data Chaos -
https://atlan.com/data-contracts/
• Chad Sanderson - The Production-Grade Data Pipeline -
https://dataproducts.substack.com/p/the-production-grade-data-pipeline
• Chad Sanderson and Adrian Kreuziger - An Engineers Guide to Data Contracts -
https://mlops.community/an-engineers-guide-to-data-contracts-pt-1/
• Green Tree Snakes the missing Python AST docs - https://greentreesnakes.readthedocs.io/en/latest/
• Rope - Refactoring Variable Extraction -
https://rope.readthedocs.io/en/latest/library.html#performing-refactorings
Questions?
linkedin.com/in/ryancollingwood
mastodon.social/@ryancollingwood
twitter.com/ryancollingwood
www.meetup.com/en-AU/data-engineering-melbourne

More Related Content

What's hot

Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
한국투자증권의 디지털 플랫폼 구현 사례.pdf
한국투자증권의 디지털 플랫폼 구현 사례.pdf한국투자증권의 디지털 플랫폼 구현 사례.pdf
한국투자증권의 디지털 플랫폼 구현 사례.pdfAWS Korea 금융산업팀
 
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container DayECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container DayAmazon Web Services Korea
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetJeno Yamma
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksAmazon Web Services
 
Building the Business Case for AWS
Building the Business Case for AWSBuilding the Business Case for AWS
Building the Business Case for AWSAmazon Web Services
 
Large Scale Geospatial Indexing and Analysis on Apache Spark
Large Scale Geospatial Indexing and Analysis on Apache SparkLarge Scale Geospatial Indexing and Analysis on Apache Spark
Large Scale Geospatial Indexing and Analysis on Apache SparkDatabricks
 
Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)Jatinder Randhawa
 
MongoDB performance
MongoDB performanceMongoDB performance
MongoDB performanceMydbops
 
ABCs of AWS: S3
ABCs of AWS: S3ABCs of AWS: S3
ABCs of AWS: S3Mark Cohen
 
CAF intro Hosters modern
CAF intro Hosters modernCAF intro Hosters modern
CAF intro Hosters modernssuserdb85d71
 
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into productionDataWorks Summit
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowDatabricks
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryDavid Giard
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQueryRyuji Tamagawa
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
 

What's hot (20)

Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
한국투자증권의 디지털 플랫폼 구현 사례.pdf
한국투자증권의 디지털 플랫폼 구현 사례.pdf한국투자증권의 디지털 플랫폼 구현 사례.pdf
한국투자증권의 디지털 플랫폼 구현 사례.pdf
 
Vector database
Vector databaseVector database
Vector database
 
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container DayECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
ECS & ECR Deep Dive - 김기완 솔루션즈 아키텍트 :: AWS Container Day
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
 
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech TalksHow to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
How to Build a Data Lake in Amazon S3 & Amazon Glacier - AWS Online Tech Talks
 
Building the Business Case for AWS
Building the Business Case for AWSBuilding the Business Case for AWS
Building the Business Case for AWS
 
Large Scale Geospatial Indexing and Analysis on Apache Spark
Large Scale Geospatial Indexing and Analysis on Apache SparkLarge Scale Geospatial Indexing and Analysis on Apache Spark
Large Scale Geospatial Indexing and Analysis on Apache Spark
 
Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)Aws overview (Amazon Web Services)
Aws overview (Amazon Web Services)
 
MongoDB performance
MongoDB performanceMongoDB performance
MongoDB performance
 
ABCs of AWS: S3
ABCs of AWS: S3ABCs of AWS: S3
ABCs of AWS: S3
 
CAF intro Hosters modern
CAF intro Hosters modernCAF intro Hosters modern
CAF intro Hosters modern
 
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into production
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflow
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 

Similar to Data Contracts as Code

BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...Thomas Rones
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyNeo4j
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Codemotion
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Tableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of ThoughtTableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of ThoughtMongoDB
 
DataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationDataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationpzjnjr6rsg
 
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementData APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementVictor Olex
 
What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approachesAparna Sharma
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataAndy Stretton
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Debmalya Biswas
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeDavid Linthicum
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists CCG
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)Jeremy Cabral
 

Similar to Data Contracts as Code (20)

BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Tableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of ThoughtTableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of Thought
 
DataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationDataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestration
 
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementData APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of Engagement
 
What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approaches
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First Time
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 

Data Contracts as Code

  • 1. Data Contracts Consensus as Code Ryan Collingwood 2023-08-18
  • 2. Who am I and my current context • Ryan Collingwood, Head of Data & Analytics at Oroton • Australia’s oldest luxury fashion company • Centralised Data Team • Monoliths (ERP & POS) surrounded by number of SaaS • Data is mostly moved in batch
  • 3. Why I think you might care about this Responsibility in the modern data stack Andrew Jones - Driving Data Quality with Data Contracts (2023)
  • 4. Shout out to Andrew Jones https://data-contracts.com/
  • 5. Similar, Related, and Complementary Concepts APIs Data Dictionaries Data Mesh Event Storming I’d be curious to know what else you might add to this list Data Catalogs Domain Driven Design
  • 6. Advice is a form of nostalgia. Dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than it's worth Mary Schmich https://www.chicagotribune.com/columns/chi-schmich-sunscreen-column-column.htm “If I could offer you only one tip for the future, sunscreen would be it.”
  • 8. ... outlines how data can get exchanged between two parties. It defines the structure, format, and rules of exchange in a distributed data architecture. These formal agreements make sure that there aren’t any uncertainties or undocumented assumptions about data. https://atlan.com/data-contracts/ ... is an agreed interface between the generators of data and its consumers. It sets the expectations around that data, defines how it should be governed, and facilitates the explicit generation of quality data that meets the business requirements. Andrew Jones - Driving Data Quality with Data Contracts (2023)
  • 9. Data Producers and Data Consumers Team A Team B Team C
  • 10. You can be a Data Producer without knowing about it Non-consensual API Team C ��
  • 11. Broken pipelines, broken non-promises Non-consensual API Non-consensual API Non-consensual API 🧰󰠼 ❌ Team A Team C �� Team B
  • 12. One of the largest impediments to addressing data quality at any organization is the lack of collaboration between data producers and data consumers. ... A common workaround (is the) proliferation of non-consensual APIs. Can’t get a software engineer to emit the data you need to solve some business problem? Connect your ELT tool to a production source and extract a batch dump on a schedule. Easy (Until things start breaking…whoops). Chad Sanderson - https://dataproducts.substack.com/p/the-production-grade-data-pipeline
  • 13. What makes up a Data Contract https://github.com/PacktPublishing/Driving-Data-Quality-with-Data-Contracts/blob/main/Chapter03/order_events.yaml
  • 14. However, data contracts are more than just a schema... we need our data contracts to capture metadata that describes how the data can be used, how it is governed, and the controls around the data Driving Data Quality with Data Contracts - Andrew Jones (2023)
  • 15. What makes up a Data Contract Schema Contract Governance Semantics Service Level Objectives Dataset Governance Mechanisms of Transmission People
  • 16. Schema versus Semantics Schema Semantics Systems interoperability Human Expectations Support for Implicit Validation by Database Technologies Tends to require Explicit Validation by complimentary solutions Ensuring we capture and retrieve the data consistently Ensuring we interpret the data consistently Dates / times, monetary values - are a trap if considered only as schema. What are your “schema” but “secretly semantic” situations?
  • 17. Minimum Viable Data Contract Tooling Andrew Jones - Driving Data Quality with Data Contracts (2023) Operate
  • 18. Meta-Data Powered Tooling Andrew Jones - Driving Data Quality with Data Contracts (2023)
  • 19. Data Quality Checks Andrew Jones - Driving Data Quality with Data Contracts (2023)
  • 20. Data Contract Tooling - My Context
  • 21. Data Contract Tooling - My Context Producer Boundaries
  • 24. Ok so how are we going to make this all happen? Awesome humans who understand models, abstractions, constraints You could even do it in ✨code ✨ ... and you should definitely version control it
  • 25. Why Code? Why not Text? ● Entanglement of meaning and representation ● Finding References instead of text matches ● Enforcement of structure ● Refactoring ● Testable constraints ● More options for document generation ○ Including JSON and yaml Although... I’ve been having a blast using Logseq (a graph like outliner) and I might be crazy enough to give that a go as an IDE for this
  • 28. Guiding Principles ● Primary Objective: Consensus ● Evolution ● Quick Feedback ● First Outcome: Data Tests
  • 29. Creating a Meta Model ● Focused around Events ● From UI to DB ● Schema and Semantics ● People ... still figuring it out Don’t have to do it all at once!
  • 30.
  • 31. The optimistic path to capturing and generating contracts
  • 32. The Event Capture spreadsheet
  • 33. Who’s Going to Do The Work? Andrew Jones - Driving Data Quality with Data Contracts (2023) Probably these people Hopefully these people
  • 34. Why Python? ● Gradual Typing* ● Static Analysis ● Well understood within the team
  • 35. Helpful Python Libraries ● Pandas ● Pydantic ● Rope ● Pytest ● Mypy ● Black
  • 36.
  • 37.
  • 38. Refactoring, doing variable extraction with Rope https://colab.research.google.com/drive/1fHLit3hF2G0dFV0Xl11jnovcdPR87s-E
  • 39. Refactoring, doing variable extraction with Rope https://colab.research.google.com/drive/1fHLit3hF2G0dFV0Xl11jnovcdPR87s-E
  • 40. Code Refactoring - Other Libraries • https://pybowler.io/ - doesn't have variable extraction and not much development activity in the last while • https://github.com/hchasestevens/astpath - useful for finding parts of the AST but then I'm not sure how to proceed with it, seems to be powering a number of meta-programming libs though • traad - https://av.tib.eu/en/media/19947
  • 41. Further explorations for wrangling generated code • Abstract Syntax Tree - Options for querying • Linting - Define my own rules to as they apply to the meta schema • Code duplication detection • Network (Graph) Analysis
  • 42. linkedin.com/in/ryancollingwood mastodon.social/@ryancollingwood twitter.com/ryancollingwood www.meetup.com/en-AU/data-engineering-melbourne • You can be a Data Producer without knowing about it, make it worthwhile for Consumers to “register” with you • You can do this through having a contract which provides clarity and can be used to power tooling and generate artefacts • Code is easier to refactor, find references, and generally maintain than the alternatives Key Takeaways
  • 43. My References • Andrew Jones - Driving Data Quality with Data Contracts (2023) - ISBN 13 978-1837635009 • Data Contracts: The Key to Scaling Distributed Data Architecture and Reducing Data Chaos - https://atlan.com/data-contracts/ • Chad Sanderson - The Production-Grade Data Pipeline - https://dataproducts.substack.com/p/the-production-grade-data-pipeline • Chad Sanderson and Adrian Kreuziger - An Engineers Guide to Data Contracts - https://mlops.community/an-engineers-guide-to-data-contracts-pt-1/ • Green Tree Snakes the missing Python AST docs - https://greentreesnakes.readthedocs.io/en/latest/ • Rope - Refactoring Variable Extraction - https://rope.readthedocs.io/en/latest/library.html#performing-refactorings