SlideShare a Scribd company logo
1 of 40
Download to read offline
CONFIDENTIAL. Copyright © 1
Dagster - DataOps and MLOps for
Machine Learning Engineers
CONFIDENTIAL. Copyright © 2
8+ years swimming in data @
A Researcher, Engineer and Blogger
CONFIDENTIAL. Copyright © 3
Agenda
01
02
03
04
05
06
Motivation
Dagster's philosophy
Dagster 101
Dagster DataOps
Dagster MLOps
Q&A
CONFIDENTIAL. Copyright © 4
Motivation
CONFIDENTIAL. Copyright © 5
Typical Machine Learning pipeline
Data Preparation Model training Serving model
CONFIDENTIAL. Copyright © 6
Why we need orchestration?
1. Directed Acyclic Graphs
(DAGs)
2. Scheduling and Workflow
Management
3. Error Handling and Retry
Mechanisms
4. Monitoring and Logging
Source: link
CONFIDENTIAL. Copyright © 7
Orchestration frameworks
CONFIDENTIAL. Copyright © 8
Difficulties in answering important questions
• Is this data up-to-date?
• When this upstream data updated which downstream data
affected?
• How can we manage data version overtime?
• How is model's performance overtime?
ModelOps
DataOps
DevOps
90%
10%
10%
CONFIDENTIAL. Copyright © 9
Dagster's philosophy
CONFIDENTIAL. Copyright © 10
Dagster's philosophy: Assets
Reports
Tables
ML Models
CONFIDENTIAL. Copyright © 11
Ideas: transition from Imperative to Declarative
Say goodbye to spaghetti
code and complex DOM
manipulations with ReactJS
Infrastructure as code (IaC)
with Terraform
Managing containerized
applications at scale has
never been easier with K8s
More accurate and efficient
analytics with data
oriented
Front end
Cluster
orchestration
Dev Ops
Data job/op data
CONFIDENTIAL. Copyright © 12
Dagster 101
CONFIDENTIAL. Copyright © 13
• An open-source library used to build ETL and Machine Learning systems
(first released in 2018).
• 100+ contributors, 10K commits, 5K stars.
• Used by many innovation organizations.
CONFIDENTIAL. Copyright © 14
From Job/Op
def upstream_asset1():
return 1
def upstream_asset2():
return 2
def combine_asset(upstream_asset1, upstream_asset2):
combine = upstream_asset1 + upstream_asset2
print(f"{upstream_asset1} + {upstream_asset2} = {combine}")
return combine
result = combine_asset(upstream_asset1(), upstream_asset2())
CONFIDENTIAL. Copyright © 15
To assets
from dagster import asset
@asset
def upstream_asset1():
return 1
@asset
def upstream_asset2():
return 2
@asset
def combine_asset(context, upstream_asset1, upstream_asset2):
combine = upstream_asset1 + upstream_asset2
context.log.info(f"{upstream_asset1} + {upstream_asset2} =
{combine}")
return combine
Asset key
CONFIDENTIAL. Copyright © 16
dagster dev -f <file_name.py>
from dagster import asset
@asset
def upstream_asset1():
return 1
@asset
def upstream_asset2():
return 2
@asset
def combine_asset(context, upstream_asset1, upstream_asset2):
combine = upstream_asset1 + upstream_asset2
context.log.info(f"{upstream_asset1} + {upstream_asset2} =
{combine}")
return combine
Upstream asset key
CONFIDENTIAL. Copyright © 17
Dagster DataOps
CONFIDENTIAL. Copyright © 18
Modularity:
• Designed with modular architecture → easily organize complex data pipelines.
• Provides a clear separation between data processing logic, data management, and infrastructure
management.
Flexibility:
• Supports a wide range of data sources, including databases, application programming interfaces (APIs),
and file systems.
• provides integration with popular data processing frameworks (Apache Airflow, Apache Spark) → easy
integration into existing data pipelines.
Debugging and testing:
• Provides tools to debug, test data pipeline → easily identify and fix errors.
• Powerful UI allows data pipeline visualization and progress tracking.
Supportive Community:
• Dagster has a community of active users and contributors, developing, continuously adding new
features and improving the framework.
CONFIDENTIAL. Copyright © 19
Visualization and debugging
Dagster comes with Dagit,
a graphical user interface
that allows ML engineers
to visualize pipelines,
monitor execution
progress, and debug
issues using detailed logs
and error messages.
CONFIDENTIAL. Copyright © 20
Detailed logs and error messages
CONFIDENTIAL. Copyright © 21
1st: Organize complex data pipeline
• Where’s data come from?
• How’s data computed?
• Is this data up-to-date?
• When this upstream data updated which downstream data
affected?
CONFIDENTIAL. Copyright © 22
2nd : Easy integration into existing tech stacks
from dagster import materialize
if __name__ == "__main__":
result = materialize(assets=[my_first_asset])
pip install dagster dagit
Just install
And materialize your assets
Extensibility and integration: Dagster has a rich ecosystem
of libraries and plugins that support various tools and
platforms related to machine learning, data processing,
and infrastructure. This extensibility allows ML engineers
to integrate Dagster with existing tools and systems.
CONFIDENTIAL. Copyright © 23
3rd : assets changes detection
If the latest version of combine_asset was created before the latest version of upstream_asset1 or upstream_asset2, then
combine_asset may be obsolete. Dagster will warn the difference with the "upstream changed" indicator
CONFIDENTIAL. Copyright © 24
4th : IOManager: reduce data streamline complexity
Write Once, use everywhere!
CONFIDENTIAL. Copyright © 25
CSVIOManager - handle_output() & load_input()
CONFIDENTIAL. Copyright © 26
Dagster MLOps
CONFIDENTIAL. Copyright © 27
Benefits of building machine learning pipelines in Dagster
• Dagster makes iterating on machine learning models and testing easy, and it is designed to use during the
development process.
• Dagster has a lightweight execution model means you can access the benefits of an orchestrator, like re-
executing from the middle of a pipeline and parallelizing steps while you're experimenting.
• Dagster models data assets, not just tasks, so it understands the upstream and downstream data dependencies.
• Dagster is a one-stop shop for both the data transformations and the models that depend on the data
transformations.
CONFIDENTIAL. Copyright © 28
Typical Machine Learning pipeline
Data Preparation Model training Serving model
CONFIDENTIAL. Copyright © 29
Organize complex data pipeline (Modeling Pipeline)
Pipeline abstraction: Dagster
enables ML engineers to define
complex workflows as modular
pipelines composed of
individual units called assets.
This modularity aids in code
readability, maintainability, and
reusability.
CONFIDENTIAL. Copyright © 30
Organize complex data pipeline (Data preparation)
CONFIDENTIAL. Copyright © 31
Organize complex data pipeline (Model training)
CONFIDENTIAL. Copyright © 32
5th : Debug, test data pipeline
from dagster import asset
@asset
def my_first_asset(context):
context.log.info("This is my first asset")
return 1
from dagster import materialize, build_op_context
def test_my_first_asset():
result = materialize(assets=[my_first_asset])
assert result.success
context = build_op_context()
assert my_first_asset(context) == 1
my_assets.py
test_my_assets.py
Testing and development: Dagster supports
local development and testing by enabling
execution of individual assets or entire
pipelines independent of the production
environment, fostering faster iteration and
experimentation.
CONFIDENTIAL. Copyright © 33
Tracking model history
Viewing previous versions of a machine
learning model can be useful to
understand the evaluation history or
referencing a model that was used for
inference. Using Dagster will enable you
to understand:
• What data was used to train the
model
• When the model was refreshed
• The code version and ML model
version was used to generate the
predictions used for predicted values
CONFIDENTIAL. Copyright © 34
Monitoring potential model drift, data drift overtime
Monitoring and observability: Dagster makes it
easier to monitor and track model performance
metrics with built-in logging and error-handling,
enabling ML engineers to detect issues and ensure
the reliability of their machine learning workflows.
CONFIDENTIAL. Copyright © 35
Dagster’s architecture
Scalability and portability: With Dagster, ML engineers can define pipelines that scale across
different execution environments, such as cloud-based infrastructure, containerization
platforms like Docker, and orchestration tools like Kubernetes.
CONFIDENTIAL. Copyright © 36
6th : Transitioning Data Pipelines from Development to Production
Configuration
management: With
Dagster, ML engineers can
manage configurations
more efficiently and
consistently across various
environments, simplifying
pipeline and model
parameterization.
CONFIDENTIAL. Copyright © 37
Dagster features to take away
1.Organize complex data pipeline
2.Easy integration into existing tech stacks
3.Assets changes detection
4.IOManager: reduce data streamline complexity
5.Debug, test data pipeline
6.Transitioning Data Pipelines from Development to Production
37
CONFIDENTIAL. Copyright © 38
Dagster Pros & Cons
Pros Cons
• Data Pipeline Orchestration
• Modularity and Reusability
• Data Quality and Validation checks
• Monitoring and Observability
• Community Support
• Learning Curve
• Not appropriate for stream processing
CONFIDENTIAL. Copyright © 39
Q&A
CONFIDENTIAL. Copyright © 40
References
Introducing Software-Defined Assets
Dagster vs. Airflow
Building machine learning pipelines with Dagster
Managing machine learning models with Dagster
Open Source deployment architecture

More Related Content

What's hot

Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup Omid Vahdaty
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderDatabricks
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo SanchezGoDataDriven
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentDatabricks
 
Growing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RSGrowing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RSDatabricks
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motionconfluent
 

What's hot (20)

Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
 
Growing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RSGrowing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RS
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 

Similar to Dagster - DataOps and MLOps for Machine Learning Engineers.pdf

Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapNeo4j
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Databricks
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
 
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Data Con LA
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
 
Re-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudRe-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudCarter Wickstrom
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Igor De Souza
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaNeo4j
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningCloudera, Inc.
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarImpetus Technologies
 
Presentation application change management and data masking strategies for ...
Presentation   application change management and data masking strategies for ...Presentation   application change management and data masking strategies for ...
Presentation application change management and data masking strategies for ...xKinAnx
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 

Similar to Dagster - DataOps and MLOps for Machine Learning Engineers.pdf (20)

Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Re-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudRe-Platforming Applications for the Cloud
Re-Platforming Applications for the Cloud
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
 
CS-Op Analytics
CS-Op AnalyticsCS-Op Analytics
CS-Op Analytics
 
Presentation application change management and data masking strategies for ...
Presentation   application change management and data masking strategies for ...Presentation   application change management and data masking strategies for ...
Presentation application change management and data masking strategies for ...
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 

More from Hong Ong

Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...Hong Ong
 
DBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxDBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxHong Ong
 
Data Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdfData Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdfHong Ong
 
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?Hong Ong
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịHong Ong
 
Nền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big DataNền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big DataHong Ong
 
Bắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big DataBắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big DataHong Ong
 
Bắt đầu học data science
Bắt đầu học data scienceBắt đầu học data science
Bắt đầu học data scienceHong Ong
 

More from Hong Ong (8)

Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...
 
DBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxDBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptx
 
Data Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdfData Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdf
 
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
 
Nền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big DataNền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big Data
 
Bắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big DataBắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big Data
 
Bắt đầu học data science
Bắt đầu học data scienceBắt đầu học data science
Bắt đầu học data science
 

Recently uploaded

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 

Recently uploaded (20)

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 

Dagster - DataOps and MLOps for Machine Learning Engineers.pdf

  • 1. CONFIDENTIAL. Copyright © 1 Dagster - DataOps and MLOps for Machine Learning Engineers
  • 2. CONFIDENTIAL. Copyright © 2 8+ years swimming in data @ A Researcher, Engineer and Blogger
  • 3. CONFIDENTIAL. Copyright © 3 Agenda 01 02 03 04 05 06 Motivation Dagster's philosophy Dagster 101 Dagster DataOps Dagster MLOps Q&A
  • 5. CONFIDENTIAL. Copyright © 5 Typical Machine Learning pipeline Data Preparation Model training Serving model
  • 6. CONFIDENTIAL. Copyright © 6 Why we need orchestration? 1. Directed Acyclic Graphs (DAGs) 2. Scheduling and Workflow Management 3. Error Handling and Retry Mechanisms 4. Monitoring and Logging Source: link
  • 7. CONFIDENTIAL. Copyright © 7 Orchestration frameworks
  • 8. CONFIDENTIAL. Copyright © 8 Difficulties in answering important questions • Is this data up-to-date? • When this upstream data updated which downstream data affected? • How can we manage data version overtime? • How is model's performance overtime? ModelOps DataOps DevOps 90% 10% 10%
  • 9. CONFIDENTIAL. Copyright © 9 Dagster's philosophy
  • 10. CONFIDENTIAL. Copyright © 10 Dagster's philosophy: Assets Reports Tables ML Models
  • 11. CONFIDENTIAL. Copyright © 11 Ideas: transition from Imperative to Declarative Say goodbye to spaghetti code and complex DOM manipulations with ReactJS Infrastructure as code (IaC) with Terraform Managing containerized applications at scale has never been easier with K8s More accurate and efficient analytics with data oriented Front end Cluster orchestration Dev Ops Data job/op data
  • 12. CONFIDENTIAL. Copyright © 12 Dagster 101
  • 13. CONFIDENTIAL. Copyright © 13 • An open-source library used to build ETL and Machine Learning systems (first released in 2018). • 100+ contributors, 10K commits, 5K stars. • Used by many innovation organizations.
  • 14. CONFIDENTIAL. Copyright © 14 From Job/Op def upstream_asset1(): return 1 def upstream_asset2(): return 2 def combine_asset(upstream_asset1, upstream_asset2): combine = upstream_asset1 + upstream_asset2 print(f"{upstream_asset1} + {upstream_asset2} = {combine}") return combine result = combine_asset(upstream_asset1(), upstream_asset2())
  • 15. CONFIDENTIAL. Copyright © 15 To assets from dagster import asset @asset def upstream_asset1(): return 1 @asset def upstream_asset2(): return 2 @asset def combine_asset(context, upstream_asset1, upstream_asset2): combine = upstream_asset1 + upstream_asset2 context.log.info(f"{upstream_asset1} + {upstream_asset2} = {combine}") return combine Asset key
  • 16. CONFIDENTIAL. Copyright © 16 dagster dev -f <file_name.py> from dagster import asset @asset def upstream_asset1(): return 1 @asset def upstream_asset2(): return 2 @asset def combine_asset(context, upstream_asset1, upstream_asset2): combine = upstream_asset1 + upstream_asset2 context.log.info(f"{upstream_asset1} + {upstream_asset2} = {combine}") return combine Upstream asset key
  • 17. CONFIDENTIAL. Copyright © 17 Dagster DataOps
  • 18. CONFIDENTIAL. Copyright © 18 Modularity: • Designed with modular architecture → easily organize complex data pipelines. • Provides a clear separation between data processing logic, data management, and infrastructure management. Flexibility: • Supports a wide range of data sources, including databases, application programming interfaces (APIs), and file systems. • provides integration with popular data processing frameworks (Apache Airflow, Apache Spark) → easy integration into existing data pipelines. Debugging and testing: • Provides tools to debug, test data pipeline → easily identify and fix errors. • Powerful UI allows data pipeline visualization and progress tracking. Supportive Community: • Dagster has a community of active users and contributors, developing, continuously adding new features and improving the framework.
  • 19. CONFIDENTIAL. Copyright © 19 Visualization and debugging Dagster comes with Dagit, a graphical user interface that allows ML engineers to visualize pipelines, monitor execution progress, and debug issues using detailed logs and error messages.
  • 20. CONFIDENTIAL. Copyright © 20 Detailed logs and error messages
  • 21. CONFIDENTIAL. Copyright © 21 1st: Organize complex data pipeline • Where’s data come from? • How’s data computed? • Is this data up-to-date? • When this upstream data updated which downstream data affected?
  • 22. CONFIDENTIAL. Copyright © 22 2nd : Easy integration into existing tech stacks from dagster import materialize if __name__ == "__main__": result = materialize(assets=[my_first_asset]) pip install dagster dagit Just install And materialize your assets Extensibility and integration: Dagster has a rich ecosystem of libraries and plugins that support various tools and platforms related to machine learning, data processing, and infrastructure. This extensibility allows ML engineers to integrate Dagster with existing tools and systems.
  • 23. CONFIDENTIAL. Copyright © 23 3rd : assets changes detection If the latest version of combine_asset was created before the latest version of upstream_asset1 or upstream_asset2, then combine_asset may be obsolete. Dagster will warn the difference with the "upstream changed" indicator
  • 24. CONFIDENTIAL. Copyright © 24 4th : IOManager: reduce data streamline complexity Write Once, use everywhere!
  • 25. CONFIDENTIAL. Copyright © 25 CSVIOManager - handle_output() & load_input()
  • 26. CONFIDENTIAL. Copyright © 26 Dagster MLOps
  • 27. CONFIDENTIAL. Copyright © 27 Benefits of building machine learning pipelines in Dagster • Dagster makes iterating on machine learning models and testing easy, and it is designed to use during the development process. • Dagster has a lightweight execution model means you can access the benefits of an orchestrator, like re- executing from the middle of a pipeline and parallelizing steps while you're experimenting. • Dagster models data assets, not just tasks, so it understands the upstream and downstream data dependencies. • Dagster is a one-stop shop for both the data transformations and the models that depend on the data transformations.
  • 28. CONFIDENTIAL. Copyright © 28 Typical Machine Learning pipeline Data Preparation Model training Serving model
  • 29. CONFIDENTIAL. Copyright © 29 Organize complex data pipeline (Modeling Pipeline) Pipeline abstraction: Dagster enables ML engineers to define complex workflows as modular pipelines composed of individual units called assets. This modularity aids in code readability, maintainability, and reusability.
  • 30. CONFIDENTIAL. Copyright © 30 Organize complex data pipeline (Data preparation)
  • 31. CONFIDENTIAL. Copyright © 31 Organize complex data pipeline (Model training)
  • 32. CONFIDENTIAL. Copyright © 32 5th : Debug, test data pipeline from dagster import asset @asset def my_first_asset(context): context.log.info("This is my first asset") return 1 from dagster import materialize, build_op_context def test_my_first_asset(): result = materialize(assets=[my_first_asset]) assert result.success context = build_op_context() assert my_first_asset(context) == 1 my_assets.py test_my_assets.py Testing and development: Dagster supports local development and testing by enabling execution of individual assets or entire pipelines independent of the production environment, fostering faster iteration and experimentation.
  • 33. CONFIDENTIAL. Copyright © 33 Tracking model history Viewing previous versions of a machine learning model can be useful to understand the evaluation history or referencing a model that was used for inference. Using Dagster will enable you to understand: • What data was used to train the model • When the model was refreshed • The code version and ML model version was used to generate the predictions used for predicted values
  • 34. CONFIDENTIAL. Copyright © 34 Monitoring potential model drift, data drift overtime Monitoring and observability: Dagster makes it easier to monitor and track model performance metrics with built-in logging and error-handling, enabling ML engineers to detect issues and ensure the reliability of their machine learning workflows.
  • 35. CONFIDENTIAL. Copyright © 35 Dagster’s architecture Scalability and portability: With Dagster, ML engineers can define pipelines that scale across different execution environments, such as cloud-based infrastructure, containerization platforms like Docker, and orchestration tools like Kubernetes.
  • 36. CONFIDENTIAL. Copyright © 36 6th : Transitioning Data Pipelines from Development to Production Configuration management: With Dagster, ML engineers can manage configurations more efficiently and consistently across various environments, simplifying pipeline and model parameterization.
  • 37. CONFIDENTIAL. Copyright © 37 Dagster features to take away 1.Organize complex data pipeline 2.Easy integration into existing tech stacks 3.Assets changes detection 4.IOManager: reduce data streamline complexity 5.Debug, test data pipeline 6.Transitioning Data Pipelines from Development to Production 37
  • 38. CONFIDENTIAL. Copyright © 38 Dagster Pros & Cons Pros Cons • Data Pipeline Orchestration • Modularity and Reusability • Data Quality and Validation checks • Monitoring and Observability • Community Support • Learning Curve • Not appropriate for stream processing
  • 40. CONFIDENTIAL. Copyright © 40 References Introducing Software-Defined Assets Dagster vs. Airflow Building machine learning pipelines with Dagster Managing machine learning models with Dagster Open Source deployment architecture