SlideShare a Scribd company logo
1 of 32
Download to read offline
Patterns and Anti-patterns for
Memorializing Data Science
Project Artifacts
Derrick Higgins
Senior Director, Enterprise Data Science
Sonjia Waxmonsky
Senior Data Scientist, Enterprise Data Science
We are the fourth-largest private insurer in the US, and the
largest customer-owned insurer.
We are a not-for-profit insurer. We view health care financing,
access and delivery with a long-term perspective that promotes
the entire health care system, not just the company's position.
We insure one in twelve American adults—16 million members.
We have been recognized by Forbes magazine as one of the top
workplaces for women, for diversity and for LGBTQ equality
Keep it all together
Code fragmentation
Hey—can you tell me if something changed
with the way last_visit_date is calculated?
Sean made some updates, but nothing that
should change your data stream
It seems … different from last week
Can you share what he did?
I can ask him to email you a snapshot of the
code
How are you showing the model scores for
each day in the dashboard?
The average
Average of all scores for a day?
Hmm; don’t remember
Or average by members who visit on a day?
Can I get back to you tomorrow?
Team fragmentation
Problems
• Inefficient and error-prone
manual interfaces between
teams
• Lack of reproducibility
• Challenges for quality
assurance and governance
Conway’s law: Any organization that designs a system (defined broadly) will
produce a design whose structure is a copy of the organization's
communication structure
Data Team
ETL
Infrastructure Team
Resource Specification
Data Science Team
Modeling
Front-End Team
UI
Defragmentation
Goals
• Transparency (across teams)
• Common/linked versioning
• Interoperability
Obstacles
• Differences in tooling
• Differences in technical sophistication
• Manual processes for which there is no code
• Politics…
Strategies
Single shared repository
• The ultimate in
transparency and common
versioning
• Depends on shared
processes and standards
for contributions
• Difficult to achieve in
large, matrixed
organizations
Linking of versions across
project repos
• Allows for transparency
and reproducibility
• Looser coupling between
related development
efforts
• Still has certain technical
prerequisites (versioning)
Documentation and cross-
linking
• If nothing else, at least
document dependencies
and consumers as best
possible
• Links to where code lives,
even if not versioned
• Document stakeholders
Your project is not a folder
File folders
My household
management project
Barnard & Fein, 1958. Organization and Retrieval of
Records Generated in a Large-Scale Engineering Project
Problem: Lack of versioning
• Most file systems do
not support versioning
• When they do (e.g.,
SharePoint, S3
buckets), they are not
set up to track
metadata related to
changes
Problem: Catastrophic failure
• If the code only
lives on your
laptop, it lives and
dies with your
laptop
• Even a disk in a
“secure” location
can become
corrupt, fall victim
to malware, or
catch fire
Problem: Obstacles to collaboration
What about a shared drive?
• Exacerbates other problems
– More users ⟶ greater risk of accidental
deletion or corruption
– Versioning by convention breaks down
with larger development teams
• Doesn’t solve versioning problem
• Assumes uninterrupted connectivity
Project is not discoverable
• Collaborators only get insight into code
state when the maintainer sends a copy
• (And they need to know to ask)
Project does not support a workflow
for parallel changes made by multiple
contributors
• Editing a copy of the code creates
irreconcilable forks
Where to store project files
File store(s) must be versioned
File store(s) must be resilient and allow for recovery
File store(s) must be transparent and allow for
independent and asynchronous contributions by
multiple collaborators
Your project is not a git repo
The zeal of the converted
• Git and GitHub provide
versioning, resilience,
transparency and a mechanism
for collaborative development
• But they are not meant for
everything!
• Git (and similar distributed
version control systems) can be
abused for purposes they were
not intended for – especially by
data scientists
Tensorflow: 500 MB;
10 minutes to clone
Other repo: 2.1 GB; 40
minutes to clone
What belongs in GitHub?
😀 😐 😬Diff-able, human-readable,
dynamic, small
Binary, static, large
ETL code
Program code
Configuration files
Experimental scripts
Documentation
Trained models
Input data
Compiled executables
Intermediate files
Bundled resources (images)
Notebooks
Problems
Obstacles to collaboration
▪ Interacting with remote repository slows to a crawl; syncs become a bottleneck in workflow
▪ Meaningless diffs of large / binary / non-editable files obscure consequential changes in
codebase
Obstacles to production deployment
▪ Heterogeneous file types in repository may not all be suitable for a production environment
▪ Could contain sensitive information or simply introduce bloat
Challenges with integration
▪ Intermediate data files stored in repository can make it difficult to ensure consistency with
upstream data sources
Better solutions
Nice tooling incorporated in
Databricks platform!
Other options for data
• Git LFS
• Versioned buckets in S3 and
other blob storage services
Other options for trained
models
• Data science platforms…
• Blob storage linkage
Explain and record decisions
Hidden configurations – Modeling
Model parameters
Threshold
Modeling
population
Aggregation
Hidden configurations – ETL
Patient age at time of admission
Discoveries and EDA as a product
Discoveries and EDA as a product
Support reproducibility:
Store configurations,
artifacts, and results
Modeling outputs – local scope
Modeling outputs – recreating local scope
Modeling tracking with MLflow
Modeling objects
• Hyperparameters
• Model binary
• Test metrics
• Charts, datasets, etc
Modeling tracking with MLflow
Parameterize your code
• Centralize parameters into configuration options and consolidate
• Anticipate and allow future configuration changes
• Be prepared for future changes
YAML and other config file formats
Command-line options
MLflow Tracking / Runs
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts

More Related Content

What's hot

Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilDatabricks
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Spark Summit
 
Serverless data pipelines gcp
Serverless data pipelines gcpServerless data pipelines gcp
Serverless data pipelines gcpCatherine Kimani
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerSri Ambati
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseLaurent Alquier
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesDatabricks
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Migrating Your Data Platform At a High Growth Startup
Migrating Your Data Platform At a High Growth StartupMigrating Your Data Platform At a High Growth Startup
Migrating Your Data Platform At a High Growth StartupDatabricks
 
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...Databricks
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDatabricks
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stageNick Handel
 
Big Data – A New Testing Challenge
Big Data – A New Testing ChallengeBig Data – A New Testing Challenge
Big Data – A New Testing ChallengeTEST Huddle
 
Introduction to Sparkling Water - Spark Summit East 2016
Introduction to Sparkling Water - Spark Summit East 2016Introduction to Sparkling Water - Spark Summit East 2016
Introduction to Sparkling Water - Spark Summit East 2016Sri Ambati
 
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16Sri Ambati
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLDatabricks
 
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Databricks
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Databricks
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 

What's hot (20)

Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
 
Serverless data pipelines gcp
Serverless data pipelines gcpServerless data pipelines gcp
Serverless data pipelines gcp
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge base
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data Lakes
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Migrating Your Data Platform At a High Growth Startup
Migrating Your Data Platform At a High Growth StartupMigrating Your Data Platform At a High Growth Startup
Migrating Your Data Platform At a High Growth Startup
 
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stage
 
Big Data – A New Testing Challenge
Big Data – A New Testing ChallengeBig Data – A New Testing Challenge
Big Data – A New Testing Challenge
 
Introduction to Sparkling Water - Spark Summit East 2016
Introduction to Sparkling Water - Spark Summit East 2016Introduction to Sparkling Water - Spark Summit East 2016
Introduction to Sparkling Water - Spark Summit East 2016
 
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using ML
 
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 

Similar to Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts

FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
 
Writing a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPToolWriting a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPToolkfear
 
Planning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive ProjectsPlanning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive Projectsac2182
 
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW
 
10 Essentials for Effective Teams Governance
10 Essentials for Effective Teams Governance10 Essentials for Effective Teams Governance
10 Essentials for Effective Teams GovernanceChristian Buckley
 
Pikas Maryland Tech Day 2007
Pikas Maryland Tech Day 2007Pikas Maryland Tech Day 2007
Pikas Maryland Tech Day 2007Christina Pikas
 
Open Data - Oi Sir Tim Hands Off My Spreadsheet
Open Data - Oi Sir Tim Hands Off My SpreadsheetOpen Data - Oi Sir Tim Hands Off My Spreadsheet
Open Data - Oi Sir Tim Hands Off My SpreadsheetSnowflake Software
 
Thriving in an Environment of Change
Thriving in an Environment of ChangeThriving in an Environment of Change
Thriving in an Environment of ChangeNeeraj Bhatia
 
Scalar unstructured data april 28, 2010
Scalar unstructured data april 28, 2010Scalar unstructured data april 28, 2010
Scalar unstructured data april 28, 2010pwtoday
 
GTN-Québec_2010 05-25
GTN-Québec_2010 05-25GTN-Québec_2010 05-25
GTN-Québec_2010 05-25Simon Grant
 
RDA Work Groups Outputs and Adoption - Early WG Report back session
RDA Work Groups Outputs and Adoption - Early WG Report back sessionRDA Work Groups Outputs and Adoption - Early WG Report back session
RDA Work Groups Outputs and Adoption - Early WG Report back sessionResearch Data Alliance
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo
 
OSMC 2021 | Contributing to open source with the example of icinga (1)
OSMC 2021 | Contributing to open source with the example of icinga (1)OSMC 2021 | Contributing to open source with the example of icinga (1)
OSMC 2021 | Contributing to open source with the example of icinga (1)NETWAYS
 
Automatic and rapid generation of massive knowledge repositories from data
Automatic and rapid generation of massive knowledge repositories from dataAutomatic and rapid generation of massive knowledge repositories from data
Automatic and rapid generation of massive knowledge repositories from dataSIKM
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupScott Mitchell
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
What They Won't Tell You About DITA
What They Won't Tell You About DITAWhat They Won't Tell You About DITA
What They Won't Tell You About DITAAlan Houser
 

Similar to Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts (20)

The Road to DITA
The Road to DITAThe Road to DITA
The Road to DITA
 
Digital Destiny
Digital DestinyDigital Destiny
Digital Destiny
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
Writing a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPToolWriting a successful data management plan with the DMPTool
Writing a successful data management plan with the DMPTool
 
Planning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive ProjectsPlanning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive Projects
 
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise It
 
10 Essentials for Effective Teams Governance
10 Essentials for Effective Teams Governance10 Essentials for Effective Teams Governance
10 Essentials for Effective Teams Governance
 
Pikas Maryland Tech Day 2007
Pikas Maryland Tech Day 2007Pikas Maryland Tech Day 2007
Pikas Maryland Tech Day 2007
 
Open Data - Oi Sir Tim Hands Off My Spreadsheet
Open Data - Oi Sir Tim Hands Off My SpreadsheetOpen Data - Oi Sir Tim Hands Off My Spreadsheet
Open Data - Oi Sir Tim Hands Off My Spreadsheet
 
Thriving in an Environment of Change
Thriving in an Environment of ChangeThriving in an Environment of Change
Thriving in an Environment of Change
 
Scalar unstructured data april 28, 2010
Scalar unstructured data april 28, 2010Scalar unstructured data april 28, 2010
Scalar unstructured data april 28, 2010
 
GTN-Québec_2010 05-25
GTN-Québec_2010 05-25GTN-Québec_2010 05-25
GTN-Québec_2010 05-25
 
RDA Work Groups Outputs and Adoption - Early WG Report back session
RDA Work Groups Outputs and Adoption - Early WG Report back sessionRDA Work Groups Outputs and Adoption - Early WG Report back session
RDA Work Groups Outputs and Adoption - Early WG Report back session
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and Business
 
OSMC 2021 | Contributing to open source with the example of icinga (1)
OSMC 2021 | Contributing to open source with the example of icinga (1)OSMC 2021 | Contributing to open source with the example of icinga (1)
OSMC 2021 | Contributing to open source with the example of icinga (1)
 
Automatic and rapid generation of massive knowledge repositories from data
Automatic and rapid generation of massive knowledge repositories from dataAutomatic and rapid generation of massive knowledge repositories from data
Automatic and rapid generation of massive knowledge repositories from data
 
Organising and Documenting Data
Organising and Documenting DataOrganising and Documenting Data
Organising and Documenting Data
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
What They Won't Tell You About DITA
What They Won't Tell You About DITAWhat They Won't Tell You About DITA
What They Won't Tell You About DITA
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Recently uploaded (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts

  • 1. Patterns and Anti-patterns for Memorializing Data Science Project Artifacts Derrick Higgins Senior Director, Enterprise Data Science Sonjia Waxmonsky Senior Data Scientist, Enterprise Data Science
  • 2. We are the fourth-largest private insurer in the US, and the largest customer-owned insurer. We are a not-for-profit insurer. We view health care financing, access and delivery with a long-term perspective that promotes the entire health care system, not just the company's position. We insure one in twelve American adults—16 million members. We have been recognized by Forbes magazine as one of the top workplaces for women, for diversity and for LGBTQ equality
  • 3. Keep it all together
  • 4. Code fragmentation Hey—can you tell me if something changed with the way last_visit_date is calculated? Sean made some updates, but nothing that should change your data stream It seems … different from last week Can you share what he did? I can ask him to email you a snapshot of the code How are you showing the model scores for each day in the dashboard? The average Average of all scores for a day? Hmm; don’t remember Or average by members who visit on a day? Can I get back to you tomorrow?
  • 5. Team fragmentation Problems • Inefficient and error-prone manual interfaces between teams • Lack of reproducibility • Challenges for quality assurance and governance Conway’s law: Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure Data Team ETL Infrastructure Team Resource Specification Data Science Team Modeling Front-End Team UI
  • 6. Defragmentation Goals • Transparency (across teams) • Common/linked versioning • Interoperability Obstacles • Differences in tooling • Differences in technical sophistication • Manual processes for which there is no code • Politics…
  • 7. Strategies Single shared repository • The ultimate in transparency and common versioning • Depends on shared processes and standards for contributions • Difficult to achieve in large, matrixed organizations Linking of versions across project repos • Allows for transparency and reproducibility • Looser coupling between related development efforts • Still has certain technical prerequisites (versioning) Documentation and cross- linking • If nothing else, at least document dependencies and consumers as best possible • Links to where code lives, even if not versioned • Document stakeholders
  • 8. Your project is not a folder
  • 9.
  • 10. File folders My household management project Barnard & Fein, 1958. Organization and Retrieval of Records Generated in a Large-Scale Engineering Project
  • 11. Problem: Lack of versioning • Most file systems do not support versioning • When they do (e.g., SharePoint, S3 buckets), they are not set up to track metadata related to changes
  • 12. Problem: Catastrophic failure • If the code only lives on your laptop, it lives and dies with your laptop • Even a disk in a “secure” location can become corrupt, fall victim to malware, or catch fire
  • 13. Problem: Obstacles to collaboration What about a shared drive? • Exacerbates other problems – More users ⟶ greater risk of accidental deletion or corruption – Versioning by convention breaks down with larger development teams • Doesn’t solve versioning problem • Assumes uninterrupted connectivity Project is not discoverable • Collaborators only get insight into code state when the maintainer sends a copy • (And they need to know to ask) Project does not support a workflow for parallel changes made by multiple contributors • Editing a copy of the code creates irreconcilable forks
  • 14. Where to store project files File store(s) must be versioned File store(s) must be resilient and allow for recovery File store(s) must be transparent and allow for independent and asynchronous contributions by multiple collaborators
  • 15. Your project is not a git repo
  • 16. The zeal of the converted • Git and GitHub provide versioning, resilience, transparency and a mechanism for collaborative development • But they are not meant for everything! • Git (and similar distributed version control systems) can be abused for purposes they were not intended for – especially by data scientists Tensorflow: 500 MB; 10 minutes to clone Other repo: 2.1 GB; 40 minutes to clone
  • 17. What belongs in GitHub? 😀 😐 😬Diff-able, human-readable, dynamic, small Binary, static, large ETL code Program code Configuration files Experimental scripts Documentation Trained models Input data Compiled executables Intermediate files Bundled resources (images) Notebooks
  • 18. Problems Obstacles to collaboration ▪ Interacting with remote repository slows to a crawl; syncs become a bottleneck in workflow ▪ Meaningless diffs of large / binary / non-editable files obscure consequential changes in codebase Obstacles to production deployment ▪ Heterogeneous file types in repository may not all be suitable for a production environment ▪ Could contain sensitive information or simply introduce bloat Challenges with integration ▪ Intermediate data files stored in repository can make it difficult to ensure consistency with upstream data sources
  • 19. Better solutions Nice tooling incorporated in Databricks platform! Other options for data • Git LFS • Versioned buckets in S3 and other blob storage services Other options for trained models • Data science platforms… • Blob storage linkage
  • 20. Explain and record decisions
  • 21. Hidden configurations – Modeling Model parameters Threshold Modeling population Aggregation
  • 22. Hidden configurations – ETL Patient age at time of admission
  • 23. Discoveries and EDA as a product
  • 24. Discoveries and EDA as a product
  • 26. Modeling outputs – local scope
  • 27. Modeling outputs – recreating local scope
  • 28. Modeling tracking with MLflow Modeling objects • Hyperparameters • Model binary • Test metrics • Charts, datasets, etc
  • 30. Parameterize your code • Centralize parameters into configuration options and consolidate • Anticipate and allow future configuration changes • Be prepared for future changes YAML and other config file formats Command-line options MLflow Tracking / Runs
  • 31. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.