SlideShare a Scribd company logo
1
Overcoming
DataOps hurdles for
ML in Production
August 2020
SANDEEP UTTAMCHANDANI
CHIEF DATA OFFICER and VP OF ENGINEERING
sandeep@unraveldata.com
2
Behind the scenes of a ML Model in Production
3
DATA ML Model in
Production
Discover Prep Build Operationalize
DataOps
4
Top 10 DataOps Battlescars
Levels of
Automation
Gather technical metadata
Gather operational metadata
Aggregate tribal
knowledge
1. “I thought the attribute means something else”
Battlescar:
Incorrect assumptions about the meaning of attributes, whether it is the
source of truth, owner/common users, versioning, whether dataset is
trustworthy?
Metric:
Time to
Interpret
Building a Self-Service Metadata Catalog
1. “I thought the attribute means something else?”
Battlescar:
Incorrect assumptions about the meaning of attributes, whether it is the
source of truth, owner/common users, versioning, whether dataset is
trustworthy?
Metric:
Time to
Interpret
Building a Self-Service Metadata Catalog
Intuit
7
2. “Where is the dataset I need for my model?”
Battlescar:
Building a customer support forecasting model. Data was silo’ed across
business units. 4+ months of connecting to data stewards to locate the data
attributes required for building the model
Building a Self-Service Search Service
Levels of
Automation
Indexing of datasets &
artifacts
Search Relevance ranking
Access control of
search results
Metric:
Time to
Find
8
Battlescar:
Building a customer support forecasting model. Data was silo’ed across
business units. 4+ months of connecting to data stewards to locate the data
attributes required for building the model
Building a Self-Service Search Service
Metric:
Time to
Find
2. “Where is the dataset I need for my model?”
9
3. “1000 rows in source database -- why only 50 rows in
data lake?”
Battlescar:
Issues in correctness, completeness, timeliness in moving data
daily/hourly from transactional datastores to centralized data lake
Metric:
Time to
Move
Building a Self-Service Data Movement service
Data Ingestion Configuration
Data Transformation
Change Mgt
Levels of
Automation
10
4. “Job completed but dashboard graphs have data missing?”
Battlescar:
Jobs are orchestrated using schedulers (such as Airflow, Oozie). Several
times, the job dependencies are incorrect, leading to reporting or model
training jobs to be triggered prematurely.
Metric:
Time to
Orchestrate
Building a Self-Service orchestration Service
Levels of
Automation
Defining Job Dependencies
Robust Job Execution
Production
Monitoring
11
5. “Data processing was supposed to complete at 8 am. Its 4pm
and my model retraining job is still waiting?”
Battlescar:
Writing efficient Big Data processing applications is non-trivial. With
plethora of technologies, gaining broad expertise is difficult even for
expert data engineers.
Metric:
Time to
Optimize
Building a Self-Service query optimization Service
Levels of
Automation
Aggregating query, cluster,
resource Stats
Analyzing & correlating
stats
Tuning Jobs
12
6. “Customer changed preference to no marketing emails. Why are
we still including in email campaigns?”
Battlescar:
Without a consistent primary key to identify the customer across data
silos, where recurring issues arise. Emerging Data Rights such as
GDPR, CCPA, require complying with customer preferences on what
data is collected, how it is used, deleted on request.
Metric:
Time to
Comply
Building a Self-Service data rights governance Service
Levels of
Automation
Tracking customer data lifecycle
and preferences
Executing customer’s
data rights requests
Use-case
based access
control
13
7. “Job pipeline ran for 15 hours and now we detect data
quality issue upon completion -- could we be proactive?”
Battlescar:
Data issues in a long running business critical job leads to missing
insights. Only when results don't look correct that we realize there is an
issue.
Metric:
Time to
Insights
Quality
Building a Self-Service data observability Service
Levels of
Automation
Verify accuracy of data
Detect anomalies
Avoid data
quality issues
14
8. “Using the best polyglot datastores -- how do I now write
queries effectively across this data?”
Battlescar:
Significant time spent in planning, design, and writing queries that
process data across datastores
Metric:
Time to
Virtualize
Datastores
Building a Self-Service data virtualization Service
Levels of
Automation
Automatic query routing
Managing datastore
specific queries
Joining across
transactional
sources
15
9. “I ran a A/B experiment -- need to build time-consuming
data pipelines to now analyze the data”
Battlescar:
Analyzing experimental results in a consistent fashion is a nightmare. No
consistent definitions between metrics used for experimental analysis
and business reporting
Metric:
Time to A/B
Test
Building a Self-Service A/B Testing Service
Levels of
Automation
16
10. “Data processing jobs last week cost us 30% more. Why?”
Battlescar:
Especially in the cloud, $ cost is linear to usage. Tracking budgets and
spend to effectively optimize requires non-trivial effort.
Metric:
Time to
Cost
Governance
Building a Self-Service cost governance Service
Levels of
Automation
Expenditure Observability
Matching
Supply-Demand
Continuous Cost
Optimization
17
Wrap up: Advice on Managing your DataOps
18
People
Process Technology
DataOps hurdles vary and depends on...
19
Self-Service has levels (not binary)
20
Discover Prep Build Operationalize
TIME-TO-INSIGHT
Measuring Current DataOps:
Time-to-Insight Metric
DATA
21
Discover Prep Build Operationalize
Time-to-Insight Scorecard
22
Discover Prep Build Operationalize
Creating Your Time-to-Insight Scorecard
WeeksDaysHoursLegend:
23
Call for Action: Making DataOps Self-Service
1. Measure
Create your
Time-to-Insight Scorecard
Self-Service
DataOps
2. Learn
Shortlist 1-2 scorecard
metrics to improve level
of automation
3. Build
Implement well-known
design patterns in your
data platform to make the
metrics self-service
24
Upcoming Book: The Self-Service Data Roadmap
Available Sept’20
Early Release Available on O’Reilly:
https://www.oreilly.com/library/view/the-self-service-data/9781492075240/
25
CONTACT US TO SCHEDULE A DATA OPERATIONS HEALTH CHECK TODAY
hello@unraveldata.com

More Related Content

What's hot

The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
DATAVERSITY
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?
Talend
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Perficient, Inc.
 
Webinar: The Death of Traditional Data Integration
Webinar: The Death of Traditional Data IntegrationWebinar: The Death of Traditional Data Integration
Webinar: The Death of Traditional Data Integration
SnapLogic
 
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseMike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Talend
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance
Talend
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
CCG
 
Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017
Michelle Ufford
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine Learning
Talend
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
EMC
 
5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend Connect5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend Connect
Talend
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data Governance
Eric Kavanagh
 
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
SnapLogic
 
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Sri Ambati
 
Achieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendAchieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - Talend
Talend
 
Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovic
Radovan Baćović
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Data Con LA
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
Eric Kavanagh
 
Cloud-Con: Integration & Web APIs
Cloud-Con: Integration & Web APIsCloud-Con: Integration & Web APIs
Cloud-Con: Integration & Web APIs
SnapLogic
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
Dataiku
 

What's hot (20)

The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
 
Webinar: The Death of Traditional Data Integration
Webinar: The Death of Traditional Data IntegrationWebinar: The Death of Traditional Data Integration
Webinar: The Death of Traditional Data Integration
 
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseMike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine Learning
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend Connect5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend Connect
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data Governance
 
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
 
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
 
Achieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendAchieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - Talend
 
Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovic
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 
Cloud-Con: Integration & Web APIs
Cloud-Con: Integration & Web APIsCloud-Con: Integration & Web APIs
Cloud-Con: Integration & Web APIs
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 

Similar to Overcoming DataOps hurdles for ML in Production

Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
Jesus Rodriguez
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
DATAVERSITY
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
Poornima Vijayashanker
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
Vadlamudi Saketh
 
StreamCentral for the IT Professional
StreamCentral for the IT ProfessionalStreamCentral for the IT Professional
StreamCentral for the IT Professional
Raheel Retiwalla
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Build it…will they come by Shawn Trainer
 Build it…will they come by Shawn Trainer Build it…will they come by Shawn Trainer
Build it…will they come by Shawn Trainer
Data Con LA
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Cambridge Semantics
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PM
Product School
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with Automation
Inside Analysis
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdf
ssuserd23711
 
Emvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce Deck
Emvigo Technologies
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
DataStax
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
DATAVERSITY
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
redmondpulver
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Memoori
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 

Similar to Overcoming DataOps hurdles for ML in Production (20)

Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
 
StreamCentral for the IT Professional
StreamCentral for the IT ProfessionalStreamCentral for the IT Professional
StreamCentral for the IT Professional
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Build it…will they come by Shawn Trainer
 Build it…will they come by Shawn Trainer Build it…will they come by Shawn Trainer
Build it…will they come by Shawn Trainer
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PM
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with Automation
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdf
 
Emvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce Deck
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 

Recently uploaded

The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 

Recently uploaded (20)

The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 

Overcoming DataOps hurdles for ML in Production

  • 1. 1 Overcoming DataOps hurdles for ML in Production August 2020 SANDEEP UTTAMCHANDANI CHIEF DATA OFFICER and VP OF ENGINEERING sandeep@unraveldata.com
  • 2. 2 Behind the scenes of a ML Model in Production
  • 3. 3 DATA ML Model in Production Discover Prep Build Operationalize DataOps
  • 4. 4 Top 10 DataOps Battlescars
  • 5. Levels of Automation Gather technical metadata Gather operational metadata Aggregate tribal knowledge 1. “I thought the attribute means something else” Battlescar: Incorrect assumptions about the meaning of attributes, whether it is the source of truth, owner/common users, versioning, whether dataset is trustworthy? Metric: Time to Interpret Building a Self-Service Metadata Catalog
  • 6. 1. “I thought the attribute means something else?” Battlescar: Incorrect assumptions about the meaning of attributes, whether it is the source of truth, owner/common users, versioning, whether dataset is trustworthy? Metric: Time to Interpret Building a Self-Service Metadata Catalog Intuit
  • 7. 7 2. “Where is the dataset I need for my model?” Battlescar: Building a customer support forecasting model. Data was silo’ed across business units. 4+ months of connecting to data stewards to locate the data attributes required for building the model Building a Self-Service Search Service Levels of Automation Indexing of datasets & artifacts Search Relevance ranking Access control of search results Metric: Time to Find
  • 8. 8 Battlescar: Building a customer support forecasting model. Data was silo’ed across business units. 4+ months of connecting to data stewards to locate the data attributes required for building the model Building a Self-Service Search Service Metric: Time to Find 2. “Where is the dataset I need for my model?”
  • 9. 9 3. “1000 rows in source database -- why only 50 rows in data lake?” Battlescar: Issues in correctness, completeness, timeliness in moving data daily/hourly from transactional datastores to centralized data lake Metric: Time to Move Building a Self-Service Data Movement service Data Ingestion Configuration Data Transformation Change Mgt Levels of Automation
  • 10. 10 4. “Job completed but dashboard graphs have data missing?” Battlescar: Jobs are orchestrated using schedulers (such as Airflow, Oozie). Several times, the job dependencies are incorrect, leading to reporting or model training jobs to be triggered prematurely. Metric: Time to Orchestrate Building a Self-Service orchestration Service Levels of Automation Defining Job Dependencies Robust Job Execution Production Monitoring
  • 11. 11 5. “Data processing was supposed to complete at 8 am. Its 4pm and my model retraining job is still waiting?” Battlescar: Writing efficient Big Data processing applications is non-trivial. With plethora of technologies, gaining broad expertise is difficult even for expert data engineers. Metric: Time to Optimize Building a Self-Service query optimization Service Levels of Automation Aggregating query, cluster, resource Stats Analyzing & correlating stats Tuning Jobs
  • 12. 12 6. “Customer changed preference to no marketing emails. Why are we still including in email campaigns?” Battlescar: Without a consistent primary key to identify the customer across data silos, where recurring issues arise. Emerging Data Rights such as GDPR, CCPA, require complying with customer preferences on what data is collected, how it is used, deleted on request. Metric: Time to Comply Building a Self-Service data rights governance Service Levels of Automation Tracking customer data lifecycle and preferences Executing customer’s data rights requests Use-case based access control
  • 13. 13 7. “Job pipeline ran for 15 hours and now we detect data quality issue upon completion -- could we be proactive?” Battlescar: Data issues in a long running business critical job leads to missing insights. Only when results don't look correct that we realize there is an issue. Metric: Time to Insights Quality Building a Self-Service data observability Service Levels of Automation Verify accuracy of data Detect anomalies Avoid data quality issues
  • 14. 14 8. “Using the best polyglot datastores -- how do I now write queries effectively across this data?” Battlescar: Significant time spent in planning, design, and writing queries that process data across datastores Metric: Time to Virtualize Datastores Building a Self-Service data virtualization Service Levels of Automation Automatic query routing Managing datastore specific queries Joining across transactional sources
  • 15. 15 9. “I ran a A/B experiment -- need to build time-consuming data pipelines to now analyze the data” Battlescar: Analyzing experimental results in a consistent fashion is a nightmare. No consistent definitions between metrics used for experimental analysis and business reporting Metric: Time to A/B Test Building a Self-Service A/B Testing Service Levels of Automation
  • 16. 16 10. “Data processing jobs last week cost us 30% more. Why?” Battlescar: Especially in the cloud, $ cost is linear to usage. Tracking budgets and spend to effectively optimize requires non-trivial effort. Metric: Time to Cost Governance Building a Self-Service cost governance Service Levels of Automation Expenditure Observability Matching Supply-Demand Continuous Cost Optimization
  • 17. 17 Wrap up: Advice on Managing your DataOps
  • 20. 20 Discover Prep Build Operationalize TIME-TO-INSIGHT Measuring Current DataOps: Time-to-Insight Metric DATA
  • 21. 21 Discover Prep Build Operationalize Time-to-Insight Scorecard
  • 22. 22 Discover Prep Build Operationalize Creating Your Time-to-Insight Scorecard WeeksDaysHoursLegend:
  • 23. 23 Call for Action: Making DataOps Self-Service 1. Measure Create your Time-to-Insight Scorecard Self-Service DataOps 2. Learn Shortlist 1-2 scorecard metrics to improve level of automation 3. Build Implement well-known design patterns in your data platform to make the metrics self-service
  • 24. 24 Upcoming Book: The Self-Service Data Roadmap Available Sept’20 Early Release Available on O’Reilly: https://www.oreilly.com/library/view/the-self-service-data/9781492075240/
  • 25. 25 CONTACT US TO SCHEDULE A DATA OPERATIONS HEALTH CHECK TODAY hello@unraveldata.com