SlideShare a Scribd company logo
1 of 23
Download to read offline
APRIL, 2023
DataOps
The Future of Data Management - Embracing Agility,
Collaboration, and Automation
Agenda
2
Introductions
DevOps to DataOps
CI/CD for Data Products
Orchestration, Testing and Monitoring
Questions
Jeewan Singh
Senior Principal,
Data Analytics
Tomy Rhymond
Principal- Cloud Lead
Technology Enablement
3
About Us.
So…. what is DevOps, really???
DevOps is a cultural movement to:
• Improve Collaboration
• Automate operations (aka the “plumbing”)
• Increase the rate of deployment
• Improve quality and security
What
 Source Control
 CI/CD
 Infrastructure Automation (IAC)
 Automated Test and Validation
 Design for Scalability
 Use the Cloud
How
Why
Spend more time on valuable work
… and have more fun!
Continuous Deployment Of Databases : Part 1
Data and Analytics professional face unique challenges for
automation
State Rolling back Other
Testing
Down Time
Application code is
stateless
Database contains
valuable business data
Change structure and
data without loss
Hand crafting release
scripts is error-prone
Application servers are
easy to swap in/out
Database servers are
very difficult to swap
in/out (even in cluster)
Can sometimes swap
databases or tables
in/out
Applications easy to roll
back from source
control
Databases must be
explicitly backed up and
restored
Very time-consuming
Database unavailable
during restore
Application code is easy
to test with unit tests
Unit testing for
databases is challenging
Unit testing requires test
data generation and
management which gets
complicated quickly
Configuration changes
deployed via CI/CD
Most often only DBAs
touch the database
(control)
Prod databases don’t
match source control
(drift)
Database change
management is difficult
6
These Roadblocks add friction, prevent automation, and
slow adoption of DataOps best practices
Fragile Column Mappings
Embedded Credentials
Hard-coded connections
Black-Box SaaS
GUI-Only Tools
5 Critical Mindset Changes
 Business Requirements are Static
“Our job is to meet the agreed business requirements.”
 Single-Developer, Individual Ownership
“Someone will email me if it breaks.”
 UAT Testing Approach
“We will run some tests before we launch.”
 Everything Manual
“No time to build the automation yet.”
 Demos at End of Project
“Creating demos take time.”
Traditional Mindset DevOps Mindset
 Business Requirements are Fluid
“We aren’t doing right if we assume requirements are static.”
 Multiple Developers, Team Ownership
“Someone else may have to fix this if it breaks.”
 Continuous Testing Approach
“We wrote the tests before we started developing.”
 Mostly Automated
“No time to waste on manual stuff.”
 Demos Daily or Weekly
“Continual feedback is critical to success.”
8
DataOps is a collaborative and automated approach to
managing the entire lifecycle of data, from its creation to
its deletion, in a way that ensures that data is trustworthy,
accurate, and readily available to the right people at the
right time.
PEOPLE PROCESS
TECHNOL
OGY
DataOps Collaboration
Product
Owner/Architect
Operations/
Administration
Chief Data Officer
Data
Analysts
Data
Scientist
Data
Engineer
10
DataOps is an approach to data analytics and data-driven decision
making that follows the agile methodology of continuous
improvement.
Source
Data
Data
Ingestions
Data
Engineering
Data
Analytics
Business
Users
DataOps
CI/CD Orchestration Testing Monitoring
11
DataOps practices are an investment whose dividends
increase with time and experience
Increased speed of delivery
from improved processes
End-to-end efficient data
form automated pipelines
with feedback loops
Improved productivity and
collaboration from
empowered developers
Better business outcomes
from happier customers
Secure and compliant data
from automated, data
quality checks, masking,
tokenization and more.
Reduced mean time to
resolution (MTTR) from shift-
left quality approach
Increased data reliability
and resiliency
Developer empowerment with the
DevOps culture that promote
collaboration and ownership &
accountability
12
DataOps Principles
Analytics is code.
Differences can be spotted easily and
are all committed to the code repo.
Orchestrate.
When everything is automated, we
never have to choose between delivery
new features and performing manual
maintenance.
Make it reproducible.
The code runs the same way every time.
There is no state to manage and there are no
“two ways” to run it which might produce
different results.
Disposable environments.
There’s no such things as data loss. At any
time, the production environment can be
recycled, and a new environment can be spun
up automatically.
DataOps Maturity Model
CI/CD for Data
Products
Taken from Stefana Muller in Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration
What do we mean when we say “CI/CD”?
CI/CD Definitions
Continuous Integration (CI)
is a software engineering practice in which
developers integrate code into a shared
repository several times a day in order to
obtain rapid feedback of the feasibility of that
code. CI enables automated build and
testing so that teams can rapidly work on a
single project together.
Continuous Deployment (also
CD)
is the process by which qualified changes in
software code or architecture are deployed
to production as soon as they are ready and
without human intervention.
Continuous Delivery (CD)
is a software engineering practice in which
teams develop, build, test, and release
software in short cycles. It depends on
automation at every stage so that cycles can
be both quick and reliable.
Developing with
CI/CD commit
commit
commit
commit
commit
main
branch
dev
branch
Pull
Request
✔
✔
✔
❌
Rebuild a
“Beta” Copy
of DW
Auto-Publish
to Production
DW
❌
Refreshed daily/hourly
1. Continuous Integration (CI) Testing:
Automatic or with every commit!
2. Continuous Delivery (CD):
New changes automatically delivered in beta!
3. Continuous Deployment (also CD):
New features and fixes delivered
to customers automatically!
✔ ❌
 1) Store all your files in source control.
 2) Create a full deployment script.
 3) Create a text file pointing to your
deployment script.
CI/CDGettingStartedChecklist
Orchestration, Testing
and Monitoring
18
DataOps Compared to DevOps
Develop Build Test Deploy Run
CI CD
Sandbox Develop Orchestrate Test Deploy
Orchestrate
Monitor
CI
CD
©4/13/23
Slalom. All Rights Reserved. Proprietary and Confidential. 19
Modern Cloud Data Reference Architecture
Data Pipeline Orchestration and Monitoring
Security: Authorization & Authentication
Continuous Integration, Continuous Deployment (CI/CD)
End-User
Manufacturer
Management Team
Internal Analytics
Teams
External Users
Data Source Layer
External
Unstructured Data
Loyalty
E-Commerce
POS Technology
Patient Support Program
Wholesale Distribution
Vistex JDA MBA Anzio
SoloChain MSA
Maple CMSV2
PharmaClick
POS
Reflex POS
Tulip MagicBox
Guardian
Rewards
Uniprix
Rewards
Proxim
Rewards
Newsletter LMS NPS / Survey
IQVIA Nielsen Health Canada
Program
Participation
First Data Bank
IQ DataSmart UniBi
Website /
Facebook
Email
(Dialogue)
Mobile Apps
UniSante
ProxiSante
PTS (db)
Proxim POS Cyberlog ICN
General Pharmacy
Operations Team
Data Lake
Raw Zone
Processed Zone
Curated Zone
Data
Ingestion
Batch Ingestion
• Cloud based ETL
• Event driven f(x)
• Rest APIs
Streaming Ingestion
• Real-time ingestion
• IoT Devices
Machine Learning
(Predictions & Recommendations)
Feature
Generation
Model
Development
Model
Deployment
Model
Monitoring
Central Data Storage
Data Warehouse
Transformation
&
Business
Rules
Data Governance and Access
Data Access Layer Governance Layer Management Layer
Centralized Policies
Data Quality Monitoring
Data Lineage & Metadata
Data Catalog
Consistent Controls
Security Policy Enforcement
Data
Tokenization
&
Masking
Patient Data Hub
Facts
Dimensions
Aggregates
Views
Merge & Match
Deduplication
Enrichment
Specialty Pharmacy
Operations Team
Consumption Layer
Operational Reports
• Warehouse & Specialty
• Store Sales & Growth
• Kiosk Reports
External Data Portal
• Neilsen Data
• External Kiosk
• SharePoint
Sandbox Environment
• Ad-hoc data analysis
• Raw data analysis
• Merging / curating data
sets
Analytical Dashboard
• Manufacturer Insights
• Patient Insights
• Pharmacy Insights
API Apps
• LifeLabs Apps
• Loyalty Program Apps
• Etc.
VPN
Patient / Customer
Data Governance
SMEs
SIR
DLD
RX Technology
Kroll
Reflex RX
Fillware
Compliance
Cube
AssysteRx
PharmaClick RX
Applied
Robotics
Ubik
Data Warehouses
GCP E-
commerce
RelayHealth
Hub
SAP
BeWell
Diem
Taken from Stefana Muller in Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration
Orchestrated,Test and Monitor
Orchestrate
• Both Infrastructure as code and data
pipeline code with single pipeline
• Composer (GCP), Airflow, Azure Data
Factory (Azure), DBT, DataOps.live,
Informatica, Mattilion, Stitch, AWS Data
Pipeline
Monitor
• Cloud Resources
• GCP Monitoring, CloudWatch,
Azure Monitor, Datadog
• Data pipelines
• Respective tools, native cloud
monitoring dashboards
• Data Quality
• ETL tools, manual tools on top of
data platforms
Test
• At the end of the pipeline run
• DBT, DataOps.live, Google Dataform,
Boomi, Informatica, Matillion, Great
Expectations, TSQLT
21
From ETL
to ELTP
Extract
Load
Transform
Publish
Extract
Transform
Load
Extract
Load
Transform
Publish
Benefits of ELT over ETL:
• non-destructive updates
• improved stability and recoverability
“Publish” step signals that data is available
and ready for downstream subscribers, may
involve shipping a copy of the data into the
data lake, replicating to multiple redshift
clusters, populating BI models, or similar
actions.
22
At the core of DataOps is your organization’s information
architecture
• How well you know your data?
• Do you trust your data?
• Are you able to quickly detect errors?
• Can you make changes incrementally without
“breaking” your entire data pipeline?
Critical areas below can transform your data
pipeline:
• Data Curation services
• Metadata Management
• Data Governance
• Master Data Management
• Self-Service interaction
Thank You.
Questions?

More Related Content

What's hot

Introduction to the Well-Architected Framework and Tool - SVC212 - Chicago AW...
Introduction to the Well-Architected Framework and Tool - SVC212 - Chicago AW...Introduction to the Well-Architected Framework and Tool - SVC212 - Chicago AW...
Introduction to the Well-Architected Framework and Tool - SVC212 - Chicago AW...
Amazon Web Services
 

What's hot (20)

How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
 
Best Practices in Planning a Large-Scale Migration to AWS - AWS Online Tech T...
Best Practices in Planning a Large-Scale Migration to AWS - AWS Online Tech T...Best Practices in Planning a Large-Scale Migration to AWS - AWS Online Tech T...
Best Practices in Planning a Large-Scale Migration to AWS - AWS Online Tech T...
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Splunk Overview
Splunk OverviewSplunk Overview
Splunk Overview
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Snowflake Architecture
Snowflake ArchitectureSnowflake Architecture
Snowflake Architecture
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
 
Cell-based Architecture : An Emerging Architecture Pattern for Agile Integration
Cell-based Architecture : An Emerging Architecture Pattern for Agile IntegrationCell-based Architecture : An Emerging Architecture Pattern for Agile Integration
Cell-based Architecture : An Emerging Architecture Pattern for Agile Integration
 
AWS as a Data Platform
AWS as a Data PlatformAWS as a Data Platform
AWS as a Data Platform
 
Creating an Operating Model to enable a high frequency organization
Creating an Operating Model to enable a high frequency organizationCreating an Operating Model to enable a high frequency organization
Creating an Operating Model to enable a high frequency organization
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
 
Introduction to the Well-Architected Framework and Tool - SVC212 - Chicago AW...
Introduction to the Well-Architected Framework and Tool - SVC212 - Chicago AW...Introduction to the Well-Architected Framework and Tool - SVC212 - Chicago AW...
Introduction to the Well-Architected Framework and Tool - SVC212 - Chicago AW...
 
Einstein Analytics for Developers
Einstein Analytics for DevelopersEinstein Analytics for Developers
Einstein Analytics for Developers
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 

Similar to DataOps , cbuswaw April '23

How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 

Similar to DataOps , cbuswaw April '23 (20)

What is DevOps?
What is DevOps?What is DevOps?
What is DevOps?
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
 
DevOps 101 - IBM Impact 2014
DevOps 101 - IBM Impact 2014 DevOps 101 - IBM Impact 2014
DevOps 101 - IBM Impact 2014
 
Quality 4.0 and reimagining quality
Quality 4.0 and reimagining qualityQuality 4.0 and reimagining quality
Quality 4.0 and reimagining quality
 
Digital Disruption with DevOps - Reference Architecture Overview
Digital Disruption with DevOps - Reference Architecture OverviewDigital Disruption with DevOps - Reference Architecture Overview
Digital Disruption with DevOps - Reference Architecture Overview
 
IBM Collaborative Lifecycle Management Solution for DevOps v6
IBM Collaborative Lifecycle Management Solution for DevOps v6IBM Collaborative Lifecycle Management Solution for DevOps v6
IBM Collaborative Lifecycle Management Solution for DevOps v6
 
SplunkLive! London 2016 Splunk for Devops
SplunkLive! London 2016 Splunk for DevopsSplunkLive! London 2016 Splunk for Devops
SplunkLive! London 2016 Splunk for Devops
 
How SQL Change Automation helps you deliver value faster
How SQL Change Automation helps you deliver value fasterHow SQL Change Automation helps you deliver value faster
How SQL Change Automation helps you deliver value faster
 
Pivotal korea transformation_strategy_seminar_enterprise_dev_ops_20160630_v1.0
Pivotal korea transformation_strategy_seminar_enterprise_dev_ops_20160630_v1.0Pivotal korea transformation_strategy_seminar_enterprise_dev_ops_20160630_v1.0
Pivotal korea transformation_strategy_seminar_enterprise_dev_ops_20160630_v1.0
 
AWS Partner: Grindr: Aggregate, Analyze, and Act on 900M Daily API Calls
AWS Partner: Grindr: Aggregate, Analyze, and Act on 900M Daily API CallsAWS Partner: Grindr: Aggregate, Analyze, and Act on 900M Daily API Calls
AWS Partner: Grindr: Aggregate, Analyze, and Act on 900M Daily API Calls
 
Back To Basics
Back To BasicsBack To Basics
Back To Basics
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data ops
 
Using Lean Thinking to Identify and Address Delivery Pipeline Bottlenecks
Using Lean Thinking to Identify and Address Delivery Pipeline BottlenecksUsing Lean Thinking to Identify and Address Delivery Pipeline Bottlenecks
Using Lean Thinking to Identify and Address Delivery Pipeline Bottlenecks
 
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
 
Continuous Integration and Continuous Delivery on Azure
Continuous Integration and Continuous Delivery on AzureContinuous Integration and Continuous Delivery on Azure
Continuous Integration and Continuous Delivery on Azure
 
A DevOps adoption playbook- achieving business value at scale
A DevOps adoption playbook- achieving business value at scaleA DevOps adoption playbook- achieving business value at scale
A DevOps adoption playbook- achieving business value at scale
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 

More from Jason Packer

More from Jason Packer (17)

Third Party Cookies: Columbus DAW March 2024
Third Party Cookies: Columbus DAW March 2024Third Party Cookies: Columbus DAW March 2024
Third Party Cookies: Columbus DAW March 2024
 
Cbuswaw October '23, Marketing Mix Modeling
Cbuswaw October '23, Marketing Mix ModelingCbuswaw October '23, Marketing Mix Modeling
Cbuswaw October '23, Marketing Mix Modeling
 
Generative AI and SEO
Generative AI and SEOGenerative AI and SEO
Generative AI and SEO
 
Google Analytics Alternatives
Google Analytics AlternativesGoogle Analytics Alternatives
Google Analytics Alternatives
 
Google Analytics Alternatives
Google Analytics AlternativesGoogle Analytics Alternatives
Google Analytics Alternatives
 
Web Analytics Wednesday April 2020 - Customer Journey Mapping
Web Analytics Wednesday April 2020 - Customer Journey MappingWeb Analytics Wednesday April 2020 - Customer Journey Mapping
Web Analytics Wednesday April 2020 - Customer Journey Mapping
 
Introduction to Factor Analysis
Introduction to Factor AnalysisIntroduction to Factor Analysis
Introduction to Factor Analysis
 
Product Analytics at Web Analytics Wednesday
Product Analytics at Web Analytics WednesdayProduct Analytics at Web Analytics Wednesday
Product Analytics at Web Analytics Wednesday
 
Columbus Web Analytics Wednesday September 2019
Columbus Web Analytics Wednesday September 2019Columbus Web Analytics Wednesday September 2019
Columbus Web Analytics Wednesday September 2019
 
How to Present Test Results to Inspire Action
How to Present Test Results to Inspire ActionHow to Present Test Results to Inspire Action
How to Present Test Results to Inspire Action
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
CBUSWAW - October 2017 Alain Stephan
CBUSWAW - October 2017 Alain StephanCBUSWAW - October 2017 Alain Stephan
CBUSWAW - October 2017 Alain Stephan
 
Attribution 101
Attribution 101Attribution 101
Attribution 101
 
CBUSWAW presentation July 2016
CBUSWAW presentation July 2016CBUSWAW presentation July 2016
CBUSWAW presentation July 2016
 
CBUSWAW presentation May 2016
CBUSWAW presentation May 2016CBUSWAW presentation May 2016
CBUSWAW presentation May 2016
 
Digging into Data Collection
Digging into Data CollectionDigging into Data Collection
Digging into Data Collection
 
Columbus WordCamp 2015
Columbus WordCamp 2015Columbus WordCamp 2015
Columbus WordCamp 2015
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 

DataOps , cbuswaw April '23

  • 1. APRIL, 2023 DataOps The Future of Data Management - Embracing Agility, Collaboration, and Automation
  • 2. Agenda 2 Introductions DevOps to DataOps CI/CD for Data Products Orchestration, Testing and Monitoring Questions
  • 3. Jeewan Singh Senior Principal, Data Analytics Tomy Rhymond Principal- Cloud Lead Technology Enablement 3 About Us.
  • 4. So…. what is DevOps, really??? DevOps is a cultural movement to: • Improve Collaboration • Automate operations (aka the “plumbing”) • Increase the rate of deployment • Improve quality and security What  Source Control  CI/CD  Infrastructure Automation (IAC)  Automated Test and Validation  Design for Scalability  Use the Cloud How Why Spend more time on valuable work … and have more fun!
  • 5. Continuous Deployment Of Databases : Part 1 Data and Analytics professional face unique challenges for automation State Rolling back Other Testing Down Time Application code is stateless Database contains valuable business data Change structure and data without loss Hand crafting release scripts is error-prone Application servers are easy to swap in/out Database servers are very difficult to swap in/out (even in cluster) Can sometimes swap databases or tables in/out Applications easy to roll back from source control Databases must be explicitly backed up and restored Very time-consuming Database unavailable during restore Application code is easy to test with unit tests Unit testing for databases is challenging Unit testing requires test data generation and management which gets complicated quickly Configuration changes deployed via CI/CD Most often only DBAs touch the database (control) Prod databases don’t match source control (drift) Database change management is difficult
  • 6. 6 These Roadblocks add friction, prevent automation, and slow adoption of DataOps best practices Fragile Column Mappings Embedded Credentials Hard-coded connections Black-Box SaaS GUI-Only Tools
  • 7. 5 Critical Mindset Changes  Business Requirements are Static “Our job is to meet the agreed business requirements.”  Single-Developer, Individual Ownership “Someone will email me if it breaks.”  UAT Testing Approach “We will run some tests before we launch.”  Everything Manual “No time to build the automation yet.”  Demos at End of Project “Creating demos take time.” Traditional Mindset DevOps Mindset  Business Requirements are Fluid “We aren’t doing right if we assume requirements are static.”  Multiple Developers, Team Ownership “Someone else may have to fix this if it breaks.”  Continuous Testing Approach “We wrote the tests before we started developing.”  Mostly Automated “No time to waste on manual stuff.”  Demos Daily or Weekly “Continual feedback is critical to success.”
  • 8. 8 DataOps is a collaborative and automated approach to managing the entire lifecycle of data, from its creation to its deletion, in a way that ensures that data is trustworthy, accurate, and readily available to the right people at the right time. PEOPLE PROCESS TECHNOL OGY
  • 10. 10 DataOps is an approach to data analytics and data-driven decision making that follows the agile methodology of continuous improvement. Source Data Data Ingestions Data Engineering Data Analytics Business Users DataOps CI/CD Orchestration Testing Monitoring
  • 11. 11 DataOps practices are an investment whose dividends increase with time and experience Increased speed of delivery from improved processes End-to-end efficient data form automated pipelines with feedback loops Improved productivity and collaboration from empowered developers Better business outcomes from happier customers Secure and compliant data from automated, data quality checks, masking, tokenization and more. Reduced mean time to resolution (MTTR) from shift- left quality approach Increased data reliability and resiliency Developer empowerment with the DevOps culture that promote collaboration and ownership & accountability
  • 12. 12 DataOps Principles Analytics is code. Differences can be spotted easily and are all committed to the code repo. Orchestrate. When everything is automated, we never have to choose between delivery new features and performing manual maintenance. Make it reproducible. The code runs the same way every time. There is no state to manage and there are no “two ways” to run it which might produce different results. Disposable environments. There’s no such things as data loss. At any time, the production environment can be recycled, and a new environment can be spun up automatically.
  • 15. Taken from Stefana Muller in Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration What do we mean when we say “CI/CD”? CI/CD Definitions Continuous Integration (CI) is a software engineering practice in which developers integrate code into a shared repository several times a day in order to obtain rapid feedback of the feasibility of that code. CI enables automated build and testing so that teams can rapidly work on a single project together. Continuous Deployment (also CD) is the process by which qualified changes in software code or architecture are deployed to production as soon as they are ready and without human intervention. Continuous Delivery (CD) is a software engineering practice in which teams develop, build, test, and release software in short cycles. It depends on automation at every stage so that cycles can be both quick and reliable.
  • 16. Developing with CI/CD commit commit commit commit commit main branch dev branch Pull Request ✔ ✔ ✔ ❌ Rebuild a “Beta” Copy of DW Auto-Publish to Production DW ❌ Refreshed daily/hourly 1. Continuous Integration (CI) Testing: Automatic or with every commit! 2. Continuous Delivery (CD): New changes automatically delivered in beta! 3. Continuous Deployment (also CD): New features and fixes delivered to customers automatically! ✔ ❌  1) Store all your files in source control.  2) Create a full deployment script.  3) Create a text file pointing to your deployment script. CI/CDGettingStartedChecklist
  • 18. 18 DataOps Compared to DevOps Develop Build Test Deploy Run CI CD Sandbox Develop Orchestrate Test Deploy Orchestrate Monitor CI CD
  • 19. ©4/13/23 Slalom. All Rights Reserved. Proprietary and Confidential. 19 Modern Cloud Data Reference Architecture Data Pipeline Orchestration and Monitoring Security: Authorization & Authentication Continuous Integration, Continuous Deployment (CI/CD) End-User Manufacturer Management Team Internal Analytics Teams External Users Data Source Layer External Unstructured Data Loyalty E-Commerce POS Technology Patient Support Program Wholesale Distribution Vistex JDA MBA Anzio SoloChain MSA Maple CMSV2 PharmaClick POS Reflex POS Tulip MagicBox Guardian Rewards Uniprix Rewards Proxim Rewards Newsletter LMS NPS / Survey IQVIA Nielsen Health Canada Program Participation First Data Bank IQ DataSmart UniBi Website / Facebook Email (Dialogue) Mobile Apps UniSante ProxiSante PTS (db) Proxim POS Cyberlog ICN General Pharmacy Operations Team Data Lake Raw Zone Processed Zone Curated Zone Data Ingestion Batch Ingestion • Cloud based ETL • Event driven f(x) • Rest APIs Streaming Ingestion • Real-time ingestion • IoT Devices Machine Learning (Predictions & Recommendations) Feature Generation Model Development Model Deployment Model Monitoring Central Data Storage Data Warehouse Transformation & Business Rules Data Governance and Access Data Access Layer Governance Layer Management Layer Centralized Policies Data Quality Monitoring Data Lineage & Metadata Data Catalog Consistent Controls Security Policy Enforcement Data Tokenization & Masking Patient Data Hub Facts Dimensions Aggregates Views Merge & Match Deduplication Enrichment Specialty Pharmacy Operations Team Consumption Layer Operational Reports • Warehouse & Specialty • Store Sales & Growth • Kiosk Reports External Data Portal • Neilsen Data • External Kiosk • SharePoint Sandbox Environment • Ad-hoc data analysis • Raw data analysis • Merging / curating data sets Analytical Dashboard • Manufacturer Insights • Patient Insights • Pharmacy Insights API Apps • LifeLabs Apps • Loyalty Program Apps • Etc. VPN Patient / Customer Data Governance SMEs SIR DLD RX Technology Kroll Reflex RX Fillware Compliance Cube AssysteRx PharmaClick RX Applied Robotics Ubik Data Warehouses GCP E- commerce RelayHealth Hub SAP BeWell Diem
  • 20. Taken from Stefana Muller in Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration Orchestrated,Test and Monitor Orchestrate • Both Infrastructure as code and data pipeline code with single pipeline • Composer (GCP), Airflow, Azure Data Factory (Azure), DBT, DataOps.live, Informatica, Mattilion, Stitch, AWS Data Pipeline Monitor • Cloud Resources • GCP Monitoring, CloudWatch, Azure Monitor, Datadog • Data pipelines • Respective tools, native cloud monitoring dashboards • Data Quality • ETL tools, manual tools on top of data platforms Test • At the end of the pipeline run • DBT, DataOps.live, Google Dataform, Boomi, Informatica, Matillion, Great Expectations, TSQLT
  • 21. 21 From ETL to ELTP Extract Load Transform Publish Extract Transform Load Extract Load Transform Publish Benefits of ELT over ETL: • non-destructive updates • improved stability and recoverability “Publish” step signals that data is available and ready for downstream subscribers, may involve shipping a copy of the data into the data lake, replicating to multiple redshift clusters, populating BI models, or similar actions.
  • 22. 22 At the core of DataOps is your organization’s information architecture • How well you know your data? • Do you trust your data? • Are you able to quickly detect errors? • Can you make changes incrementally without “breaking” your entire data pipeline? Critical areas below can transform your data pipeline: • Data Curation services • Metadata Management • Data Governance • Master Data Management • Self-Service interaction