4. So…. what is DevOps, really???
DevOps is a cultural movement to:
• Improve Collaboration
• Automate operations (aka the “plumbing”)
• Increase the rate of deployment
• Improve quality and security
What
Source Control
CI/CD
Infrastructure Automation (IAC)
Automated Test and Validation
Design for Scalability
Use the Cloud
How
Why
Spend more time on valuable work
… and have more fun!
5. Continuous Deployment Of Databases : Part 1
Data and Analytics professional face unique challenges for
automation
State Rolling back Other
Testing
Down Time
Application code is
stateless
Database contains
valuable business data
Change structure and
data without loss
Hand crafting release
scripts is error-prone
Application servers are
easy to swap in/out
Database servers are
very difficult to swap
in/out (even in cluster)
Can sometimes swap
databases or tables
in/out
Applications easy to roll
back from source
control
Databases must be
explicitly backed up and
restored
Very time-consuming
Database unavailable
during restore
Application code is easy
to test with unit tests
Unit testing for
databases is challenging
Unit testing requires test
data generation and
management which gets
complicated quickly
Configuration changes
deployed via CI/CD
Most often only DBAs
touch the database
(control)
Prod databases don’t
match source control
(drift)
Database change
management is difficult
6. 6
These Roadblocks add friction, prevent automation, and
slow adoption of DataOps best practices
Fragile Column Mappings
Embedded Credentials
Hard-coded connections
Black-Box SaaS
GUI-Only Tools
7. 5 Critical Mindset Changes
Business Requirements are Static
“Our job is to meet the agreed business requirements.”
Single-Developer, Individual Ownership
“Someone will email me if it breaks.”
UAT Testing Approach
“We will run some tests before we launch.”
Everything Manual
“No time to build the automation yet.”
Demos at End of Project
“Creating demos take time.”
Traditional Mindset DevOps Mindset
Business Requirements are Fluid
“We aren’t doing right if we assume requirements are static.”
Multiple Developers, Team Ownership
“Someone else may have to fix this if it breaks.”
Continuous Testing Approach
“We wrote the tests before we started developing.”
Mostly Automated
“No time to waste on manual stuff.”
Demos Daily or Weekly
“Continual feedback is critical to success.”
8. 8
DataOps is a collaborative and automated approach to
managing the entire lifecycle of data, from its creation to
its deletion, in a way that ensures that data is trustworthy,
accurate, and readily available to the right people at the
right time.
PEOPLE PROCESS
TECHNOL
OGY
10. 10
DataOps is an approach to data analytics and data-driven decision
making that follows the agile methodology of continuous
improvement.
Source
Data
Data
Ingestions
Data
Engineering
Data
Analytics
Business
Users
DataOps
CI/CD Orchestration Testing Monitoring
11. 11
DataOps practices are an investment whose dividends
increase with time and experience
Increased speed of delivery
from improved processes
End-to-end efficient data
form automated pipelines
with feedback loops
Improved productivity and
collaboration from
empowered developers
Better business outcomes
from happier customers
Secure and compliant data
from automated, data
quality checks, masking,
tokenization and more.
Reduced mean time to
resolution (MTTR) from shift-
left quality approach
Increased data reliability
and resiliency
Developer empowerment with the
DevOps culture that promote
collaboration and ownership &
accountability
12. 12
DataOps Principles
Analytics is code.
Differences can be spotted easily and
are all committed to the code repo.
Orchestrate.
When everything is automated, we
never have to choose between delivery
new features and performing manual
maintenance.
Make it reproducible.
The code runs the same way every time.
There is no state to manage and there are no
“two ways” to run it which might produce
different results.
Disposable environments.
There’s no such things as data loss. At any
time, the production environment can be
recycled, and a new environment can be spun
up automatically.
15. Taken from Stefana Muller in Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration
What do we mean when we say “CI/CD”?
CI/CD Definitions
Continuous Integration (CI)
is a software engineering practice in which
developers integrate code into a shared
repository several times a day in order to
obtain rapid feedback of the feasibility of that
code. CI enables automated build and
testing so that teams can rapidly work on a
single project together.
Continuous Deployment (also
CD)
is the process by which qualified changes in
software code or architecture are deployed
to production as soon as they are ready and
without human intervention.
Continuous Delivery (CD)
is a software engineering practice in which
teams develop, build, test, and release
software in short cycles. It depends on
automation at every stage so that cycles can
be both quick and reliable.
16. Developing with
CI/CD commit
commit
commit
commit
commit
main
branch
dev
branch
Pull
Request
✔
✔
✔
❌
Rebuild a
“Beta” Copy
of DW
Auto-Publish
to Production
DW
❌
Refreshed daily/hourly
1. Continuous Integration (CI) Testing:
Automatic or with every commit!
2. Continuous Delivery (CD):
New changes automatically delivered in beta!
3. Continuous Deployment (also CD):
New features and fixes delivered
to customers automatically!
✔ ❌
1) Store all your files in source control.
2) Create a full deployment script.
3) Create a text file pointing to your
deployment script.
CI/CDGettingStartedChecklist
20. Taken from Stefana Muller in Dev Leaders Compare Continuous Delivery vs. Continuous Deployment vs. Continuous Integration
Orchestrated,Test and Monitor
Orchestrate
• Both Infrastructure as code and data
pipeline code with single pipeline
• Composer (GCP), Airflow, Azure Data
Factory (Azure), DBT, DataOps.live,
Informatica, Mattilion, Stitch, AWS Data
Pipeline
Monitor
• Cloud Resources
• GCP Monitoring, CloudWatch,
Azure Monitor, Datadog
• Data pipelines
• Respective tools, native cloud
monitoring dashboards
• Data Quality
• ETL tools, manual tools on top of
data platforms
Test
• At the end of the pipeline run
• DBT, DataOps.live, Google Dataform,
Boomi, Informatica, Matillion, Great
Expectations, TSQLT
22. 22
At the core of DataOps is your organization’s information
architecture
• How well you know your data?
• Do you trust your data?
• Are you able to quickly detect errors?
• Can you make changes incrementally without
“breaking” your entire data pipeline?
Critical areas below can transform your data
pipeline:
• Data Curation services
• Metadata Management
• Data Governance
• Master Data Management
• Self-Service interaction