Starting Your Modern DataOps Journey

Starting Your Modern
DataOps Journey
Lucas Stone
Solutions Engineer
01.12.2020

The DataOps Story
Pre
2000
2007
Waterfall
Linear approach
Good for projects where end state is well-defined
(e.g. physical infrastructure)
Not as good where product is continually
changing / developing (e.g. software)
Agile
Began with “Agile Manifesto”
Designed for production of software products
Response to how rapid business
requirements could change
Emphasised iteration
DevOps
Even given the rise of Agile, dev and
ops teams remained siloed
DevOps aimed to bring them together
Set of practices to release high quality
code and faster
DataOps
Bring the DevOps approach to data function
Align data science / management and
operations teams
Ensures the business can fully leverage
data and convert into actionable insight
2000 2014

Where is DataOps in the “Hype Cycle”?

DataOps Search Trends: Past 5 Years

DataOps
DATA PRODUCTION VALUE
DEVELOPMENT
Quality
Data + Operations
Ensure that data value is delivered to business as soon as possible
Intersection of the Value and Innovation Pipelines
Value
Pipeline
Innovation Pipeline
IDEA
Quality

DataOps Misconceptions
A technology
Although there are a set of technologies that
are commonly used to support its
implementation within an organisation
Restricted
Either to 1) “big data” – the scale of data and
complexity doesn’t preclude benefits 2) only
advanced data science applications (e.g. ML)
More than just
DevOps for Data!
It brings together 3 distinct elements:
Agile development, DevOps, and
Statistical Process Control (Lean)
A methodology
It brings together a number of principles
and practices around the way an
organization manages and processes data
DataOps is… DataOps is not…

DataOps – What are the benefits?
Faster
Deployment & Feedback
Consumers of data analytics will get what
they need faster and be able to feedback
more often, creating virtuous circle with
business requirements as the starting point
Happier Colleagues
Introduction of DataOps will mean those involved in
the process are more able to quickly see the positive
impact of their work leading to more engaged and
productive teams
Higher Data Quality
Increased automation (particularly of testing) and
standardised processes will lead to high Data
Quality, which will in turn lead to better insights
generated from Machine Learning models
Collaboration
DataOps promotes collaboration,
communication, and coordination between
teams that may otherwise remained siloed

Which companies use DataOps?
Please see Qubole’s Creating a Data-Driven Enterprise with DataOps ebook for further
information on how each of these organisations implements DataOps

Case Study: Facebook & Apache Hive
Stage 2
Created a Hadoop data
lake and developed Hive
to make it more
accessible – data team
evolved from a service
team to building self-
service platforms for data
extraction
Stage 4
Developed Uis that were
easy for business users to
understand and use to
independently extract data
- data becomes fully
democratised
Stage 1
No structure around data
requests, rather the
business would request on
an ad-hoc basis – the data
function would act as a
service
Stage 3
Combined metadata
services with Hive
allowing users to look at
data and metadata –
however, data still not
accessible to non-tech
users

What is required to begin implementing DataOps?
Processes
It is then important to establish clear processes
including who is “RACI”. Those responsible should
receive appropriate training. Measuring process
effectiveness with appropriate KPIs is also crucial
People & Culture
The foundation for introducing DataOps lies firstly
with buy-in from the key stakeholder groups
particularly the business so that business
requirements are understood
Technologies
Once the correct culture and processes have been
established, an organisation can introduce tooling to
support related activities, notably automation, testing,
and orchestration
2
1
3

People: Stakeholders Groups
Data Consumers
Those who will use data to perform
analysis and extract insights to then
deliver to those in the business who
can use these to drive value
Data Suppliers
Those managing the integrity of
Authoritative Data Sources to ensure
data quality and availability
Data Preparers
Those who build data pipelines
linking one source to another as well
as managing its transformation into a
usable format for Data Consumers
Business
Other parts of the organisation would
not use DataOps – rather they rely
on and benefit from better outputs in
terms of insights / BI / analytics and
convey Business Requirements
Data Ops builds two crucial bridges, firstly between the business and technology
functions, secondly within the data function itself

People: Ingraining a DataOps Culture
Push from the Top
Cultural changes must be endorsed by
senior management both within and
outside of the data function before being
pushed down to individual teams
Embrace the Process
Acknowledge that change won’t happen
over night and that improvements will be
incremental – allow a realistic timeframe for
the process of implementing data ops
Remove silos
Breaking down organisational barriers between
Data Suppliers, Preparers, Consumers, and the
Business will be crucial to the smooth flow of
data to those making decisions
Emphasise Data
Data should be front and centre of
strategic decision making for DataOps to
realise its full potential – this should be
embedded as a company value
Invest in Tools
Carefully selecting a complementary set of
technologies underpinning the implementation of
DataOps is essential as will be providing the
relevant training to upskill your teams
1
2
3
4
5

Processes: Building a “Data Supply Chain”
Data
Suppliers
Data
Preparers
Data
Consumers
The
Business
Source Owner, DBA, Infrastructure and Ops Personnel,
Application Admins + Developers
Data Engineers, Data Architects, Data Stewards,
Integration Architects + Developers, Data Modelers
Machine Learning Model Developers, Data Scientists
HR, Finance, Strategy, Operations etc.
Data Product Managers
Business Analysts
Data Security Teams
Data Privacy Officers

Technology: Agile, Collaboration,
Automation, Infrastructure as Code
Agile
Small but frequent deliveries
of new features
Constant feedback loop
between the business and tech
Version Control to decrease risk
and increase productivity
Job / issue tracking to ensure
even minor feedback captured
Collaboration
Automation
Continuous Integration,
Deployment, Delivery
Automate testing and speed up
getting code into production
Infrastructure as Code
Manage IT infrastructure
using code
Make changes to
existing infrastructure
much more easily

CloverDX & DataOps
Increased deployment frequency
Package, share, and reuse any
functionality you design
Automated testing
Incorporate data quality tests and build in error
handling to your data pipelines
Consistent metadata and version control
CloverDX is easy to integrate with most VC tools,
metadata can easily be tracked and visualized
Monitoring
The CloverDX server has a monitoring suite that
can be applied to individual jobs or whole
business processes
Collaboration across all stakeholders
CloverDX’s visual design allows technical and
non-technical users to “speak the same language”
Gartner identified 5 “key techniques” that will support with the delivery of DataOps –
Clover can support “Data Preparers” with each

Upcoming Webinar
Code Management with
Version Control in CloverDX
December 8th
11am EST / 4pm GMT / 5pm CET
Register
Q&A

Starting Your Modern DataOps Journey

More Related Content

What's hot

Similar to Starting Your Modern DataOps Journey

More from CloverDX

Recently uploaded

Starting Your Modern DataOps Journey