Starting Your Modern
DataOps Journey
Lucas Stone
Solutions Engineer
01.12.2020
The DataOps Story
Pre
2000
2007
Waterfall
Linear approach
Good for projects where end state is well-defined
(e.g. physical infrastructure)
Not as good where product is continually
changing / developing (e.g. software)
Agile
Began with “Agile Manifesto”
Designed for production of software products
Response to how rapid business
requirements could change
Emphasised iteration
DevOps
Even given the rise of Agile, dev and
ops teams remained siloed
DevOps aimed to bring them together
Set of practices to release high quality
code and faster
DataOps
Bring the DevOps approach to data function
Align data science / management and
operations teams
Ensures the business can fully leverage
data and convert into actionable insight
2000 2014
Where is DataOps in the “Hype Cycle”?
DataOps Search Trends: Past 5 Years
DataOps
DATA PRODUCTION VALUE
DEVELOPMENT
Quality
Data + Operations
Ensure that data value is delivered to business as soon as possible
Intersection of the Value and Innovation Pipelines
Value
Pipeline
Innovation Pipeline
IDEA
Quality
DataOps Misconceptions
A technology
Although there are a set of technologies that
are commonly used to support its
implementation within an organisation
Restricted
Either to 1) “big data” – the scale of data and
complexity doesn’t preclude benefits 2) only
advanced data science applications (e.g. ML)
More than just
DevOps for Data!
It brings together 3 distinct elements:
Agile development, DevOps, and
Statistical Process Control (Lean)
A methodology
It brings together a number of principles
and practices around the way an
organization manages and processes data
DataOps is… DataOps is not…
DataOps – What are the benefits?
Faster
Deployment & Feedback
Consumers of data analytics will get what
they need faster and be able to feedback
more often, creating virtuous circle with
business requirements as the starting point
Happier Colleagues
Introduction of DataOps will mean those involved in
the process are more able to quickly see the positive
impact of their work leading to more engaged and
productive teams
Higher Data Quality
Increased automation (particularly of testing) and
standardised processes will lead to high Data
Quality, which will in turn lead to better insights
generated from Machine Learning models
Collaboration
DataOps promotes collaboration,
communication, and coordination between
teams that may otherwise remained siloed
Which companies use DataOps?
Please see Qubole’s Creating a Data-Driven Enterprise with DataOps ebook for further
information on how each of these organisations implements DataOps
Case Study: Facebook & Apache Hive
Stage 2
Created a Hadoop data
lake and developed Hive
to make it more
accessible – data team
evolved from a service
team to building self-
service platforms for data
extraction
Stage 4
Developed Uis that were
easy for business users to
understand and use to
independently extract data
- data becomes fully
democratised
Stage 1
No structure around data
requests, rather the
business would request on
an ad-hoc basis – the data
function would act as a
service
Stage 3
Combined metadata
services with Hive
allowing users to look at
data and metadata –
however, data still not
accessible to non-tech
users
What is required to begin implementing DataOps?
Processes
It is then important to establish clear processes
including who is “RACI”. Those responsible should
receive appropriate training. Measuring process
effectiveness with appropriate KPIs is also crucial
People & Culture
The foundation for introducing DataOps lies firstly
with buy-in from the key stakeholder groups
particularly the business so that business
requirements are understood
Technologies
Once the correct culture and processes have been
established, an organisation can introduce tooling to
support related activities, notably automation, testing,
and orchestration
2
1
3
People: Stakeholders Groups
Data Consumers
Those who will use data to perform
analysis and extract insights to then
deliver to those in the business who
can use these to drive value
Data Suppliers
Those managing the integrity of
Authoritative Data Sources to ensure
data quality and availability
Data Preparers
Those who build data pipelines
linking one source to another as well
as managing its transformation into a
usable format for Data Consumers
Business
Other parts of the organisation would
not use DataOps – rather they rely
on and benefit from better outputs in
terms of insights / BI / analytics and
convey Business Requirements
Data Ops builds two crucial bridges, firstly between the business and technology
functions, secondly within the data function itself
People: Ingraining a DataOps Culture
Push from the Top
Cultural changes must be endorsed by
senior management both within and
outside of the data function before being
pushed down to individual teams
Embrace the Process
Acknowledge that change won’t happen
over night and that improvements will be
incremental – allow a realistic timeframe for
the process of implementing data ops
Remove silos
Breaking down organisational barriers between
Data Suppliers, Preparers, Consumers, and the
Business will be crucial to the smooth flow of
data to those making decisions
Emphasise Data
Data should be front and centre of
strategic decision making for DataOps to
realise its full potential – this should be
embedded as a company value
Invest in Tools
Carefully selecting a complementary set of
technologies underpinning the implementation of
DataOps is essential as will be providing the
relevant training to upskill your teams
1
2
3
4
5
Processes: Building a “Data Supply Chain”
Data
Suppliers
Data
Preparers
Data
Consumers
The
Business
Source Owner, DBA, Infrastructure and Ops Personnel,
Application Admins + Developers
Data Engineers, Data Architects, Data Stewards,
Integration Architects + Developers, Data Modelers
Machine Learning Model Developers, Data Scientists
HR, Finance, Strategy, Operations etc.
Data Product Managers
Business Analysts
Data Security Teams
Data Privacy Officers
Technology: Agile, Collaboration,
Automation, Infrastructure as Code
Agile
Small but frequent deliveries
of new features
Constant feedback loop
between the business and tech
Version Control to decrease risk
and increase productivity
Job / issue tracking to ensure
even minor feedback captured
Collaboration
Automation
Continuous Integration,
Deployment, Delivery
Automate testing and speed up
getting code into production
Infrastructure as Code
Manage IT infrastructure
using code
Make changes to
existing infrastructure
much more easily
CloverDX & DataOps
Increased deployment frequency
Package, share, and reuse any
functionality you design
Automated testing
Incorporate data quality tests and build in error
handling to your data pipelines
Consistent metadata and version control
CloverDX is easy to integrate with most VC tools,
metadata can easily be tracked and visualized
Monitoring
The CloverDX server has a monitoring suite that
can be applied to individual jobs or whole
business processes
Collaboration across all stakeholders
CloverDX’s visual design allows technical and
non-technical users to “speak the same language”
Gartner identified 5 “key techniques” that will support with the delivery of DataOps –
Clover can support “Data Preparers” with each
Upcoming Webinar
Code Management with
Version Control in CloverDX
December 8th
11am EST / 4pm GMT / 5pm CET
Register
Q&A

Starting Your Modern DataOps Journey

  • 1.
    Starting Your Modern DataOpsJourney Lucas Stone Solutions Engineer 01.12.2020
  • 2.
    The DataOps Story Pre 2000 2007 Waterfall Linearapproach Good for projects where end state is well-defined (e.g. physical infrastructure) Not as good where product is continually changing / developing (e.g. software) Agile Began with “Agile Manifesto” Designed for production of software products Response to how rapid business requirements could change Emphasised iteration DevOps Even given the rise of Agile, dev and ops teams remained siloed DevOps aimed to bring them together Set of practices to release high quality code and faster DataOps Bring the DevOps approach to data function Align data science / management and operations teams Ensures the business can fully leverage data and convert into actionable insight 2000 2014
  • 3.
    Where is DataOpsin the “Hype Cycle”?
  • 4.
  • 5.
    DataOps DATA PRODUCTION VALUE DEVELOPMENT Quality Data+ Operations Ensure that data value is delivered to business as soon as possible Intersection of the Value and Innovation Pipelines Value Pipeline Innovation Pipeline IDEA Quality
  • 6.
    DataOps Misconceptions A technology Althoughthere are a set of technologies that are commonly used to support its implementation within an organisation Restricted Either to 1) “big data” – the scale of data and complexity doesn’t preclude benefits 2) only advanced data science applications (e.g. ML) More than just DevOps for Data! It brings together 3 distinct elements: Agile development, DevOps, and Statistical Process Control (Lean) A methodology It brings together a number of principles and practices around the way an organization manages and processes data DataOps is… DataOps is not…
  • 7.
    DataOps – Whatare the benefits? Faster Deployment & Feedback Consumers of data analytics will get what they need faster and be able to feedback more often, creating virtuous circle with business requirements as the starting point Happier Colleagues Introduction of DataOps will mean those involved in the process are more able to quickly see the positive impact of their work leading to more engaged and productive teams Higher Data Quality Increased automation (particularly of testing) and standardised processes will lead to high Data Quality, which will in turn lead to better insights generated from Machine Learning models Collaboration DataOps promotes collaboration, communication, and coordination between teams that may otherwise remained siloed
  • 8.
    Which companies useDataOps? Please see Qubole’s Creating a Data-Driven Enterprise with DataOps ebook for further information on how each of these organisations implements DataOps
  • 9.
    Case Study: Facebook& Apache Hive Stage 2 Created a Hadoop data lake and developed Hive to make it more accessible – data team evolved from a service team to building self- service platforms for data extraction Stage 4 Developed Uis that were easy for business users to understand and use to independently extract data - data becomes fully democratised Stage 1 No structure around data requests, rather the business would request on an ad-hoc basis – the data function would act as a service Stage 3 Combined metadata services with Hive allowing users to look at data and metadata – however, data still not accessible to non-tech users
  • 10.
    What is requiredto begin implementing DataOps? Processes It is then important to establish clear processes including who is “RACI”. Those responsible should receive appropriate training. Measuring process effectiveness with appropriate KPIs is also crucial People & Culture The foundation for introducing DataOps lies firstly with buy-in from the key stakeholder groups particularly the business so that business requirements are understood Technologies Once the correct culture and processes have been established, an organisation can introduce tooling to support related activities, notably automation, testing, and orchestration 2 1 3
  • 11.
    People: Stakeholders Groups DataConsumers Those who will use data to perform analysis and extract insights to then deliver to those in the business who can use these to drive value Data Suppliers Those managing the integrity of Authoritative Data Sources to ensure data quality and availability Data Preparers Those who build data pipelines linking one source to another as well as managing its transformation into a usable format for Data Consumers Business Other parts of the organisation would not use DataOps – rather they rely on and benefit from better outputs in terms of insights / BI / analytics and convey Business Requirements Data Ops builds two crucial bridges, firstly between the business and technology functions, secondly within the data function itself
  • 12.
    People: Ingraining aDataOps Culture Push from the Top Cultural changes must be endorsed by senior management both within and outside of the data function before being pushed down to individual teams Embrace the Process Acknowledge that change won’t happen over night and that improvements will be incremental – allow a realistic timeframe for the process of implementing data ops Remove silos Breaking down organisational barriers between Data Suppliers, Preparers, Consumers, and the Business will be crucial to the smooth flow of data to those making decisions Emphasise Data Data should be front and centre of strategic decision making for DataOps to realise its full potential – this should be embedded as a company value Invest in Tools Carefully selecting a complementary set of technologies underpinning the implementation of DataOps is essential as will be providing the relevant training to upskill your teams 1 2 3 4 5
  • 13.
    Processes: Building a“Data Supply Chain” Data Suppliers Data Preparers Data Consumers The Business Source Owner, DBA, Infrastructure and Ops Personnel, Application Admins + Developers Data Engineers, Data Architects, Data Stewards, Integration Architects + Developers, Data Modelers Machine Learning Model Developers, Data Scientists HR, Finance, Strategy, Operations etc. Data Product Managers Business Analysts Data Security Teams Data Privacy Officers
  • 14.
    Technology: Agile, Collaboration, Automation,Infrastructure as Code Agile Small but frequent deliveries of new features Constant feedback loop between the business and tech Version Control to decrease risk and increase productivity Job / issue tracking to ensure even minor feedback captured Collaboration Automation Continuous Integration, Deployment, Delivery Automate testing and speed up getting code into production Infrastructure as Code Manage IT infrastructure using code Make changes to existing infrastructure much more easily
  • 15.
    CloverDX & DataOps Increaseddeployment frequency Package, share, and reuse any functionality you design Automated testing Incorporate data quality tests and build in error handling to your data pipelines Consistent metadata and version control CloverDX is easy to integrate with most VC tools, metadata can easily be tracked and visualized Monitoring The CloverDX server has a monitoring suite that can be applied to individual jobs or whole business processes Collaboration across all stakeholders CloverDX’s visual design allows technical and non-technical users to “speak the same language” Gartner identified 5 “key techniques” that will support with the delivery of DataOps – Clover can support “Data Preparers” with each
  • 16.
    Upcoming Webinar Code Managementwith Version Control in CloverDX December 8th 11am EST / 4pm GMT / 5pm CET Register Q&A