SlideShare a Scribd company logo
1 of 28
The Rise of DataOps
From the ashes of Data Governance
Ryan Gross
2 Pariveda Solutions, Inc. Confidential & Proprietary.
Fear driven
Governance
Does not work
© Pariveda Solutions. Confidential & Proprietary.3
Governance is often the weakest pillar in a Modern Data
Enterprise, holding back the realization of value
No organization-wide plan or
vision guiding efforts and
consequently, ROI is unknown
or poorly justified
Current state process defined,
focused on supporting current
state efforts and baseline
financial estimates of
projected costs and initial
value forecasts
Vision of future state defined,
focused on incremental
improvements inclusive of
estimates that consider
operations
Vision and guiding principles,
cost and value considerations
together drive all technology
efforts to reach future state
Data Warehouse –
Analytics are descriptive and
immature
Data Swamp –
Descriptive analytics,
reporting, and visualization
Data Lake –
Experimentation and
beginning of predictive
analytics
Data Platform –
Optimization
for increased scale of use-
cases
Lack of well-defined data
stewardship and management
Data stewardship defined,
manage requests and issues
reactively
Proactive data stewardship
and management as code
Data stewardship and
management is
operationalized
Uncontrolled Reactive Proactive Resilient
Governance
Value
Platform
Most organization’s
governance efforts are
stuck in a loop from
reactive  uncontrolled
Attempt to apply legacy,
manual governance
practices to address issues
Governance efforts fail to
gain traction, leading to a
return to uncontrolled state
Data Governance Guideposts haven’t
changed
Data Quality Data Accessibility Data SecurityCompliance Availability
Traditional Data Governance is heavily dependent on human
intervention to manage, creating business decision bottlenecks
Chief Data Officer
Key Business Unit LeadsLead Data Stewards
Data Governance Council
Lead Data Stewards
Data Project Groups
Data Custodians
Data Stewards
Issues
Guidance
Initiatives
Initiatives
Traditional Approach in 2014
Recreated: http://datagovernanceaus.com.au/data-governance-what-is-it/
Lead Data Stewards
Chief Data Officer
Data Governance Council
Lead Data Stewards Key Business Unit Leads
Data Project Groups
Data Custodians
Data Stewards
Issues
Guidance
Initiatives
Initiatives
Traditional Approach in 2020
Data Governance Headcount, Meetings
& Quality Over Time
Data Quality
© Pariveda Solutions. Confidential & Proprietary.6
To fix Data
Governance,
we need to
change our
Mental Model
© Pariveda Solutions. Confidential & Proprietary.7
In other words, maybe
we need to reframe the
problem…
8 Pariveda Solutions, Inc. Confidential & Proprietary.
With
Machine
Learning,
Data
Writes the
Code
Once Upon a Time…
© Pariveda Solutions. Confidential & Proprietary.10
The ability to
compile a set of
input code to
executable
outputs
Version control
systems to
keep track of
the input code
Two core innovations created the discipline of software engineering:
+
11 Pariveda Solutions, Inc. Confidential & Proprietary.
1. Higher level languages (scala, python)
2. Automated Unit & Integration Tests
3. Static Analysis
4. Continuous Integration
5. Refactoring
All modern software engineering builds on
these fundamental constructs:
12 Pariveda Solutions, Inc. Confidential & Proprietary.
The ability to compile input code
to executable outputs
Version control systems to keep
track of the input code
DevOps: Bring Compilers and Source Control to Infrastructure
13 Pariveda Solutions, Inc. Confidential & Proprietary.
Let’s Look at DevOps
The same process of increasing
improvements has repeated itself, with
cloud native approaches and
continuous deployment becoming the
norm
14 Pariveda Solutions, Inc. Confidential & Proprietary.
So getting back to
data…
15 Pariveda Solutions, Inc. Confidential & Proprietary.
The ability to compile input code
to executable outputs
Version control systems to keep
track of the input code
DataOps: Bring Compilers and Source Control to the world of Data
If this is the
source
code…
…and this is the
resulting
operation
… then the pipeline
is the compiler
We still don’t really
understand how
data writes code
• This is why we have data
scientists experiment to
figure out the logic
• Later data engineers come
in later to build the
optimizers
© Pariveda Solutions. Confidential & Proprietary.17
Use Software Development Best Practices to manage what we
don’t fully understand
Innovation Pipelines provide a means for companies to
deliver value while automating testing and monitoring
quality.Source: The DataOps Cookbook
Define
Everything as
Code
to reduce risk
increase
quality, and
build trust
Access&Privacy
Defines the requirements
to access the data
outputs produced by
each pipeline stage
Dependencies
Defines all libraries that this
component depends on to
execute / test without actually
including the libraries in SCM.
PipelineCode
The functional code for
this component. This
code should be separated
out so that pure business
logic lives in a library &
platform specific code
calls the lib.
CloudEnvironments
CloudFormation Templates define the
infrastructure that will be created to
deploy this component. Data cloning
or test data management provide the
datasets that enable testing
LogicTests
Test code to
ensure proper
function of the
business logic.
These capture
edge cases that
may not be in
the real data
DeployPipeline
Jenkins File definitions
include the pipeline of build
steps required to
successfully get this
component into production.
DataTests
Test code that
ensures input
data is correct
and outputs are
properly
configured.
By ensuring that every
aspect of developing
analytics solutions is
captured and tracked as
code, it becomes much
more clear which change
introduced a failure
Everything
as Code
© Pariveda Solutions. Confidential & Proprietary.19
Change Management
• Cloud Vendor Selection
• Resilient Solution Architecture
• Integrated Enterprise Solution
• Infrastructure Automation
• DevOps Process Definition & Change Plan
• Solution Evolution & Cost Optimization
Make it all discoverable automatically
Automate documentation and cataloging of data so data
producers don’t need to curate and document where and what
the data is
Empower self-service and self discovery
Make data discoverable and transparent so data consumers
can access and take advantage of it without having to talk to
the data producers.
Rapidly realize deep business insights
Focus is on the value of the data and how it can be used to drive timely
actionable insights at the speed of business not IT.
Remove Inefficiency and Thrash
Manual manipulation and management of data is
notoriously time consuming and inefficient
Onboard new sources in a day
New data sources can historically take weeks or
months, use the principles of DevOps to deploy
new sources from POC to Production in days or
hours
Write no incremental code
Bring new data sources in through configuration and automated
data introspection. Build flexible platforms that don’t need
expensive and error prone custom development to interpret new
data. Save custom development for algorithms that add
demonstrable business value.
DATA OPS
AS
GOVERNANCE
What if you could…
All the while providing data compliance and security.
© Pariveda Solutions. Confidential & Proprietary.20
New technology is available support DataOps Governance
Data Quality Data Discovery Data
Security
Data
Availability
Visual data wrangling
Pipeline Testing AWS LakeFormation
Compliance
ML Data Catalogs Compliance Catalogs
Rapid Exploration
AWS Athena
FederatedQuery
Privacy Tools
Observability Tools Access Management
AmazonKeyManagementServiceData Versioning
Encryption
Ingest Diff Model Enhance Transform
Production
Data Sources
Production Data Platform
Data Lake
Data Pipeline
Banking Data
Bloomberg, Dow Jones
Test Data Development Data Platform
Data Lake
Data Pipeline
Ingest Diff Model Enhance TransformDiff Model Enhance Transform
Raw Modeled Enhanced Products Data Mart
Test Banking Data
Test Bloomberg, Dow
Jones Data
test
test
test
test
test
Define Access Policies, Test end-to-end
Data Mart
Design right-to-left
Test Data for
NYSE
Raw Modeled Enhanced Products
Ingest Prototype
BI
Real time
AI
BI
Real time
AI
Dash-
boards
Dash-
boards
© Pariveda Solutions. Confidential & Proprietary.22
Add tests to control what you don’t know
That’s great…
…now how do I get started?
© Pariveda Solutions. Confidential & Proprietary.24
How to ramp up of data operations roles & processes
Leverage the data platform to capture
governance metrics and enforce constraints
Over time, the data management team will work more closely with
data science to build in governance during the experimentation loop
Same team applies governance once
feasible, valuable prototypes exist
People will be more likely to participate in the governance
process if they have a hand in the value being generated
Build data management team with a focus
on enabling use cases
The more people contribute to the initial Data Operations
design, the more they will feel invested in seeing it succeed
Reporting on the value of data insights and risk
avoidance drives continued motivation
Data Operations is a continually ongoing activity, people need to be
thoroughly invested to maintain participation
1
2
3
4
The gap between experiment start and
time necessary to learn the reasoning,
processes, tooling, and rules for the
governance aspects of data operations
OPPORTUNITIES PROTOTYPES
METRICSINSIGHTS
IDENTIFY,
ASSESS AND
PRIORITIZE
EXPERIMENT
AND LEARN
BUILD, TEST
AND RUN
© Pariveda Solutions. Confidential & Proprietary.25
A DataOps-enabled platform will support the entire Data Science Lifecycle
Data Catalog Data Lake Storage Data Pipelines
Experiment & Learn Deploy, Test, and Run
As you mature, you will be able to take on more complexity
Platform
Processes
Modern
ToolingFoundation
Data Forge
Framework
Data Ops
Roles/People
Organization
Technology and
Infrastructure
HigherlevelsofDataMaturity
Differentiated Business Value driven from your Data
Modern Data
Management
© Pariveda Solutions. Confidential & Proprietary.27
Manage trust in addition to minimizing Data Issues
28 Pariveda Solutions, Inc. Confidential & Proprietary.
1. Data environment management
2. Access & Privacy as Code
3. Test management
4. Continuous Deployment & Compliance
A similar process will play out with innovations
built on the foundation of DataOps

More Related Content

What's hot

Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overviewjdijcks
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI StrategyAtScale
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Constant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneyConstant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneySeeling Cheung
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analyticsRob Winters
 
Pervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityPervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityCloudera, Inc.
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTechWell
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business IntelligenceAlmog Ramrajkar
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your Product
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your ProductDell Technology World - IT as a Business - Multi-Cloud Strategy is your Product
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your ProductManuel "Manny" Rodriguez-Perez
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Data Ops at TripActions
Data Ops at TripActionsData Ops at TripActions
Data Ops at TripActionsRob Winters
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseOsama Hussein
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsCloudera, Inc.
 
BarbaraZigmanResume 2016
BarbaraZigmanResume 2016BarbaraZigmanResume 2016
BarbaraZigmanResume 2016bzigman
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkkguest4e975e2
 

What's hot (20)

Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Constant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneyConstant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake Journey
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Pervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityPervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricity
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big Problems
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your Product
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your ProductDell Technology World - IT as a Business - Multi-Cloud Strategy is your Product
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your Product
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Data Ops at TripActions
Data Ops at TripActionsData Ops at TripActions
Data Ops at TripActions
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
 
BarbaraZigmanResume 2016
BarbaraZigmanResume 2016BarbaraZigmanResume 2016
BarbaraZigmanResume 2016
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkk
 

Similar to Data summit connect fall 2020 - rise of data ops

What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?Xpand IT
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Elemica
 
DataOps , cbuswaw April '23
DataOps , cbuswaw April '23DataOps , cbuswaw April '23
DataOps , cbuswaw April '23Jason Packer
 
Data Science Innovation Summit Philadelphia 2019 - pariveda
Data Science Innovation Summit  Philadelphia 2019 - parivedaData Science Innovation Summit  Philadelphia 2019 - pariveda
Data Science Innovation Summit Philadelphia 2019 - parivedaRyan Gross
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Chain Sys Corporation
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeDataWorks Summit
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
Defining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentDefining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...Agile Testing Alliance
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
It Consulting & Services - Black Basil Technologies
It Consulting & Services  - Black Basil TechnologiesIt Consulting & Services  - Black Basil Technologies
It Consulting & Services - Black Basil TechnologiesBlack Basil Technologies
 
Estuate EDM Checklist
Estuate EDM ChecklistEstuate EDM Checklist
Estuate EDM ChecklistEstuate, Inc.
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 

Similar to Data summit connect fall 2020 - rise of data ops (20)

What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
 
DataOps , cbuswaw April '23
DataOps , cbuswaw April '23DataOps , cbuswaw April '23
DataOps , cbuswaw April '23
 
Data Science Innovation Summit Philadelphia 2019 - pariveda
Data Science Innovation Summit  Philadelphia 2019 - parivedaData Science Innovation Summit  Philadelphia 2019 - pariveda
Data Science Innovation Summit Philadelphia 2019 - pariveda
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Defining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentDefining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business Environment
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
 
How Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom LineHow Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom Line
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
It Consulting & Services - Black Basil Technologies
It Consulting & Services  - Black Basil TechnologiesIt Consulting & Services  - Black Basil Technologies
It Consulting & Services - Black Basil Technologies
 
Dev ops
Dev opsDev ops
Dev ops
 
Estuate EDM Checklist
Estuate EDM ChecklistEstuate EDM Checklist
Estuate EDM Checklist
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 

Data summit connect fall 2020 - rise of data ops

  • 1. The Rise of DataOps From the ashes of Data Governance Ryan Gross
  • 2. 2 Pariveda Solutions, Inc. Confidential & Proprietary. Fear driven Governance Does not work
  • 3. © Pariveda Solutions. Confidential & Proprietary.3 Governance is often the weakest pillar in a Modern Data Enterprise, holding back the realization of value No organization-wide plan or vision guiding efforts and consequently, ROI is unknown or poorly justified Current state process defined, focused on supporting current state efforts and baseline financial estimates of projected costs and initial value forecasts Vision of future state defined, focused on incremental improvements inclusive of estimates that consider operations Vision and guiding principles, cost and value considerations together drive all technology efforts to reach future state Data Warehouse – Analytics are descriptive and immature Data Swamp – Descriptive analytics, reporting, and visualization Data Lake – Experimentation and beginning of predictive analytics Data Platform – Optimization for increased scale of use- cases Lack of well-defined data stewardship and management Data stewardship defined, manage requests and issues reactively Proactive data stewardship and management as code Data stewardship and management is operationalized Uncontrolled Reactive Proactive Resilient Governance Value Platform Most organization’s governance efforts are stuck in a loop from reactive  uncontrolled Attempt to apply legacy, manual governance practices to address issues Governance efforts fail to gain traction, leading to a return to uncontrolled state
  • 4. Data Governance Guideposts haven’t changed Data Quality Data Accessibility Data SecurityCompliance Availability
  • 5. Traditional Data Governance is heavily dependent on human intervention to manage, creating business decision bottlenecks Chief Data Officer Key Business Unit LeadsLead Data Stewards Data Governance Council Lead Data Stewards Data Project Groups Data Custodians Data Stewards Issues Guidance Initiatives Initiatives Traditional Approach in 2014 Recreated: http://datagovernanceaus.com.au/data-governance-what-is-it/ Lead Data Stewards Chief Data Officer Data Governance Council Lead Data Stewards Key Business Unit Leads Data Project Groups Data Custodians Data Stewards Issues Guidance Initiatives Initiatives Traditional Approach in 2020 Data Governance Headcount, Meetings & Quality Over Time Data Quality
  • 6. © Pariveda Solutions. Confidential & Proprietary.6 To fix Data Governance, we need to change our Mental Model
  • 7. © Pariveda Solutions. Confidential & Proprietary.7 In other words, maybe we need to reframe the problem…
  • 8. 8 Pariveda Solutions, Inc. Confidential & Proprietary. With Machine Learning, Data Writes the Code
  • 9. Once Upon a Time…
  • 10. © Pariveda Solutions. Confidential & Proprietary.10 The ability to compile a set of input code to executable outputs Version control systems to keep track of the input code Two core innovations created the discipline of software engineering: +
  • 11. 11 Pariveda Solutions, Inc. Confidential & Proprietary. 1. Higher level languages (scala, python) 2. Automated Unit & Integration Tests 3. Static Analysis 4. Continuous Integration 5. Refactoring All modern software engineering builds on these fundamental constructs:
  • 12. 12 Pariveda Solutions, Inc. Confidential & Proprietary. The ability to compile input code to executable outputs Version control systems to keep track of the input code DevOps: Bring Compilers and Source Control to Infrastructure
  • 13. 13 Pariveda Solutions, Inc. Confidential & Proprietary. Let’s Look at DevOps The same process of increasing improvements has repeated itself, with cloud native approaches and continuous deployment becoming the norm
  • 14. 14 Pariveda Solutions, Inc. Confidential & Proprietary. So getting back to data…
  • 15. 15 Pariveda Solutions, Inc. Confidential & Proprietary. The ability to compile input code to executable outputs Version control systems to keep track of the input code DataOps: Bring Compilers and Source Control to the world of Data If this is the source code… …and this is the resulting operation … then the pipeline is the compiler
  • 16. We still don’t really understand how data writes code • This is why we have data scientists experiment to figure out the logic • Later data engineers come in later to build the optimizers
  • 17. © Pariveda Solutions. Confidential & Proprietary.17 Use Software Development Best Practices to manage what we don’t fully understand Innovation Pipelines provide a means for companies to deliver value while automating testing and monitoring quality.Source: The DataOps Cookbook
  • 18. Define Everything as Code to reduce risk increase quality, and build trust Access&Privacy Defines the requirements to access the data outputs produced by each pipeline stage Dependencies Defines all libraries that this component depends on to execute / test without actually including the libraries in SCM. PipelineCode The functional code for this component. This code should be separated out so that pure business logic lives in a library & platform specific code calls the lib. CloudEnvironments CloudFormation Templates define the infrastructure that will be created to deploy this component. Data cloning or test data management provide the datasets that enable testing LogicTests Test code to ensure proper function of the business logic. These capture edge cases that may not be in the real data DeployPipeline Jenkins File definitions include the pipeline of build steps required to successfully get this component into production. DataTests Test code that ensures input data is correct and outputs are properly configured. By ensuring that every aspect of developing analytics solutions is captured and tracked as code, it becomes much more clear which change introduced a failure Everything as Code
  • 19. © Pariveda Solutions. Confidential & Proprietary.19 Change Management • Cloud Vendor Selection • Resilient Solution Architecture • Integrated Enterprise Solution • Infrastructure Automation • DevOps Process Definition & Change Plan • Solution Evolution & Cost Optimization Make it all discoverable automatically Automate documentation and cataloging of data so data producers don’t need to curate and document where and what the data is Empower self-service and self discovery Make data discoverable and transparent so data consumers can access and take advantage of it without having to talk to the data producers. Rapidly realize deep business insights Focus is on the value of the data and how it can be used to drive timely actionable insights at the speed of business not IT. Remove Inefficiency and Thrash Manual manipulation and management of data is notoriously time consuming and inefficient Onboard new sources in a day New data sources can historically take weeks or months, use the principles of DevOps to deploy new sources from POC to Production in days or hours Write no incremental code Bring new data sources in through configuration and automated data introspection. Build flexible platforms that don’t need expensive and error prone custom development to interpret new data. Save custom development for algorithms that add demonstrable business value. DATA OPS AS GOVERNANCE What if you could… All the while providing data compliance and security.
  • 20. © Pariveda Solutions. Confidential & Proprietary.20 New technology is available support DataOps Governance Data Quality Data Discovery Data Security Data Availability Visual data wrangling Pipeline Testing AWS LakeFormation Compliance ML Data Catalogs Compliance Catalogs Rapid Exploration AWS Athena FederatedQuery Privacy Tools Observability Tools Access Management AmazonKeyManagementServiceData Versioning Encryption
  • 21. Ingest Diff Model Enhance Transform Production Data Sources Production Data Platform Data Lake Data Pipeline Banking Data Bloomberg, Dow Jones Test Data Development Data Platform Data Lake Data Pipeline Ingest Diff Model Enhance TransformDiff Model Enhance Transform Raw Modeled Enhanced Products Data Mart Test Banking Data Test Bloomberg, Dow Jones Data test test test test test Define Access Policies, Test end-to-end Data Mart Design right-to-left Test Data for NYSE Raw Modeled Enhanced Products Ingest Prototype BI Real time AI BI Real time AI Dash- boards Dash- boards
  • 22. © Pariveda Solutions. Confidential & Proprietary.22 Add tests to control what you don’t know
  • 23. That’s great… …now how do I get started?
  • 24. © Pariveda Solutions. Confidential & Proprietary.24 How to ramp up of data operations roles & processes Leverage the data platform to capture governance metrics and enforce constraints Over time, the data management team will work more closely with data science to build in governance during the experimentation loop Same team applies governance once feasible, valuable prototypes exist People will be more likely to participate in the governance process if they have a hand in the value being generated Build data management team with a focus on enabling use cases The more people contribute to the initial Data Operations design, the more they will feel invested in seeing it succeed Reporting on the value of data insights and risk avoidance drives continued motivation Data Operations is a continually ongoing activity, people need to be thoroughly invested to maintain participation 1 2 3 4 The gap between experiment start and time necessary to learn the reasoning, processes, tooling, and rules for the governance aspects of data operations OPPORTUNITIES PROTOTYPES METRICSINSIGHTS IDENTIFY, ASSESS AND PRIORITIZE EXPERIMENT AND LEARN BUILD, TEST AND RUN
  • 25. © Pariveda Solutions. Confidential & Proprietary.25 A DataOps-enabled platform will support the entire Data Science Lifecycle Data Catalog Data Lake Storage Data Pipelines Experiment & Learn Deploy, Test, and Run
  • 26. As you mature, you will be able to take on more complexity Platform Processes Modern ToolingFoundation Data Forge Framework Data Ops Roles/People Organization Technology and Infrastructure HigherlevelsofDataMaturity Differentiated Business Value driven from your Data Modern Data Management
  • 27. © Pariveda Solutions. Confidential & Proprietary.27 Manage trust in addition to minimizing Data Issues
  • 28. 28 Pariveda Solutions, Inc. Confidential & Proprietary. 1. Data environment management 2. Access & Privacy as Code 3. Test management 4. Continuous Deployment & Compliance A similar process will play out with innovations built on the foundation of DataOps