1
Learn More About Unravel
info@unraveldata.com
2
July 16, 2019
George Demarest
Senior Director of Product Marketing
Understanding DataOps
and its Impact on
Application Quality
3
Data Applications: The Perfect Storm of Human Error
Swiss Cheese Model of Human Error
Latent failures
Organizational
Challenges
Latent failures
Immature Supervision
Preconditions for
unsafe acts
Latent failures
Unsafe acts
Active failures
Incident!
4
Data Applications: The Perfect Storm of Human Error
Swiss Cheese Model of Human Error in Data Applications
Latent failures
Organizational
Challenges
Latent failures
Immature Supervision
Preconditions for
unsafe acts
Latent failures
Unsafe acts
Active failures
OOM
Incident!
“We don’t need a CDO/CAO”
“Developers will allocate their own cluster resources”
“Big Data is experimental. Data team: govern thyself”
“I need max memory for all of my jobs”
5
DataOps is a Thing
6
DataOps is a Thing
https://www.dataopsmanifesto.org/
7
Gartner Definition of DataOps
DataOps is a collaborative data management practice focused on
improving the communication, integration and automation of data flows
between data managers and consumers across an organization.
The goal of DataOps is to create predictable delivery and change
management of data, data models and related artifacts.
DataOps uses technology to automate data delivery with the appropriate
levels of security, quality and metadata to improve the use and value of
data in a dynamic environment.
8
Are We Losing the Battle of Complexity?
Multiple layers of
complexity make data
applications difficult to
tune, troubleshoot,
operationalize, and scale.
Only intelligent automation
can win this fight.
9
Managing distributed apps is hard and expensive
Hard to identify root cause Large “attack surface”
10
Without AI, DataOps is a manual, logistical challenge
One complete correlated view
with intelligent automation.
Multiple tools, no complete
view, no intelligence.
DataOps Without AI AI-Powered DataOps
11
Unravel is AI Powered Automation for DataOps
Without Unravel With Unravel
• One full-stack, cross platform console
• One complete correlated view
• Automatic, lightweight data/log collection
• Built-in AI/ML powered recommendations,
insights, remediation
• Chargeback/showback reporting
• Multiple tools
• Manual data/log collection
• Fragmented view, multiple consoles
• Minimal intelligence; manual tuning
and troubleshooting
• Manual cost analysis
12
Where is the value for DataOps created?
Business Value Accelerate business decisions though timely data driven insights
Performance Guarantee modern data application SLAs
Throughput Optimize cluster performance and job completion times
Quality Minimize failed jobs
Efficiency Minimize big data cluster and resource contention
Productivity Autonomous remediation scales Ops teams
13
Essential Elements of an AI-Powered DataOps
• Data Collection and Correlation
- Observe and collect all relevant data
- Correlate collected data and derived metadata
• Operational Data Model
- Monitoring, troubleshooting, tuning, and managing
requires an operational data model
- Richer, more powerful than a CMDB
• Analytics
- Basic and advanced statistical analysis – correlate,
classify, extrapolate from operational metadata
- Predictive analytics and forecasting for capacity and
growth
- Pattern and anomaly detection, root-cause analysis
- Prescriptive analytics and recommendations
- Context, topology and coded expertise
• Automation
- Auto-tuning of applications
- Autonomous resource allocation and optimization
- Cluster load balancing and job scheduling
- Automatic response to alerts and recovery from failures
14
Our Solution – Extensible Data Operations Platform
15
Automated DataOps Use Cases for Unravel
Automated Cloud Cost Management
• Optimize cost by right-sizing cloud images
• Optimize cost by choosing the optimal
price plan
Automated Workload Management
• Eliminate CPU, Memory, Network I/O and
Disk I/O contention
• Correctly size VM’s and Cloud Images
• Place VM’s in the best Hosts and Clusters
Automated Root Cause Analysis
• Intelligent analysis of application
failures
• Use Unravel data model and learned
app behaviors to automate RCA
Automated Performance Optimization
• Automatically learn the performance
characteristics apps and supporting stack
• Automatically optimize for a chosen KPI
(performance, efficiency)
16
Example: Root Cause Analysis of App Failures
16
Challenge
• Many levels of correlated stack traces
• Identifying the root cause is hard and time consuming
17
Resolution
• Reduce troubleshooting time from days to seconds
• Improve productivity of data scientists and analysts
Automated Root Cause Analysis of Failures
18
Error
Template
Extraction
Feature
vectors
Learning
Algorithm
for
Predictive
Model
Container
Logs
Predictive
Model
Root causes
Automated Root Cause Analysis of Failures
19
Unravel insights and
recommendations for
spark application tuning
20
Unravel is AI Powered Automation for DataOps
Without Unravel With Unravel
• One full-stack, cross platform console
• One complete correlated view
• Automatic, lightweight data/log collection
• Built-in AI/ML powered recommendations,
insights, remediation
• Chargeback/showback reporting
• Multiple tools
• Manual data/log collection
• Fragmented view, multiple consoles
• Minimal intelligence; manual tuning
and troubleshooting
• Manual cost analysis
21
Unravel – What makes us different
FULL-STACK
COVERAGE
Only Unravel works across your
entire ecosystem to demystify and
simplify operations.
AI-DRIVEN
RECOMMENDATIONS
Unravel does more than monitor – it
shows you how to make things
better.
AUTOMATED TUNING AND
REMEDIATION
Unravel operationalizes big data by
automating it.
FULLY-EXTENSIBLE
FOR CLOUD ADOPTION
Only Unravel works future-proofs
your cloud adoption choices
22
Learn More About Unravel
info@unraveldata.com

Understanding DataOps and Its Impact on Application Quality

  • 1.
    1 Learn More AboutUnravel info@unraveldata.com
  • 2.
    2 July 16, 2019 GeorgeDemarest Senior Director of Product Marketing Understanding DataOps and its Impact on Application Quality
  • 3.
    3 Data Applications: ThePerfect Storm of Human Error Swiss Cheese Model of Human Error Latent failures Organizational Challenges Latent failures Immature Supervision Preconditions for unsafe acts Latent failures Unsafe acts Active failures Incident!
  • 4.
    4 Data Applications: ThePerfect Storm of Human Error Swiss Cheese Model of Human Error in Data Applications Latent failures Organizational Challenges Latent failures Immature Supervision Preconditions for unsafe acts Latent failures Unsafe acts Active failures OOM Incident! “We don’t need a CDO/CAO” “Developers will allocate their own cluster resources” “Big Data is experimental. Data team: govern thyself” “I need max memory for all of my jobs”
  • 5.
  • 6.
    6 DataOps is aThing https://www.dataopsmanifesto.org/
  • 7.
    7 Gartner Definition ofDataOps DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and consumers across an organization. The goal of DataOps is to create predictable delivery and change management of data, data models and related artifacts. DataOps uses technology to automate data delivery with the appropriate levels of security, quality and metadata to improve the use and value of data in a dynamic environment.
  • 8.
    8 Are We Losingthe Battle of Complexity? Multiple layers of complexity make data applications difficult to tune, troubleshoot, operationalize, and scale. Only intelligent automation can win this fight.
  • 9.
    9 Managing distributed appsis hard and expensive Hard to identify root cause Large “attack surface”
  • 10.
    10 Without AI, DataOpsis a manual, logistical challenge One complete correlated view with intelligent automation. Multiple tools, no complete view, no intelligence. DataOps Without AI AI-Powered DataOps
  • 11.
    11 Unravel is AIPowered Automation for DataOps Without Unravel With Unravel • One full-stack, cross platform console • One complete correlated view • Automatic, lightweight data/log collection • Built-in AI/ML powered recommendations, insights, remediation • Chargeback/showback reporting • Multiple tools • Manual data/log collection • Fragmented view, multiple consoles • Minimal intelligence; manual tuning and troubleshooting • Manual cost analysis
  • 12.
    12 Where is thevalue for DataOps created? Business Value Accelerate business decisions though timely data driven insights Performance Guarantee modern data application SLAs Throughput Optimize cluster performance and job completion times Quality Minimize failed jobs Efficiency Minimize big data cluster and resource contention Productivity Autonomous remediation scales Ops teams
  • 13.
    13 Essential Elements ofan AI-Powered DataOps • Data Collection and Correlation - Observe and collect all relevant data - Correlate collected data and derived metadata • Operational Data Model - Monitoring, troubleshooting, tuning, and managing requires an operational data model - Richer, more powerful than a CMDB • Analytics - Basic and advanced statistical analysis – correlate, classify, extrapolate from operational metadata - Predictive analytics and forecasting for capacity and growth - Pattern and anomaly detection, root-cause analysis - Prescriptive analytics and recommendations - Context, topology and coded expertise • Automation - Auto-tuning of applications - Autonomous resource allocation and optimization - Cluster load balancing and job scheduling - Automatic response to alerts and recovery from failures
  • 14.
    14 Our Solution –Extensible Data Operations Platform
  • 15.
    15 Automated DataOps UseCases for Unravel Automated Cloud Cost Management • Optimize cost by right-sizing cloud images • Optimize cost by choosing the optimal price plan Automated Workload Management • Eliminate CPU, Memory, Network I/O and Disk I/O contention • Correctly size VM’s and Cloud Images • Place VM’s in the best Hosts and Clusters Automated Root Cause Analysis • Intelligent analysis of application failures • Use Unravel data model and learned app behaviors to automate RCA Automated Performance Optimization • Automatically learn the performance characteristics apps and supporting stack • Automatically optimize for a chosen KPI (performance, efficiency)
  • 16.
    16 Example: Root CauseAnalysis of App Failures 16 Challenge • Many levels of correlated stack traces • Identifying the root cause is hard and time consuming
  • 17.
    17 Resolution • Reduce troubleshootingtime from days to seconds • Improve productivity of data scientists and analysts Automated Root Cause Analysis of Failures
  • 18.
  • 19.
    19 Unravel insights and recommendationsfor spark application tuning
  • 20.
    20 Unravel is AIPowered Automation for DataOps Without Unravel With Unravel • One full-stack, cross platform console • One complete correlated view • Automatic, lightweight data/log collection • Built-in AI/ML powered recommendations, insights, remediation • Chargeback/showback reporting • Multiple tools • Manual data/log collection • Fragmented view, multiple consoles • Minimal intelligence; manual tuning and troubleshooting • Manual cost analysis
  • 21.
    21 Unravel – Whatmakes us different FULL-STACK COVERAGE Only Unravel works across your entire ecosystem to demystify and simplify operations. AI-DRIVEN RECOMMENDATIONS Unravel does more than monitor – it shows you how to make things better. AUTOMATED TUNING AND REMEDIATION Unravel operationalizes big data by automating it. FULLY-EXTENSIBLE FOR CLOUD ADOPTION Only Unravel works future-proofs your cloud adoption choices
  • 22.
    22 Learn More AboutUnravel info@unraveldata.com