SlideShare a Scribd company logo
1 of 30
Download to read offline
1
Using AI-powered
Automation for High
Performance Data
Pipelines in the Cloud
April 10, 2019
2
Speaker
Alejandro Fernandez
Principal Software Engineer @
Unravel Data
alejandro@unraveldata.com
PMC Contributor Contributor
3
IS THE CLOUD
A GOOD DEAL?
4
Migrating and Managing Data Apps in the Cloud is Hard
For many reasons…
”Which cloud
provider”
5
Hadoop-unaware, manual, slow, inaccurate trial-and-error….
1000s of log files to dig through
Silo’d system monitoring tools do
not have any application context
OR
Current approaches are tedious and disconnected
6
Tools Must Become More Sophisticated
6
One complete correlated view
with built-in AI and ML.
Multiple tools, no complete
view, no intelligence.
Optimizing Data Apps
Without AI
With Unravel
Ganglia
7
The Unravel Architecture
AI-Powered & Automated
Troubleshooting and
Tuning for Modern Data
Pipelines in the Cloud
8
AI for
DataOps
Cloud
Migration
Resource and
Cost Optimization
Troubleshooting and
Root Cause Analysis
APM
for Big Data
Problems We Solve
9
Intelligence for Operations - Use Cases with Unravel
Optimizing Cloud Cost
• Comparing Cloud Provider cost
• Right-sizing VMs
• Identifying Apps suitable for the Cloud
Automated Workload Management
• Eliminate resource contention
• YARN queue analysis and auto-actions
Automated Event Management and RCA
• Automatic collection of all logs
• Correlation of error to Line of Code
• Alerts & integration with Slack
Automated Performance Optimization and
Remediation
• Recommend job and cluster configs
using ML model
• Automatically tune jobs via Sessions
• Automatically optimize for a chosen KPI
(performance, efficiency)
10
Root Cause Analysis with AI
Feature
vectors
Learning
Algorithm
for Predictive
Model
Container
Logs
Predictive
Model
Root Causes
Data Scientist
Error
Template
Extraction
11
Root Cause Analysis with AI
Learning Models:
• Logistic Regression
• Random Decision Forests
start
stop
database
table
partition
TF-IDF: measures relevance of a word in a corpus
!"#$ %#"&'"()*
+,)'$"(- %#"&'"()*
Doc2Vec
12
Root Cause Analysis with AI
80
85
90
95
100
TF-IDF Doc2Vec
AccuracyScore
[%]
Logistic Regression Random Forests
spark.driver.cores 2
spark.executor.cores 10
spark.sql.shuffle.partitions 300
spark.sql.autoBroadcastJoinThreshold 20MB
…
SKEW('orders', 'o_custId') true
spark.catalog.cacheTable(“orders") true
App Tuning – Often by Trial and Error
ConfigsAuto-Tune
Goal
Reduce duration
Reduce resources
Performance
OOM Error
Fixed config
efficient
14
Cloud Migrations
Cloud Migration
15
Journey to the Cloud
MigratePlan Validate and Manage
16
Planning Your Migration
Google
DataProc
HDI
Lift & Shift
Usage-Based
Cost Reduction Workload Fit
Destination:
Strategy:
Jobs:
Service Type &
Compatibility Tenant
Potential
Savings
Source:
CDP
17
Understand Your Cluster and Workloads
19
Identify which Applications to move to the Cloud
Bursty Apps
20
Identify which Applications to move to the Cloud
Apps belonging to specific tenants
21
Cloud Provider – VM Preferences
Multi-Cloud:
AWS
Azure
Google
Region-aware
External (S3, ADLS, Cloud Storage)
or EBS volumes
22
Map your On-Prem Cluster to a Cloud Provider
Strategies: Lift & Shift, Cost Reduction, Workload Fit
23
Map your On-Prem Cluster to a Cloud Provider
Strategies: Lift & Shift, Cost Reduction, Workload Fit
24
Map your On-Prem Cluster to a Cloud Provider
Strategies: Lift & Shift, Cost Reduction, Workload Fit
25
Tracking a Cloud Migration
This app is 8 times slower on cloud.
Unravel provides automatic fixes to get app back to meeting SLA
Compare how app is doing in new environment
26
Service Compatibility
What is the risk with migrating your applications?
27
Unravel Dataflow for AWS
28
Unravel Detailed Architecture for AWS
29
Unravel – What sets us Apart
FULL-STACK
COVERAGE
• 360º visibility
• Correlate code, config,
container, resources &
dependencies
• Agentless design and
micros-sensors make it
unobtrusive
AI-DRIVEN
RECOMMENDATIONS
• AI-powered actionable
insights and
recommendations
• Map dependencies
between apps, services,
resources, and users.
• Optimize cloud VMs
AUTOMATED TUNING
AND REMEDIATION
• Auto-Actions improve
app performance,
resource usage, and
reliability
• Automatically detect
and correct bottlenecks
and failures
30
Unravel makes data work
Unravel removes the blind spots in your data ecosystem, providing AI-powered
recommendations to drive more reliable performance in your modern data applications
31
Uncover what’s really going on in your cluster
and get the most out of every application.
START YOUR FREE TRIAL
https://unraveldata.com/free-trial/
hello@unraveldata.com

More Related Content

What's hot

Understanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidenceUnderstanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidence
IBM Software India
 
Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)
Denodo
 

What's hot (20)

IDC Infographic - How Flash Fits into Your Cloud
IDC Infographic - How Flash Fits into Your CloudIDC Infographic - How Flash Fits into Your Cloud
IDC Infographic - How Flash Fits into Your Cloud
 
Take the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTake the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented Analytics
 
How Analytics Optimize Migration to Amazon Web Services, Microsoft Azure and ...
How Analytics Optimize Migration to Amazon Web Services, Microsoft Azure and ...How Analytics Optimize Migration to Amazon Web Services, Microsoft Azure and ...
How Analytics Optimize Migration to Amazon Web Services, Microsoft Azure and ...
 
Semantic Data Management
Semantic Data ManagementSemantic Data Management
Semantic Data Management
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
CONNtext presentation
CONNtext presentationCONNtext presentation
CONNtext presentation
 
Accelerating Digital Transformation using Cloud Native Solutions
Accelerating Digital Transformation using Cloud Native SolutionsAccelerating Digital Transformation using Cloud Native Solutions
Accelerating Digital Transformation using Cloud Native Solutions
 
Understanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidenceUnderstanding Big Data so you can act with confidence
Understanding Big Data so you can act with confidence
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Postgres Vision 2018: The Pragmatic Cloud
Postgres Vision 2018:  The Pragmatic CloudPostgres Vision 2018:  The Pragmatic Cloud
Postgres Vision 2018: The Pragmatic Cloud
 
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big Data
 
Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)
 
Solution Centric Architectural Presentation - A Journey from Data Paralysis t...
Solution Centric Architectural Presentation - A Journey from Data Paralysis t...Solution Centric Architectural Presentation - A Journey from Data Paralysis t...
Solution Centric Architectural Presentation - A Journey from Data Paralysis t...
 
Liberate Your Data: Integrate Data From Traditional On-Prem Systems to Next-G...
Liberate Your Data: Integrate Data From Traditional On-Prem Systems to Next-G...Liberate Your Data: Integrate Data From Traditional On-Prem Systems to Next-G...
Liberate Your Data: Integrate Data From Traditional On-Prem Systems to Next-G...
 
Reinventing and Simplifying Data Management for a Successful Hybrid and Multi...
Reinventing and Simplifying Data Management for a Successful Hybrid and Multi...Reinventing and Simplifying Data Management for a Successful Hybrid and Multi...
Reinventing and Simplifying Data Management for a Successful Hybrid and Multi...
 
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
 
Augmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data ScientistAugmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data Scientist
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
 
Webinar: Building a Multi-Cloud Strategy with Data Autonomy featuring 451 Res...
Webinar: Building a Multi-Cloud Strategy with Data Autonomy featuring 451 Res...Webinar: Building a Multi-Cloud Strategy with Data Autonomy featuring 451 Res...
Webinar: Building a Multi-Cloud Strategy with Data Autonomy featuring 451 Res...
 

Similar to Using AI-powered Automation for High Performance Data Pipelines in the Cloud

!GDSC NYUST Infrastructure and Application Modernization with Google Cloud .pptx
!GDSC NYUST Infrastructure and Application Modernization with Google Cloud .pptx!GDSC NYUST Infrastructure and Application Modernization with Google Cloud .pptx
!GDSC NYUST Infrastructure and Application Modernization with Google Cloud .pptx
GangTingFan
 
Serverless Cloud Integrations Meetup: The Path Forward
Serverless Cloud Integrations Meetup: The Path ForwardServerless Cloud Integrations Meetup: The Path Forward
Serverless Cloud Integrations Meetup: The Path Forward
AaronLieberman5
 

Similar to Using AI-powered Automation for High Performance Data Pipelines in the Cloud (20)

Cloud Migration.pdf
Cloud Migration.pdfCloud Migration.pdf
Cloud Migration.pdf
 
Your Business at the Speed of Cloud. Innovate with Cloud-Native App Delivery,...
Your Business at the Speed of Cloud. Innovate with Cloud-Native App Delivery,...Your Business at the Speed of Cloud. Innovate with Cloud-Native App Delivery,...
Your Business at the Speed of Cloud. Innovate with Cloud-Native App Delivery,...
 
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
 
7 steps to Enterprise PaaS
7 steps to Enterprise PaaS7 steps to Enterprise PaaS
7 steps to Enterprise PaaS
 
[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석
[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석
[AWS Dev Day] 기조연설 – Olivier Klein AWS 신기술 부문 책임자, 정성권 삼성전자 수석
 
Cloud Con 2015 - Integration & Web APIs
Cloud Con 2015 - Integration & Web APIsCloud Con 2015 - Integration & Web APIs
Cloud Con 2015 - Integration & Web APIs
 
Building Your AWS Migration Practice with Toolkits AWS-Partner-Summit-Singapo...
Building Your AWS Migration Practice with Toolkits AWS-Partner-Summit-Singapo...Building Your AWS Migration Practice with Toolkits AWS-Partner-Summit-Singapo...
Building Your AWS Migration Practice with Toolkits AWS-Partner-Summit-Singapo...
 
Building Your AWS Migration Practice with Toolkits AWS-Partner-Summit-Singapo...
Building Your AWS Migration Practice with Toolkits AWS-Partner-Summit-Singapo...Building Your AWS Migration Practice with Toolkits AWS-Partner-Summit-Singapo...
Building Your AWS Migration Practice with Toolkits AWS-Partner-Summit-Singapo...
 
Hybrid Cloud Point of View - IBM Event, 2015
Hybrid Cloud Point of View - IBM Event, 2015Hybrid Cloud Point of View - IBM Event, 2015
Hybrid Cloud Point of View - IBM Event, 2015
 
Cloud what is the best model for vietnam
Cloud   what is the best model for vietnamCloud   what is the best model for vietnam
Cloud what is the best model for vietnam
 
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
 
Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS
 
App Modernization Pitch Deck.pptx
App Modernization Pitch Deck.pptxApp Modernization Pitch Deck.pptx
App Modernization Pitch Deck.pptx
 
The Ultimate Guide to Cloud Migration - A Whitepaper by RapidValue
The Ultimate Guide to Cloud Migration - A Whitepaper by RapidValueThe Ultimate Guide to Cloud Migration - A Whitepaper by RapidValue
The Ultimate Guide to Cloud Migration - A Whitepaper by RapidValue
 
Introduction to the AWS Cloud from Digital Tuesday Meetup
Introduction to the AWS Cloud from Digital Tuesday MeetupIntroduction to the AWS Cloud from Digital Tuesday Meetup
Introduction to the AWS Cloud from Digital Tuesday Meetup
 
!GDSC NYUST Infrastructure and Application Modernization with Google Cloud .pptx
!GDSC NYUST Infrastructure and Application Modernization with Google Cloud .pptx!GDSC NYUST Infrastructure and Application Modernization with Google Cloud .pptx
!GDSC NYUST Infrastructure and Application Modernization with Google Cloud .pptx
 
RightScale Webinar: Hybrid Cloud Fundamentals and Lessons Learned
RightScale Webinar: Hybrid Cloud Fundamentals and Lessons LearnedRightScale Webinar: Hybrid Cloud Fundamentals and Lessons Learned
RightScale Webinar: Hybrid Cloud Fundamentals and Lessons Learned
 
The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?
 
Serverless Cloud Integrations Meetup: The Path Forward
Serverless Cloud Integrations Meetup: The Path ForwardServerless Cloud Integrations Meetup: The Path Forward
Serverless Cloud Integrations Meetup: The Path Forward
 
Smart Integration to the Cloud - Kellton Tech Webinar
Smart Integration to the Cloud - Kellton Tech WebinarSmart Integration to the Cloud - Kellton Tech Webinar
Smart Integration to the Cloud - Kellton Tech Webinar
 

More from DevOps.com

Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
DevOps.com
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
DevOps.com
 

More from DevOps.com (20)

Modernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source SoftwareModernizing on IBM Z Made Easier With Open Source Software
Modernizing on IBM Z Made Easier With Open Source Software
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
 
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
 
Next Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykNext Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and Snyk
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
 
2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions2021 Open Source Governance: Top Ten Trends and Predictions
2021 Open Source Governance: Top Ten Trends and Predictions
 
A New Year’s Ransomware Resolution
A New Year’s Ransomware ResolutionA New Year’s Ransomware Resolution
A New Year’s Ransomware Resolution
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
 
Don't Panic! Effective Incident Response
Don't Panic! Effective Incident ResponseDon't Panic! Effective Incident Response
Don't Panic! Effective Incident Response
 
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's CultureCreating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
 
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with TeleportRole Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
 
Monitoring Serverless Applications with Datadog
Monitoring Serverless Applications with DatadogMonitoring Serverless Applications with Datadog
Monitoring Serverless Applications with Datadog
 
Deliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or PrivatelyDeliver your App Anywhere … Publicly or Privately
Deliver your App Anywhere … Publicly or Privately
 
Securing medical apps in the age of covid final
Securing medical apps in the age of covid finalSecuring medical apps in the age of covid final
Securing medical apps in the age of covid final
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call Culture
 
The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021The Evolving Role of the Developer in 2021
The Evolving Role of the Developer in 2021
 
Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?Service Mesh: Two Big Words But Do You Need It?
Service Mesh: Two Big Words But Do You Need It?
 
Secure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift EnvironmentsSecure Data Sharing in OpenShift Environments
Secure Data Sharing in OpenShift Environments
 
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
 
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 

Using AI-powered Automation for High Performance Data Pipelines in the Cloud

  • 1. 1 Using AI-powered Automation for High Performance Data Pipelines in the Cloud April 10, 2019
  • 2. 2 Speaker Alejandro Fernandez Principal Software Engineer @ Unravel Data alejandro@unraveldata.com PMC Contributor Contributor
  • 3. 3 IS THE CLOUD A GOOD DEAL?
  • 4. 4 Migrating and Managing Data Apps in the Cloud is Hard For many reasons… ”Which cloud provider”
  • 5. 5 Hadoop-unaware, manual, slow, inaccurate trial-and-error…. 1000s of log files to dig through Silo’d system monitoring tools do not have any application context OR Current approaches are tedious and disconnected
  • 6. 6 Tools Must Become More Sophisticated 6 One complete correlated view with built-in AI and ML. Multiple tools, no complete view, no intelligence. Optimizing Data Apps Without AI With Unravel Ganglia
  • 7. 7 The Unravel Architecture AI-Powered & Automated Troubleshooting and Tuning for Modern Data Pipelines in the Cloud
  • 8. 8 AI for DataOps Cloud Migration Resource and Cost Optimization Troubleshooting and Root Cause Analysis APM for Big Data Problems We Solve
  • 9. 9 Intelligence for Operations - Use Cases with Unravel Optimizing Cloud Cost • Comparing Cloud Provider cost • Right-sizing VMs • Identifying Apps suitable for the Cloud Automated Workload Management • Eliminate resource contention • YARN queue analysis and auto-actions Automated Event Management and RCA • Automatic collection of all logs • Correlation of error to Line of Code • Alerts & integration with Slack Automated Performance Optimization and Remediation • Recommend job and cluster configs using ML model • Automatically tune jobs via Sessions • Automatically optimize for a chosen KPI (performance, efficiency)
  • 10. 10 Root Cause Analysis with AI Feature vectors Learning Algorithm for Predictive Model Container Logs Predictive Model Root Causes Data Scientist Error Template Extraction
  • 11. 11 Root Cause Analysis with AI Learning Models: • Logistic Regression • Random Decision Forests start stop database table partition TF-IDF: measures relevance of a word in a corpus !"#$ %#"&'"()* +,)'$"(- %#"&'"()* Doc2Vec
  • 12. 12 Root Cause Analysis with AI 80 85 90 95 100 TF-IDF Doc2Vec AccuracyScore [%] Logistic Regression Random Forests
  • 13. spark.driver.cores 2 spark.executor.cores 10 spark.sql.shuffle.partitions 300 spark.sql.autoBroadcastJoinThreshold 20MB … SKEW('orders', 'o_custId') true spark.catalog.cacheTable(“orders") true App Tuning – Often by Trial and Error ConfigsAuto-Tune Goal Reduce duration Reduce resources Performance OOM Error Fixed config efficient
  • 15. 15 Journey to the Cloud MigratePlan Validate and Manage
  • 16. 16 Planning Your Migration Google DataProc HDI Lift & Shift Usage-Based Cost Reduction Workload Fit Destination: Strategy: Jobs: Service Type & Compatibility Tenant Potential Savings Source: CDP
  • 17. 17 Understand Your Cluster and Workloads
  • 18. 19 Identify which Applications to move to the Cloud Bursty Apps
  • 19. 20 Identify which Applications to move to the Cloud Apps belonging to specific tenants
  • 20. 21 Cloud Provider – VM Preferences Multi-Cloud: AWS Azure Google Region-aware External (S3, ADLS, Cloud Storage) or EBS volumes
  • 21. 22 Map your On-Prem Cluster to a Cloud Provider Strategies: Lift & Shift, Cost Reduction, Workload Fit
  • 22. 23 Map your On-Prem Cluster to a Cloud Provider Strategies: Lift & Shift, Cost Reduction, Workload Fit
  • 23. 24 Map your On-Prem Cluster to a Cloud Provider Strategies: Lift & Shift, Cost Reduction, Workload Fit
  • 24. 25 Tracking a Cloud Migration This app is 8 times slower on cloud. Unravel provides automatic fixes to get app back to meeting SLA Compare how app is doing in new environment
  • 25. 26 Service Compatibility What is the risk with migrating your applications?
  • 28. 29 Unravel – What sets us Apart FULL-STACK COVERAGE • 360º visibility • Correlate code, config, container, resources & dependencies • Agentless design and micros-sensors make it unobtrusive AI-DRIVEN RECOMMENDATIONS • AI-powered actionable insights and recommendations • Map dependencies between apps, services, resources, and users. • Optimize cloud VMs AUTOMATED TUNING AND REMEDIATION • Auto-Actions improve app performance, resource usage, and reliability • Automatically detect and correct bottlenecks and failures
  • 29. 30 Unravel makes data work Unravel removes the blind spots in your data ecosystem, providing AI-powered recommendations to drive more reliable performance in your modern data applications
  • 30. 31 Uncover what’s really going on in your cluster and get the most out of every application. START YOUR FREE TRIAL https://unraveldata.com/free-trial/ hello@unraveldata.com