SlideShare a Scribd company logo
Mick Miller
Senior Product Manager, Cloud Native
KeyBank
mick_miller@keybank.com
@mickmill
Lessons from the trenches
How KeyBank used Elastic to build an enterprise monitoring
solution
15
states
1,100+
branches
1,400+
ATMs
17,206
employees
$143.6
B
assets
$6.4B
revenue
2
datacenters
Business and technology landscape
Design by entropy1
2
3
The floodgates open
Urgency
Business and
technology problems
21+ monitoring systems resulted in:
• Slow MTTR, great MTTB
• Difficulty in identifying the root cause of problems
• Poor mobile application satisfaction scores
• Poor branch workstation performance metrics
Business and technology landscape
Design by entropy
2
1
3
The floodgates open
Urgency
The floodgates open
2017–2018

Architecture
2018

Pilot/Prod
2019

Scale out
Enterprise Monitoring transformation
timeline
Late 2017
All new monitoring stopped
Log storage cost and stability
issues
Huge backlog of monitoring
requests
Early 2018
Critical situation w/ eastern WA
branches
No visibility into root cause
Elasticsearch dev environment
functional
Deployed metric and winlog beats
The floodgates
opened!
Two days later...
Root causes identified
Remediation project
launched
Business and technology landscape
Design by entropy
3
1
2
The floodgates open
Urgency
Pathway to production cleared
First production cluster was deployed mid-2018
Funding provided by retiring 5 of the 21 existing monitoring
systems
• More systems retired in 2018, 2019 with plans for more in 2020
Estimated savings over $5M
• No accurate way to estimate savings from eliminating all the different
support and skill sets required to support 21 systems
Cluster immediately started to experience performance issues;
tuning and scaling became highest priority
Urgency
Lessons from KeyBank's monitoring transformation
Cluster size estimations1
2
3
4
5
Architecture and design
approaches
Automation
Feedback loops and monitoring
Unexpected business value
Bad news: it's nearly impossible to precisely predict your workload
• Index growth, data ingestions, query usage, etc.
• Expect to iterate three or more times
• You’re always wrong the first time
• Elasticsearch is not an RDBMS and does not scale like a datastore
Cluster size estimation
There’s no way anyone can tell you how to design a perfect
cluster, and those who claim they do are liars. 
—Fred de Villamil, Elasticsearch for fun and profit
Good news: it is possible to design for growth
• Start small and isolated;  KeyBank used a dedicated eight-server hyper-
converged cluster
• Plan on scaling out and up; with a preference to scale out
• Automation is the key to scaling out and up without outages
Development cluster: first iteration
No separation
Development cluster: current
Data nodes
Master nodes
Coordinator nodes
Development: breakout of Elasticsearch services
Development
environment: current
iteration
Production cluster: first iteration
Production cluster: master and data nodes only
Production cluster:
first iteration
Production cluster: current iteration
Production cluster: breakout of cluster services
Production cluster:

current iteration
Type OS Heap size
(GB)
RAM
(GB)
Disk (GB) CPU
Data node RHEL
7.x
30 80 8000 16
Master node RHEL
7.x
30 64 40 2
Coordinator
node
RHEL
7.x
30 64 40 8
Current compute node specs
Initial lessons learned:
• Keep heap around 30 GB
• 16 cores are enough
• Monitor garbage collection
• VMs on hyper-converged
systems are not great for data
nodes
• Hyper-converged nodes are
expensive to scale
• Low-cost physical; more
compute for your $
Pain points with current design
• Hyper-converged systems / hypervisor disk I/O too high
• Hypervisor VM replication waste of disk space and drive writes
• Logstash ingestion pipeline fragile
Next iteration design goals
• Move data nodes to 49 physical servers; keep master and coordinator
nodes on hyper-converged systems
• Move Logstash ingestion pipelines to containers for isolation and scale
on demand
Unless you plan to use Elasticsearch to power the
search of your blog or a small e-commerce
website, your first design will fail. 
—Fred de Villamil, Elasticsearch for fun and profit
Current issues and next iteration goals
Next iteration cluster design: massive scale out
Ingestion ≅ 1.5TB/day
49 physical servers
14 VMs on HX
3TB RAM
638 cores
Hot nodes: 14TB
Warm nodes: 228TB
Cold nodes: 182.3TB
Total cluster: 453TB
Lessons from KeyBank's monitoring transformation
Cluster size estimations
2
1
3
4
5
Architecture and design
approaches
Automation
Feedback loops and monitoring
Unexpected business value
Independently scalable tiers
Loosely coupled tiers
High availability
Design to maintain the environment during business
hours with no outages
Architecture and design approaches
Logical architecture
Logical architecture
Independently scalable tiers allow for surgical scaling
HA tiers allows for production live updates
Physical vs. virtual
• Virtual → Kafka, Elasticsearch master and coordinator nodes
• Physical → Elasticsearch data nodes
• Containerize Logstash streams for pipeline isolation and on-
demand scaling
How to scale
Measure ingestion to indexed performance (SLAs)
• KeyBank SLA is under 0.5 sec. from inbound raw Kafka topic to
indexed in Elasticsearch
• Backlog item to create Kibana dashboard and alerting
• Ingestion SLAs are only part of the metric; need to make sure
Kafka output topics are near empty all the time. 
• When end-to-end indexing speed exceeds SLAs, it's time to tune
or scale out
Measure shards per data node
• When these get too high (approx. 300), it's time to tune or scale
out
Monitor query execution times
• Possibly scale out feedback
When to scale
Lessons from KeyBank's monitoring transformation
Cluster size estimations
3
1
2
4
5
Architecture and design
approaches
Automation
Feedback loops and monitoring
Unexpected business value
Core principle:

Infrastructure as Code (IaC)
Automation platform:

Ansible and AWX
Source control:

Bitbucket and simplified gitflow
Automate everything
Infrastructure as Code
Lessons from KeyBank's monitoring transformation
Cluster size estimations
4
1
2
3
5
Architecture and design
approaches
Automation
Feedback loops and monitoring
Unexpected business value
Important metrics for monitoring:
• Number of shards per node
• High shard per node count is a leading indicator of
degraded system performance
• Disk I/O per second
• Heap size
• Excessive garbage collection
• CPU utilization
• Ingestion SLA (time from Kafka raw topic to fully indexed)
Feedback loops and monitoring
Lessons from KeyBank's monitoring transformation
Cluster size estimations
5
1
2
3
4
Architecture and design
approaches
Automation
Feedback loops and monitoring
Unexpected business value
Initial focus on enterprise
monitoring data led to integrations
with:
• Database
• ITSM
• Collaboration
• Development pipeline
Unexpected business insight through
integrations
Elasticsearch documentation
https://www.elastic.co/guide/index.html
Running Elasticsearch for fun and profit
https://fdv.github.io/running-elasticsearch-fun-profit/
https://github.com/fdv/running-elasticsearch-fun-profit
References
Thanks!

More Related Content

What's hot

Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
Databricks
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
confluent
 
Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019
Steven Moy
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
Analytics8
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
Alan McSweeney
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
Amazon Web Services
 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaTop 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Kai Wähner
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
Rodney Joyce
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
Araf Karsh Hamid
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360
Capgemini
 
Data Governance in a big data era
Data Governance in a big data eraData Governance in a big data era
Data Governance in a big data era
Pieter De Leenheer
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
Databricks
 

What's hot (20)

Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
 
Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaTop 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360
 
Data Governance in a big data era
Data Governance in a big data eraData Governance in a big data era
Data Governance in a big data era
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 

Similar to How KeyBank Used Elastic to Build an Enterprise Monitoring Solution

Informix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceInformix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceKeshav Murthy
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
Databricks
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
FoundationDB
 
LeanXcale for Monitoring
LeanXcale for MonitoringLeanXcale for Monitoring
LeanXcale for Monitoring
LeanXcale
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
Stratebi
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
Qingsong Yao
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Maya Lumbroso
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Dataconomy Media
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the field
Stéphane Dorrekens
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsClaudiu Barbura
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
DataWorks Summit/Hadoop Summit
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and Heat
Qiming Teng
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
Santanu Dey
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media
 

Similar to How KeyBank Used Elastic to Build an Enterprise Monitoring Solution (20)

Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Informix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceInformix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performance
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
LeanXcale for Monitoring
LeanXcale for MonitoringLeanXcale for Monitoring
LeanXcale for Monitoring
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the field
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and Heat
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 

More from Elasticsearch

An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
Elasticsearch
 
From MSP to MSSP using Elastic
From MSP to MSSP using ElasticFrom MSP to MSSP using Elastic
From MSP to MSSP using Elastic
Elasticsearch
 
Cómo crear excelentes experiencias de búsqueda en sitios web
Cómo crear excelentes experiencias de búsqueda en sitios webCómo crear excelentes experiencias de búsqueda en sitios web
Cómo crear excelentes experiencias de búsqueda en sitios web
Elasticsearch
 
Te damos la bienvenida a una nueva forma de realizar búsquedas
Te damos la bienvenida a una nueva forma de realizar búsquedas Te damos la bienvenida a una nueva forma de realizar búsquedas
Te damos la bienvenida a una nueva forma de realizar búsquedas
Elasticsearch
 
Tirez pleinement parti d'Elastic grâce à Elastic Cloud
Tirez pleinement parti d'Elastic grâce à Elastic CloudTirez pleinement parti d'Elastic grâce à Elastic Cloud
Tirez pleinement parti d'Elastic grâce à Elastic Cloud
Elasticsearch
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
Elasticsearch
 
Plongez au cœur de la recherche dans tous ses états.
Plongez au cœur de la recherche dans tous ses états.Plongez au cœur de la recherche dans tous ses états.
Plongez au cœur de la recherche dans tous ses états.
Elasticsearch
 
Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]
Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]
Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]
Elasticsearch
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
Elasticsearch
 
Welcome to a new state of find
Welcome to a new state of findWelcome to a new state of find
Welcome to a new state of find
Elasticsearch
 
Building great website search experiences
Building great website search experiencesBuilding great website search experiences
Building great website search experiences
Elasticsearch
 
Keynote: Harnessing the power of Elasticsearch for simplified search
Keynote: Harnessing the power of Elasticsearch for simplified searchKeynote: Harnessing the power of Elasticsearch for simplified search
Keynote: Harnessing the power of Elasticsearch for simplified search
Elasticsearch
 
Cómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesCómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisiones
Elasticsearch
 
Explore relève les défis Big Data avec Elastic Cloud
Explore relève les défis Big Data avec Elastic Cloud Explore relève les défis Big Data avec Elastic Cloud
Explore relève les défis Big Data avec Elastic Cloud
Elasticsearch
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
Elasticsearch
 
Transforming data into actionable insights
Transforming data into actionable insightsTransforming data into actionable insights
Transforming data into actionable insights
Elasticsearch
 
Opening Keynote: Why Elastic?
Opening Keynote: Why Elastic?Opening Keynote: Why Elastic?
Opening Keynote: Why Elastic?
Elasticsearch
 
Empowering agencies using Elastic as a Service inside Government
Empowering agencies using Elastic as a Service inside GovernmentEmpowering agencies using Elastic as a Service inside Government
Empowering agencies using Elastic as a Service inside Government
Elasticsearch
 
The opportunities and challenges of data for public good
The opportunities and challenges of data for public goodThe opportunities and challenges of data for public good
The opportunities and challenges of data for public good
Elasticsearch
 
Enterprise search and unstructured data with CGI and Elastic
Enterprise search and unstructured data with CGI and ElasticEnterprise search and unstructured data with CGI and Elastic
Enterprise search and unstructured data with CGI and Elastic
Elasticsearch
 

More from Elasticsearch (20)

An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
From MSP to MSSP using Elastic
From MSP to MSSP using ElasticFrom MSP to MSSP using Elastic
From MSP to MSSP using Elastic
 
Cómo crear excelentes experiencias de búsqueda en sitios web
Cómo crear excelentes experiencias de búsqueda en sitios webCómo crear excelentes experiencias de búsqueda en sitios web
Cómo crear excelentes experiencias de búsqueda en sitios web
 
Te damos la bienvenida a una nueva forma de realizar búsquedas
Te damos la bienvenida a una nueva forma de realizar búsquedas Te damos la bienvenida a una nueva forma de realizar búsquedas
Te damos la bienvenida a una nueva forma de realizar búsquedas
 
Tirez pleinement parti d'Elastic grâce à Elastic Cloud
Tirez pleinement parti d'Elastic grâce à Elastic CloudTirez pleinement parti d'Elastic grâce à Elastic Cloud
Tirez pleinement parti d'Elastic grâce à Elastic Cloud
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
 
Plongez au cœur de la recherche dans tous ses états.
Plongez au cœur de la recherche dans tous ses états.Plongez au cœur de la recherche dans tous ses états.
Plongez au cœur de la recherche dans tous ses états.
 
Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]
Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]
Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Welcome to a new state of find
Welcome to a new state of findWelcome to a new state of find
Welcome to a new state of find
 
Building great website search experiences
Building great website search experiencesBuilding great website search experiences
Building great website search experiences
 
Keynote: Harnessing the power of Elasticsearch for simplified search
Keynote: Harnessing the power of Elasticsearch for simplified searchKeynote: Harnessing the power of Elasticsearch for simplified search
Keynote: Harnessing the power of Elasticsearch for simplified search
 
Cómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesCómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisiones
 
Explore relève les défis Big Data avec Elastic Cloud
Explore relève les défis Big Data avec Elastic Cloud Explore relève les défis Big Data avec Elastic Cloud
Explore relève les défis Big Data avec Elastic Cloud
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
 
Transforming data into actionable insights
Transforming data into actionable insightsTransforming data into actionable insights
Transforming data into actionable insights
 
Opening Keynote: Why Elastic?
Opening Keynote: Why Elastic?Opening Keynote: Why Elastic?
Opening Keynote: Why Elastic?
 
Empowering agencies using Elastic as a Service inside Government
Empowering agencies using Elastic as a Service inside GovernmentEmpowering agencies using Elastic as a Service inside Government
Empowering agencies using Elastic as a Service inside Government
 
The opportunities and challenges of data for public good
The opportunities and challenges of data for public goodThe opportunities and challenges of data for public good
The opportunities and challenges of data for public good
 
Enterprise search and unstructured data with CGI and Elastic
Enterprise search and unstructured data with CGI and ElasticEnterprise search and unstructured data with CGI and Elastic
Enterprise search and unstructured data with CGI and Elastic
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

How KeyBank Used Elastic to Build an Enterprise Monitoring Solution

  • 1. Mick Miller Senior Product Manager, Cloud Native KeyBank mick_miller@keybank.com @mickmill Lessons from the trenches How KeyBank used Elastic to build an enterprise monitoring solution
  • 3. Business and technology landscape Design by entropy1 2 3 The floodgates open Urgency
  • 4. Business and technology problems 21+ monitoring systems resulted in: • Slow MTTR, great MTTB • Difficulty in identifying the root cause of problems • Poor mobile application satisfaction scores • Poor branch workstation performance metrics
  • 5. Business and technology landscape Design by entropy 2 1 3 The floodgates open Urgency
  • 6. The floodgates open 2017–2018
 Architecture 2018
 Pilot/Prod 2019
 Scale out Enterprise Monitoring transformation timeline Late 2017 All new monitoring stopped Log storage cost and stability issues Huge backlog of monitoring requests Early 2018 Critical situation w/ eastern WA branches No visibility into root cause Elasticsearch dev environment functional Deployed metric and winlog beats The floodgates opened! Two days later... Root causes identified Remediation project launched
  • 7. Business and technology landscape Design by entropy 3 1 2 The floodgates open Urgency
  • 8. Pathway to production cleared First production cluster was deployed mid-2018 Funding provided by retiring 5 of the 21 existing monitoring systems • More systems retired in 2018, 2019 with plans for more in 2020 Estimated savings over $5M • No accurate way to estimate savings from eliminating all the different support and skill sets required to support 21 systems Cluster immediately started to experience performance issues; tuning and scaling became highest priority Urgency
  • 9. Lessons from KeyBank's monitoring transformation Cluster size estimations1 2 3 4 5 Architecture and design approaches Automation Feedback loops and monitoring Unexpected business value
  • 10. Bad news: it's nearly impossible to precisely predict your workload • Index growth, data ingestions, query usage, etc. • Expect to iterate three or more times • You’re always wrong the first time • Elasticsearch is not an RDBMS and does not scale like a datastore Cluster size estimation There’s no way anyone can tell you how to design a perfect cluster, and those who claim they do are liars.  —Fred de Villamil, Elasticsearch for fun and profit Good news: it is possible to design for growth • Start small and isolated;  KeyBank used a dedicated eight-server hyper- converged cluster • Plan on scaling out and up; with a preference to scale out • Automation is the key to scaling out and up without outages
  • 11. Development cluster: first iteration No separation
  • 12. Development cluster: current Data nodes Master nodes Coordinator nodes
  • 13. Development: breakout of Elasticsearch services Development environment: current iteration
  • 15. Production cluster: master and data nodes only Production cluster: first iteration
  • 17. Production cluster: breakout of cluster services Production cluster:
 current iteration
  • 18. Type OS Heap size (GB) RAM (GB) Disk (GB) CPU Data node RHEL 7.x 30 80 8000 16 Master node RHEL 7.x 30 64 40 2 Coordinator node RHEL 7.x 30 64 40 8 Current compute node specs Initial lessons learned: • Keep heap around 30 GB • 16 cores are enough • Monitor garbage collection • VMs on hyper-converged systems are not great for data nodes • Hyper-converged nodes are expensive to scale • Low-cost physical; more compute for your $
  • 19. Pain points with current design • Hyper-converged systems / hypervisor disk I/O too high • Hypervisor VM replication waste of disk space and drive writes • Logstash ingestion pipeline fragile Next iteration design goals • Move data nodes to 49 physical servers; keep master and coordinator nodes on hyper-converged systems • Move Logstash ingestion pipelines to containers for isolation and scale on demand Unless you plan to use Elasticsearch to power the search of your blog or a small e-commerce website, your first design will fail.  —Fred de Villamil, Elasticsearch for fun and profit Current issues and next iteration goals
  • 20. Next iteration cluster design: massive scale out Ingestion ≅ 1.5TB/day 49 physical servers 14 VMs on HX 3TB RAM 638 cores Hot nodes: 14TB Warm nodes: 228TB Cold nodes: 182.3TB Total cluster: 453TB
  • 21. Lessons from KeyBank's monitoring transformation Cluster size estimations 2 1 3 4 5 Architecture and design approaches Automation Feedback loops and monitoring Unexpected business value
  • 22. Independently scalable tiers Loosely coupled tiers High availability Design to maintain the environment during business hours with no outages Architecture and design approaches
  • 24. Independently scalable tiers allow for surgical scaling HA tiers allows for production live updates Physical vs. virtual • Virtual → Kafka, Elasticsearch master and coordinator nodes • Physical → Elasticsearch data nodes • Containerize Logstash streams for pipeline isolation and on- demand scaling How to scale
  • 25. Measure ingestion to indexed performance (SLAs) • KeyBank SLA is under 0.5 sec. from inbound raw Kafka topic to indexed in Elasticsearch • Backlog item to create Kibana dashboard and alerting • Ingestion SLAs are only part of the metric; need to make sure Kafka output topics are near empty all the time.  • When end-to-end indexing speed exceeds SLAs, it's time to tune or scale out Measure shards per data node • When these get too high (approx. 300), it's time to tune or scale out Monitor query execution times • Possibly scale out feedback When to scale
  • 26. Lessons from KeyBank's monitoring transformation Cluster size estimations 3 1 2 4 5 Architecture and design approaches Automation Feedback loops and monitoring Unexpected business value
  • 27. Core principle:
 Infrastructure as Code (IaC) Automation platform:
 Ansible and AWX Source control:
 Bitbucket and simplified gitflow Automate everything
  • 29. Lessons from KeyBank's monitoring transformation Cluster size estimations 4 1 2 3 5 Architecture and design approaches Automation Feedback loops and monitoring Unexpected business value
  • 30. Important metrics for monitoring: • Number of shards per node • High shard per node count is a leading indicator of degraded system performance • Disk I/O per second • Heap size • Excessive garbage collection • CPU utilization • Ingestion SLA (time from Kafka raw topic to fully indexed) Feedback loops and monitoring
  • 31. Lessons from KeyBank's monitoring transformation Cluster size estimations 5 1 2 3 4 Architecture and design approaches Automation Feedback loops and monitoring Unexpected business value
  • 32. Initial focus on enterprise monitoring data led to integrations with: • Database • ITSM • Collaboration • Development pipeline Unexpected business insight through integrations
  • 33. Elasticsearch documentation https://www.elastic.co/guide/index.html Running Elasticsearch for fun and profit https://fdv.github.io/running-elasticsearch-fun-profit/ https://github.com/fdv/running-elasticsearch-fun-profit References