SlideShare a Scribd company logo

Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn

Michael Kehoe
Michael Kehoe
Michael KehoeArchitect of reliable, scalable infrastructure at LinkedIn

Good monitoring can be the difference between a great night's sleep or hearing your phone go off at 2:37 a.m. because of a production outage. Couchbase Server provides a large number of metrics which can be overwhelming if you do not know the critical things to focus on or how to expose that information to your monitoring system. In this talk we will look at example production incidents, going in depth around specific things to monitor, and how this information can be used to find issues, work out root cause, and discover trends.

Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn

1 of 42
Download to read offline
©2016 Couchbase Inc.
Monitoring Production Deployments
TheTools – LinkedIn
Alex Ma – Principal Architect – Couchbase
Michael Kehoe – Staff Site Reliability Engineer - LinkedIn
1
©2016 Couchbase Inc.©2016 Couchbase Inc.
Overview
• MonitoringTools
• Making sense of the data
• External Monitoring Integrations
• Summary
2
©2016 Couchbase Inc. 3
Alex Ma
PrincipalArchitect, StrategicAccounts
alex@couchbase.com
IMAGE GOES HERE
©2016 Couchbase Inc. 4
Michael Kehoe
Staff Site Reliability Engineer (SRE) - LinkedIn
mkehoe@linkedin.com
• Production-SRE team
• Member of CBVT
• Australian!
• Contact
• linkedin.com/in/michaelkkehoe
• @matrixtek
GOES HERE
©2016 Couchbase Inc. 5
MonitoringTools
©2016 Couchbase Inc. 6
MonitoringTools – CouchbaseWeb Console

Recommended

Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsMichael Kehoe
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInMichael Kehoe
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...Michael Kehoe
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016Michael Kehoe
 
Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Christian Horsdal
 
[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7Site24x7
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environmentconfluent
 

More Related Content

What's hot

The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...HostedbyConfluent
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...confluent
 
What Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesWhat Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesAlexander Dean
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanDataWorks Summit
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EnginePraveen Thirukonda
 
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Tyler Nguyen
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
David Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...HostedbyConfluent
 
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...HostedbyConfluent
 
JIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter StricklandJIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter StricklandAtlassian
 
10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian PerformanceAtlassian
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions Icinga
 
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...Alexander Dean
 

What's hot (20)

The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
 
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent)  K...
Event Sourcing, Stream Processing and Serverless (Ben Stopford, Confluent) K...
 
What Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registriesWhat Crimean War gunboats teach us about the need for schema registries
What Crimean War gunboats teach us about the need for schema registries
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
 
Intro to.net core 20170111
Intro to.net core   20170111Intro to.net core   20170111
Intro to.net core 20170111
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation Engine
 
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
David Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to Espresso
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
 
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
 
JIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter StricklandJIRA Data Center Implementation at Pitney Bowes - Peter Strickland
JIRA Data Center Implementation at Pitney Bowes - Peter Strickland
 
10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance
 
The Rise of BaaS
The Rise of BaaSThe Rise of BaaS
The Rise of BaaS
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions
 
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
 

Viewers also liked

Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Michael Kehoe
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentMichael Kehoe
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 
Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialPooja Tangi
 
Event driven-automation and workflows
Event driven-automation and workflowsEvent driven-automation and workflows
Event driven-automation and workflowsDmitri Zimine
 
Optimized Couchbase Data Management
Optimized Couchbase Data ManagementOptimized Couchbase Data Management
Optimized Couchbase Data ManagementImanis Data
 
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Issa Fattah
 
Software reliability tools and common software errors
Software reliability tools and common software errorsSoftware reliability tools and common software errors
Software reliability tools and common software errorsHimanshu
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the dayPooja Tangi
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Diego Pacheco
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineeringguest90cec6
 
Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015Dmitri Zimine
 
Software reliability growth model
Software reliability growth modelSoftware reliability growth model
Software reliability growth modelHimanshu
 
Load balancing in the SRE way
Load balancing in the SRE wayLoad balancing in the SRE way
Load balancing in the SRE wayShawn Zhu
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring MicroservicesWeaveworks
 
The servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONThe servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONDaniel ( Danny ) ☃ Lawrence
 

Viewers also liked (18)

Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potential
 
Social Media Monitoring Tools and Services Report Excerpts 2016
Social Media Monitoring Tools and Services Report Excerpts 2016Social Media Monitoring Tools and Services Report Excerpts 2016
Social Media Monitoring Tools and Services Report Excerpts 2016
 
Event driven-automation and workflows
Event driven-automation and workflowsEvent driven-automation and workflows
Event driven-automation and workflows
 
Social Media Monitoring Tools and Services Report 2016 Presentation
Social Media Monitoring Tools and Services Report 2016 PresentationSocial Media Monitoring Tools and Services Report 2016 Presentation
Social Media Monitoring Tools and Services Report 2016 Presentation
 
Optimized Couchbase Data Management
Optimized Couchbase Data ManagementOptimized Couchbase Data Management
Optimized Couchbase Data Management
 
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
 
Software reliability tools and common software errors
Software reliability tools and common software errorsSoftware reliability tools and common software errors
Software reliability tools and common software errors
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the day
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineering
 
Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015
 
Software reliability growth model
Software reliability growth modelSoftware reliability growth model
Software reliability growth model
 
Load balancing in the SRE way
Load balancing in the SRE wayLoad balancing in the SRE way
Load balancing in the SRE way
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring Microservices
 
The servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECONThe servicescore card - Gamifying Operational Excellence - SRECON
The servicescore card - Gamifying Operational Excellence - SRECON
 

Similar to Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Kafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsKafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsManuel Hurtado
 
Monitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to backMonitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to backIcinga
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
 
Network Emulation in SOASTA 57 Spring Release
Network Emulation in SOASTA 57 Spring ReleaseNetwork Emulation in SOASTA 57 Spring Release
Network Emulation in SOASTA 57 Spring ReleaseJennifer Finney
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Cloud Services Powered by IBM SoftLayer and NetflixOSS
Cloud Services Powered by IBM SoftLayer and NetflixOSSCloud Services Powered by IBM SoftLayer and NetflixOSS
Cloud Services Powered by IBM SoftLayer and NetflixOSSaspyker
 
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudAndrew Schofield
 
Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Criteolabs
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...confluent
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Landon Robinson
 
Peter lubbers-html5-offline-web-apps
Peter lubbers-html5-offline-web-appsPeter lubbers-html5-offline-web-apps
Peter lubbers-html5-offline-web-appsSkills Matter
 
OpsStack--Integrated Operation Platform
OpsStack--Integrated Operation PlatformOpsStack--Integrated Operation Platform
OpsStack--Integrated Operation PlatformChinaNetCloud
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks
 
ASTQB washington-sept-2015
ASTQB washington-sept-2015ASTQB washington-sept-2015
ASTQB washington-sept-2015Dan Boutin
 
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...Insight Technology, Inc.
 
Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world usesRogue Wave Software
 
Live webinar data center_160224_revision
Live webinar data center_160224_revisionLive webinar data center_160224_revision
Live webinar data center_160224_revisionRyan Hadden
 

Similar to Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn (20)

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Kafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsKafka & Couchbase Integration Patterns
Kafka & Couchbase Integration Patterns
 
Monitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to backMonitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to back
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
Network Emulation in SOASTA 57 Spring Release
Network Emulation in SOASTA 57 Spring ReleaseNetwork Emulation in SOASTA 57 Spring Release
Network Emulation in SOASTA 57 Spring Release
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Cloud Services Powered by IBM SoftLayer and NetflixOSS
Cloud Services Powered by IBM SoftLayer and NetflixOSSCloud Services Powered by IBM SoftLayer and NetflixOSS
Cloud Services Powered by IBM SoftLayer and NetflixOSS
 
IPv4 IPv6 Media Player
IPv4 IPv6 Media PlayerIPv4 IPv6 Media Player
IPv4 IPv6 Media Player
 
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
 
Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo?
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
 
Peter lubbers-html5-offline-web-apps
Peter lubbers-html5-offline-web-appsPeter lubbers-html5-offline-web-apps
Peter lubbers-html5-offline-web-apps
 
OpsStack--Integrated Operation Platform
OpsStack--Integrated Operation PlatformOpsStack--Integrated Operation Platform
OpsStack--Integrated Operation Platform
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 
ASTQB washington-sept-2015
ASTQB washington-sept-2015ASTQB washington-sept-2015
ASTQB washington-sept-2015
 
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...
[db tech showcase Tokyo 2016] E22: Getting real time Oracle data into Kafka a...
 
Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world uses
 
Live webinar data center_160224_revision
Live webinar data center_160224_revisionLive webinar data center_160224_revision
Live webinar data center_160224_revision
 

More from Michael Kehoe

Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFMichael Kehoe
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...Michael Kehoe
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsMichael Kehoe
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 

More from Michael Kehoe (17)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 

Recently uploaded

Bluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons workingBluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons workingshrey Ansh
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS Chicago
 
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptxEvolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptxKyle Willson
 
My self introduction to know others abut me
My self  introduction to know others abut meMy self  introduction to know others abut me
My self introduction to know others abut meManoj Prabakar B
 
OTel Orientation_ How to Train Teams (OTel in Practice).pdf
OTel Orientation_ How to Train Teams (OTel in Practice).pdfOTel Orientation_ How to Train Teams (OTel in Practice).pdf
OTel Orientation_ How to Train Teams (OTel in Practice).pdfPaige Cruz
 
From eSIMs to iSIMs: It’s Inside the Manufacturing
From eSIMs to iSIMs: It’s Inside the ManufacturingFrom eSIMs to iSIMs: It’s Inside the Manufacturing
From eSIMs to iSIMs: It’s Inside the ManufacturingSoracom Global, Inc.
 
Navigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio LeadersNavigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio LeadersOnePlan Solutions
 
Heltun_HE-RS01_User_Manual_B9AH.pdf
Heltun_HE-RS01_User_Manual_B9AH.pdfHeltun_HE-RS01_User_Manual_B9AH.pdf
Heltun_HE-RS01_User_Manual_B9AH.pdfMarielaL5
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfkatalinjordans1
 
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Adrian Sanabria
 
Manual sensor Zigbee 3.0 MOES ZSS-X-PIRL-C
Manual  sensor Zigbee 3.0 MOES ZSS-X-PIRL-CManual  sensor Zigbee 3.0 MOES ZSS-X-PIRL-C
Manual sensor Zigbee 3.0 MOES ZSS-X-PIRL-CDomotica daVinci
 
Breaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologyBreaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologySafe Software
 
Curtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfCurtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfDomotica daVinci
 
M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____Aathiraju
 
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFEDNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFEandreiandasan
 
Put a flag on it. A busy developer's guide to feature toggles.
Put a flag on it. A busy developer's guide to feature toggles.Put a flag on it. A busy developer's guide to feature toggles.
Put a flag on it. A busy developer's guide to feature toggles.Mateusz Kwasniewski
 
Enhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for PartnersEnhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for PartnersThousandEyes
 
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfQuinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfDomotica daVinci
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfLLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfThomas Poetter
 

Recently uploaded (20)

Bluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons workingBluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons working
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user group
 
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptxEvolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
 
My self introduction to know others abut me
My self  introduction to know others abut meMy self  introduction to know others abut me
My self introduction to know others abut me
 
OTel Orientation_ How to Train Teams (OTel in Practice).pdf
OTel Orientation_ How to Train Teams (OTel in Practice).pdfOTel Orientation_ How to Train Teams (OTel in Practice).pdf
OTel Orientation_ How to Train Teams (OTel in Practice).pdf
 
From eSIMs to iSIMs: It’s Inside the Manufacturing
From eSIMs to iSIMs: It’s Inside the ManufacturingFrom eSIMs to iSIMs: It’s Inside the Manufacturing
From eSIMs to iSIMs: It’s Inside the Manufacturing
 
Navigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio LeadersNavigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio Leaders
 
Heltun_HE-RS01_User_Manual_B9AH.pdf
Heltun_HE-RS01_User_Manual_B9AH.pdfHeltun_HE-RS01_User_Manual_B9AH.pdf
Heltun_HE-RS01_User_Manual_B9AH.pdf
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdf
 
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
 
Manual sensor Zigbee 3.0 MOES ZSS-X-PIRL-C
Manual  sensor Zigbee 3.0 MOES ZSS-X-PIRL-CManual  sensor Zigbee 3.0 MOES ZSS-X-PIRL-C
Manual sensor Zigbee 3.0 MOES ZSS-X-PIRL-C
 
Breaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologyBreaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI Technology
 
Curtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfCurtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdf
 
M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____
 
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFEDNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
 
Put a flag on it. A busy developer's guide to feature toggles.
Put a flag on it. A busy developer's guide to feature toggles.Put a flag on it. A busy developer's guide to feature toggles.
Put a flag on it. A busy developer's guide to feature toggles.
 
5 Tech Trend to Notice in ESG Landscape- 47Billion
5 Tech Trend to Notice in ESG Landscape- 47Billion5 Tech Trend to Notice in ESG Landscape- 47Billion
5 Tech Trend to Notice in ESG Landscape- 47Billion
 
Enhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for PartnersEnhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for Partners
 
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfQuinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfLLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
 

Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn

  • 1. ©2016 Couchbase Inc. Monitoring Production Deployments TheTools – LinkedIn Alex Ma – Principal Architect – Couchbase Michael Kehoe – Staff Site Reliability Engineer - LinkedIn 1
  • 2. ©2016 Couchbase Inc.©2016 Couchbase Inc. Overview • MonitoringTools • Making sense of the data • External Monitoring Integrations • Summary 2
  • 3. ©2016 Couchbase Inc. 3 Alex Ma PrincipalArchitect, StrategicAccounts alex@couchbase.com IMAGE GOES HERE
  • 4. ©2016 Couchbase Inc. 4 Michael Kehoe Staff Site Reliability Engineer (SRE) - LinkedIn mkehoe@linkedin.com • Production-SRE team • Member of CBVT • Australian! • Contact • linkedin.com/in/michaelkkehoe • @matrixtek GOES HERE
  • 5. ©2016 Couchbase Inc. 5 MonitoringTools
  • 6. ©2016 Couchbase Inc. 6 MonitoringTools – CouchbaseWeb Console
  • 7. ©2016 Couchbase Inc. 7 MonitoringTools – CouchbaseWeb Console
  • 8. ©2016 Couchbase Inc. 8 MonitoringTools – CouchbaseWeb Console
  • 9. ©2016 Couchbase Inc. 9 MonitoringTools – Couchbase REST API • http://docs.couchbase.com/admin/admin/REST/rest-bucket-stats.html • GET /pools/default/buckets/[bucket-name]/stats • JSON output format • 60 collections per metric
  • 10. ©2016 Couchbase Inc. 10 MonitoringTools - cbstats • http://docs.couchbase.com/admin/admin/CLI/cbstats-intro.html • Command Line tool for viewing stats • 333+ Available stats • Cumulative and Snapshot
  • 11. ©2016 Couchbase Inc. 11 MonitoringTools - cbstats • Average value size = ep_value_size/(curr_items_tot-ep_num_non_resident) • ep_value_size = Amount of RAM used to hold values in this bucket for this node • Curr_items_tot =Total count of active/replica items in this bucket for this node • Ep_num_non_resident =Total number of items not resident in RAM • 9567135872 / ( 28733039 – 26582747 ) = 4449.22 bytes
  • 12. ©2016 Couchbase Inc. 12 MonitoringTools - cbstats • Cbstats can be pointed to a specific host and a specific port
  • 13. ©2016 Couchbase Inc. 13 MonitoringTools - cbstats • CbstatsTimings • Histogram that shows the timing of a number of internal operations • Commit to disk, background IO operations, GET ops • http://docs.couchbase.com/admin/admin/CLI/CBstats/cbstats-timing.html
  • 14. ©2016 Couchbase Inc. 14 MonitoringTools - Queries • http://developer.couchbase.com/documentation/server/current/tools/query-monitoring.html • http://localhost:8093/admin/vitals
  • 15. ©2016 Couchbase Inc. 15 MonitoringTools - htop • Htop|Top|vmstat|proc • Core Utilization • Customization
  • 16. ©2016 Couchbase Inc. 16 MonitoringTools - iostat • IO Utilization • Average wait times • Read/Write requests • Determine Capacity
  • 17. ©2016 Couchbase Inc. 17 MonitoringTools - iostat • IO Utilization • Average wait times • Read/Write requests • Determine Capacity
  • 18. ©2016 Couchbase Inc. 18 MonitoringTools - iftop • See where traffic is coming from • Measure replication throughput • Verify Capacity
  • 19. ©2016 Couchbase Inc. 19 Making Sense of the data
  • 20. ©2016 Couchbase Inc. 20 Key Statistics Metrics to Consider: • Couchbase-Server • Client application • Disk • Network
  • 21. ©2016 Couchbase Inc. 21 Key Statistics – Couchbase Server
  • 22. ©2016 Couchbase Inc. 22 Key Statistics – Couchbase Server Metrics to Consider: • Operations • Cache miss (ep_cache_miss_rate) • Active/Replica vbuckets (vb_active_num/vb_replica_num) • Percentage of items in memory (vb_active_resident_items_ratio) • Disk Queue (ep_diskqueue_items) • Misdirected Requests (ep_num_not_my_vbuckets)
  • 23. ©2016 Couchbase Inc. 23 Key Statistics – Couchbase Client Metrics to Consider: • Call-time latency • Measure GET’s/ SET’s separately • Hit-rate • Is the hit-rate what you expected • Errors • Timeouts retrieving objects • Unable to reach Couchbase-Server • See http://developer.couchbase.com/documentation/server/4.0/sdks/java-2.2/event-bus- metrics.html
  • 24. ©2016 Couchbase Inc. 24 Key Statistics – Couchbase Client
  • 25. ©2016 Couchbase Inc. 25 Key Statistics – Disk Metrics to Consider: • Disk Space • Compaction • Rebalance • Disk IO • Can disk sustain required IOPS • Disk Queue
  • 26. ©2016 Couchbase Inc. 26 Key Statistics – Network Metrics to Consider: • Network connectivity • Connections • Capacity/ Utilization
  • 27. ©2016 Couchbase Inc. 27 Key Statistics – Network – Connectivity • Ping - simple network connectivity test • Firewalls – make sure you have the correct ports open • See http://developer.couchbase.com/documentation/server/current/install/install-ports.html
  • 28. ©2016 Couchbase Inc. 28 Key Statistics – Network – Connections • File-descriptor limits • Connections in CLOSE_WAIT state • Collect stats from /proc/net/tcp
  • 29. ©2016 Couchbase Inc. 29 Key Statistics – Network – Capacity/ Utilization • Practical network capacity is ~85-90% of theoretical • E.g. 1Gb/s network interface can do 850-900Mb/s • Congested networks are problematic • Higher latency on responses • Slower replication • Collect stats from /proc/net/dev
  • 30. ©2016 Couchbase Inc. 30 Key Statistics – Network – Capacity/ Utilization • Practical network capacity is ~85-90% of theoretical (1250 Mb/s) • E.g. 1Gb/s network interface can do 850-900Mb/s Average object size (bytes) 4,096 ID length (bytes) 32 Meta data size (bytes) 56 Reads 100,000 Writes 60,000 Replica count 1 Read network utilization 421,600,000 Write network utilizaation 502,080,000 Total network utilization 923,680,000 1.25 billion theoretical max remaining bandwidth 276,320,000
  • 31. ©2016 Couchbase Inc. 31 External Monitoring Integrations
  • 32. ©2016 Couchbase Inc. 32 External Monitoring Integrations
  • 33. ©2016 Couchbase Inc. 33 External Monitoring Integrations – Write your own Getting Started • Use Couchbase REST API • Pipe ‘cbstats’ output
  • 34. ©2016 Couchbase Inc.©2016 Couchbase Inc. Using Couchbase REST API • Examples • Datadog – http://lnkd.in/cb-datadog • This Example – http://lnkd.in/cb-stats-collector 34
  • 35. ©2016 Couchbase Inc.©2016 Couchbase Inc. Using Couchbase REST API 35
  • 36. ©2016 Couchbase Inc.©2016 Couchbase Inc. Using Couchbase REST API 36
  • 37. ©2016 Couchbase Inc.©2016 Couchbase Inc. Using Couchbase REST API 37
  • 38. ©2016 Couchbase Inc.©2016 Couchbase Inc. Using Couchbase CBstats 38
  • 39. ©2016 Couchbase Inc.©2016 Couchbase Inc. Using Couchbase CBstats 39
  • 40. ©2016 Couchbase Inc. 40 Summary
  • 41. ©2016 Couchbase Inc. 41 Summary Important to have monitoring in-place Understand the metrics you monitor • What causes them • How to remediate

Editor's Notes

  1. - Service is slow, think its couchbase
  2. - Service is slow, think its couchbase