SlideShare a Scribd company logo
App/Server Monitoring
Jaemok Jeong
2016. July.
©jmjeong 2016 1
It's not in produc.on unless
it’s monitored”.
— Theo Schlossnagle
©jmjeong 2016 2
If you can not measure it,
you can not improve it.”
— Lord Kelvin
©jmjeong 2016 3
What is measured improves
— Peter Drucker
©jmjeong 2016 4
Ques%on to answer
• How fast is my system?
• Is it faster than last month?
• Did our last deploy affect database performance?
• How much ;me do we spend calling external web
services?
©jmjeong 2016 5
More ques)ons
• How many errors do we have a day?
• How many failed logins?
• How many successful logins?
©jmjeong 2016 6
And more ques,ons!
• How many orders did we have today?
• How many orders did we have today from Android
version 2.0.56?
• How many rejected orders did we have?
©jmjeong 2016 7
To answer all of this, you need a way
to track difference numbers
©jmjeong 2016 8
©jmjeong 2016 9
Graphite
• A Highly Scalable Real-1me Graphing System
• h9p://graphite.wikidot.com/
• Components
• carbon - a daemon that listens for 1me-series data.
• whisper - a simple database library for storing 1me-
series data.
• webapp - a (Django) webapp that renders graphs
on demand.
©jmjeong 2016 10
©jmjeong 2016 11
Data Reten(on
• Default se+ngs
• 6 hours of 10 second data
• 1 week of 1 minute data
• 5 years of 10 minute data
• That's amounts to ~3.2MB per metric
• Configurable
[server_load]
priority=100
pattern- ^servers.
retentions = 60:43200,900:350400
©jmjeong 2016 12
Ports
• 80 : nginx
• 2003 : carbon
• 2004 : carbon aggregator
• 2023 : carbon pickle
• 2024 : carbon aggregator pickle
©jmjeong 2016 13
The Graphite Message Format
metric_patch value timestamp(UNIX epoch time)n
ex) foo.bar.baz 42 74857843
©jmjeong 2016 14
Populate Data
PORT=2003
SERVER=graphite.your.org
echo "local.random.diceroll 4 `date +%s`" | nc -c ${SERVER} ${PORT}
©jmjeong 2016 15
node.js
var graphite=require('graphite')
var client = graphite.createClient('plaintext://server:2003/');
var metrics = { foo.bar.baz : 72,
foo.bar.test : 100
foo.bar.size : 1024 };
client.write(metrics, Date.now(), function(err) {
if (err) console.error(err);
})
another nota)on:
var metrics = { foo.bar : {baz : 72, test : 100, size : 1024 }};
©jmjeong 2016 16
Grafana
• Beau&ful metric & analy&c dashboards
• Use graphite as backend storage
• h;p://grafana.org/
• Live Demo
©jmjeong 2016 17
©jmjeong 2016 18
Statsd
• A simple NodeJS daemon that listens for messages
on a UDP port
• It parses the messages, extracts metrics data, and
periodically flushes the data to graphite
Your app send data to StatsD
©jmjeong 2016 19
Usage
<metricname>:<value>|<type>
echo "foo:1|c" | nc -u -w0 127.0.0.1 8125
©jmjeong 2016 20
StatsD Metric Types
• Coun&ng - number of orders per sec
• gorets:c|c
• At each flush the current count is sent and reset
to 0
• Sampling
• gorets:1|c|@0.1
• Sent sampled every 1/10th of the &me
©jmjeong 2016 21
StatsD Metric Types (cont'd)
• Gauges - total orders today
• gaugor:333|g
• If the gauge is not updated at the next flush, it will
send the previous value
• Sets - unique user count
• uniques:765|s
• Coun?ng unique occurrences of events between
flushed, using a Set to store all occurring events
©jmjeong 2016 22
StatsD Metric Types (cont'd)
• Timing - )me to make an order
• glork|320|ms|@0.1
©jmjeong 2016 23
node-statsd-client
var SDC = require('statsd-client');
var sdc = new SDC({host:host,port:port,prefix:prefix});
sdc.increment('sample.counter');
sdc.increment('sample.mycounter',10);
sdc.gauge('sample,gauge', randomInteger(100));
var timer=new Date();
sdc.timing('sample.timer',timer);
sdc && sdc.close();
©jmjeong 2016 24
For log & crash search
©jmjeong 2016 25
Slack Integra-on
©jmjeong 2016 26
var alarmUrl = conf.alarm.info.url;
var payload = {
"channel": "monitoring",
"username": title.name,
"text": ['[', moment().format('YYYY-MM-DD HH:mm:ss'), '] ', icon, ' ', data].join(''),
"icon_emoji": title.icon
};
request({
url: alarmUrl,
method: 'POST',
json: payload
}, function(err, resp, body) {
if(err) {
logger.error('[sendNoti] error;', err);
} else {
logger.debug('[sendNoti] result; '+body.toString());
}
});
©jmjeong 2016 27
graphite naming conven/on
• {env}.{metric}.{region}.{hostname}
• lnc.summary.* - for All Projects
• count, size, totalsize
• denySize, denyCount, ...
• lnc.{group}.{appkey}.* - for each Projects
• lnc.internal.*
• lnc.internal.sct.stats.*
• lnc.internal.kaCa.lag.*
©jmjeong 2016 28
©jmjeong 2016 29
©jmjeong 2016 30
node.js
function sendToGraphite(prefix, data) {
var url = 'plaintext://'+conf.graphite.server+':'+conf.graphite.port+'/';
var client = graphite.createClient(url);
var metric = {};
metric[prefix]=data;
client.write(metric, function(err) {
if (err) {
logger.error('[sendToGraphite] send error', err);
}
else {
logger.debug('[sendToGraphite] send to ', url);
client.end();
}
});
}
sendToGraphite('lnc.internal.sct.stats', {
totalOnEs: result.total_doc_count,
sctQueue: result.doc_count,
esProcessed: result.processed_count,
esUpdateChecking: result.update_waiting
});
©jmjeong 2016 31
Demo
©jmjeong 2016 32

More Related Content

Similar to app/server monitoring

Session 2 - CloudStack Usage and Application (2013.Q3)
Session 2 - CloudStack Usage and Application (2013.Q3)Session 2 - CloudStack Usage and Application (2013.Q3)
Session 2 - CloudStack Usage and Application (2013.Q3)tcloudcomputing-tw
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
MongoDB
 
SOLR Power FTW: short version
SOLR Power FTW: short versionSOLR Power FTW: short version
SOLR Power FTW: short version
Alex Pinkin
 
Auditing data and answering the life long question, is it the end of the day ...
Auditing data and answering the life long question, is it the end of the day ...Auditing data and answering the life long question, is it the end of the day ...
Auditing data and answering the life long question, is it the end of the day ...
Simona Meriam
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
 
The State of the GeoServer project
The State of the GeoServer projectThe State of the GeoServer project
The State of the GeoServer project
GeoSolutions
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB
 
State of GeoServer at FOSS4G-NA
State of GeoServer at FOSS4G-NAState of GeoServer at FOSS4G-NA
State of GeoServer at FOSS4G-NA
GeoSolutions
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
Itai Yaffe
 
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - BrignoliMongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
Codemotion
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
From the proposal to ECMAScript, step by step
From the proposal to ECMAScript, step by stepFrom the proposal to ECMAScript, step by step
From the proposal to ECMAScript, step by step
Igalia
 
Druid meetup @walkme
Druid meetup @walkmeDruid meetup @walkme
Druid meetup @walkme
Dori Waldman
 
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics
 
[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps
WSO2
 
Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The CloudJourney of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The Cloud
takezoe
 
The Gnocchi Experiment
The Gnocchi ExperimentThe Gnocchi Experiment
The Gnocchi Experiment
Gordon Chung
 
5-minute Practical Streaming Techniques that can Save You Millions
5-minute Practical Streaming Techniques that can Save You Millions5-minute Practical Streaming Techniques that can Save You Millions
5-minute Practical Streaming Techniques that can Save You Millions
HostedbyConfluent
 
Urban flood prediction digital ocean august edition
Urban flood prediction   digital ocean august editionUrban flood prediction   digital ocean august edition
Urban flood prediction digital ocean august edition
transight
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 

Similar to app/server monitoring (20)

Session 2 - CloudStack Usage and Application (2013.Q3)
Session 2 - CloudStack Usage and Application (2013.Q3)Session 2 - CloudStack Usage and Application (2013.Q3)
Session 2 - CloudStack Usage and Application (2013.Q3)
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
SOLR Power FTW: short version
SOLR Power FTW: short versionSOLR Power FTW: short version
SOLR Power FTW: short version
 
Auditing data and answering the life long question, is it the end of the day ...
Auditing data and answering the life long question, is it the end of the day ...Auditing data and answering the life long question, is it the end of the day ...
Auditing data and answering the life long question, is it the end of the day ...
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
The State of the GeoServer project
The State of the GeoServer projectThe State of the GeoServer project
The State of the GeoServer project
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema Design
 
State of GeoServer at FOSS4G-NA
State of GeoServer at FOSS4G-NAState of GeoServer at FOSS4G-NA
State of GeoServer at FOSS4G-NA
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - BrignoliMongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
 
From the proposal to ECMAScript, step by step
From the proposal to ECMAScript, step by stepFrom the proposal to ECMAScript, step by step
From the proposal to ECMAScript, step by step
 
Druid meetup @walkme
Druid meetup @walkmeDruid meetup @walkme
Druid meetup @walkme
 
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
 
[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps[WSO2Con EU 2018] Patterns for Building Streaming Apps
[WSO2Con EU 2018] Patterns for Building Streaming Apps
 
Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The CloudJourney of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The Cloud
 
The Gnocchi Experiment
The Gnocchi ExperimentThe Gnocchi Experiment
The Gnocchi Experiment
 
5-minute Practical Streaming Techniques that can Save You Millions
5-minute Practical Streaming Techniques that can Save You Millions5-minute Practical Streaming Techniques that can Save You Millions
5-minute Practical Streaming Techniques that can Save You Millions
 
Urban flood prediction digital ocean august edition
Urban flood prediction   digital ocean august editionUrban flood prediction   digital ocean august edition
Urban flood prediction digital ocean august edition
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 

Recently uploaded

First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 

Recently uploaded (20)

First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 

app/server monitoring

  • 2. It's not in produc.on unless it’s monitored”. — Theo Schlossnagle ©jmjeong 2016 2
  • 3. If you can not measure it, you can not improve it.” — Lord Kelvin ©jmjeong 2016 3
  • 4. What is measured improves — Peter Drucker ©jmjeong 2016 4
  • 5. Ques%on to answer • How fast is my system? • Is it faster than last month? • Did our last deploy affect database performance? • How much ;me do we spend calling external web services? ©jmjeong 2016 5
  • 6. More ques)ons • How many errors do we have a day? • How many failed logins? • How many successful logins? ©jmjeong 2016 6
  • 7. And more ques,ons! • How many orders did we have today? • How many orders did we have today from Android version 2.0.56? • How many rejected orders did we have? ©jmjeong 2016 7
  • 8. To answer all of this, you need a way to track difference numbers ©jmjeong 2016 8
  • 10. Graphite • A Highly Scalable Real-1me Graphing System • h9p://graphite.wikidot.com/ • Components • carbon - a daemon that listens for 1me-series data. • whisper - a simple database library for storing 1me- series data. • webapp - a (Django) webapp that renders graphs on demand. ©jmjeong 2016 10
  • 12. Data Reten(on • Default se+ngs • 6 hours of 10 second data • 1 week of 1 minute data • 5 years of 10 minute data • That's amounts to ~3.2MB per metric • Configurable [server_load] priority=100 pattern- ^servers. retentions = 60:43200,900:350400 ©jmjeong 2016 12
  • 13. Ports • 80 : nginx • 2003 : carbon • 2004 : carbon aggregator • 2023 : carbon pickle • 2024 : carbon aggregator pickle ©jmjeong 2016 13
  • 14. The Graphite Message Format metric_patch value timestamp(UNIX epoch time)n ex) foo.bar.baz 42 74857843 ©jmjeong 2016 14
  • 15. Populate Data PORT=2003 SERVER=graphite.your.org echo "local.random.diceroll 4 `date +%s`" | nc -c ${SERVER} ${PORT} ©jmjeong 2016 15
  • 16. node.js var graphite=require('graphite') var client = graphite.createClient('plaintext://server:2003/'); var metrics = { foo.bar.baz : 72, foo.bar.test : 100 foo.bar.size : 1024 }; client.write(metrics, Date.now(), function(err) { if (err) console.error(err); }) another nota)on: var metrics = { foo.bar : {baz : 72, test : 100, size : 1024 }}; ©jmjeong 2016 16
  • 17. Grafana • Beau&ful metric & analy&c dashboards • Use graphite as backend storage • h;p://grafana.org/ • Live Demo ©jmjeong 2016 17
  • 19. Statsd • A simple NodeJS daemon that listens for messages on a UDP port • It parses the messages, extracts metrics data, and periodically flushes the data to graphite Your app send data to StatsD ©jmjeong 2016 19
  • 20. Usage <metricname>:<value>|<type> echo "foo:1|c" | nc -u -w0 127.0.0.1 8125 ©jmjeong 2016 20
  • 21. StatsD Metric Types • Coun&ng - number of orders per sec • gorets:c|c • At each flush the current count is sent and reset to 0 • Sampling • gorets:1|c|@0.1 • Sent sampled every 1/10th of the &me ©jmjeong 2016 21
  • 22. StatsD Metric Types (cont'd) • Gauges - total orders today • gaugor:333|g • If the gauge is not updated at the next flush, it will send the previous value • Sets - unique user count • uniques:765|s • Coun?ng unique occurrences of events between flushed, using a Set to store all occurring events ©jmjeong 2016 22
  • 23. StatsD Metric Types (cont'd) • Timing - )me to make an order • glork|320|ms|@0.1 ©jmjeong 2016 23
  • 24. node-statsd-client var SDC = require('statsd-client'); var sdc = new SDC({host:host,port:port,prefix:prefix}); sdc.increment('sample.counter'); sdc.increment('sample.mycounter',10); sdc.gauge('sample,gauge', randomInteger(100)); var timer=new Date(); sdc.timing('sample.timer',timer); sdc && sdc.close(); ©jmjeong 2016 24
  • 25. For log & crash search ©jmjeong 2016 25
  • 27. var alarmUrl = conf.alarm.info.url; var payload = { "channel": "monitoring", "username": title.name, "text": ['[', moment().format('YYYY-MM-DD HH:mm:ss'), '] ', icon, ' ', data].join(''), "icon_emoji": title.icon }; request({ url: alarmUrl, method: 'POST', json: payload }, function(err, resp, body) { if(err) { logger.error('[sendNoti] error;', err); } else { logger.debug('[sendNoti] result; '+body.toString()); } }); ©jmjeong 2016 27
  • 28. graphite naming conven/on • {env}.{metric}.{region}.{hostname} • lnc.summary.* - for All Projects • count, size, totalsize • denySize, denyCount, ... • lnc.{group}.{appkey}.* - for each Projects • lnc.internal.* • lnc.internal.sct.stats.* • lnc.internal.kaCa.lag.* ©jmjeong 2016 28
  • 31. node.js function sendToGraphite(prefix, data) { var url = 'plaintext://'+conf.graphite.server+':'+conf.graphite.port+'/'; var client = graphite.createClient(url); var metric = {}; metric[prefix]=data; client.write(metric, function(err) { if (err) { logger.error('[sendToGraphite] send error', err); } else { logger.debug('[sendToGraphite] send to ', url); client.end(); } }); } sendToGraphite('lnc.internal.sct.stats', { totalOnEs: result.total_doc_count, sctQueue: result.doc_count, esProcessed: result.processed_count, esUpdateChecking: result.update_waiting }); ©jmjeong 2016 31