● Django web application consisting of 3 parts:
○ carbon (relays, caches, aggregates metrics)
○ whisper (graphite’s equivalent of RRD files)
○ Web UI (graph composer, simple dashboard)
Discover trends and patterns
What time of the day do we get the most users?
When x happened, what was the effect on y?
How many hits am I getting per hour?
How does this compare to last week? last month?
Predict future events
When will we need to add more servers? Databases?
Did the release into production fix problem x?
A few reasons:
formulas, no graph introspection, cannot push metrics, cannot feed out of sequence
metrics, ugly graphs, no API, expose system/os metrics on host via snmp, no graph
composer, no custom graphs, predefine metrics, predefine graphs, static polling interval,
unscalable, tons of work to create one graph, no 3rd party ecosystem, etc.
(Nagios integration, 3rd party custom dashboards)
Easy to feed data
Wide ecosystem of 3rd party
tools and dashboards
No all in one solution
No easy backups
It probably will become
How to graph
There are tons of ways to
feed graphite your data
timestamp = `date +%s`
value = 10
echo "dot.delimited.metric.name $value $timestamp" | nc -w 1 graphite.
def send_msg(message, HOST, PORT):
sock = socket.create_connection((HOST, PORT))
Python using graphite-pymetrics
from metrics import timing
def heavy_task( x, y, z):
# do heavy stuff here
Host = 'somegraphitehost'
conn = TCPSocket.new Host, 2003
conn.puts 'Metrics value timestamp'
Socket conn = new Socket("somegraphitehost" , 2003);
DataOutputStream dos = new DataOutputStream(conn .getOutputStream());
dos.writeBytes("metrics value timestamp" );
How we use graphite
700K + metrics per minute
A Common Graphite Stack
Agent for system/hardware level metrics
Growing repository of plugins for a wide variety
disk i/o, disk space, cpu, memory, mysql,
JMX, java, Redis, file sizes, load, etc.
Write your custom plugin in python
You can write Nagios plugins that can alert off
of metrics values
Nagios can also feed graphite
performance data, events (ie: update
counter each time email is sent), etc.
What to collect?
How often function x is called
Average value of function x
Average running time of
number of records with
value == ?
number of slow queries
send a 1, draw as infinite
http access logs
(2xx, 3xx, 4xx, 5xx)
Exception counts, results, important events, hits
Treat graphite like ‘Big Data’
You don’t know what metrics
you need until you need it