Dieter Plaetinck
Tel Aviv. 5/3/2020
Graphite & Metrictank
| 2
What is Graphite?
| 3
Graphite is a Time Series Database
- It accepts and stores time-series data
- It provides an API to query that data
- Started by Chris Davis @ Orbitz circa 2006
- Open sourced 2008
- Last release: 1.1.6 (october 2019)
- Written in Python
- Uses a very simple storage format called whisper
- Each series is represented by a file on disk
Fun fact!
| 4
https://slides.com/torkelo/devsum-metrics-and-logging-2#/29
Data model
| 5
- Originally supported a hierarchical naming scheme:
production.dc1.server3.memory.used
- Each metric has a name made up of dot-delimited segments (called
nodes)
- Pro:
- Implied structure can help with navigation, autocomplete
- Con:
- Inflexible: It's difficult to make changes without breaking dashboards
- Opaque: There isn't an easy way to discover what each node
means
Data model
| 6
- Tagging arrived in 1.1 (December 2017)
- Added tags to existing hierarchical scheme
memory.used;dc=dc1;env=production;host=server3
- Flexible: Add new tags to the naming scheme easily
- Clear: Each element in the scheme has a built-in name
- Resilient: Adding tags doesn’t break existing dashboards
- Simple: Uses as much existing tooling as possible
- Pluggable: SQLite, MySQL, Postgres, Redis, HTTP
Data model
| 7
Hierarchical:
nginx.ip-1-2-3-4-80.home.200.http_requests_total
nginx.ip-1-2-3-5-80.settings.500.http_requests_total
nginx.ip-1-2-3-5-80.settings.400.http_requests_total
Nginx.ip-1-2-3-5-80.home.200.http_requests_total
Tagged:
http_requests_total;job=nginx;instance=1.2.3.4:80;path=/home;status=200
http_requests_total;job=nginx;instance=1.2.3.5:80;path=/settings;status=500
http_requests_total;job=nginx;instance=1.2.3.5:80;path=/settings;status=400
http_requests_total;job=nginx;instance=1.2.3.4:80;path=/home;status=200
Sending data
| 8
PORT=2003
SERVER=graphite.your.org
echo "local.random.diceroll 4 `date +%s`" | nc ${SERVER} ${PORT}
Selecting Series
| 9
Hierarchical:
nginx.*.*.*.500.*.http_requests_total
Tagged:
seriesByTag("name=http_requests_total",
"job=nginx","status=500")
https://graphite.readthedocs.io/en/latest/functions.html
Processing functions
| 10
sumSeries(), minSeries(), maxSeries(), etc
lowestAverage()
summarize()
linearRegression()
timeShift()
exponentialMovingAverage()
holtWintersConfidenceBands()
...
Holt winters confidence bands
| 11
Query syntax
| 12
Nested / traditional:
alias(movingAverage(scaleToSeconds(sumSeries(stats_global.production.counters.api.requests.*.
count),60),30),'api.avg')
Pipe syntax:
stats_global.production.counters.api.requests.*.count | sumSeries() | scaleToSeconds(60) |
movingAverage(30) | alias('api.avg')
Protip: reducing series to points
| 13
consolidateBy(foo, ‘min’)&maxDataPoints=1
Function plugins
| 14
Example:
from graphite.functions.params import Param, ParamTypes
def toUpperCase(requestContext, seriesList):
"""Custom function that changes series names to UPPERCASE"""
for series in seriesList:
series.name = series.name.upper()
return seriesList
toUpperCase.group = 'Custom'
toUpperCase.params = [
Param('seriesList', ParamTypes.seriesList, required=True),
]
SeriesFunctions = {
'upper': toUpperCase,
}
Ecosystem
| 15
https://graphite.readthedocs.io/en/latest/
tools.html
Ecosystem
| 16
collection tools: 21
Forwarding: 16
Visualization: 25
Monitoring: 10
Alternative storage: 11
Other: 8
Ecosystem: backends and relays
| 17
Official:
https://graphiteapp.org/ & https://github.com/graphite-project
Alternative backends:
https://github.com/bookingcom/carbonapi
https://github.com/go-graphite
https://github.com/InfluxGraph/influxgraph
https://github.com/lomik/graphite-clickhouse , https://github.com/ClickHouse/graphouse
https://github.com/douban/Kenshin
https://github.com/grafana/metrictank
Alternative relays:
https://github.com/grobian/carbon-c-relay , https://github.com/bookingcom/nanotube ,
https://github.com/go-graphite/gorelka , https://github.com/grafana/carbon-relay-ng
Graphiteapp Pros and Cons
| 18
Con :
● Performance & resource utilisation
● Does not handle high-churn series very well
● Quiet project
● Hard to scale
Graphiteapp Pros and Cons
| 19
Con :
● Performance & resource utilisation
● Does not handle high-churn series very well
● Quiet project
● Hard to scale
Pro :
● Rich library of handy processing functions
● Long term storage, seamless rollups
● Push based. No need for service discovery. Easier for SaaS (~)
● Ecosystem of tools and apps that can send out graphite data
Metrictank
| 20
Worldping TSDB requirements
| 22
Large scale (millions of points/sec, hundreds of millions of series)
Long term storage, rollups
Resource efficient (cpu, memory, disk)
Multi-tenant
Open source
Operationally friendly
Proven technology
Compatible with Graphite (or pluggable into Graphite)
Building Metrictank
| 23
● Take some shortcuts - don’t reinvent the wheel
● Scalable platform (but not fully automated)
● Performance (in-memory, index cache, native fns, etc)
● Cost (storage and compute resources)
● Tunable for different workloads (eg. retention, cache, redundancy,
index pruning)
Compared to Graphite
| 24
● Native functions execute much faster (WIP)
● Seamless resolution changing
● Better support for churn (short-lived series), but not to the extent
that cortex/prometheus solves for this case.
● Multiple rollup functions, choice at query time (*)
● Meta tags
● Append only, for now.
Basic topology
| 25
A few considerations
| 26
● Separate read and write peers
● Separate query peers
● Primary role reassignment & kafka retention
| 27
Meta tags
● Optimizing Low cardinality tags in index (!)
● Flexible associations
● Seamless: look and feel like real tags
| 28
Speculative query execution
● Cluster fan-out
● Limited by slowest cluster peer
● Go GC inflates response times
https://github.com/golang/go/issues/14812
● -> Spec-Exec !
| 29
Speculative query execution: example
● Cluster with 120 shardgroups, 2x replication = 240 read peers
● Average of 67 req/s ~ 8k peer req/s
● Threshold 94% (120 * 0.94 = 112.8)
| 30
| 31
| 32
Response metadata & Series Lineage
$ http 'http://localhost:6060/render?target=...&meta=true'
HTTP/1.1 200 OK
(...)
Trace-Id: 24c07790dc66a088
{
"meta": {
"stats": {
"executeplan.resolve-series.ms": 8,
"executeplan.get-targets.ms": 11,
"executeplan.prepare-series.ms": 0,
"executeplan.plan-run.ms": 0,
"executeplan.series-fetch.count": 300,
"executeplan.points-fetch.count": 6600,
"executeplan.points-return.count": 1800,
"executeplan.cache-miss.count": 0,
"executeplan.cache-hit-partial.count": 0,
"executeplan.cache-hit.count": 0,
"executeplan.chunks-from-tank.count": 300,
"executeplan.chunks-from-cache.count": 0,
"executeplan.chunks-from-store.count": 0
}
| 33
Response metadata & Series Lineage
"series": [
{
"target": "sumSeries(some.id.of.a.metric.*)",
"datapoints": [[123456, 1234567890], [123, 1234567895], … ],
"meta": [
{
"schema-name": "default-1",
"schema-retentions": "1s:6h:2min:2,1min:35d:6h:1",
"archive-read": 0,
"archive-interval": 5,
"aggnum-norm": 1,
"consolidate-normfetch": "AverageConsolidator",
"aggnum-rc": 0,
"consolidate-rc": "NoneConsolidator",
"count": 20
},
| 34
Response metadata & Series Lineage : mockup
Future work
| 35
● Graphite: try to help with governance, project consolidation,
installation process, etc
● Metrictank: more native functions
● Metrictank: chunk streaming
● Metrictank: meta analytics (enterprise)
● Metrictank: series overwrites
● Carbon-relay-ng: performance
● #metrictank on http://slack.grafana.com/
● Hiring python and Golang developers!

Graphite & Metrictank - Meetup Tel Aviv Yafo