3. Key points
- Many nodes, each node running carbon-relay, webapp, carbon-cache(s).
- Use at least two carbon-cache processes at the node to utilize performance
(typically one process per CPU core)
- All carbon-cache instances use the same schema definitions for whisper files
- All monitoring agents (statsd/sensu/gdash/codehale/collectd/etc) use loadbalancer front-end (HAproxy or ELB)
to send/query metrics.
- Each carbon-relay may route metrics to any carbon-cache instance at any graphite server in cluster.
- All carbon-relays use 'consistent-hashing' method and have exactly the same DESTINATIONS list
(carbon.conf DESTINATIONS. Order is important?)
- All webapp processes share exactly the same memcache instance(s)
(local_settings.py MEMCACHE_HOSTS)
- Each webapp can may query only local carbon-cache instances.
(local_settings.py CARBONLINK_HOSTS)
- All webapps may contain not only other webapps in CLUSTER_SERVERS , but also itself.
(local_settings.py, as of 0.9.10 version)
- Each webapp CARBONLINK_HOSTS must contain only local instances from DESTINATIONS
(order is not important )
- In terms of AWS EC2, graphite nodes are supposed to be installed in the same Region.
- Aggregator is not that useful. It is better to aggregate somewhere else (statsd/diamond) and send to graphite.
4. webapp/graphite/storage.py
STORE = Store(settings.DATA_DIRS, remote_hosts=settings.CLUSTER_SERVERS)
class Store:
def __init__(self, directories=[], remote_hosts=[]):
self.directories = directories
self.remote_hosts = remote_hosts
self.remote_stores = [ RemoteStore(host)
for host in remote_hosts if not is_local_interface(host) ]
...
def find_first():
...
remote_requests = [ r.find(query) for r in self.remote_stores if r.available ]
...
It is safe to have exactly the same CLUSTER_SERVERS option for all webapps in a cluster
(less template work with Chef/Puppet)
Though, there are some edge-cases. https://github.com/graphite-project/graphite-web/issues/222
5. CARBONLINK_HOSTS should contain only local carbon-cache instances, not all DESTINATIONS list.
Webapp will take care of selecting proper carbon-cache instance for the metric, although it has a different
list of items in his hash ring .
https://answers.launchpad.net/graphite/+question/228472
webapp/graphite/render/datalib.py
# Data retrieval API
def fetchData(requestContext, pathExpr):
...
if requestContext['localOnly']:
store = LOCAL_STORE
else:
store = STORE
for dbFile in store.find(pathExpr):
log.metric_access(dbFile.metric_path)
dbResults = dbFile.fetch( timestamp(startTime), timestamp(endTime) )
try:
cachedResults = CarbonLink.query(dbFile.real_metric)
results = mergeResults(dbResults, cachedResults)
except:
log.exception()
results = dbResults
if not results:
continue
...
return seriesList