Key points- Many nodes, each node running carbon-relay, webapp, carbon-cache(s).- Use at least two carbon-cache processes at the node to utilize performance(typically one process per CPU core)- All carbon-cache instances use the same schema definitions for whisper files- All monitoring agents (statsd/sensu/gdash/codehale/collectd/etc) use loadbalancer front-end (HAproxy or ELB)to send/query metrics.- Each carbon-relay may route metrics to any carbon-cache instance at any graphite server in cluster.- All carbon-relays use consistent-hashing method and have exactly the same DESTINATIONS list(carbon.conf DESTINATIONS. Order is important?)- All webapp processes share exactly the same memcache instance(s)(local_settings.py MEMCACHE_HOSTS)- Each webapp can may query only local carbon-cache instances.(local_settings.py CARBONLINK_HOSTS)- All webapps may contain not only other webapps in CLUSTER_SERVERS , but also itself.(local_settings.py, as of 0.9.10 version)- Each webapp CARBONLINK_HOSTS must contain only local instances from DESTINATIONS(order is not important )- In terms of AWS EC2, graphite nodes are supposed to be installed in the same Region.- Aggregator is not that useful. It is better to aggregate somewhere else (statsd/diamond) and send to graphite.
webapp/graphite/storage.pySTORE = Store(settings.DATA_DIRS, remote_hosts=settings.CLUSTER_SERVERS)class Store:def __init__(self, directories=, remote_hosts=):self.directories = directoriesself.remote_hosts = remote_hostsself.remote_stores = [ RemoteStore(host)for host in remote_hosts if not is_local_interface(host) ]...def find_first():...remote_requests = [ r.find(query) for r in self.remote_stores if r.available ]...It is safe to have exactly the same CLUSTER_SERVERS option for all webapps in a cluster(less template work with Chef/Puppet)Though, there are some edge-cases. https://github.com/graphite-project/graphite-web/issues/222
CARBONLINK_HOSTS should contain only local carbon-cache instances, not all DESTINATIONS list.Webapp will take care of selecting proper carbon-cache instance for the metric, although it has a differentlist of items in his hash ring .https://answers.launchpad.net/graphite/+question/228472webapp/graphite/render/datalib.py# Data retrieval APIdef fetchData(requestContext, pathExpr):...if requestContext[localOnly]:store = LOCAL_STOREelse:store = STOREfor dbFile in store.find(pathExpr):log.metric_access(dbFile.metric_path)dbResults = dbFile.fetch( timestamp(startTime), timestamp(endTime) )try:cachedResults = CarbonLink.query(dbFile.real_metric)results = mergeResults(dbResults, cachedResults)except:log.exception()results = dbResultsif not results:continue...return seriesList