Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
IP: 10.4.0.2Host: graphite-02IP: 10.4.0.1Host: graphite-01[cache:a]line_receiver_interface 0.0.0.0line_receiver_port 2103p...
IP: 10.4.0.2Host: graphite-02IP: 10.4.0.1Host: graphite-01[cache:a]line_receiver_port 2103pickle_receiver_port 2104cache_q...
Key points- Many nodes, each node running carbon-relay, webapp, carbon-cache(s).- Use at least two carbon-cache processes ...
webapp/graphite/storage.pySTORE = Store(settings.DATA_DIRS, remote_hosts=settings.CLUSTER_SERVERS)class Store:def __init__...
CARBONLINK_HOSTS should contain only local carbon-cache instances, not all DESTINATIONS list.Webapp will take care of sele...
Useful links:http://graphite.readthedocs.orghttp://www.aosabook.org/en/graphite.htmlhttp://rcrowley.org/articles/federated...
Upcoming SlideShare
Loading in …5
×

Graphite cluster setup blueprint

8,945 views

Published on

Published in: Technology
  • Hey Anatoliy, I have a similar setup but moved the carbon relay out of the 3 graphite boxes and have 2 of them with equal setup behind a VIP to ensure availability for my metrics pushing applications in case one of the graphite 'shards' goes down. I would like to use consistent hashing and replication factor 2, however this seems to not work. Only the relay rules works with replication factor 2. do you have any idea how I can get this to work properly, so I can have more than 2 shards with proper replication of data? Thanks
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I think it may be a problem sending to carbon-cache on different machines, using consistent hashing. since metric could once be saved in graphite-01 and once in graphite-02. and then when you will ask for the metric you might not get the last updated one, this is because it first checks locally before checking other webapps in the cluster.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Cool thing, Anatoliy!
    But LB need to have some sticky routing enabled, at least by IP or something AFAIK...
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Graphite cluster setup blueprint

  1. 1. IP: 10.4.0.2Host: graphite-02IP: 10.4.0.1Host: graphite-01[cache:a]line_receiver_interface 0.0.0.0line_receiver_port 2103pickle_receiver_interface 0.0.0.0pickle_receiver_port 2104cache_query_interface 0.0.0.0cache_query_port 7102[cache:b]line_receiver_interface 0.0.0.0line_receiver_port 2203pickle_receiver_interface 0.0.0.0pickle_receiver_port 2204cache_query_interface 0.0.0.0cache_query_port 7202[relay]line_receiver_interface = 0.0.0.0line_receiver_port = 2003pickle_receiver_interface = 0.0.0.0pickle_receiver_port = 2004relay_method = consistent-hashingreplication_factor = 1Destinations = [ 10.4.0.1:2104:a,10.4.0.1:2204:b,10.4.0.2:2104:a,10.4.0.2:2204:b ][cache:a]line_receiver_interface 0.0.0.0line_receiver_port 2103pickle_receiver_interface 0.0.0.0pickle_receiver_port 2104cache_query_interface 0.0.0.0cache_query_port 7102[cache:b]line_receiver_interface 0.0.0.0line_receiver_port 2203pickle_receiver_interface 0.0.0.0pickle_receiver_port 2204cache_query_interface 0.0.0.0cache_query_port 7202[relay]line_receiver_interface = 0.0.0.0line_receiver_port = 2003pickle_receiver_interface = 0.0.0.0pickle_receiver_port = 2004relay_method = consistent-hashingreplication_factor = 1Destinations = [ 10.4.0.1:2104:a,10.4.0.1:2204:b,10.4.0.2:2104:a,10.4.0.2:2204:b ][aggregator]line_receiver_interface = 0.0.0.0line_receiver_port = 2013pickle_receiver_interface = 0.0.0.0pickle_receiver_port = 2014[aggregator]line_receiver_interface = 0.0.0.0line_receiver_port = 2013pickle_receiver_interface = 0.0.0.0pickle_receiver_port = 2014load balancerIP: 10.4.0.10Host: graphiteTCP ports: 2003, 2004 HTTP ports: 80[webapp]port = 80memcache_hosts = [“rf-1.cache.amazonaws.com” ]cluster_servers = [“10.4.0.2:80” ]remote_rendering = falsecarbonlink_hosts = [“10.4.0.1:7102”,“10.4.0.1:7202” ][webapp]port = 80memcache_hosts = [“rf-1.cache.amazonaws.com” ]cluster_servers = [“10.4.0.2:80” ]remote_rendering = falsecarbonlink_hosts = [“10.4.0.2:7102”,“10.4.0.2:7202” ]adobrosynets@recordedfuture.com
  2. 2. IP: 10.4.0.2Host: graphite-02IP: 10.4.0.1Host: graphite-01[cache:a]line_receiver_port 2103pickle_receiver_port 2104cache_query_port 7102[cache:b]line_receiver_port 2203pickle_receiver_port 2204cache_query_port 7202[relay]line_receiver_port = 2003pickle_receiver_port = 2004relay_method = consistent-hashreplication_factor = 1Destinations = [10.4.0.1:2104:a,10.4.0.1:2204:b,10.4.0.2:2104:a,10.4.0.2:2204:b,10.4.0.3:2104:a,10.4.0.3:2204:b][cache:a]line_receiver_port 2103pickle_receiver_port 2104cache_query_port 7102[cache:b]line_receiver_port 2203pickle_receiver_port 2204cache_query_port 7202[relay]line_receiver_port = 2003pickle_receiver_port = 2004relay_method = consistent-hashreplication_factor = 1Destinations = [10.4.0.1:2104:a,10.4.0.1:2204:b,10.4.0.2:2104:a,10.4.0.2:2204:b ,10.4.0.3:2104:a,10.4.0.3:2204:b][aggregator]line_receiver_port = 2013pickle_receiver_port = 2014[aggregator]line_receiver_port = 2013pickle_receiver_port = 2014load balancerIP: 10.4.0.10Host: graphiteTCP ports: 2003, 2004 HTTP ports: 80[webapp]memcache_hosts = [“rf-1.cache” ]cluster_servers = [“10.4.0.2:80”,“10.4.0.3:80” ]carbonlink_hosts = [“10.4.0.1:7102”,“10.4.0.1:7202” ][webapp]memcache_hosts = [“rf-1.cache” ]cluster_servers = [“10.4.0.1:80”,“10.4.0.3:80” ]carbonlink_hosts = [“10.4.0.2:7102”,“10.4.0.2:7202” ]IP: 10.4.0.3Host: graphite-03[cache:a]line_receiver_port 2103pickle_receiver_port 2104cache_query_port 7102[cache:b]line_receiver_port 2203pickle_receiver_port 2204cache_query_port 7202[relay]line_receiver_port = 2003pickle_receiver_port = 2004relay_method = consistent-hashreplication_factor = 1Destinations = [10.4.0.1:2104:a,10.4.0.1:2204:b,10.4.0.2:2104:a,10.4.0.2:2204:b ,10.4.0.3:2104:a,10.4.0.3:2204:b][aggregator]line_receiver_port = 2013pickle_receiver_port = 2014[webapp]memcache_hosts = [“rf-1.cache” ]cluster_servers = [“10.4.0.1:80”,“10.4.0.2:80” ]carbonlink_hosts = [“10.4.0.3:7102”,“10.4.0.3:7202” ]adobrosynets@recordedfuture.com
  3. 3. Key points- Many nodes, each node running carbon-relay, webapp, carbon-cache(s).- Use at least two carbon-cache processes at the node to utilize performance(typically one process per CPU core)- All carbon-cache instances use the same schema definitions for whisper files- All monitoring agents (statsd/sensu/gdash/codehale/collectd/etc) use loadbalancer front-end (HAproxy or ELB)to send/query metrics.- Each carbon-relay may route metrics to any carbon-cache instance at any graphite server in cluster.- All carbon-relays use consistent-hashing method and have exactly the same DESTINATIONS list(carbon.conf DESTINATIONS. Order is important?)- All webapp processes share exactly the same memcache instance(s)(local_settings.py MEMCACHE_HOSTS)- Each webapp can may query only local carbon-cache instances.(local_settings.py CARBONLINK_HOSTS)- All webapps may contain not only other webapps in CLUSTER_SERVERS , but also itself.(local_settings.py, as of 0.9.10 version)- Each webapp CARBONLINK_HOSTS must contain only local instances from DESTINATIONS(order is not important )- In terms of AWS EC2, graphite nodes are supposed to be installed in the same Region.- Aggregator is not that useful. It is better to aggregate somewhere else (statsd/diamond) and send to graphite.
  4. 4. webapp/graphite/storage.pySTORE = Store(settings.DATA_DIRS, remote_hosts=settings.CLUSTER_SERVERS)class Store:def __init__(self, directories=[], remote_hosts=[]):self.directories = directoriesself.remote_hosts = remote_hostsself.remote_stores = [ RemoteStore(host)for host in remote_hosts if not is_local_interface(host) ]...def find_first():...remote_requests = [ r.find(query) for r in self.remote_stores if r.available ]...It is safe to have exactly the same CLUSTER_SERVERS option for all webapps in a cluster(less template work with Chef/Puppet)Though, there are some edge-cases. https://github.com/graphite-project/graphite-web/issues/222
  5. 5. CARBONLINK_HOSTS should contain only local carbon-cache instances, not all DESTINATIONS list.Webapp will take care of selecting proper carbon-cache instance for the metric, although it has a differentlist of items in his hash ring .https://answers.launchpad.net/graphite/+question/228472webapp/graphite/render/datalib.py# Data retrieval APIdef fetchData(requestContext, pathExpr):...if requestContext[localOnly]:store = LOCAL_STOREelse:store = STOREfor dbFile in store.find(pathExpr):log.metric_access(dbFile.metric_path)dbResults = dbFile.fetch( timestamp(startTime), timestamp(endTime) )try:cachedResults = CarbonLink.query(dbFile.real_metric)results = mergeResults(dbResults, cachedResults)except:log.exception()results = dbResultsif not results:continue...return seriesList
  6. 6. Useful links:http://graphite.readthedocs.orghttp://www.aosabook.org/en/graphite.htmlhttp://rcrowley.org/articles/federated-graphite.htmlhttp://bitprophet.org/blog/2013/03/07/graphite/http://boopathi.in/blog/the-graphite-story-directihttps://answers.launchpad.net/graphite/+question/228472

×