Successfully reported this slideshow.

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

5

Share

1 of 21
1 of 21

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

5

Share

Download to read offline

In this talk we present a prototype solution for multitenant, scale-out Prometheus. Don't worry, its open source!

Our solution turns a lot of the Prometheus architectural assumptions on its head, by marrying a scale-out PromQL query engine with a storage layer based on DynamoDB & S3. We have disaggregated the Prometheus binary into a microservices-style architecture, with separate services for distribution, ingest and storage. By designing all these services as fungible replicas, this solution can be scaled out with ease and failure of any individual replica can be dealt with gracefully.

This multitenant, scale-out Prometheus service forms a core component of the Weave Cloud, a hosted management, monitoring and visualisation platform for microservice & containerised applications. This platform is built from 100% open source components, and we're working with the Prometheus community to contribute all the changes we've made back to Prometheus.

In this talk we present a prototype solution for multitenant, scale-out Prometheus. Don't worry, its open source!

Our solution turns a lot of the Prometheus architectural assumptions on its head, by marrying a scale-out PromQL query engine with a storage layer based on DynamoDB & S3. We have disaggregated the Prometheus binary into a microservices-style architecture, with separate services for distribution, ingest and storage. By designing all these services as fungible replicas, this solution can be scaled out with ease and failure of any individual replica can be dealt with gracefully.

This multitenant, scale-out Prometheus service forms a core component of the Weave Cloud, a hosted management, monitoring and visualisation platform for microservice & containerised applications. This platform is built from 100% open source components, and we're working with the Prometheus community to contribute all the changes we've made back to Prometheus.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

  1. 1. Project Frankenstein A multi-tenant, horizontally scalable Prometheus as a Service Tom Wilkie (& Julius Volz) Weaveworks, August 2016
  2. 2. “the best way to visualise, manage & monitor your cloud native application”
  3. 3. Design
  4. 4. why not just run my own Prometheus? • the as-a-service bit provides authentication and access control • virtually infinite retention; all the state is managed for you, by us • provide a different story around durability, HA and scalability • (eventually) better query performance, especially for long queries
  5. 5. requirements: 1. API compatible with Prometheus 2. easy to operate and manage 3. tens of thousands of users, tens of millions samples/s 4. cost effective to run 5. reuse as much of Prometheus as possible … so we can sell it
  6. 6. Aim: build proof of concept as quickly as possible 16/06 started design doc 22/06 circulated on list 22/06 initial commit 26/07 launch jobs 25/08 give talk! http://goo.gl/prdUYV
  7. 7. Retriever scraping your jobs Your DC Weave Cloud Frontend, Authenticator Distributor Ingester Distributor… IngesterIngester DynamoDB S3
  8. 8. Retriever /bin/prometheus -retrieval-only -storage.remote.generic-url=... Does scraping and relabelling. Is a vanilla Prometheus plus: • Brian Brazil’s generic write PR (#1487) • Some modification to prevent local storage + indexing
  9. 9. • Uses consistent hashing to assign timeseries to Ingesters • Input to hash is (user ID, metric name) • Tokens stored in Consul • Also currently handles queries Distributor http://goo.gl/U9u1U2
  10. 10. • Heavily modified MemorySeriesStorage • Use same chunk format as Prometheus • Keeps everything in memory (for up to an hour) • Also stores in memory inverted index for queries • Flushes chunks to S3 and indexes them in DynamoDB Ingester
  11. 11. External inverted index maintained in DynamoDB, chunks stored in S3 Item in DynamoDB looks like: { hash key: “{user ID}:{metric name}:{hour}”, range key: “{label name}:{label value}:{chunk ID}”, metric: ..., from, through: ..., ID: ..., } DynamoDB S3
  12. 12. Evaluation
  13. 13. The Good • It works! And in ~2 months. • Seems pretty scalable, handling two clusters right now • Query performance better than expected The Ugly: the code… The Bad • Hashing scheme means can’t do queries that don’t involve metric names. • Possible to hotspot an ingester
  14. 14. Demo
  15. 15. Lots left to do… Features: • Recording rules • Alerting & Alertmanager Reliability: • Replication between ingesters, commit log etc • Ingestor lifecycle • Separate query service? Performance: • Query parallelisation • Background chunk coalescing Code: • Code cleanup • Upstream appropriate changes
  16. 16. Questions? Try it out! Email help@weave.works for instructions and to get on white list https://github.com/tomwilkie/prometheus

×