Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Project Frankenstein
A multi-tenant, horizontally scalable Prometheus as a Service
Tom Wilkie (& Julius Volz)
Weaveworks, ...
“the best way to visualise, manage & monitor your cloud
native application”
Design
why not just run my own Prometheus?
• the as-a-service bit provides authentication and
access control
• virtually infinite ...
requirements:
1. API compatible with Prometheus
2. easy to operate and manage
3. tens of thousands of users, tens of
milli...
Aim: build proof of concept as
quickly as possible
16/06 started design doc
22/06 circulated on list
22/06 initial commit
...
Retriever
scraping
your jobs
Your DC
Weave Cloud
Frontend,
Authenticator
Distributor
Ingester
Distributor…
IngesterIngeste...
Retriever
/bin/prometheus -retrieval-only -storage.remote.generic-url=...
Does scraping and relabelling.
Is a vanilla Prom...
• Uses consistent hashing to assign
timeseries to Ingesters
• Input to hash is (user ID, metric
name)
• Tokens stored in C...
• Heavily modified MemorySeriesStorage
• Use same chunk format as Prometheus
• Keeps everything in memory (for up to an hou...
External inverted index maintained in DynamoDB, chunks stored in S3
Item in DynamoDB looks like:
{
hash key: “{user ID}:{m...
Evaluation
The Good
• It works! And in ~2
months.
• Seems pretty scalable,
handling two clusters right
now
• Query performance better...
Demo
Lots left to do…
Features:
• Recording rules
• Alerting & Alertmanager
Reliability:
• Replication between
ingesters, commi...
Questions?
Try it out!
Email help@weave.works for
instructions and to get on white list
https://github.com/tomwilkie/prome...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service
Upcoming SlideShare
Loading in …5
×

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

2,910 views

Published on

In this talk we present a prototype solution for multitenant, scale-out Prometheus. Don't worry, its open source!

Our solution turns a lot of the Prometheus architectural assumptions on its head, by marrying a scale-out PromQL query engine with a storage layer based on DynamoDB & S3. We have disaggregated the Prometheus binary into a microservices-style architecture, with separate services for distribution, ingest and storage. By designing all these services as fungible replicas, this solution can be scaled out with ease and failure of any individual replica can be dealt with gracefully.

This multitenant, scale-out Prometheus service forms a core component of the Weave Cloud, a hosted management, monitoring and visualisation platform for microservice & containerised applications. This platform is built from 100% open source components, and we're working with the Prometheus community to contribute all the changes we've made back to Prometheus.

Published in: Technology
  • Be the first to comment

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

  1. 1. Project Frankenstein A multi-tenant, horizontally scalable Prometheus as a Service Tom Wilkie (& Julius Volz) Weaveworks, August 2016
  2. 2. “the best way to visualise, manage & monitor your cloud native application”
  3. 3. Design
  4. 4. why not just run my own Prometheus? • the as-a-service bit provides authentication and access control • virtually infinite retention; all the state is managed for you, by us • provide a different story around durability, HA and scalability • (eventually) better query performance, especially for long queries
  5. 5. requirements: 1. API compatible with Prometheus 2. easy to operate and manage 3. tens of thousands of users, tens of millions samples/s 4. cost effective to run 5. reuse as much of Prometheus as possible … so we can sell it
  6. 6. Aim: build proof of concept as quickly as possible 16/06 started design doc 22/06 circulated on list 22/06 initial commit 26/07 launch jobs 25/08 give talk! http://goo.gl/prdUYV
  7. 7. Retriever scraping your jobs Your DC Weave Cloud Frontend, Authenticator Distributor Ingester Distributor… IngesterIngester DynamoDB S3
  8. 8. Retriever /bin/prometheus -retrieval-only -storage.remote.generic-url=... Does scraping and relabelling. Is a vanilla Prometheus plus: • Brian Brazil’s generic write PR (#1487) • Some modification to prevent local storage + indexing
  9. 9. • Uses consistent hashing to assign timeseries to Ingesters • Input to hash is (user ID, metric name) • Tokens stored in Consul • Also currently handles queries Distributor http://goo.gl/U9u1U2
  10. 10. • Heavily modified MemorySeriesStorage • Use same chunk format as Prometheus • Keeps everything in memory (for up to an hour) • Also stores in memory inverted index for queries • Flushes chunks to S3 and indexes them in DynamoDB Ingester
  11. 11. External inverted index maintained in DynamoDB, chunks stored in S3 Item in DynamoDB looks like: { hash key: “{user ID}:{metric name}:{hour}”, range key: “{label name}:{label value}:{chunk ID}”, metric: ..., from, through: ..., ID: ..., } DynamoDB S3
  12. 12. Evaluation
  13. 13. The Good • It works! And in ~2 months. • Seems pretty scalable, handling two clusters right now • Query performance better than expected The Ugly: the code… The Bad • Hashing scheme means can’t do queries that don’t involve metric names. • Possible to hotspot an ingester
  14. 14. Demo
  15. 15. Lots left to do… Features: • Recording rules • Alerting & Alertmanager Reliability: • Replication between ingesters, commit log etc • Ingestor lifecycle • Separate query service? Performance: • Query parallelisation • Background chunk coalescing Code: • Code cleanup • Upstream appropriate changes
  16. 16. Questions? Try it out! Email help@weave.works for instructions and to get on white list https://github.com/tomwilkie/prometheus

×