Project Frankenstein
A multi-tenant, horizontally scalable Prometheus as a Service
Tom Wilkie (& Julius Volz)
Weaveworks, September 2016
Prometheus and Kubernetes: A Perfect Match
Prometheus and Kubernetes: Deploying
https://www.weave.works/blog/
Prometheus and Kubernetes: Monitoring Your Applications
Prometheus and Kubernetes: Monitoring Your Infrastructure
Design
requirements:
1. API compatible with Prometheus
2. easy to operate and manage
3. tens of thousands of users, tens of
millions samples/s
4. cost effective to run
5. reuse as much of Prometheus as possible
… so we can sell it
Aim: build proof of concept as
quickly as possible
16/06 started design doc
26/07 launch jobs
25/08 give talk at Promcon!
20/09 give talk at Prometheus SF
http://goo.gl/prdUYV
Retriever
scraping
your jobs
Your DC
Weave Cloud
Frontend,
Authenticator
Distributor
Ingester
Distributor…
IngesterIngester
DynamoDB S3
Is a vanilla OSS Prometheus. Does service discovery, scraping and
relabelling.
Configured to send samples to Weave Cloud:
remote_write:
- url: https://cloud.weave.works/api/prom/push
basic_auth:
password: <redacted>
PRs just got merged up stream for generic write path: #1930 #1957 #1987
Retriever
Retriever
scraping
your jobs
Your DC
Weave Cloud
Frontend,
Authenticator
Distributor
Ingester
Distributor…
IngesterIngester
DynamoDB S3
• Uses consistent hashing to assign
timeseries to Ingesters
• Input to hash is (user ID, metric
name)
• Tokens stored in Consul
• Also currently handles queries
Distributor
http://goo.gl/U9u1U2
Retriever
scraping
your jobs
Your DC
Weave Cloud
Frontend,
Authenticator
Distributor
Ingester
Distributor…
IngesterIngester
DynamoDB S3
• Heavily modified MemorySeriesStorage
• Use same chunk format as Prometheus
• Keeps everything in memory (for up to an hour)
• Also stores in memory inverted index for queries
• Flushes chunks to S3 and indexes them in DynamoDB
Ingester
Retriever
scraping
your jobs
Your DC
Weave Cloud
Frontend,
Authenticator
Distributor
Ingester
Distributor…
IngesterIngester
DynamoDB S3
External inverted index maintained in DynamoDB, chunks stored in S3
Item in DynamoDB looks like:
{
hash key: “{user ID}:{metric name}:{hour}”,
range key: “{label name}:{label value}:{chunk ID}”,
metric: ...,
from, through: ...,
ID: ...,
}
DynamoDB S3
Evaluation
The Good
• It works! And in ~2
months.
• Seems pretty scalable,
handling two clusters right
now
• Query performance better
than expected
The Bad
• Hashing scheme means
can’t do queries that don’t
involve metric names.
• Possible to hotspot an
ingester
Why not just run my own Prometheus?
Demo
Lots left to do…
Features:
• Recording rules
• Alerting & Alertmanager
Reliability:
• Replication between
ingesters, commit log etc
• Ingestor lifecycle
• Separate query service?
Performance:
• Query parallelisation
• Background chunk
coalescing
Code:
• Code cleanup
• Upstream appropriate
changes
Questions?
Try it out!
Email help@weave.works for
instructions and to get on white list
https://github.com/weaveworks/frankenstein
We’re hiring!
London BerlinSan Francisco

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a service

  • 1.
    Project Frankenstein A multi-tenant,horizontally scalable Prometheus as a Service Tom Wilkie (& Julius Volz) Weaveworks, September 2016
  • 6.
    Prometheus and Kubernetes:A Perfect Match Prometheus and Kubernetes: Deploying https://www.weave.works/blog/ Prometheus and Kubernetes: Monitoring Your Applications Prometheus and Kubernetes: Monitoring Your Infrastructure
  • 7.
  • 8.
    requirements: 1. API compatiblewith Prometheus 2. easy to operate and manage 3. tens of thousands of users, tens of millions samples/s 4. cost effective to run 5. reuse as much of Prometheus as possible … so we can sell it
  • 9.
    Aim: build proofof concept as quickly as possible 16/06 started design doc 26/07 launch jobs 25/08 give talk at Promcon! 20/09 give talk at Prometheus SF http://goo.gl/prdUYV
  • 10.
    Retriever scraping your jobs Your DC WeaveCloud Frontend, Authenticator Distributor Ingester Distributor… IngesterIngester DynamoDB S3
  • 11.
    Is a vanillaOSS Prometheus. Does service discovery, scraping and relabelling. Configured to send samples to Weave Cloud: remote_write: - url: https://cloud.weave.works/api/prom/push basic_auth: password: <redacted> PRs just got merged up stream for generic write path: #1930 #1957 #1987 Retriever
  • 12.
    Retriever scraping your jobs Your DC WeaveCloud Frontend, Authenticator Distributor Ingester Distributor… IngesterIngester DynamoDB S3
  • 13.
    • Uses consistenthashing to assign timeseries to Ingesters • Input to hash is (user ID, metric name) • Tokens stored in Consul • Also currently handles queries Distributor http://goo.gl/U9u1U2
  • 14.
    Retriever scraping your jobs Your DC WeaveCloud Frontend, Authenticator Distributor Ingester Distributor… IngesterIngester DynamoDB S3
  • 15.
    • Heavily modifiedMemorySeriesStorage • Use same chunk format as Prometheus • Keeps everything in memory (for up to an hour) • Also stores in memory inverted index for queries • Flushes chunks to S3 and indexes them in DynamoDB Ingester
  • 16.
    Retriever scraping your jobs Your DC WeaveCloud Frontend, Authenticator Distributor Ingester Distributor… IngesterIngester DynamoDB S3
  • 17.
    External inverted indexmaintained in DynamoDB, chunks stored in S3 Item in DynamoDB looks like: { hash key: “{user ID}:{metric name}:{hour}”, range key: “{label name}:{label value}:{chunk ID}”, metric: ..., from, through: ..., ID: ..., } DynamoDB S3
  • 18.
  • 19.
    The Good • Itworks! And in ~2 months. • Seems pretty scalable, handling two clusters right now • Query performance better than expected The Bad • Hashing scheme means can’t do queries that don’t involve metric names. • Possible to hotspot an ingester
  • 20.
    Why not justrun my own Prometheus?
  • 21.
  • 22.
    Lots left todo… Features: • Recording rules • Alerting & Alertmanager Reliability: • Replication between ingesters, commit log etc • Ingestor lifecycle • Separate query service? Performance: • Query parallelisation • Background chunk coalescing Code: • Code cleanup • Upstream appropriate changes
  • 23.
    Questions? Try it out! Emailhelp@weave.works for instructions and to get on white list https://github.com/weaveworks/frankenstein
  • 24.