Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Operating Prometheus
モニタリング勉強会
2017/10/27 @kfdm
Self Introduction
• Paul Traylor
• LINE Fukuoka 開発室
• Currently responsible for updating monitoring environment at
LINE Fu...
Operating Prometheus at LINE Fukuoka
• 4 HA Pairs
• ~2000 targets
per machine
• ~800k samples
per machine
• ~3.5 million s...
Scaling Prometheus ‒ HA
• Run multiple Prometheus
instance with the same targets
• Alerts are de-duplicated by Alertmanager
Scaling Prometheus ‒ Shard
• Split targets
across multiple
servers
• Alertmanager
de-duplicates
alerts
• Proxy or remote
r...
Prometheus 1.8 ‒ Storage Format
https://promcon.io/2016-berlin/talks/the-prometheus-time-series-database/
http://labs.gree...
Prometheus 2.0 ‒ New Storage Format
https://promcon.io/2017-munich/slides/storing-16-bytes-at-scale.pdf
https://fabxc.org/...
Prometheus 2.0 ‒ Backups
├── 01BX40G8TA6T1MNSS8JJE7ENPY/
│ ├── chunks/
│ ├── index
│ ├── meta.json
│ └── tombstones
├── 01...
Prometheus 2.0 ‒ Flag Changes
• Most flags move from single dash to double dash
• Many storage settings move to tsdb setti...
Prometheus 2.0 ‒ Rule Format Changes
https://www.robustperception.io/converting-rules-to-
the-prometheus-2-0-format/
group...
Prometheus 2.0 ‒ Migration
Prometheus 2.0 ‒ Remote Read
• Prometheus 1.8 (Read)
• InfluxDB (Read and Write)
• Graphite (Write)
• OpenTSDB (Write)
• T...
Open Metrics
• https://github.com/RichiH/OpenMetrics
• https://github.com/RichiH/OpenMetrics/blob/master/CONT
RIBUTORS.md
Questions?
Upcoming SlideShare
Loading in …5
×

20171027 モニタリング勉強会

3,321 views

Published on

https://mackerel-ug.connpass.com/event/68478/
LINEファミリーアプリのモニタリングにPrometheusを使用しています。
運用で得られた知見についてお話します。

Prometheus自体の運用
scaling
Prometheus v2.0

Published in: Software
  • Be the first to comment

20171027 モニタリング勉強会

  1. 1. Operating Prometheus モニタリング勉強会 2017/10/27 @kfdm
  2. 2. Self Introduction • Paul Traylor • LINE Fukuoka 開発室 • Currently responsible for updating monitoring environment at LINE Fukuoka • https://github.com/line/promgen • https://promcon.io/2017-munich/talks/prometheus-as-a- internal-service/
  3. 3. Operating Prometheus at LINE Fukuoka • 4 HA Pairs • ~2000 targets per machine • ~800k samples per machine • ~3.5 million samples • ~7000 exporters https://github.com/line/promgen
  4. 4. Scaling Prometheus ‒ HA • Run multiple Prometheus instance with the same targets • Alerts are de-duplicated by Alertmanager
  5. 5. Scaling Prometheus ‒ Shard • Split targets across multiple servers • Alertmanager de-duplicates alerts • Proxy or remote read
  6. 6. Prometheus 1.8 ‒ Storage Format https://promcon.io/2016-berlin/talks/the-prometheus-time-series-database/ http://labs.gree.jp/blog/2017/10/16614/ • One series per file • Rewrites may have to touch millions of files • Queries also may touch millions of files • No easy way to backup
  7. 7. Prometheus 2.0 ‒ New Storage Format https://promcon.io/2017-munich/slides/storing-16-bytes-at-scale.pdf https://fabxc.org/blog/2017-04-10-writing-a-tsdb/ • Chunks stored in buckets by time • Chunks past retention setting are just deleted • Easier to backup • Easier to compress
  8. 8. Prometheus 2.0 ‒ Backups ├── 01BX40G8TA6T1MNSS8JJE7ENPY/ │ ├── chunks/ │ ├── index │ ├── meta.json │ └── tombstones ├── 01BX5Y9SSE10VBZK4CMZ86WDR6/ │ ├── chunks/ │ ├── index │ ├── meta.json │ └── tombstones ├── lock └── wal/ ├── 000760 └── 000761 • https://github.com/Gouthamve/agni
  9. 9. Prometheus 2.0 ‒ Flag Changes • Most flags move from single dash to double dash • Many storage settings move to tsdb settings • -config.file -> --config.file • -storage.local.path -> --storage.tsdb.path
  10. 10. Prometheus 2.0 ‒ Rule Format Changes https://www.robustperception.io/converting-rules-to- the-prometheus-2-0-format/ groups: - name: alert.rules rules: - alert: HighErrorRate expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5 for: 10m annotations: summary: High request latency - alert: DailyTest expr: vector(1) for: 1m annotations: summary: Daily alert test • ./promtool update rules /path/to/rules
  11. 11. Prometheus 2.0 ‒ Migration
  12. 12. Prometheus 2.0 ‒ Remote Read • Prometheus 1.8 (Read) • InfluxDB (Read and Write) • Graphite (Write) • OpenTSDB (Write) • TimescaledB (Read and Write) • https://prometheus.io/docs/operating/integrations/ • https://github.com/prometheus/prometheus/tree/master/do cumentation/examples/remote_storage/remote_storage_ada pter
  13. 13. Open Metrics • https://github.com/RichiH/OpenMetrics • https://github.com/RichiH/OpenMetrics/blob/master/CONT RIBUTORS.md
  14. 14. Questions?

×