Submit Search
Upload
Metrics: where and how
Report
Share
Vsevolod Polyakov
Platform Engineer / DevOps
Follow
•
8 likes
•
2,226 views
1
of
65
Metrics: where and how
•
8 likes
•
2,226 views
Report
Share
Download Now
Download to read offline
Software
Graphite tuning story from Kyiv Devops Day 2016
Read more
Vsevolod Polyakov
Platform Engineer / DevOps
Follow
Recommended
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly) by
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)
Ontico
1.5K views
•
169 slides
Всеволод Поляков (DevOps Team Lead в Grammarly) by
Всеволод Поляков (DevOps Team Lead в Grammarly)
Provectus
402 views
•
65 slides
"Metrics: Where and How", Vsevolod Polyakov by
"Metrics: Where and How", Vsevolod Polyakov
Yulia Shcherbachova
1.4K views
•
65 slides
Developing High Performance Application with Aerospike & Go by
Developing High Performance Application with Aerospike & Go
Chris Stivers
3.1K views
•
44 slides
Jvm & Garbage collection tuning for low latencies application by
Jvm & Garbage collection tuning for low latencies application
Quentin Ambard
1.8K views
•
67 slides
Gnocchi v4 (preview) by
Gnocchi v4 (preview)
Gordon Chung
657 views
•
17 slides
More Related Content
What's hot
Gnocchi Profiling v2 by
Gnocchi Profiling v2
Gordon Chung
638 views
•
28 slides
MongoUK 2011 - Rplacing RabbitMQ with MongoDB by
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
Boxed Ice
9.2K views
•
55 slides
Exactly once with spark streaming by
Exactly once with spark streaming
Quentin Ambard
1.6K views
•
57 slides
Thanos - Prometheus on Scale by
Thanos - Prometheus on Scale
Bartłomiej Płotka
1.4K views
•
52 slides
(JVM) Garbage Collection - Brown Bag Session by
(JVM) Garbage Collection - Brown Bag Session
Jens Hadlich
606 views
•
58 slides
JVM performance options. How it works by
JVM performance options. How it works
Dmitriy Dumanskiy
6.7K views
•
37 slides
What's hot
(20)
Gnocchi Profiling v2 by Gordon Chung
Gnocchi Profiling v2
Gordon Chung
•
638 views
MongoUK 2011 - Rplacing RabbitMQ with MongoDB by Boxed Ice
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
Boxed Ice
•
9.2K views
Exactly once with spark streaming by Quentin Ambard
Exactly once with spark streaming
Quentin Ambard
•
1.6K views
Thanos - Prometheus on Scale by Bartłomiej Płotka
Thanos - Prometheus on Scale
Bartłomiej Płotka
•
1.4K views
(JVM) Garbage Collection - Brown Bag Session by Jens Hadlich
(JVM) Garbage Collection - Brown Bag Session
Jens Hadlich
•
606 views
JVM performance options. How it works by Dmitriy Dumanskiy
JVM performance options. How it works
Dmitriy Dumanskiy
•
6.7K views
Go Profiling - John Graham-Cumming by Cloudflare
Go Profiling - John Graham-Cumming
Cloudflare
•
22.3K views
opentsdb in a real enviroment by Chen Robert
opentsdb in a real enviroment
Chen Robert
•
11.6K views
ELK: Moose-ively scaling your log system by Avleen Vig
ELK: Moose-ively scaling your log system
Avleen Vig
•
16.9K views
Go debugging and troubleshooting tips - from real life lessons at SignalFx by SignalFx
Go debugging and troubleshooting tips - from real life lessons at SignalFx
SignalFx
•
49.7K views
Go Memory by Cloudflare
Go Memory
Cloudflare
•
1.8K views
Monitoring MySQL with OpenTSDB by Geoffrey Anderson
Monitoring MySQL with OpenTSDB
Geoffrey Anderson
•
16.6K views
Handling 20 billion requests a month by Dmitriy Dumanskiy
Handling 20 billion requests a month
Dmitriy Dumanskiy
•
1.5K views
Tweaking performance on high-load projects by Dmitriy Dumanskiy
Tweaking performance on high-load projects
Dmitriy Dumanskiy
•
2.5K views
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution by Karan Singh
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Karan Singh
•
6.4K views
HBaseCon 2013: OpenTSDB at Box by Cloudera, Inc.
HBaseCon 2013: OpenTSDB at Box
Cloudera, Inc.
•
8.8K views
Gnocchi v4 - past and present by Gordon Chung
Gnocchi v4 - past and present
Gordon Chung
•
982 views
On heap cache vs off-heap cache by rgrebski
On heap cache vs off-heap cache
rgrebski
•
5.7K views
Java 어플리케이션 성능튜닝 Part1 by 상욱 송
Java 어플리케이션 성능튜닝 Part1
상욱 송
•
948 views
Odoo Performance Limits by Odoo
Odoo Performance Limits
Odoo
•
1.1K views
Viewers also liked
Путь мониторинга, DevOps club в Grammarly by
Путь мониторинга, DevOps club в Grammarly
Vsevolod Polyakov
1.5K views
•
28 slides
Monitoring base, golang meetup, kyiv by
Monitoring base, golang meetup, kyiv
Vsevolod Polyakov
403 views
•
41 slides
Путь мониторинга: модульность, гибкость, devops by
Путь мониторинга: модульность, гибкость, devops
Vsevolod Polyakov
451 views
•
19 slides
Мониторинг. Опять, rootconf 2016 by
Мониторинг. Опять, rootconf 2016
Vsevolod Polyakov
587 views
•
113 slides
Chef wtf by
Chef wtf
Vsevolod Polyakov
385 views
•
21 slides
Federated Graphite in Docker - Denver Docker Meetup by
Federated Graphite in Docker - Denver Docker Meetup
Phil Zimmerman
2.5K views
•
50 slides
Viewers also liked
(20)
Путь мониторинга, DevOps club в Grammarly by Vsevolod Polyakov
Путь мониторинга, DevOps club в Grammarly
Vsevolod Polyakov
•
1.5K views
Monitoring base, golang meetup, kyiv by Vsevolod Polyakov
Monitoring base, golang meetup, kyiv
Vsevolod Polyakov
•
403 views
Путь мониторинга: модульность, гибкость, devops by Vsevolod Polyakov
Путь мониторинга: модульность, гибкость, devops
Vsevolod Polyakov
•
451 views
Мониторинг. Опять, rootconf 2016 by Vsevolod Polyakov
Мониторинг. Опять, rootconf 2016
Vsevolod Polyakov
•
587 views
Chef wtf by Vsevolod Polyakov
Chef wtf
Vsevolod Polyakov
•
385 views
Federated Graphite in Docker - Denver Docker Meetup by Phil Zimmerman
Federated Graphite in Docker - Denver Docker Meetup
Phil Zimmerman
•
2.5K views
Infrastructure as code might be literally impossible part 2 by ice799
Infrastructure as code might be literally impossible part 2
ice799
•
2.1K views
DevOps в реальном времени by Andriy Samilyak
DevOps в реальном времени
Andriy Samilyak
•
1.3K views
DevOps in realtime by Andriy Samilyak
DevOps in realtime
Andriy Samilyak
•
1.3K views
Сергей Кибиткин - Meet Magento Ukraine - Что вы никогда не сделаете в Magento by Atwix
Сергей Кибиткин - Meet Magento Ukraine - Что вы никогда не сделаете в Magento
Atwix
•
466 views
Александр Смага, Юрий Муратов - Meet Magento Ukraine - Технический обзор OroCRM by Atwix
Александр Смага, Юрий Муратов - Meet Magento Ukraine - Технический обзор OroCRM
Atwix
•
810 views
Itea dev ops_course_topic2 by Vadym Tymoshyk
Itea dev ops_course_topic2
Vadym Tymoshyk
•
530 views
Андрей Самиляк - Meet Magento Ukraine - Как мы играли в DevOps и как получилс... by Atwix
Андрей Самиляк - Meet Magento Ukraine - Как мы играли в DevOps и как получилс...
Atwix
•
387 views
Functional web with elixir and elm in phoenix by Izzet Mustafaiev
Functional web with elixir and elm in phoenix
Izzet Mustafaiev
•
559 views
Roman Valchuk "Introducing to DevOps technologies" by Vadym Muliavka
Roman Valchuk "Introducing to DevOps technologies"
Vadym Muliavka
•
407 views
Final melnyk 2 by Roman Melnyk
Final melnyk 2
Roman Melnyk
•
294 views
#nostaging - Software Circus - Amsterdam, 2-9-2016 by Pavel Chunyayev
#nostaging - Software Circus - Amsterdam, 2-9-2016
Pavel Chunyayev
•
800 views
Continuous integration with Docker and Ansible by Dmytro Slupytskyi
Continuous integration with Docker and Ansible
Dmytro Slupytskyi
•
1.2K views
Implementing DevOps In Practice by Zoltán Németh
Implementing DevOps In Practice
Zoltán Németh
•
1.5K views
JUST EAT: Embracing DevOps by Peter Mounce
JUST EAT: Embracing DevOps
Peter Mounce
•
5.4K views
Similar to Metrics: where and how
Managing terabytes: When Postgres gets big by
Managing terabytes: When Postgres gets big
Selena Deckelmann
753 views
•
29 slides
Managing terabytes: When PostgreSQL gets big by
Managing terabytes: When PostgreSQL gets big
Selena Deckelmann
3.9K views
•
29 slides
Am I reading GC logs Correctly? by
Am I reading GC logs Correctly?
Tier1 App
2K views
•
27 slides
Vaex pygrunn by
Vaex pygrunn
Maarten Breddels
328 views
•
32 slides
JDD2015: On-heap cache vs Off-heap cache - Radek Grębski by
JDD2015: On-heap cache vs Off-heap cache - Radek Grębski
PROIDEA
197 views
•
39 slides
Pick diamonds from garbage by
Pick diamonds from garbage
Tier1 App
1K views
•
28 slides
Similar to Metrics: where and how
(20)
Managing terabytes: When Postgres gets big by Selena Deckelmann
Managing terabytes: When Postgres gets big
Selena Deckelmann
•
753 views
Managing terabytes: When PostgreSQL gets big by Selena Deckelmann
Managing terabytes: When PostgreSQL gets big
Selena Deckelmann
•
3.9K views
Am I reading GC logs Correctly? by Tier1 App
Am I reading GC logs Correctly?
Tier1 App
•
2K views
Vaex pygrunn by Maarten Breddels
Vaex pygrunn
Maarten Breddels
•
328 views
JDD2015: On-heap cache vs Off-heap cache - Radek Grębski by PROIDEA
JDD2015: On-heap cache vs Off-heap cache - Radek Grębski
PROIDEA
•
197 views
Pick diamonds from garbage by Tier1 App
Pick diamonds from garbage
Tier1 App
•
1K views
Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14 by Jayesh Thakrar
Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14
Jayesh Thakrar
•
488 views
Hadoop performance optimization tips by Subhas Kumar Ghosh
Hadoop performance optimization tips
Subhas Kumar Ghosh
•
879 views
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector by Gurpreet Sachdeva
Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector
Gurpreet Sachdeva
•
1.8K views
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim... by InfluxData
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
•
740 views
Adaptive Linear Solvers and Eigensolvers by inside-BigData.com
Adaptive Linear Solvers and Eigensolvers
inside-BigData.com
•
269 views
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv... by MongoDB
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
MongoDB
•
602 views
Couchbase live 2016 by Pierre Mavro
Couchbase live 2016
Pierre Mavro
•
452 views
Top 5 mistakes when writing Spark applications by hadooparchbook
Top 5 mistakes when writing Spark applications
hadooparchbook
•
11.3K views
Top 5 Mistakes When Writing Spark Applications by Spark Summit
Top 5 Mistakes When Writing Spark Applications
Spark Summit
•
26.4K views
Tweaking perfomance on high-load projects_Думанский Дмитрий by GeeksLab Odessa
Tweaking perfomance on high-load projects_Думанский Дмитрий
GeeksLab Odessa
•
10.6K views
Top 5 mistakes when writing Spark applications by markgrover
Top 5 mistakes when writing Spark applications
markgrover
•
394 views
Top 5 mistakes when writing Spark applications by hadooparchbook
Top 5 mistakes when writing Spark applications
hadooparchbook
•
14.6K views
GC in Ruby. RubyC, Kiev, 2014. by Timothy Tsvetkov
GC in Ruby. RubyC, Kiev, 2014.
Timothy Tsvetkov
•
3.5K views
Real Time Analytics - Stream Processing (Colombo big data meetup 18/05/2017) by mahesh madushanka
Real Time Analytics - Stream Processing (Colombo big data meetup 18/05/2017)
mahesh madushanka
•
1K views
Recently uploaded
Bootstrapping vs Venture Capital.pptx by
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic
16 views
•
17 slides
predicting-m3-devopsconMunich-2023-v2.pptx by
predicting-m3-devopsconMunich-2023-v2.pptx
Tier1 app
14 views
•
33 slides
University of Borås-full talk-2023-12-09.pptx by
University of Borås-full talk-2023-12-09.pptx
Mahdi_Fahmideh
12 views
•
51 slides
nintendo_64.pptx by
nintendo_64.pptx
paiga02016
7 views
•
7 slides
Flask-Python by
Flask-Python
Triloki Gupta
10 views
•
12 slides
.NET Deserialization Attacks by
.NET Deserialization Attacks
Dharmalingam Ganesan
7 views
•
50 slides
Recently uploaded
(20)
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic
•
16 views
predicting-m3-devopsconMunich-2023-v2.pptx by Tier1 app
predicting-m3-devopsconMunich-2023-v2.pptx
Tier1 app
•
14 views
University of Borås-full talk-2023-12-09.pptx by Mahdi_Fahmideh
University of Borås-full talk-2023-12-09.pptx
Mahdi_Fahmideh
•
12 views
nintendo_64.pptx by paiga02016
nintendo_64.pptx
paiga02016
•
7 views
Flask-Python by Triloki Gupta
Flask-Python
Triloki Gupta
•
10 views
.NET Deserialization Attacks by Dharmalingam Ganesan
.NET Deserialization Attacks
Dharmalingam Ganesan
•
7 views
Transport Management System - Shipment & Container Tracking by Freightoscope
Transport Management System - Shipment & Container Tracking
Freightoscope
•
6 views
Introduction to Gradle by John Valentino
Introduction to Gradle
John Valentino
•
7 views
Benefits in Software Development by John Valentino
Benefits in Software Development
John Valentino
•
6 views
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... by Stefan Wolpers
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
Stefan Wolpers
•
44 views
Google Solutions Challenge 2024 Talk pdf by MohdAbdulAleem4
Google Solutions Challenge 2024 Talk pdf
MohdAbdulAleem4
•
34 views
tecnologia18.docx by nosi6702
tecnologia18.docx
nosi6702
•
6 views
What is API by artembondar5
What is API
artembondar5
•
15 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill
Freightoscope
•
6 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert
•
35 views
Page Object Model by artembondar5
Page Object Model
artembondar5
•
7 views
Understanding HTML terminology by artembondar5
Understanding HTML terminology
artembondar5
•
8 views
Quality Assurance by interworksoftware2
Quality Assurance
interworksoftware2
•
8 views
Techstack Ltd at Slush 2023, Ukrainian delegation by ViktoriiaOpanasenko
Techstack Ltd at Slush 2023, Ukrainian delegation
ViktoriiaOpanasenko
•
7 views
Top-5-production-devconMunich-2023-v2.pptx by Tier1 app
Top-5-production-devconMunich-2023-v2.pptx
Tier1 app
•
9 views
Metrics: where and how
1.
Metrics: where and
how graphite-oriented story
2.
• Vsevolod Polyakov •
Platform Engineer at Grammarly
3.
Graphite All whisper-based systems
4.
Default graphite architecture
5.
what? • RRD-like (gram.ly/gfsx) •
so.it.is.my.metric → /so/it/is/my/metric.wsp • Fixed retention (by namepattern) • Fixed size (actually no)
6.
Retention and size •
1s:1d → 1 036 828 bytes • 10s:10d → 1 036 828 bytes • 1s:365d → 378 432 028 bytes (1 TB ~ 3 000) • 10s:365d → 37 843 228 bytes (1 TB ~ 30 000) whisper calc
7.
Retention and size •
10s:30d,1m:120d,10m:365d → 4 564 864 bytes • 240 864 metrics in 1 TB • aggregation: average, sum, min, max, and last. • can be assign per metric
8.
How • terraform (https://www.terraform.io/) •
docker (https://www.docker.com/) • ansible (https://www.ansible.com/) • rocker (https://github.com/grammarly/rocker) • rocker-compose (https://github.com/grammarly/rocker-compose)
9.
Default graphite architecture
10.
carbon-cache.py • single-core • many
options in config file • default link
11.
architecture carbon-cache.py
12.
Start load testing •
m4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2) • retentions = 1s:1d • MAX_CACHE_SIZE, MAX_UPDATES_PER_SECOND, MAX_CR • defaults • almost 1.5h to get limit :(
13.
carbon-cache.py cache size
→ 75k reqs
16.
results • 75 000
reqs max • 60 000 reqs flagman speed • IO :(
17.
Try to tune! •
WHISPER_SPARSE_CREATE = true (don’t allocate space on creation) non-linear IO load. • CACHE_WRITE_STRATEGY = sorted (default)
18.
cache size 1k
→ 195k reqs
19.
results • 120 000
reqs flagman speed • cache flush problem :(
20.
Try to tune! •
CACHE_WRITE_STRATEGY = max will give a strong flush preference to frequently updated metrics and will also reduce random file-io.
21.
from 1k to
150k
22.
results • 90 000
reqs flagman speed • cache flush problem :(
23.
Try to tune! •
CACHE_WRITE_STRATEGY = naive just flush. Better with random IO.
24.
from 45k to
135k
25.
results • 120 000
reqs flagman speed • still CPU
26.
sorted max naive
27.
• Maybe it’s
IO EBS limitation? → 512 GB disk. • No.
28.
go-carbon • multi-core single
daemon • written in golang • not many options to tune :( link
29.
Start load testing •
m4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2) • retentions = 1s:1d • max-size = 0 • max-updates-per-second = 0 • almost 1h to get limit :(
30.
1k → 130k
reqs ~3k/min
32.
results • 120 000
reqs flagman speed • but it’s without sparse. • try to implement
33.
try to tune! remaining
:= whisper.Size() - whisper.MetadataSize() whisper.file.Seek(int64(remaining-1), 0) whisper.file.Write([]byte{0}) chunkSize := 16384 zeros := make([]byte, chunkSize) for remaining > chunkSize { // if _, err = whisper.file.Write(zeros); err != nil { // return nil, err // } remaining -= chunkSize } if _, err = whisper.file.Write(zeros[:remaining]); err != nil { return nil, err }
34.
180 000 reqs
!
36.
try to tune! •
max update operation = 1500
37.
results • TLDR 210
000 - 240 000 reqs flagman speed • 31 000 000 cache size!
39.
try to tune! •
max update operation = 0 • input-buffer = 400 000
40.
results • 270 000
reqs flagman speed • 10-20 million req cache size!
42.
try to tune! •
vm.dirty_background_ratio=40 • vm.dirty_ratio=60
43.
300 000 reqs
44.
results • 300 000
reqs flagman speed • 180k+ reqs ±without cache
45.
Re:Lays
46.
Default graphite architecture
47.
arch forward
48.
arch namedregexp
49.
arch hash
50.
arch hash replicafactor:
2
51.
carbon-relay.py • twisted based •
native
52.
Start load testing •
c4.xlarge instance (4 CPU, 7.5 GB ram) • ~1 Gb lan • default parameters • hashing • 10 connections
53.
WTF!
54.
carbon-relay-ng • golang-based • web-panel •
live-updates • aggregators • spooling link
55.
<150 000 reqs
56.
carbon-c-relay • written in
C • advanced cluster management
57.
from 100 000
to 1 600 000 reqs
58.
1 400 000
flagman speed. Or not?
59.
So… go-carbon + carbon-c-relay
= ♡
60.
BTW. influx, 130k
reqs on cluster
61.
influx
62.
openTSDB single instance +
hbase cluster = upto 150k reqs
63.
ALSO • zipper: • https://github.com/grobian/carbonserver •
https://github.com/grobian/carbonwriter • https://github.com/dgryski/carbonzipper • https://github.com/dgryski/carbonapi • https://github.com/dgryski/carbonmem • https://github.com/jssjr/carbonate
64.
plans • Cyanite, retest •
newTS • openTSDB tuninig • zipper tuning
65.
feel free to
ask • Vsevolod Polyakov • ctrlok@gmail.com • skype: ctrlok1987 • github.com/ctrlok • twitter.com/ctrlok • slack: HangOps • Gitter: dev_ua/devops • skype: DevOps from Ukraine
Editor's Notes
последние 2,5 года работаю в граммарли разрабатываем крутые штуки, надо писать много метрик, бла бла бла
очень простой, все знают. Можно сказать стандарт индустрии для метрик
простота быстро читать, быстро писать
отключил логи, у амазона странные EBS
если кеш не сбрасывается то это плохо
скорость записи — большими кусками создаются файлы
линейная скорость чтения
отключил логи, у амазона странные EBS