SlideShare a Scribd company logo
1 of 28
Download to read offline
Gnocchi Numbers
(more) Benchmarking 2.1.x
Test Configuration
- 4 physical hosts
- CentOS 7.2.1511
- 24 physical cores (hyperthreaded), 256 GB memory
- 25 - 1TB disks, 10K RPM
- 1Gb network
- PostgreSQL 9.2.15 (single node)
- Shared with ceph and compute service
Default everything, except 300 connections vs 100(default)
- Ceph 10.2.2 (4 nodes, 1 monitoring, 3 OSD)
- 30 OSDs (1 TB disk), Journals share SSD, 2 replica, 2048 placement groups
- OSD nodes shared with (idle) compute service
- Gnocchi Master (~ June 3rd, 2016)
Host Configuration
- Host1
- OpenStack Controller Node (Ceilometer, Heat, Nova-stuff, Neutron, Cinder, Glance, Horizon)
- Ceph Monitoring service
- Gnocchi API
- Host2
- OpenStack Compute Node
- Ceph OSD node (10 OSDs)
- Host3
- Ceph OSD node (10 OSDs)
- Host4
- OpenStack Compute Node
- Ceph OSD node (10 OSDs)
- PostgreSQL
Testing Methodology
- Start 3 metricd services - 24 workers each
- POST 1000 generic resources spread across 20 workers, 20 metrics each.
- POST Every 10 minutes
- 1 minute granularity, 10 points/metric/request
- 20 000 metrics, medium archive policy
- 1 min for a day, 1 hr for a week, 1 day for a year, 8 aggregates each
Batch1 metricd details
- POST time (50 posts) - avg=10.8s (-65.5%), stdev=0.79
- Injection time - ~ 144 seconds
- Stats
- Per metric injection - avg=0.462s, min=0.235s, max=1.693s, stdev=0.174
- Average IO time - ~66% of _add_measures()
- Overhead - ~10.8% (~9.89% minus all IO once metric locked)
- Comparison to 20OSD w/ shared journal
- POST - 65.5% quicker
- Injection time - 27% quicker
Batch2 metricd details
- POST time (50 posts) - avg=30.6s, stdev=2.72
- Injection time - ~ 400 seconds
- Stats
- Per metric injection - avg=1.316s, min=0.286s, max=5.758s, stdev=0.844
- Average IO time - ~76.0% of _add_measures()
- Overhead - ~9.23% (~6.78% minus all IO once metric locked)
- Comparison to 20OSD w/ shared journal
- POST - 70% quicker
- Injection time - 28.4% quicker
Batch3 metricd details
- POST time (50 posts) - avg=30.2s, stdev=2.87
- Injection time - ~ 408 seconds
- Stats
- Per metric injection - avg=1.33s, min=0.285s, max=5.647s, stdev=0.824
- Average IO time - ~74.9% of _add_measures()
- Overhead - ~9.58% (~6.95% minus all IO once metric locked)
- Comparison to 20OSD w/ shared journal
- POST - 65.4% quicker
- Injection time - 26% quicker
Metric Processing Rate
Job Distribution
Gnocchi Contention
Estimated 37%
wasted on no op*
Estimated 13%
wasted on no op*
* based on assumption
each contention
wastes 1.6ms
Ceph Profile
Ceph Profile
- Read speed
- avg = 6727 kB/s (+32%)
- max = 28293 kB/s (+47%)
- stdev = 4185 (+69%)
- Write speed
- avg = 1565 kB/s (+36%)
- max = 8655 kB/s (+94%)
- stdev = 1262 (+65%)
- Operations
- avg = 8349 op/s (+36%)
- max = 31791 op/s (+62%)
- stdev = 5289 (+77%)
Difference compared to 20OSD, non-SSD deployment
Tuning Ceph
Hardware Configurations
- Ceph 10.2.2
- 30 OSDs (1 TB disk), Journals share SSD, 2 replica, 2048 placement groups
- OSD nodes shared with (idle) compute service
- Network File System
- 8 - 1TB 10K HDD, RAID0
- Separate host from metricd services
Ceph Hardware - Processing Rate
Ceph Hardware - Processing Rate
Ceph Test Configurations
‘Default’ (30OSD+JOURNAL SSD)
[osd]
osd journal size = 10000
osd pool default size = 3
osd pool default min size = 2
osd crush chooseleaf type = 1
8 Threads
[osd]
osd journal size = 10000
osd pool default size = 3
osd pool default min size = 2
osd crush chooseleaf type = 1
osd op threads = 8
filestore op threads = 8
journal max write entries = 50000
journal queue max ops = 50000
24 Threads
[osd]
osd journal size = 10000
osd pool default size = 3
osd pool default min size = 2
osd crush chooseleaf type = 1
osd op threads = 24
filestore op threads = 24
journal max write entries = 50000
journal queue max ops = 50000
36 Threads
[osd]
osd journal size = 10000
osd pool default size = 3
osd pool default min size = 2
osd crush chooseleaf type = 1
osd op threads = 36
filestore op threads = 36
journal max write entries = 50000
journal queue max ops = 50000
36 + fs queue
[osd]
osd journal size = 10000
osd pool default size = 3
osd pool default min size = 2
osd crush chooseleaf type = 1
osd op threads = 36
filestore op threads = 36
filestore queue max ops = 50000
filestore queue committing max ops = 50000
journal max write entries = 50000
journal queue max ops = 50000
Ceph Configurations - Metrics processed per 5s
Ceph Configurations - Processing Rate
Tuned vs Untuned
- Comparing Batch3 (36 + fs queue) vs Batch3 (default)
- POST time (50 posts) - avg=21.1s (-30.1%), stdev=0.904 (-68.5%)
- Injection time - ~ 199 seconds (-51.2%)
- Stats
- Per metric injection
- avg=0.596s(-55.2%)
- stdev=0.477(-42.1%)
- min=0.286s(+0%)
- max=9.12s (+38%)
- Overhead - ~15.2% (~14.1% minus all IO once metric locked)
- Consistent write performance between batches!
Ceph Profile
- Read speed
- avg = 10978 kB/s (+63%)
- max = 27104 kB/s (-4%)
- stdev = 5230 (+25%)
- Write speed
- avg = 2521 kB/s (+61%)
- max = 5304 kB/s (-39%)
- stdev = 994(-21%)
- Operations
- avg = 13534 op/s (+62%)
- max = 30398 op/s (-4%)
- stdev = 5739(+9%)
Difference compared to default 30OSD+SSD journal configuration using standard Ceph configurations
Gnocchi Design Tuning
Optimisation Opportunities
- Gnocchi has a lot of IO
- By default, over 25 reads and 25 writes for every single metric
- Serialising and deserialising each time
- Degradation as number of points grows (up to object split size)
- Needs to read in full object with related points, update, and write full object for each aggregate
even if updating one point out of thousands.
Current Serialisation
Simpler serialisation merged into master and
backported to 2.1
Effects of IO
Serialisation Format
Existing
{‘values’:{<timestamp>: float,
<timestamp>: float,
...
<timestamp>: float}}
- ~18B/point or ~10B/point (compressed)
- Not appendable
- Msgpack serialisation, super fast
Proposed
delimiter+float+delimiter+float+.
..+delimiter+float
- 9B/point (or much if compressed)
- Appendable
- Delimiter can be used to describe subsequent
bytes
- Timestamp computed by offset
- eg. Position 9 to 17 is data x seconds from
start
- Zero padding required if first point not start of split
- Handles compression much better
Comparing Serialisation Formats
Existing deserialisation needs to be sorted. It
is more comparable if factored in.
Looking to 3.x
- Testing larger datasets (a few thousand points/metric)
- Benchmarking new proposed format
- Study effects of alternative storage solutions
- Try to add in support for intermediary storage in memory

More Related Content

What's hot

Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
DataStax
 
1404 app dev series - session 8 - monitoring & performance tuning
1404   app dev series - session 8 - monitoring & performance tuning1404   app dev series - session 8 - monitoring & performance tuning
1404 app dev series - session 8 - monitoring & performance tuning
MongoDB
 
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows AzureCloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
Ankur Dave
 

What's hot (20)

OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
 
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov
 
Thanos - Prometheus on Scale
Thanos - Prometheus on ScaleThanos - Prometheus on Scale
Thanos - Prometheus on Scale
 
Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016
 
Metrics: where and how
Metrics: where and howMetrics: where and how
Metrics: where and how
 
Neo4j after 1 year in production
Neo4j after 1 year in productionNeo4j after 1 year in production
Neo4j after 1 year in production
 
1404 app dev series - session 8 - monitoring & performance tuning
1404   app dev series - session 8 - monitoring & performance tuning1404   app dev series - session 8 - monitoring & performance tuning
1404 app dev series - session 8 - monitoring & performance tuning
 
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows AzureCloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
 
MesosCon 2018
MesosCon 2018MesosCon 2018
MesosCon 2018
 
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)
Путь мониторинга 2.0 всё стало другим / Всеволод Поляков (Grammarly)
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
Linux Cluster and Distributed Resource Manager
Linux Cluster and Distributed Resource ManagerLinux Cluster and Distributed Resource Manager
Linux Cluster and Distributed Resource Manager
 
Всеволод Поляков (DevOps Team Lead в Grammarly)
Всеволод Поляков (DevOps Team Lead в Grammarly)Всеволод Поляков (DevOps Team Lead в Grammarly)
Всеволод Поляков (DevOps Team Lead в Grammarly)
 
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
 
Weather of the Century: Design and Performance
Weather of the Century: Design and PerformanceWeather of the Century: Design and Performance
Weather of the Century: Design and Performance
 

Similar to Gnocchi Profiling v2

Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
DataStax
 
Summit demystifying systemd1
Summit demystifying systemd1Summit demystifying systemd1
Summit demystifying systemd1
Susant Sahani
 

Similar to Gnocchi Profiling v2 (20)

Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud Stroage
 
orca_fosdem_FINAL
orca_fosdem_FINALorca_fosdem_FINAL
orca_fosdem_FINAL
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on Solaris
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
 
System Capa Planning_DBA oracle edu
System Capa Planning_DBA oracle eduSystem Capa Planning_DBA oracle edu
System Capa Planning_DBA oracle edu
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )  Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators
 
Geneve
GeneveGeneve
Geneve
 
Summit demystifying systemd1
Summit demystifying systemd1Summit demystifying systemd1
Summit demystifying systemd1
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Gnocchi Profiling v2

  • 2. Test Configuration - 4 physical hosts - CentOS 7.2.1511 - 24 physical cores (hyperthreaded), 256 GB memory - 25 - 1TB disks, 10K RPM - 1Gb network - PostgreSQL 9.2.15 (single node) - Shared with ceph and compute service Default everything, except 300 connections vs 100(default) - Ceph 10.2.2 (4 nodes, 1 monitoring, 3 OSD) - 30 OSDs (1 TB disk), Journals share SSD, 2 replica, 2048 placement groups - OSD nodes shared with (idle) compute service - Gnocchi Master (~ June 3rd, 2016)
  • 3. Host Configuration - Host1 - OpenStack Controller Node (Ceilometer, Heat, Nova-stuff, Neutron, Cinder, Glance, Horizon) - Ceph Monitoring service - Gnocchi API - Host2 - OpenStack Compute Node - Ceph OSD node (10 OSDs) - Host3 - Ceph OSD node (10 OSDs) - Host4 - OpenStack Compute Node - Ceph OSD node (10 OSDs) - PostgreSQL
  • 4. Testing Methodology - Start 3 metricd services - 24 workers each - POST 1000 generic resources spread across 20 workers, 20 metrics each. - POST Every 10 minutes - 1 minute granularity, 10 points/metric/request - 20 000 metrics, medium archive policy - 1 min for a day, 1 hr for a week, 1 day for a year, 8 aggregates each
  • 5. Batch1 metricd details - POST time (50 posts) - avg=10.8s (-65.5%), stdev=0.79 - Injection time - ~ 144 seconds - Stats - Per metric injection - avg=0.462s, min=0.235s, max=1.693s, stdev=0.174 - Average IO time - ~66% of _add_measures() - Overhead - ~10.8% (~9.89% minus all IO once metric locked) - Comparison to 20OSD w/ shared journal - POST - 65.5% quicker - Injection time - 27% quicker
  • 6. Batch2 metricd details - POST time (50 posts) - avg=30.6s, stdev=2.72 - Injection time - ~ 400 seconds - Stats - Per metric injection - avg=1.316s, min=0.286s, max=5.758s, stdev=0.844 - Average IO time - ~76.0% of _add_measures() - Overhead - ~9.23% (~6.78% minus all IO once metric locked) - Comparison to 20OSD w/ shared journal - POST - 70% quicker - Injection time - 28.4% quicker
  • 7. Batch3 metricd details - POST time (50 posts) - avg=30.2s, stdev=2.87 - Injection time - ~ 408 seconds - Stats - Per metric injection - avg=1.33s, min=0.285s, max=5.647s, stdev=0.824 - Average IO time - ~74.9% of _add_measures() - Overhead - ~9.58% (~6.95% minus all IO once metric locked) - Comparison to 20OSD w/ shared journal - POST - 65.4% quicker - Injection time - 26% quicker
  • 10. Gnocchi Contention Estimated 37% wasted on no op* Estimated 13% wasted on no op* * based on assumption each contention wastes 1.6ms
  • 12. Ceph Profile - Read speed - avg = 6727 kB/s (+32%) - max = 28293 kB/s (+47%) - stdev = 4185 (+69%) - Write speed - avg = 1565 kB/s (+36%) - max = 8655 kB/s (+94%) - stdev = 1262 (+65%) - Operations - avg = 8349 op/s (+36%) - max = 31791 op/s (+62%) - stdev = 5289 (+77%) Difference compared to 20OSD, non-SSD deployment
  • 14. Hardware Configurations - Ceph 10.2.2 - 30 OSDs (1 TB disk), Journals share SSD, 2 replica, 2048 placement groups - OSD nodes shared with (idle) compute service - Network File System - 8 - 1TB 10K HDD, RAID0 - Separate host from metricd services
  • 15. Ceph Hardware - Processing Rate
  • 16. Ceph Hardware - Processing Rate
  • 17. Ceph Test Configurations ‘Default’ (30OSD+JOURNAL SSD) [osd] osd journal size = 10000 osd pool default size = 3 osd pool default min size = 2 osd crush chooseleaf type = 1 8 Threads [osd] osd journal size = 10000 osd pool default size = 3 osd pool default min size = 2 osd crush chooseleaf type = 1 osd op threads = 8 filestore op threads = 8 journal max write entries = 50000 journal queue max ops = 50000 24 Threads [osd] osd journal size = 10000 osd pool default size = 3 osd pool default min size = 2 osd crush chooseleaf type = 1 osd op threads = 24 filestore op threads = 24 journal max write entries = 50000 journal queue max ops = 50000 36 Threads [osd] osd journal size = 10000 osd pool default size = 3 osd pool default min size = 2 osd crush chooseleaf type = 1 osd op threads = 36 filestore op threads = 36 journal max write entries = 50000 journal queue max ops = 50000 36 + fs queue [osd] osd journal size = 10000 osd pool default size = 3 osd pool default min size = 2 osd crush chooseleaf type = 1 osd op threads = 36 filestore op threads = 36 filestore queue max ops = 50000 filestore queue committing max ops = 50000 journal max write entries = 50000 journal queue max ops = 50000
  • 18. Ceph Configurations - Metrics processed per 5s
  • 19. Ceph Configurations - Processing Rate
  • 20. Tuned vs Untuned - Comparing Batch3 (36 + fs queue) vs Batch3 (default) - POST time (50 posts) - avg=21.1s (-30.1%), stdev=0.904 (-68.5%) - Injection time - ~ 199 seconds (-51.2%) - Stats - Per metric injection - avg=0.596s(-55.2%) - stdev=0.477(-42.1%) - min=0.286s(+0%) - max=9.12s (+38%) - Overhead - ~15.2% (~14.1% minus all IO once metric locked) - Consistent write performance between batches!
  • 21. Ceph Profile - Read speed - avg = 10978 kB/s (+63%) - max = 27104 kB/s (-4%) - stdev = 5230 (+25%) - Write speed - avg = 2521 kB/s (+61%) - max = 5304 kB/s (-39%) - stdev = 994(-21%) - Operations - avg = 13534 op/s (+62%) - max = 30398 op/s (-4%) - stdev = 5739(+9%) Difference compared to default 30OSD+SSD journal configuration using standard Ceph configurations
  • 23. Optimisation Opportunities - Gnocchi has a lot of IO - By default, over 25 reads and 25 writes for every single metric - Serialising and deserialising each time - Degradation as number of points grows (up to object split size) - Needs to read in full object with related points, update, and write full object for each aggregate even if updating one point out of thousands.
  • 24. Current Serialisation Simpler serialisation merged into master and backported to 2.1
  • 26. Serialisation Format Existing {‘values’:{<timestamp>: float, <timestamp>: float, ... <timestamp>: float}} - ~18B/point or ~10B/point (compressed) - Not appendable - Msgpack serialisation, super fast Proposed delimiter+float+delimiter+float+. ..+delimiter+float - 9B/point (or much if compressed) - Appendable - Delimiter can be used to describe subsequent bytes - Timestamp computed by offset - eg. Position 9 to 17 is data x seconds from start - Zero padding required if first point not start of split - Handles compression much better
  • 27. Comparing Serialisation Formats Existing deserialisation needs to be sorted. It is more comparable if factored in.
  • 28. Looking to 3.x - Testing larger datasets (a few thousand points/metric) - Benchmarking new proposed format - Study effects of alternative storage solutions - Try to add in support for intermediary storage in memory