Boyan Ivanov - latency, the #1 metric of your cloud

•

1 like•671 views

No two clouds are the same. Yet the leading clouds all have one thing in common: they deliver on metrics, which matter to the customer. In this session we'll dissect leading clouds, to show why low latency is the thing that makes a cloud stand out.

Latency: the #1 metric of your
cloud
CloudStack User Group, 14 Mar 2019

ops/transactionspersecond
Dependency of any transaction processing system
Latency
Databases
Web Servers
Load balancers
Storage systems
Etc.

core core core core
task
Balancing work to be done with system resources

core core core core
task task
Balancing work to be done with system resources – 50%

core core core core
task task task task
Balancing work to be done with system resources – 100%

Latency
opspersecond
increasingload0->100%
Elasticity of an “ideal system in vacuum”

Latency
opspersecond
Elasticity of real systems

core core core core
task task task task task task
System overload

Latency
opspersecond
Regimes of any transaction processing system

Latency
opspersecond
best service
lowest cost per
delivered resource

Latency
opspersecond
best service
lowest cost per
delivered resource
only pain

Latency
opspersecond
best service
lowest cost per
delivered resource
only pain
systemthroughput

Latency
opspersecond
best service
lowest cost per
delivered resource
only pain
benchmarks

https://www.digitalocean.com/docs/volumes/

https://www.digitalocean.com/docs/volumes/overview/
Metrics of a Cloud Storage Layer

https://www.digitalocean.com/docs/volumes/overview/
Latency
The most
important
metric is
Missing!
Where is my Latency?!

random read/write 50/50, 4k, QD 1 avg. latency
DreamHost/DreamCompute (Ceph) 4.51 ms
DigitalOcean (Ceph) 1.75 ms
OVH (Ceph) 1.53 ms
Comparing some public Clouds

DreamHost/DreamCompute (Ceph) 4.51 ms
DigitalOcean (Ceph) 1.75 ms
OVH (Ceph) 1.53 ms
AWS EBS gp2 10k ($333/mo) 0.29 ms
Comparing some public Clouds
random read/write 50/50, 4k, QD 1 avg. latency
5 to 15 times faster!
For a lot of $$$ thought!

DreamHost/DreamCompute (Ceph) 4.51 ms
DigitalOcean (Ceph) 1.75 ms
OVH (Ceph) 1.53 ms
AWS EBS gp2 10k ($333/mo) 0.29 ms
eApps ($50/mo) 0.41 ms
Togglebox ($20/mo) 0.21 ms
StorPool BCP (Best Current Practice) 0.17 ms
random read/write 50/50, 4k, QD 1 avg. latency
Best-of-breed Clouds

Latency
opspersecond
10kIOPS
2ms
A volume with 2ms latency…

Latency
opspersecond
10kIOPS
0.3ms
…Vs. a volume with 0.3ms latency

4 vCPUs, 4 GB RAM
vdisk with 10k IOPS
"fast" volume with approx 0.3 ms latency (QD 1)
"slow" volume with 2 ms latency (QD 1)
with dm-delay in host
Both volumes are on the same SSD pool
Both volumes measure 10k IOPS flat
Same test VMs in all Cloud providers 2 disks – slow & fast

4 vCPUs, 4 GB RAM
vdisk with 10k IOPS
pgbench --client=8 --jobs=4
--progress=1 --time=5 pgbench4x
database size: 16 GB (4x RAM)
https://wiki.postgresql.org/wiki/Pgbenchtesting
Same test

DEMO
(embedded video on next slide)
/or click here if video does not play/

4 vCPUs, 4 GB RAM
vdisk with 10k IOPS
2ms storage latency = 900 TPS @ 8 ms pgbench
Results – slow volume

4 vCPUs, 4 GB RAM
vdisk with 10k IOPS
2ms storage latency = 900 TPS @ 8 ms pgbench
-> if we ask for 1200 TPS -> pile up
Results – slow volume under pressure – congestion!

4 vCPUs, 4 GB RAM
vdisk with 10k IOPS
0.3ms storage latency = 1600 TPS @ 5 ms pgbench
Results – fast volume

4 vCPUs, 4 GB RAM
vdisk with 10k IOPS
0.3ms storage latency = 1600 TPS @ 5 ms pgbench
-> if we ask for 1200 TPS - no problem
Results – fast volume under pressure – no problem

8 vCPUs, 16 GB RAM, dedicated if offered, database size = 4x RAM
https://wiki.postgresql.org/wiki/Pgbenchtesting
Comparing Clouds – AWS seems to rule them all?

8 vCPUs, 16 GB RAM, dedicated if offered, database size = 4x RAM
https://wiki.postgresql.org/wiki/Pgbenchtesting
Comparing Clouds – who can run a database?

Boyan Ivanov
StorPool Storage
info@storpool.com
www.storpool.com
@storpool
Thank you + Q&A!

Booking.com uses Sereal in many applications. One of the biggest use case though is the events pipeline. It was built to delivery messages (events) from generation point to various processors in near real-time fashion. These days it servers billions of messages per day. One of our processors recently faced scalability issues due to growth of the volume of delivered events. In this talk I would like to share what problem we had, how we addressed it and which new features of Sereal helped us.

Writing Serverless Application in Java with comparison of 3 approaches: AWS S...

Andrew Zakordonets

Automatic Operation Bot for Ceph - You Ji

Ceph Community

Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...

Ontico

Существует множество архитектур и способов масштабирования систем. Сегодня многие компании мигрируют в облачные сервисы или используют контейнеры. Но действительно ли это так необходимо и нужно ли следовать трендам? В данном докладе мне бы хотелось рассказать об архитектуре, которую я спланировал и внедрил в компании InnoGames. Архитектура, не требующая вмешательства администратора в случае лавинообразного увеличения нагрузки и, что ещё более важно, умеющая редуцироваться в случае отсутствия её для экономии затрат. Вы узнаете об опыте создания сервиса с очень непростыми критериями и поймёте, что не обязательно платить в 3 раза дороже за AWS или любую подобную систему. - Что такое CRM. Зачем нам этот сервис. - Инфраструктура. -- Graphite. Почему он должен быть надежным и быстрым. -- Puppet + gitlab. -- Балансировка нагрузки. -- Наше облако. Зачем нам openstack, когда есть serveradmin!? Как роль сервера определяется несколькими атрибутами в веб-интерфейсе. -- Nagios + аггрегаторы. Другой взгляд на то, как мониторить сервисы через Graphite. -- Мониторинг кластеров. Clusterhc и Grafsy. -- Brassmonkey. Как мы написали своего сисадмина на python. -- Бэкапы. - Архитектура CRM3. - Autoscaling или как проанализировать кучу данных и принять решения.

Nitty Gritty of Adaptive Video Transmuxing in JS

Donato Borrello

Xclone presentation finalBurning Lin

Testing applications with traffic control in containers / Alban Crequy (Kinvolk)

Ontico

Testing applications is important, as shows the rise of continuous integration and automated testing. In this talk, I will focus on one area of testing that is difficult to automate: poor network connectivity. Developers usually work within reliable networking conditions so they might not notice issues that arise in other networking conditions. I will give examples of software that would benefit from test scenarios with varying connectivity. I will explain how traffic control on Linux can help to simulate various network connectivity. Finally, I will run a demo showing how an application running in Kubernetes behaves when changing network parameters.

This talk was given during DockerCon EU 2018. It ain't just a whim - to be able to continue innovating, we’ve moved our good old static production to containers. We needed to be elastic, fast, reliable and production ready at any time - that's why we chose Docker. But like in most enterprises, lots of our apps run on the JVM and most JVMs’ ergonomics assume they “own” the server they are running on. So how do you containerize JVM apps? Should you really increase JVM heap if you have spare memory? What about OS caches? What are the differences between JDK 8, 9 and 10 when it comes to container awareness? Outages because of out of memory errors? Slowness because of long garbage collection and poor environment visibility? Long story short, in this session, we’ll look at the gotchas of running JVM apps in containers and teach you how to avoid costly mistakes. Top 3 things attendees will learn: 1. Key differences between various JVM versions relevant for containerized Java apps. 2. Best practices for running JVM in containers. 3. Avoiding common pitfalls when running containerized JVM applications.

Alternative Infrastucture

Marc Seeger

Altitude SF 2017: Optimizing your hit rate

Fastly

Global deduplication for Ceph - Myoungwon Oh

Ceph Community

Varnish Cache 4.0 / Redpill Linpro breakfast in Oslo

Per Buer

Tuning the Kernel for Varnish Cache

Per Buer

Odoo Performance Limits

Odoo

AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Sharma Podila

How can you reliably schedule tasks in an unreliable, autoscaling Cloud environment? In this presentation, we'll talk about the design of our scheduler built on top of Apache Mesos that serves as the core of our stream-processing platform, Mantis, designed for real-time insights. We'll focus on the following aspects of the scheduler: - Coarse-grained vs. fine-grained resource scheduling - Fault tolerance via a combination of task reconciliation and life cycle event processing - Scheduling optimizations for bin packing, for stream locality to reduce network bandwidth usage, for task placement to achieve auto scaling of the cluster size, etc. This talk will also include detailed information about approaches to scheduling in a distributed, auto-scaling, environment.

Include os @ flossuk 2018

Per Buer

Load Balancing with Nginx

Marian Marinov

State of the CLI- Kat Marchan

NodejsFoundation

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance

ScyllaDB

In this talk I will walk you through the performance tuning steps that I took to serve 1.2M JSON requests per second from a 4 vCPU c5 instance, using a simple API server written in C. At the start of the journey the server is capable of a very respectable 224k req/s with the default configuration. Along the way I made extensive use of tools like FlameGraph and bpftrace to measure, analyze, and optimize the entire stack, from the application framework, to the network driver, all the way down to the kernel. I began this wild adventure without any prior low-level performance optimization experience; but once I started going down the performance tuning rabbit-hole, there was no turning back. Fueled by my curiosity, willingness to learn, and relentless persistence, I was able to boost performance by over 400% and reduce p99 latency by almost 80%.

(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014

Amazon Web Services

Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.

Redis acl

DaeMyung Kang

Scaling Apache Pulsar to 10 Petabytes/Day

ScyllaDB

Pulsar is used by a portfolio of products at Splunk for stream processing of different types of data, including metrics and logs. In this talk, Karthik Ramasamy will share how Splunk helped a flagship customer scale a Pulsar deployment to handle 10 PB/day in a single cluster. He will talk about the journey, the challenges faced, and the trade-offs made to scale Pulsar and operate it reliably and stably in Google Cloud Platform (GCP).

クックパッドのLVSについてSugawara Genki

Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015

Fastly

Fastly Altitude - June 25, 2015. Joe Williams, Computer Operator at GitHub discusses using a CDN to mitigate security threats. Video of the talk: http://fastly.us/Altitude2015_Mitigating-Security-Threats-2 Joe's bio: Joe Williams is a Computer Operator at GitHub, and joined their infrastructure team in August 2013. Joe's passion for distributed systems, queuing theory and automation help keep the lights on. When not behind a computer you can generally find him riding a bicycle around Marin, CA.

Velocity 2010 - ATS

Leif Hedstrom

Rails Conf Europe 2007 - Utilizing Amazon S3 and EC2 in Rails

Jonathan Weiss

Scaling a web application is a very hard problem, especially for small project and teams who do not have sufficient manpower, money, and time to solve this problem. Luckily Amazon already had to solve this problem in their datacenters and offers their services to other developers. This talk will introduce the two most important Amazon Web Services, the Elastic Compute Cloud (EC2) and the Simple Storage Service (S3), and will present different ways to leverage them in your own Rails application. Presented at RailsConfEurope Berlin 2007

Nginx - Tips and Tricks.Harish S

How Many Slaves (Ukoug)

Doug Burns

TDS-16489U-R2 0215 EN

QNAP Systems, Inc.

What's hot

OOPs, OOMs, oh my! Containerizing JVM apps

Sematext Group, Inc.

Alternative Infrastucture

Marc Seeger

Altitude SF 2017: Optimizing your hit rate

Fastly

Global deduplication for Ceph - Myoungwon Oh

Ceph Community

Varnish Cache 4.0 / Redpill Linpro breakfast in Oslo

Per Buer

Tuning the Kernel for Varnish Cache

Per Buer

Odoo Performance Limits

Odoo

AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Sharma Podila

Include os @ flossuk 2018

Per Buer

Load Balancing with Nginx

Marian Marinov

State of the CLI- Kat Marchan

NodejsFoundation

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance

ScyllaDB

(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014

Amazon Web Services

Redis acl

DaeMyung Kang

Scaling Apache Pulsar to 10 Petabytes/Day

ScyllaDB

クックパッドのLVSについてSugawara Genki

Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015

Fastly

Velocity 2010 - ATS

Leif Hedstrom

Rails Conf Europe 2007 - Utilizing Amazon S3 and EC2 in Rails

Jonathan Weiss

Nginx - Tips and Tricks.Harish S

What's hot (20)

OOPs, OOMs, oh my! Containerizing JVM apps

Alternative Infrastucture

Altitude SF 2017: Optimizing your hit rate

Global deduplication for Ceph - Myoungwon Oh

Varnish Cache 4.0 / Redpill Linpro breakfast in Oslo

Tuning the Kernel for Varnish Cache

Odoo Performance Limits

AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud

Include os @ flossuk 2018

Load Balancing with Nginx

State of the CLI- Kat Marchan

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance

(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014

Redis acl

Scaling Apache Pulsar to 10 Petabytes/Day

クックパッドのLVSについて

Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015

Velocity 2010 - ATS

Rails Conf Europe 2007 - Utilizing Amazon S3 and EC2 in Rails

Nginx - Tips and Tricks.

Similar to Boyan Ivanov - latency, the #1 metric of your cloud

How Many Slaves (Ukoug)

Doug Burns

TDS-16489U-R2 0215 EN

QNAP Systems, Inc.

How swift is your Swift - SD.pptx

OpenStack Foundation

My Sql Performance In A Cloud

Sky Jian

KVM and docker LXC Benchmarking with OpenStack

Boden Russell

Passive benchmarking with docker LXC and KVM using OpenStack hosted in SoftLayer. These results provide initial incite as to why LXC as a technology choice offers benefits over traditional VMs and seek to provide answers as to the typical initial LXC question -- "why would I consider Linux Containers over VMs" from a performance perspective. Results here provide insight as to: - Cloudy ops times (start, stop, reboot) using OpenStack. - Guest micro benchmark performance (I/O, network, memory, CPU). - Guest micro benchmark performance of MySQL; OLTP read, read / write complex and indexed insertion. - Compute node resource consumption; VM / Container density factors. - Lessons learned during benchmarking. The tests here were performed using OpenStack Rally to drive the OpenStack cloudy tests and various other linux tools to test the guest performance on a "micro level". The nova docker virt driver was used in the Cloud scenario to realize VMs as docker LXC containers and compared to the nova virt driver for libvirt KVM. Please read the disclaimers in the presentation as this is only intended to be the "chip of the ice burg".

Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain

Lamp Stack Optimization

Dave Ross

Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...

Amazon Web Services

Amazon Aurora is now PostgreSQL compatible. With Amazon Aurora’s new PostgreSQL support, customers can get several times better performance than the typical PostgreSQL database and take advantage of the scalability, durability, and security capabilities of Amazon Aurora – all for one-tenth the cost of commercial grade databases. Amazon Aurora is a fully managed relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora is built on a cloud native architecture that is designed to offer greater than 99.99 percent availability and automatic failover with no loss of data. Learning Objectives: • Learn about the capabilities and features of Amazon Aurora with PostgreSQL Compatibility • Learn about the benefits and different use cases • Learn how to get started using Amazon Aurora with PostgreSQL Compatibility

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...

In-Memory Computing Summit

Speedment SQL Reflector is a software solution that allows applications to get automatically updated data in real time. The SQL Reflector loads data from your existing SQL database and feeds it into an in-memory data grid e.g. GridGain. When started, the SQL reflector will load your selected existing relational data into your map cluster. Also, any subsequent changes that are made to the relational database (regardless how, via your application, script, SQL commands or even stored procedures) are then continuously fed to your GridGain nodes. Even SQL-transactions are preserved so that your maps will always reflect a valid state of the underlying SQL database.

Nexenta at VMworld Hands-on Lab

Nexenta Systems

My Sql Performance On Ec2MySQLConference

Mysql talk

LogicMonitor

SRV407 Deep Dive on Amazon Aurora

Amazon Web Services

Amazon Aurora is a cloud-optimized relational database that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The recently announced PostgreSQL-compatibility, together with the original MySQL compatibility, are perfect for new application development and for migrations from overpriced, restrictive commercial databases. In this session, we’ll do a deep dive into the new architectural model and distributed systems techniques behind Amazon Aurora, discuss best practices and configurations, look at migration options and share customer experience from the field.

Architectural Overview of MapR's Apache Hadoop Distribution

mcsrivas

Performance Whack-a-Mole Tutorial (pgCon 2009) PostgreSQL Experts, Inc.

Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...

Виталий Стародубцев

##Что такое Storage Replica ##Архитектура и сценарии ##Синхронная и асинхронная репликация ##Междисковая, межсерверная, внутрикластерная и межкластерная репликация ##Дизайн и проектирование Storage Replica ##Нововведения в Windows Server 2016 TP5 ##Графический интерфейс управления, и другие возможности - демонстрация и планы развития ##Интеграция Storage Replica с Storage Spaces Direct

OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...

OpenNebula Project

What’s New in Amazon Aurora

Amazon Web Services

It’s been an exciting year for Amazon Aurora, the database with MySQL-compatible and PostgreSQL-compatible database engines. Amazon Aurora combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features, including high availability options, new integrations with AWS services, and the performance management with Amazon RDS Performance Insights.

Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang

Ceph Community

What’s New in Amazon Aurora

Amazon Web Services

by Joyjeet Banerjee, Enterprise Solutions Architect, AWS Amazon Aurora is a MySQL- and PostgreSQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features in areas like high availability, security, performance management and database cloning. Level 300

Similar to Boyan Ivanov - latency, the #1 metric of your cloud (20)

How Many Slaves (Ukoug)

TDS-16489U-R2 0215 EN

How swift is your Swift - SD.pptx

My Sql Performance In A Cloud

KVM and docker LXC Benchmarking with OpenStack

Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final

Lamp Stack Optimization

Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...

Nexenta at VMworld Hands-on Lab

My Sql Performance On Ec2

Mysql talk

SRV407 Deep Dive on Amazon Aurora

Architectural Overview of MapR's Apache Hadoop Distribution

Performance Whack-a-Mole Tutorial (pgCon 2009)

Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...

OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...

What’s New in Amazon Aurora

Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang

What’s New in Amazon Aurora

More from ShapeBlue

CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue

Boyan Ivanov - latency, the #1 metric of your cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Boyan Ivanov - latency, the #1 metric of your cloud

Similar to Boyan Ivanov - latency, the #1 metric of your cloud (20)

More from ShapeBlue

More from ShapeBlue (20)

Recently uploaded

Recently uploaded (20)

Boyan Ivanov - latency, the #1 metric of your cloud