Ceph Performance and Optimization - Ceph Day Frankfurt

Sebastien Han, eNovance

Ceph performance
CephDays Frankfurt 2014

Whoami
💥 Sébastien Han
💥 French Cloud Engineer working for eNovance
💥 Daily job focused on Ceph and OpenStack
💥 Blogger
Personal blog: http://www.sebastien-han.fr/blog/
Company blog: http://techs.enovance.com/
Last Cephdays presentation

How does Ceph perform?
42*
*The Hitchhiker's Guide to the Galaxy

CRUSH: deterministic object
placement
As soon as a client writes into Ceph, the operation is computed
and the client decides to which OSD the object should belong

Aggregation: cluster level
As soon as you write into Ceph, all the objects get equally spread across the entire
Cluster, understanding machines and disks..

Aggregation: OSD level
As soon as an IO goes into an OSD, no matter how the original pattern was,
it becomes sequential.

Journaling
As soon as an IO goes into an OSD, it gets written twice.

Journal and OSD data on the same
disk
Journal penalty on the disk
Since we write twice, if the journal is stored on the same disk as
the OSD data this will result in the following:
Device: wMB/s
sdb1 - journal 50.11
sdb2 - osd_data 40.25

Filesystem fragmentation
• Objects are stored as files on the OSD filesystem
• Several IO patterns with different block sizes increase filesystem
fragmentation
• Possible root cause: image sparseness
• One year old cluster ends up with (see allocsize options for
XFS):
$ sudo xfs_db -c frag -r /dev/sdd
actual 196334, ideal 122582, fragmentation factor 37.56%

No parallelized reads
• Ceph will always serve the read request from the primary OSD
• Room for Nx times speed up where N is the replica count
Blueprint from Sage for the Giant release

Scrubbing impact
• Consistent object check at the PG level
• Compare replicas versions between each others (Fsck for objects)
• Light scrubbing (daily) checks the object size and attributes.
• Deep scrubbing (weekly) reads the data and uses checksums to ensure
data integrity.
• Corruption exists – ECC memory (10^15 for enterprise disk)
~113TB
• No pain No gain

IOs to the OSD disk
One IO into Ceph leads to 2 writes, well… the second write is the worst!

The problem
• Several objects map to the same physical disks
• Sequential streams get mixed all together
• Result: The disk seeks like hell

Even worse with erasure coding?
This is just an assumption!
•Since erasure coding does chunks of chunks we can possibly
have this phenomena amplified

Things that you must not do
• Don't put a RAID underneath your OSD
• Ceph already manages the replication
• Degraded RAID breaks performances
• Reduce usable space on the cluster
• Don't build high density nodes with a tiny cluster
• Failure consideration and data to re-balance
• Potential full cluster
• Don't run Ceph on your hypervisors (unless you're broke)
• Well maybe…

Object store multi-backend
• ObjectStore is born
• Aims to support several backends:
• levelDB (default)
• RocksDB
• Fusionio NVMKV
• Seagate Kinetic
• Yours!

Why is it so good?
• No more journal! Yay!
• Object backends have built-in atomic functions

Firefly leveldb
• Relatively new
• Need to be tested with your workload first
• Tend to be more efficient with small objects

Many thanks!
Questions?
Contact: sebastien@enovance.com
Twitter: @sebastien_han
IRC: leseb

IT organizations require a disaster recovery strategy addressing outages with loss of storage, or extended loss of availability at the primary site. Applications need to rapidly migrate to the secondary site and transition with little or no impact to their availability.This talk will cover the various architectural options and levels of maturity in OpenStack services for building multi-site configurations using the Mitaka release. We’ll present the latest capabilities for Volume, Image and Object Storage with Ceph as the backend storage solution, and look at the future developments the OpenStack and Ceph communities are driving to improve and simplify the relevant use cases. Slides from OpenStack Austin Summit 2016 session: http://alturl.com/hpesz

Peanut Butter and jelly: Mapping the deep Integration between Ceph and OpenStack

Ceph is the most widely deployed storage technology used with OpenStack, most often because it's an open source, massively scalable, unified software-defined storage solution. Its popularity is also due to its unique and optimized technical integration with the OpenStack services and its pure-software approach to scaling. In this session, we'll review how Ceph is integrated into Nova, Glance, Keystone, Cinder, and Manila and demonstrate why using traditional storage products won’t give you the full benefits of an elastic cloud infrastructure. We’ll also cover the flexible deployment options, available through Red Hat Enterprise Linux OpenStack Platform and Red Hat Ceph Storage, for seamless operations and key scenarios like disaster recovery. We'll discuss architectural options for deploying a multisite OpenStack cluster and cover the varying levels of maturity in the OpenStack services for configuring multisite. This session will also show how other technologies are using OpenStack Ceph to increase performance and reduce power consumption, such as Intel SSDs. This will include reference architectures and best practices for Ceph and SSDs.

When disaster strikes the cloud: Who, what, when, where and how to recover

Enterprise applications needs to be able to survive large scale disasters. While some born-on-the-cloud applications have built-in disaster recovery functionality, non-born-on-the-cloud enterprise applications typically expect the infrastructure to provide disaster recovery support. OpenStack provides various building blocks that enable an OpenStack application to survive a disaster; these building blocks are being improved in Juno and Kilo. Some of these building blocks need to be enabled by the OpenStack cloud administrator and others need to be leveraged by the application deployer. In this presentation, we will review basic disaster recovery concepts covering when, where, and what is done at each stage of the application cloud life-cycle. We will describe the existing building blocks and we will explain the roles of cloud administrator and the cloud end-user, in enabling OpenStack applications to survive a disaster. We will then detail new features in Juno and coming in Kilo that will help enhance OpenStack's disaster recovery support. We will conclude by detailing the remaining gaps and present some tools that address these gaps, allowing an application to survive a disaster when running on an OpenStack cloud. OpenStack Summit Session: https://youtu.be/Dj5sELG9keE

Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015

Deepak Shetty

Multiple Sites and Disaster Recovery with Ceph Audience: Intermediate Topic: Storage Abstract: Ceph is the leading storage solution for OpenStack. As OpenStack deployments become more mission critical and widely deployed, multiple site requirements are increasing as is the need to ensure disaster recovery and business continuity. Learn about the new capabilities in Ceph that assist customers with meeting these requirements for block and object uses. Speaker Bio: Andrew Hatfield, Red Hat Andrew has over 20 years experience in the IT industry across APAC, specialising in Databases, Directory Systems, Groupware, Virtualisation and Storage for Enterprise and Government organisations. When not helping customers slash costs and increase agility by moving to the software-defined storage future, he’s enjoying the subtle tones of Islay Whisky and shredding pow pow on the world’s best snowboard resorts. OpenStack Australia Day Government - Canberra 2016 https://events.aptira.com/openstack-australia-day-canberra-2016/

How to Survive an OpenStack Cloud Meltdown with Ceph

What if you lost your datacenter completely in a catastrophe, but your users hardly noticed? Sounds like a mirage, but it’s absolutely possible. This talk will showcase OpenStack features enabling multisite and disaster recovery functionalities. We’ll present the latest capabilities of OpenStack and Ceph for Volume and Image Replication using Ceph Block and Object as the backend storage solution, as well as look at the future developments they are driving to improve and simplify the relevant architecture use cases, such as Distributed NFV, an emerging use case that rationalizes your IT by using less control planes and allows you to spread your VNF on multiple datacenters and edge deployments. In this session you will learn about wew OpenStack features enabling Multisite and distributed deployments, as well as review key use cases, architecture design and best practices to help operations avoid the OpenStack cloud Meltdown nightmare. https://youtu.be/n2S7uNC_KMw https://goo.gl/cRNGBK

OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...eNovance

Manila, an update from Liberty, OpenStack Summit - Tokyo

Manila is a community-driven project that presents the management of file shares (e.g. NFS, CIFS, HDFS) as a core service to OpenStack. Manila currently works with a variety of storage platforms, as well as a reference implementation based on a Linux NFS server. Manila is exploding with new features, use cases, and deployers. In this session, we'll give an update on the new capabilities added in the Liberty release: • Integration with OpenStack Sahara • Migration of shares across different storage back-ends • Support for availability zones (AZs) and share replication across these AZs • The ability to grow and shrink file shares on demand • New mount automation framework • and much more… As well as provide a quick look of whats coming up in Mitaka release with Share Replication demo

Enabling Disaster Recovery as Service (DRaaS) on OpenStack

haribabu kasturi

CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK

Re-Think of Virtualization and Containerization

Xu Wang

Open stack in action enovance-quantum in actioneNovance

Containers and HPC

Olli-Pekka Lehto

DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...

ClusterHQ

In this talk, we will provide a 10,000-ft. overview of the key concepts, architectures, and common deployment scenarios for stateful services. We will cover the Docker volumes and available storage options in the community including ClusterHQ’s Flocker volume manager. After getting the lay of the land, we'll see these concepts in action. Starting by deploying a database container on a single node with UCP, Flocker and VolumeHub. Then, using the features of Docker Swarm and Flocker, we will then allow Swarm to automatically reschedule the stateful service along with Flocker moving its volume when the node fails giving us a HA containerized database.

Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui

Antoine Coetsier - billing the cloud

Stateful set in kubernetes implementation & usecases

Krishna-Kumar

Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...

Jeremy Eder

OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...

NETWAYS

Which Hypervisor is Best?

Kyle Bader

XCP-ng - past, present and future

Ceph Tech Talk: Ceph at DigitalOcean

Ceph Day Seoul - Ceph on Arm Scaleable and Efficient

End of RAID as we know it with Ceph Replication

What's hot

OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamoneNovance

Disaggregating Ceph using NVMeoF

Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat

OpenStack

How to Survive an OpenStack Cloud Meltdown with Ceph

OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...eNovance

Manila, an update from Liberty, OpenStack Summit - Tokyo

Enabling Disaster Recovery as Service (DRaaS) on OpenStack

haribabu kasturi

CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK

Re-Think of Virtualization and Containerization

Xu Wang

Open stack in action enovance-quantum in actioneNovance

Containers and HPC

Olli-Pekka Lehto

DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...

ClusterHQ

Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui

Antoine Coetsier - billing the cloud

Stateful set in kubernetes implementation & usecases

Krishna-Kumar

Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...

Jeremy Eder

OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...

NETWAYS

Which Hypervisor is Best?

Kyle Bader

XCP-ng - past, present and future

Ceph Tech Talk: Ceph at DigitalOcean

What's hot (20)

OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon

Disaggregating Ceph using NVMeoF

Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red Hat

How to Survive an OpenStack Cloud Meltdown with Ceph

OpenStack in Action 4! Vincent Untz - Running multiple hypervisors in your Op...

Manila, an update from Liberty, OpenStack Summit - Tokyo

Enabling Disaster Recovery as Service (DRaaS) on OpenStack

CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK

Re-Think of Virtualization and Containerization

Open stack in action enovance-quantum in action

Containers and HPC

DockerCon 2016 Ecosystem - Everything You Need to Know About Docker and Stora...

Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui

Antoine Coetsier - billing the cloud

Stateful set in kubernetes implementation & usecases

Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...

OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...

Which Hypervisor is Best?

XCP-ng - past, present and future

Ceph Tech Talk: Ceph at DigitalOcean

Viewers also liked

Ceph Day Seoul - Ceph on Arm Scaleable and Efficient

End of RAID as we know it with Ceph Replication

Ceph Day Taipei - Accelerate Ceph via SPDK

What's new in Jewel and Beyond

Ceph, Now and Later: Our Plan for Open Unified Cloud Storage

Ceph is a highly scalable open source distributed storage system that provides object, block, and file interfaces on a single platform. Although Ceph RBD block storage has dominated OpenStack deployments for several years, maturing object (S3, Swift, and librados) interfaces and stable CephFS (file) interfaces now make Ceph the only fully open source unified storage platform. This talk will cover Ceph's architectural vision and project mission and how our approach differs from alternative approaches to storage in the OpenStack ecosystem. In particular, we will look at how our open development model dovetails well with OpenStack, how major contributors are advancing Ceph capabilities and performance at a rapid pace to adapt to new hardware types and deployment models, and what major features we are priotizing for the next few years to meet the needs of expanding cloud workloads.

Performance comparison of Distributed File Systems on 1Gbit networks

Marian Marinov

BlueStore, A New Storage Backend for Ceph, One Year In

BlueStore is a new storage backend for Ceph OSDs that consumes block devices directly, bypassing the local XFS file system that is currently used today. It's design is motivated by everything we've learned about OSD workloads and interface requirements over the last decade, and everything that has worked well and not so well when storing objects as files in local files systems like XFS, btrfs, or ext4. BlueStore has been under development for a bit more than a year now, and has reached a state where it is becoming usable in production. This talk will cover the BlueStore design, how it has evolved over the last year, and what challenges remain before it can become the new default storage backend.

BlueStore: a new, faster storage backend for Ceph

Traditionally Ceph has made use of local file systems like XFS or btrfs to store its data. However, the mismatch between the OSD's requirements and the POSIX interface provided by kernel file systems has a huge performance cost and requires a lot of complexity. BlueStore, an entirely new OSD storage backend, utilizes block devices directly, doubling performance for most workloads. This talk will cover the motivation a new backend, the design and implementation, the improved performance on HDDs, SSDs, and NVMe, and discuss some of the thornier issues we had to overcome when replacing tried and true kernel file systems with entirely new code running in userspace.

A crash course in CRUSH

CRUSH is the powerful, highly configurable algorithm Red Hat Ceph Storage uses to determine how data is stored across the many servers in a cluster. A healthy Red Hat Ceph Storage deployment depends on a properly configured CRUSH map. In this session, we will review the Red Hat Ceph Storage architecture and explain the purpose of CRUSH. Using example CRUSH maps, we will show you what works and what does not, and explain why. Presented at Red Hat Summit 2016-06-29.

BlueStore: a new, faster storage backend for Ceph

Viewers also liked (10)

Ceph Day Seoul - Ceph on Arm Scaleable and Efficient

End of RAID as we know it with Ceph Replication

Ceph Day Taipei - Accelerate Ceph via SPDK

What's new in Jewel and Beyond

Ceph, Now and Later: Our Plan for Open Unified Cloud Storage

Performance comparison of Distributed File Systems on 1Gbit networks

BlueStore, A New Storage Backend for Ceph, One Year In

BlueStore: a new, faster storage backend for Ceph

A crash course in CRUSH

BlueStore: a new, faster storage backend for Ceph

Similar to Ceph Performance and Optimization - Ceph Day Frankfurt

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned

Ceph in the GRNET cloud stack

Nikos Kormpakis

Webinar - Getting Started With Ceph

Ceph

Hien Nguyen Van

Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...

Odinot Stanislas

Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.

In-Ceph-tion: Deploying a Ceph cluster on DreamCompute

Patrick McGarry

How swift is your Swift - SD.pptx

OpenStack Foundation

Surge2012

davidapacheco

When Node.js Goes Wrong: Debugging Node in Production The event-oriented approach underlying Node.js enables significant concurrency using a deceptively simple programming model, which has been an important factor in Node's growing popularity for building large scale web services. But what happens when these programs go sideways? Even in the best cases, when such issues are fatal, developers have historically been left with just a stack trace. Subtler issues, including latency spikes (which are just as bad as correctness bugs in the real-time domain where Node is especially popular) and other buggy behavior often leave even fewer clues to aid understanding. In this talk, we will discuss the issues we encountered in debugging Node.js in production, focusing upon the seemingly intractable challenge of extracting runtime state from the black hole that is a modern JIT'd VM. We will describe the tools we've developed for examining this state, which operate on running programs (via DTrace), as well as VM core dumps (via a postmortem debugger). Finally, we will describe several nasty bugs we encountered in our own production environment: we were unable to understand these using existing tools, but we successfully root-caused them using these new found abilities to introspect the JavaScript VM.

Unite2013-gavilan-pdfDavid Gavilan

Open Source Storage at Scale: Ceph @ GRNET

Nikos Kormpakis

Ceph, Xen, and CloudStack: Semper Melior

Patrick McGarry

Erasure Code at Scale - Thomas William Byrne

Serapheim-Nikolaos Dimitropoulos

PhegData X - High Performance EBS

Hanson Dong

Introduction to Cassandra and CQL for Java developers

Julien Anguenot

SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...

Fred de Villamil

Ceph & OpenStack - Boston Meetup

Patrick McGarry

Ceph and openstack at the boston meetup

Kamesh Pemmaraju

There have been heaping piles of buzz surrounding Ceph and OpenStack lately. Similar amounts of work have been going in to the integration between Ceph and OpenStack in recent versions. We'll take a look at how this work is making all the awesomeness of Ceph available to users in a simple, intuitive, and powerful way. The world of Havana and beyond is certainly no different, and promises to continue the trend of both functionality and buzz-worthiness. This talk given at the OpenStack meetup in Boston (Aug 14, 2013) gives a brief introduction to Ceph for the uninitiated and take a look at what's coming down the road. The short term of Havana has plenty to keep fans of both platforms happy and busy, but there are plenty more interesting problems that we can tackle. In addition to the concrete of the short term we'll take a look at how less-oft-used pieces of the Ceph platform can help augment your OpenStack setup, some general blue sky thinking, and what the community can do to get involved.

Debugging ZFS: From Illumos to Linux

OpenStack and Ceph: the Winning Pair

Red_Hat_Storage

OpenStack and Ceph: the Winning Pair By: Sebastien Han Ceph has become increasingly popular and saw several deployments inside and outside OpenStack. The community and Ceph itself has greatly matured. Ceph is a fully open source distributed object store, network block device, and file system designed for reliability, performance,and scalability from terabytes to exabytes. Ceph utilizes a novel placement algorithm (CRUSH), active storage nodes, and peer-to-peer gossip protocols to avoid the scalability and reliability problems associated with centralized controllers and lookup tables. The main goal of the talk is to convince those of you who aren't already using Ceph as a storage backend for OpenStack to do so. I consider the Ceph technology to be the de facto storage backend for OpenStack for a lot of good reasons that I'll expose during the talk. Since the Icehouse OpenStack summit, we have been working really hard to improve the Ceph integration. Icehouse is definitely THE big release for OpenStack and Ceph. In this session, Sebastien Han from eNovance will go through several subjects such as: Ceph overview Building a Ceph cluster - general considerations Why is Ceph so good with OpenStack? OpenStack and Ceph: 5 minutes quick start for developers Typical architecture designs State of the integration with OpenStack (icehouse best additions) Juno roadmap and beyond. Video Presentation: http://bit.ly/1iLwTNf

Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph