Best Practices & Performance Tuning - OpenStack Cloud Storage with Ceph - In this presentation, we discuss best practices and performance tuning for OpenStack cloud storage with Ceph to achieve high availability, durability, reliability and scalability at any point of time. Also discuss best practices for failure domain, recovery, rebalancing, backfilling, scrubbing, deep-scrubbing and operations
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageSage Weil
Ceph is a highly scalable open source distributed storage system that provides object, block, and file interfaces on a single platform. Although Ceph RBD block storage has dominated OpenStack deployments for several years, maturing object (S3, Swift, and librados) interfaces and stable CephFS (file) interfaces now make Ceph the only fully open source unified storage platform.
This talk will cover Ceph's architectural vision and project mission and how our approach differs from alternative approaches to storage in the OpenStack ecosystem. In particular, we will look at how our open development model dovetails well with OpenStack, how major contributors are advancing Ceph capabilities and performance at a rapid pace to adapt to new hardware types and deployment models, and what major features we are priotizing for the next few years to meet the needs of expanding cloud workloads.
This presentation provides a basic overview of Ceph, upon which SUSE Storage is based. It discusses the various factors and trade-offs that affect the performance and other functional and non-functional properties of a software-defined storage (SDS) environment.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Making distributed storage easy: usability in Ceph Luminous and beyondSage Weil
Distributed storage is complicated, and historically Ceph hasn't spent a lot of time trying to hide that complexity, instead focusing on correctness, features, and flexibility. There has been a recent shift in focus to simplifying and streamlining the user/operator experience so that the information that is actually important is available without the noise of irrelevant details. Recent feature work has also focused on simplifying configurations that were previously possible but required tedious configuration steps to manage.
This talk will cover the key new efforts in Ceph Luminous that aim to simplify and automate cluster management, as well as the plans for upcoming releases to address longstanding Cephisms that make it "hard" (e.g., choosing PG counts).
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageSage Weil
Ceph is a highly scalable open source distributed storage system that provides object, block, and file interfaces on a single platform. Although Ceph RBD block storage has dominated OpenStack deployments for several years, maturing object (S3, Swift, and librados) interfaces and stable CephFS (file) interfaces now make Ceph the only fully open source unified storage platform.
This talk will cover Ceph's architectural vision and project mission and how our approach differs from alternative approaches to storage in the OpenStack ecosystem. In particular, we will look at how our open development model dovetails well with OpenStack, how major contributors are advancing Ceph capabilities and performance at a rapid pace to adapt to new hardware types and deployment models, and what major features we are priotizing for the next few years to meet the needs of expanding cloud workloads.
This presentation provides a basic overview of Ceph, upon which SUSE Storage is based. It discusses the various factors and trade-offs that affect the performance and other functional and non-functional properties of a software-defined storage (SDS) environment.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Making distributed storage easy: usability in Ceph Luminous and beyondSage Weil
Distributed storage is complicated, and historically Ceph hasn't spent a lot of time trying to hide that complexity, instead focusing on correctness, features, and flexibility. There has been a recent shift in focus to simplifying and streamlining the user/operator experience so that the information that is actually important is available without the noise of irrelevant details. Recent feature work has also focused on simplifying configurations that were previously possible but required tedious configuration steps to manage.
This talk will cover the key new efforts in Ceph Luminous that aim to simplify and automate cluster management, as well as the plans for upcoming releases to address longstanding Cephisms that make it "hard" (e.g., choosing PG counts).
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
Al has been using Cassandra since version 0.6 and has spent the last few months doing little else but tune Cassandra clusters. In this talk, Al will show how to tune Cassandra for efficient operation using multiple views into system metrics, including OS stats, GC logs, JMX, and cassandra-stress.
CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...Ceph Community
Arm-based micro-server architecture intro and why you would want to use it.
*Smallest failure domain
*Linear scale out on capacity & performance (data sharing)
*Easy to use Ceph Management GUI (short intro on UVS manager)
*Power saving on hyper-scale DC
*Lower TCO
*Use case sharing
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuMarco Obinu
Slides presented during HomeGen by CloudGen Verona, about how to properly size an Azure IaaS VM, with an additional focus on high availability and cost-saving topics.
Session recording: https://youtu.be/C8v6c6EkJ9A
Demo: https://github.com/OmegaMadLab/SqlIaasVmPlayground
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
7. Ceph Overview
Design Goals
• Every component must scale
• No single point of failure
• Open source
• Runs on commodity hardware
• Everything must self-manage
Key Benefits
• Multi-node striping and redundancy
• COW cloning of images to volumes
• Live migration of Ceph-backed VMs
12. Glance Recommendations
• What is Glance ?
• Configuration settings: /etc/glance/glance-api.conf
• Use the ceph rbd as glance storage
• During the boot from volumes:
• Disable local cache
• Expose Image URL helps saving time as image download
and copy are NOT required
default_store=rbd
flavor = keystone+cachemanagement/flavor = keystone/
show_image_direct_url = True
show_multiple_locations = True
# glance --os-image-api-version 2 image-show 64b71b88-f243-4470-8918-d3531f461a26
+------------------+-----------------------------------------------------------------+
| Property | Value |
+------------------+-----------------------------------------------------------------+
| checksum | 24bc1b62a77389c083ac7812a08333f2 |
| container_format | bare |
| created_at | 2016-04-19T05:56:46Z |
| description | Image Updated on 18th April 2016 |
| direct_url | rbd://8a0021e6-3788-4cb3-8ada- |
| | 1f6a7b0d8d15/images/64b71b88-f243-4470-8918-d3531f461a26/snap |
| disk_format | raw |
13. Glance Recommendations
Image Format: Use ONLY RAW Images
With QCOW2 images:
• Convert qcow2 to RAW image
• Get the image UUID
With RAW images (No conversion; saves time):
• Get the image UUID
Image Size (in GB) Format VM Boot time (Approx.)
50 (Windows) QCOW2 ~ 45 minutes
RAW ~ 1 minute
6 (Linux) QCOW2 ~ 2 minutes
RAW ~ 1 minute
17. Performance Decision Factors
• What is required storage (usable/RAW)?
• How many IOPS?
• Aggregated
• Per VM (min/max)
• Optimization for?
• Performance
• Cost
18. Ceph Cluster Optimization Criteria
Cluster Optimization Criteria Properties Sample Use Cases
IOPS - Optimized • Lowest cost per IOP
• Highest IOPS
• Meets minimum fault domain recommendation
• Typically block storage
• 3x replication
Throughput - Optimized • Lowest cost per given unit of throughput
• Highest throughput
• Highest throughput per BTU
• Highest throughput per watt
• Meets minimum fault domain recommendation
• Block or object storage
• 3x replication for higher read
throughput
Capacity - Optimized • Lowest cost per TB
• Lowest BTU per TB
• Lowest watt per TB
• Meets minimum fault domain recommendation
• Typically object storage
Erasure coding common for
maximizing usable capacity
19. OSD Considerations
• RAM
o 1 GB of RAM per 1TB OSD space
• CPU
o 0.5 CPU cores/1Ghz of a core per OSD (2 cores for SSD drives)
• Ceph-mons
o 1 ceph-mon node per 15-20 OSD nodes
• Network
o The sum of the total throughput of your OSD hard disks doesn’t exceed the network
bandwidth
• Thread count
o High numbers of OSDs: (e.g., > 20) may spawn a lot of threads, during recovery and
rebalancing
HOST
OSD.2
OSD.4
OSD.6
OSD.1
OSD.3
OSD.5
20. Ceph OSD Journal
• Run operating systems, OSD data and OSD journals on separate drives to maximize overall throughput.
• On-disk journals can halve write throughput .
• Use SSD journals for high write throughput workloads.
• Performance comparison with/without SSD journal using rados bench
o 100% Write Operation with 4MB object size (default):
On-disk journal: 45 MB/s
SSD journal: 80 MB/s
• Note: Above results with 1:11 SSD:OSD ratio
• Recommended to use 1 SSD with 4 - 6 OSDs for better results
Op Type No SDD SSD
Write (MB/s) 45 80
Seq Read (MB/s) 73 140
Rand Read (MB/s) 55 655
21. OS Considerations
• Kernel: Latest stable release
• BIOS : Enable HT (hyperthreading) and VT(Virtualization Technology).
• Kernel PID max:
• Read ahead: Set in all block devices
• Swappiness:
• Disable NUMA : Disabled by passing the numa_balancing=disable parameter to the kernel.
• The same parameter could be controlled via the kernel.numa_balancing sysctl:
• CPU Tuning: Set “performance” mode use 100% CPU frequency always.
• I/O Scheduler:
# echo “4194303” > /proc/sys/kernel/pid_max
# echo "8192" > /sys/block/sda/queue/read_ahead_kb
# echo "vm.swappiness = 0" | tee -a /etc/sysctl.conf
# echo 0 > /proc/sys/kernel/numa_balancing
SATA/SAS Drives: # echo "deadline" > /sys/block/sd[x]/queue/scheduler
SSD Drives : # echo "noop" > /sys/block/sd[x]/queue/scheduler
# echo "performance" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
23. Ceph Deployment Network
• Each host have at least two 1Gbps network interface controllers (NICs).
• Use 10G Ethernet
• Always use JUMBO frames
• High BW connectivity between TOR switches and spine routers, Example: 40Gbps to 100Gbps
• Hardware should have a Baseboard Management Controller (BMC)
• Note: Running three networks in HA mode may seem like overkill
Public N/w
Cluster N/w
NIC-1
NIC-2
# ifconfig ethx mtu 9000
#echo "MTU=9000" | tee -a /etc/sysconfig/network-script/ifcfg-ethx
24. Ceph Deployment Network
• NIC Bonding - Balance-alb mode both NICs are used to send and receive traffics:
• Test results with 2x10G NIC:
• Active-Passive bond mode:
Traffic between 2 nodes:
Case#1 : node-1 to node-2 => BW 4.80 Gb/s
Case#2: node-1 to node-2 => BW 4.62 Gb/s
• Speed of one 10Gig NIC.
• Balance-alb bond mode:
• Case#1 : node-1 to node-2 => BW 8.18 Gb/s
• Case#2: node-1 to node-2 => BW 8.37 Gb/s
• Speed of two 10Gig NICs
25. Ceph Failure Domains
• A failure domain is any failure that prevents access to one or more OSDs.
Added costs of isolating every potential failure domain.
Failure domains:
• osd
• host
• chassis
• rack
• row
• pdu
• pod
• room
• datacenter
• region
26. Ceph Ops Recommendations
Scrub and deep scrub operations are very IO consuming and can affect cluster performance.
o Disable scrub and deep scrub
o After setting noscrub, nodeep-scrub ceph health became WARN state
o Enable Scrub and Deep Scrub
o Configure Scrub and Deep Scrub
#ceph osd set noscrub
set noscrub
#ceph osd set nodeep-scrub
set nodeep-scrub
#ceph health
HEALTH_WARN noscrub, nodeep-scrub flag(s) set
# ceph osd unset noscrub
unset noscrub
# ceph osd unset nodeep-scrub
unset nodeep-scrub
osd_scrub_begin_hour = 0 # begin at this hour
osd_scrub_end_hour = 24 # start last scrub at
osd_scrub_load_threshold = 0.05 #scrub only below load
osd_scrub_min_interval = 86400 # not more often than 1 day
osd_scrub_max_interval = 604800 # not less often than 1 week
osd_deep_scrub_interval = 604800 # scrub deeply once a week
27. Ceph Ops Recommendations
• Decreasing recovery and backfilling performance impact
• Settings for recovery and backfilling :
Note: The above setting will slow down the recovery/backfill process and prolongs the recovery process, if we decrease the values.
Increasing these settings value will increase recovery/backfill performance, but decrease client performance and vice versa
‘osd max backfills’ - maximum backfills allowed to/from a OSD [default 10]
‘osd recovery max active’ - Recovery requests per OSD at one time. [default 15]
‘osd recovery threads’ - The number of threads for recovering data. [default 1]
‘osd recovery op priority’ - Priority for recovery Ops. [ default 10]
28. Ceph Performance Measurement Guidelines
For best measurement results, follow these rules while testing:
• One option at a time.
• Check - what is changing.
• Choose the right performance test for the changed option.
• Re-test the changes - at least ten times.
• Run tests for hours, not seconds.
• Trace for any errors.
• Decisively look at results.
• Always try to estimate results and see at standard difference to eliminate spikes and false tests.
Tuning:
• Ceph clusters can be parametrized after deployment to better fit the requirements of the workload.
• Some configuration options can affect data redundancy and have significant implications on stability and safety of data.
• Tuning should be performed on test environment prior issuing any command and configuration changes on production.
33. Ceph H/W Best Practices
OSD
HOST
MDS
HOST
MON
HOST
1 x 64-Bit core
1 x 32-Bit Dual Core
1 x i386 Dual- Core
1GB per 1TB 1 x 64-Bit core
1 x 32-Bit Dual Core
1 x i386 Dual- Core
1GB per daemon 1GB per daemon1 x 64-Bit core
1 x 32-Bit Dual Core
1 x i386 Dual- Core
34. HDD, SDD, Controllers
• Ceph best practices to run operating systems, OSD data and OSD journals on separate drives.
Hard Disk Drives (HDD)
• minimum hard disk drive size of 1 terabyte.
• ~1GB of RAM for 1TB of storage space.
NOTE:
NOT a good idea to run:
1. multiple OSDs on a single disk.
2. OSD/monitor/metadata server on a single disk.
Solid State Drives (SSD)
Use SSDs to improve performance.
NOTE:
Controllers
Disk controllers also have a significant impact on write throughput.
Controller
Hello Good evening all...Today we will be talking about "Best practices and Perfomance tunnings for Openstack + Ceph"
So how many of you using Ceph?
how many of you planned to Ceph near time?
I Swami – working with RJIL. I am working on openstack and ceph projects for last 3 years.
My key responsibilities include to manger multiple Ceph storage clusters for Openstack Clouds.
Having 15+ years of Exp with Opensource projects like Linux, GNU GCC tools etc.
Now, I would like to introduce my colleague Mr Pandiayn, who is an ATC in Openstack and he is one of the active community
member in India Openstack community.
Move to Agenda - Rest of the talk is Best practices and performance recommendation for openstack with ceph.
First quick overview of Ceph and then Openstack integration with Ceph
And Iam going to talk about what recommendation values for OpenStack and Ceph.
Then we can go for Questions.
Typical general purpose Cloud env with 200 nodes spread across DCs with compute, block and objects storage use-cases.
We have around 2.5K VMs with compute capacity of: 40 TB RAM and 5120 CPU cores and raw storage capacity of: 4PB.
On an average, we use 20 GB and 100 GB boot volumes for Linux and Windows respectively. Additionally we use 200 GB data volumes on an average.
Below table shows the compute nodes and storage nodes details.
Typical general purpose Cloud env with 200 nodes spread across DCs with compute, block and objects storage use-cases.
We have around 2.5K VMs with compute capacity of: 40 TB RAM and 5120 CPU cores and raw storage capacity of: 4PB.
On an average, we use 20 GB and 100 GB boot volumes for Linux and Windows respectively. Additionally we use 200 GB data volumes on an average.
Below table shows the compute nodes and storage nodes details.
In this section I will do quick recap of Ceph.
Ceph is a distributed storage system designed to provide excellent performance, reliability and scalability and it delivers object, block, and file storage in one unified system.
Ceph block devices (RBD) are thin-provisioned, resizable and store data striped over multiple OSDs in a Ceph cluster and it leverage RADOS capabilities such as snapshotting, replication and consistency. Ceph’s RADOS Block Devices (RBD) interact with OSDs using kernel modules or the librbd library.
-Ceph Object Storage uses the Ceph Object Gateway daemon (radosgw), which is a FastCGI module for interacting with a Ceph Storage Cluster. Since it provides interfaces compatible with OpenStack Swift and Amazon S3 APIs.
In this section I will be talking on Openstack ceph integrations, like how openstack components interacts with Ceph components.
Cinder is The OpenStack Block Storage service, provides persistent block storage resources and its backed by Ceph RBD. Basically Cinder used to create volumes in RBD.
Glance is OpenStack Image service - to store images and maintain a catalog of available images and its also backed by Ceph RBD.
Nova is the OpenStack Compute service. You can use NOVA to host and manage cloud computing systems. Nova will do attach/detach volumes.
Object Storage is a robust, highly scalable and fault tolerant storage platform for unstructured data such as objects and its backed by Ceph RGW
Basically Cinder used to create volumes in RBD.
Nova is the OpenStack Compute service. You can use NOVA to host and manage cloud computing systems. Nova will do attach/detach volumes.
Object Storage backed by Ceph RGW
Ceph Object Gateway with Keystone, the OpenStack identity service. This sets up the gateway to accept Keystone as the users authority. A user that Keystone authorizes to access the gateway will also be automatically created on the Ceph Object Gateway (if didn’t exist beforehand). A token that Keystone validates will be considered as valid by the gateway.
A Ceph Object Gateway user is mapped into a Keystone tenant. A Keystone user has different roles assigned to it on possibly more than a single tenant. When the Ceph Object Gateway gets the ticket, it looks at the tenant, and the user roles that are assigned to that ticket, and accepts/rejects the request according to the rgw keystoneaccepted roles configurable.
In this section I will be covering a few Openstack components recommendations even though - more recommendation comes from Ceph in the next section.
During the boot from volumes – all images will download to controller and by default it caches into “cahced” location.
If we span a multple VMs of bigger image sizes, so all images will be cashed and eventaully takes controller space, which will cause the controller stop its operations.
Ceph internally use RAW format to store images, so it optimal to use RAW images by Glance (instead of qcow2)
As we discussed, Cinder is The OpenStack Block Storage service. In this section not much recommendations, expect, use the cinder-backup services with Ceph back-end to ahcive the incremental backup functionaltiyes supported by Ceph.
Here are the default cinder-backup configurations.
As you already know that, Nova is the OpenStack Compute service. In nova there are no Ceph specific recommendations, which I can give ATM. Its good to use the krbd instead of librbd to get the page caches functionaly supported by Kernel.
In this section we will discuss about Ceph specific recommendations in detail.
Performance is always depends on the use-case, so need to answere these questions:
- What is storage needs (like raw storage and usable storage). For ex: need more usable stroage, so less replication factor etc.
- IOPs needs?
- What to optimize? is it cost? or performance? Always its big challage to ahive the best perforamce in low cost, so need to compramize one to get other.
In this slide, will talk about a few optimizations and their critias. This table has slef explanation so the sack of time, I won't discuss more about here. Please go through the table.
Now, will take about Ceph OSD - OSD is the object storage daemon for the Ceph storage and it is responsible for storing objects on a local file system and providing access to them over the network. This slide shows minimum OSD requirements for OSD CPU and RAM.
Ceph-mon is the cluster monitor daemon - A cep monitor always refers to the local copy of the monmap when discovering other monitors in the cluster to ahive the consistency.
Have a check on - all OSDs disk's throughput and the network throughput. All OSDs throughput should not exceed the network throughput.
Using more OSDs per server - may hit the lake of thread count because, OSDs need more threads during the rebalance, recovery and other actities.
Ceph OSDs use a journal for two reasons: 1 - Speed and 2- Consistency.
Speed: The journal enables the Ceph OSD Daemon to commit small writes quickly.
Consistency: Ceph OSD Daemons require a filesystem interface that guarantees atomic compound operations. Every few seconds–between filestore max sync interval and filestore min sync interval–the Ceph OSD Daemon stops writes and synchronizes the journal with the filesystem, allowing Ceph OSD Daemons to trim operations from the journal and reuse the space.
2 types of OSD journal - like on-disk and SSD disk journal. General on-disk journal will show less performance as compared with SSD journal.
here are the quick results, we have performed on our env. For 100% write operations - we have seen 40 MB/s using on-disk journal. And 85 MB/s using ssd journal. But here we used 1:11 SSD and OSD ratio.
For better performance - it recommended to use 1:[4-6] SSD and OSD ration.
Here are a few Operating system recommendations. Please refer the slide (its self explanatory).
Now, we will discuss the Ceph network considerations. here is standard Ceph network diagram taken from offical Ceph docs.
Always recommended to use separate networks for public and cluster. Ceph internally do a lot actities like scrub, deep-scub, recovey,etc, which should not impact the public/use network.
To support, separate networks, each Ceph node should have aleast 2 NICs - 1 for Publuc and other for cluster network.
Recommanded to use JUMBO Frame across the network.
In our cloud Env, we have done the NIC bonding with balanced -ALB mode, which eventually showcased the better performance. here are the results:
Now, will talk about Ceph failure domains selection. As a defination, A failure domain - any failure that prevents access objects to one or more OSDs.
Ceph maps objects (ie PG) to the OSDs across failure domains.
Here are the list of failure domains. Its recommended to use chasis or Rack for durable cluster.
That could be a stopped daemon on a host; a hard disk failure, an OS crash, a malfunctioning NIC, a failed power
supply, a network outage, a power outage, etc.
To verify the integrity of data, Ceph uses a mechanism called scrub and scrubbing.
Ceph insures data integrity by scrubbing placement groups.
Light scrubbing (daily) checks the object size and attributes.
Deep scrubbing (weekly) reads the data and uses checksums to ensure data integrity.
Scrubbing is important for maintaining data integrity, but it can reduce performance.
We can adjust the following settings to increase or decrease scrubbing operations as shown in this slide.
Here will talk about recovery and backfill considerations, when an OSD goes into cocovery state due to mutile reasons.
To maintain operational performance, Ceph performs recovery with limitations on the number recovery requests, threads and object chunk sizes which allows Ceph perform well in a degraded state.