2. Agenda
Awcloud introduction
Ceph benefits and problems
Performance tuning case study
High concurrency
Cinder backup
3. AWcloud Introduction
An OpenStack startup company founded in 2012
Founder and core team come from Red Hat and IBM
Providing enterprise-grade OpenStack products and
services
Core contributors for OpenStack distributed message
broker ZeroMQ integration
285 reviews and 34 commits in Kilo release
5. Benefit
Decouple VM storage from compute node
Easy to live migration, evacuation and resize
Avoid calculating disk numbers of each hypervisor
Genuine SDS
Controlled via crush rules
• Infrastructure topology aware
• Adjustable replication
• Weighting
Exposed with cinder volume type
6. Benefit (cont'd)
Copy-on-write Clone Efficient
Fast VM provisioning
No concurrent clone limitation -- Good for golden
image.
Light-weight snapshots
Better support for continuous backups
Incremental volume backup
Greater storage efficiency
Thin provisioning
Discard support for disk space reclamation
Beyond VM block storage – Unified storage
Container, Bare mental, Object, FS
7. Problem and Suggestion
Can't fully utilize high performance disks
Improved obviously in Hammer, but still not enough
Not turnkey project
Steep learning curve
Operations on large scale deployment
Feature request
QoS among whole cluster
Geo-replication for DR (ongoing in community)
Per-image configuration
Improve out-of-box performance
Optimize the default configuration
Benchmark collection and publish
Tool for self-tuning as per deployment
8. Ceph tuning case study
Production environment
~100 OSD
Three different types of disk
• HDD
• SSD
• FC-SAN
Initial cluster performance
~ 2000 IOPS (4k randwrite)
13. RBD feature support
Snapshot
Rollback snapshot
Download from snapshot
Pool capacity report
Report the pool's capacity instead of the whole
cluster
https://review.openstack.org/#/c/166164/
Bug fix
Resize after clone
https://review.openstack.org/#/c/148185/
14. High concurrent workload
Concurrent RBD client operations are limited
Create/delete volume takes a long time
Cinder-volume/ceph-osd High CPU utilization
Cinder-volume cannot consume message
Short-term Band-Aid solution
Use more cinder-volume workers
https://review.openstack.org/#/c/135795/
15. High concurrency workload
Wokers Clone volume Delete volume
Time(second) Time(second) C-vol %CPU OSD %CPU
1 126 508 200% 100%
2 54 470 70% 120%
4 46 474 40% 140%
Table. Time consuming of operate 80 volumes on different workers