SlideShare a Scribd company logo
1 of 58
Bringing Ceph storage to the enterprise
Copyright 2015 FUJITSU
Paul von Stamwitz
Sr. Storage Architect
Storage Planning, R&D Center
pvonstamwitz@us.fujitsu.com
1
 The safe and convenient way to make Ceph storage enterprise ready
 ETERNUS CD10k integrated in OpenStack
 mSHEC Erasure Code from Fujitsu
 Contribution to performance enhancements
2
Building Storage with Ceph looks simple
Copyright 2015 FUJITSU
Ceph
+ some servers
+ network
= storage
3
Building Storage with Ceph looks simple – but……
Many new Complexities
 Rightsizing server, disk types, network
bandwidth
 Silos of management tools (HW, SW..)
 Keeping Ceph versions with versions of
server HW, OS, connectivity, drivers in sync
 Management of maintenance and support
contracts of components
 Troubleshooting
Copyright 2015 FUJITSU
Build Ceph source storage yourself
4
The challenges of software defined storage
 What users want
 Open standards
 High scalability
 High reliability
 Lower costs
 No-lock in from a vendor
 What users may get
 An own developed storage system based on open
/ industry standard HW & SW components
 High scalability and reliability ? If the stack works !
 Lower investments but higher operational efforts
 Lock-in into the own stack
Copyright 2015 FUJITSU
5
ETERNUS CD10000 – Making Ceph enterprise ready
Build Ceph source storage yourself Out of the box ETERNUS CD10000
incl. support
incl. maintenance
ETERNUS CD10000 combines open source storage with enterprise–class quality of service

E2E Solution Contract by Fujitsu based on Red Hat Ceph Enterprise
Easy Deployment / Management by Fujitsu
+
+
+ Lifecycle Management for Hardware & Software by Fujitsu
+
6
Fujitsu Maintenance, Support and Professional Services
ETERNUS CD10000: A complete offer
Copyright 2015 FUJITSU
7
Massive Scalability
 Cluster of storage nodes
 Capacity and performance scales by
adding storage nodes
 Three different node types enable
differentiated service levels
 Density, capacity optimized
 Performance optimized
 Optimized for small scale dev & test
 1st version of CD10000 (Q3.2014) is
released for a range of 4 to 224 nodes
 Scales up to >50 Petabyte
Copyright 2015 FUJITSU
Basic node 12 TB Performance node 35 TB Capacity node 252 TB
8
Immortal System
Copyright 2015 FUJITSU
Node1 Node2 Node(n)
+
Adding nodes
with new generation
of hardware
………+
Adding nodes
 Non-disruptive add / remove / exchange of hardware (disks and nodes)
 Mix of nodes/disks of different generations, online technology refresh
 Very long lifecycle reduces migration efforts and costs
9
TCO optimized
 Based on x86 industry standard architectures
 Based on open source software (Ceph)
 High-availability and self-optimizing functions are part
of the design at no extra costs
 Highly automated and fully integrated management
reduces operational efforts
 Online maintenance and technology refresh reduce
costs of downtime dramatically
 Extreme long lifecycle delivers investment protection
 End-to-end design an maintenance from Fujitsu
reduces, evaluation, integration, maintenance costs
Copyright 2015 FUJITSU
Better service levels at reduced costs – business centric storage
10
One storage – seamless management
 ETERNUS CD10000 delivers one seamless
management for the complete stack
 Central Ceph software deployment
 Central storage node management
 Central network management
 Central log file management
 Central cluster management
 Central configuration, administration and
maintenance
 SNMP integration of all nodes and
network components
Copyright 2015 FUJITSU
11
Seamless management (2)
Dashboard – Overview of cluster status
Server Management – Management of cluster hardware – add/remove server
(storage node), replace storage devices
Cluster Management – Management of cluster resources – cluster and pool creation
Monitoring the cluster – Monitoring overall capacity, pool utilization, status of OSD,
Monitor, and MDS processes, Placement Group status, and RBD status
Managing OpenStack Interoperation: Connection to OpenStack Server, and
placement of pools in Cinder multi-backend
12
Optional use of Calamari Management GUI
12
13
Example: Replacing an HDD
 Plain Ceph
 taking the failed disk offline in Ceph
 taking the failed disk offline on OS /
Controller Level
 identify (right) hard drive in server
 exchanging hard drive
 partitioning hard drive on OS level
 Make and mount file system
 bring the disk up in Ceph again
 On ETERNUS CD10000
 vsm_cli <cluster> replace-disk-out
<node> <dev>
 exchange hard drive
 vsm_cli <cluster> replace-disk-in
<node> <dev>
14
Example: Adding a Node
 Plain Ceph
 Install hardware
 Install OS
 Configure OS
 Partition disks (OSDs, Journals)
 Make filesystems
 Configure network
 Configure ssh
 Configure Ceph
 Add node to cluster
 On ETERNUS CD10000
 Install hardware
• hardware will automatically PXE boot
and install the current cluster
environment including current
configuration
 Node automatically available to GUI
 Add node to cluster with mouse click
on GUI
• Automatic PG adjustment if needed
16
Adding and Integrating Apps
 The ETERNUS CD10000 architecture
enables the integration of apps
 Fujitsu is working with customers and
software vendors to integrate selected
storage apps
 E.g. archiving, sync & share, data
discovery, cloud apps…
Copyright 2015 FUJITSU
Cloud
Services
Sync
& Share
Archive
iRODS
data
discovery
ETERNUSCD10000
Object Level
Access
Block Level
Access
File Level
Access
Central Management
Ceph Storage System S/W and Fujitsu
Extensions
10GbE Frontend Network
Fast Interconnect Network
PerformanceNodes
CapacityNodes
17
ETERNUS CD10000 at University Mainz
 Large university in Germany
 Uses iRODS Application for library services
 iRODS is an open-source data management software in use at research
organizations and government agencies worldwide
 Organizes and manages large depots of distributed digital data
 Customer has built an interface from iRODS to Ceph
 Stores raw data of measurement instruments (e.g. research in chemistry and
physics) for 10+ years meeting compliance rules of the EU
 Need to provide extensive and rapidly growing data volumes online at
reasonable costs
 Will implement a sync & share service on top of ETERNUS CD10000
19
Summary ETERNUS CD10k – Key Values
Copyright 2015 FUJITSU
ETERNUS CD10000
ETERNUS
CD10000
Unlimited
Scalability
TCO
optimized
The new
unified
Immortal
System
Zero
Downtime
ETERNUS CD10000 combines open source storage with enterprise–class quality of service
20
 The safe way to make Ceph storage enterprise ready
 ETERNUS CD10k integrated in OpenStack
 mSHEC Erasure Code from Fujitsu
 Contribution to performance enhancements
21
What is OpenStack
Free open source (Apache license) software governed by a non-profit foundation
(corporation) with a mission to produce the ubiquitous Open Source Cloud
Computing platform that will meet the needs of public and private clouds
regardless of size, by being simple to implement and massively scalable.
Platin
Gold
Corporate
…
…
 Massively scalable cloud operating system that
controls large pools of compute, storage, and
networking resources
 Community OSS with contributions from 1000+
developers and 180+ participating organizations
 Open web-based API Programmatic IaaS
 Plug-in architecture; allows different hypervisors,
block storage systems, network implementations,
hardware agnostic, etc.
http://www.openstack.org/foundation/companies/
23
Attained fast growing customer interest
 VMware clouds dominate
 OpenStack clouds already #2
 Worldwide adoption
Source: OpenStack User Survey and Feedback Nov 3rd 2014
Source: OpenStack User Survey and Feedback May 13th 2014
25
OpenStack.org User Survey Paris: Nov. 2014
26
OpenStack Cloud Layers
OpenStack and ETERNUS CD10000
Physical Server (CPU, Memory, SSD, HDD) and Network
Base Operating System (CentOS)
OAM
-dhcp
-Deploy
-LCM
Hypervisor
KVM, ESXi,
Hyper-V
Compute (Nova)
Network
(Neutron) +
plugins
Dashboard (Horizon)
Billing Portal
OpenStack
Cloud APIs
RADOS
Block
(RBD)
S3
(Rados-GW)
Object (Swift)Volume (Cinder)
Authentication (Keystone)
Images (Glance)
EC2 API
Metering (Ceilometer)
Manila (File)
File
(CephFS)
Fujitsu
Open Cloud
Storage
28
 The safe way to make Ceph storage enterprise ready
 ETERNUS CD10k integrated in OpenStack
 mSHEC Erasure Code from Fujitsu
 Contribution to performance enhancements
29
Backgrounds (1)
 Erasure codes for content data
 Content data for ICT services is ever-growing
 Demand for higher space efficiency and durability
 Reed Solomon code (de facto erasure code) improves both
Reed Solomon Code(Old style)Triple Replication
However, Reed Solomon code is not so recovery-efficient
content data
copy copy
3x space
parity parity
1.5x space
content data
30
Backgrounds (2)
 Local parity improves recovery efficiency
 Data recovery should be as efficient as possible
• in order to avoid multiple disk failures and data loss
 Reed Solomon code was improved by local parity methods
• data read from disks is reduced during recovery
Data Chunks
Parity Chunks
Reed Solomon Code
(No Local Parities) Local Parities
data read from disks
However, multiple disk failures is out of consideration
A Local Parity Method
31
 Local parity method for multiple disk failures
 Existing methods is optimized for single disk failure
• e.g. Microsoft MS-LRC, Facebook Xorbas
 However, Its recovery overhead is large in case of multiple disk failures
• because they have a chance to use global parities for recovery
Our Goal
A Local Parity Method
Our goal is a method efficiently handling multiple disk failures
Multiple Disk Failures
32
 SHEC (= Shingled Erasure Code)
 An erasure code only with local parity groups
• to improve recovery efficiency in case of multiple disk failures
 The calculation ranges of local parities are shifted and partly overlap with each
other (like the shingles on a roof)
• to keep enough durability
Our Proposal Method (SHEC)
k : data chunks (=10)
m :
parity
chunks
(=6)
l : calculation range (=5)
34
Tools for sizing erasure code parameters
 Calculate MTTR and PDL for
Reed Solomon and SHEC
 Inputs
• No. of OSDs
• OSD size
• Data rate
• Annual disk failure rate
 Output
• Plot for each configuration
• Plus recovery efficiency
 Identify
 Mark the plot for a specific layout
 Example, RS(4,3)
Copyright 2015 FUJITSU
35
Tuning tool to select best layout based
 Based on user specifications
 Minimum Reliability
 Capacity efficiency
 Recovery efficiency
 E.g. for Reed/Solomon and …
 Reliability > 1e-9
 Capacity efficiency > 60%
 Recovery overhead < 6 reads
 Results are
 RS(6,4)=2.16e-16, 60%, 6 reads
 RS(6,3)=1.90e-12, 67%, 6 reads
 RS(5,3)=8.80e-13, 63%, 5 reads
• (or all possibilities can be listed)
Copyright 2015 FUJITSU
36
1. mSHEC is more adjustable than Reed Solomon code,
because SHEC provides many recovery-efficient layouts
including Reed Solomon codes
2. mSHEC’s recovery time was ~20% faster than Reed
Solomon code in case of double disk failures
3. mSHEC erasure-code included in Hammer release
4. For more information see
https://wiki.ceph.com/Planning/Blueprints/Hammer/Shingled_Erasure_Code_(SHEC)
or ask Fujitsu
Summary mSHEC
37
 The safe way to make Ceph storage enterprise ready
 ETERNUS CD10k integrated in OpenStack
 mSHEC Erasure Code from Fujitsu
 Contribution to performance enhancements
38
Areas to improve Ceph performance
Ceph has an adequate performance today,
But there are performance issues which prevent us from taking full
advantage of our hardware resources.
Three main goals for improvement:
(1) Decrease latency in the Ceph code path
(2) Enhance large cluster scalability with many nodes / OSDS
(3) Improve balance of client performance with cluster recovery
40
Improve Latency: Measure first
Turn Around Time of a single Write IO (2x replication)
49
Identify Hot Spots
Thread per ceph-osd depends on complexity of Ceph cluster: 3x node with 4 OSDs
each ~700 threads per node; 9x nodes with 40 OSDs each > 100k threads per node
 ThreadPool::WorkThread is a hot spot = work in the ObjectStore / FileStore
total CPU usage during test 43.17 CPU seconds
Pipe::Writer 4.59 10.63%
Pipe::Reader 5.81 13.45%
ShardedThreadPool::WorkThreadSharded 8.08 18.70%
ThreadPool::WorkThread 15.56 36.04%
FileJournal::Writer 2.41 5.57%
FileJournal::WriteFinisher 1.01 2.33%
Finisher::finisher_thread_entry 2.86 6.63%
50
Propose changes
 most of the work is done in FileStore::do_transactions
 each write transaction consists of
 3 calls to omap_setkeys,
 the actual call to write to the file system
 2 calls to setattr
 Proposal: coalesce calls to omap_setkeys
 1 function call instead of 3 calls, set 5 key value pairs instead of 6 (duplicate key)
 Official change was to coalesce at the higher PG layer
52
See the difference (hopefully)
Reduced latency in ThreadPool::WorkThread by 54 microseconds = 23%
Significant reduction of CPU usage at the ceph-osd: 9% for the complete ceph-osd
 Approx 5% better performance at the Ceph client
total CPU usage during test 43.17 CPU seconds 39.33 CPU seconds
Pipe::Writer 4.59 10.63% 4.73 12.02%
Pipe::Reader 5.81 13.45% 5.91 15.04%
ShardedThreadPool::WorkThreadSharded 8.08 18.70% 7.94 20.18%
ThreadPool::WorkThread 15.56 36.04% 12.45 31.66%
FileJournal::Writer 2.41 5.57% 2.44 6.22%
FileJournal::WriteFinisher 1.01 2.33% 1.03 2.61%
Finisher::finisher_thread_entry 2.86 6.63% 2.76 7.01%
53
Other areas of investigation and improvement
 Lock analysis
 RWLock instead of mutex
 Start with CRC locks
 Bufferlist tuning
 Optimize for jumbo packets
 malloc issues
 Working closely with Ceph developer community
 Weekly meetings
 Hack-a-thons
Copyright 2015 FUJITSU
54
Rebalancing and client performance
 It a balancing act!
 Faster rebalance means higher reliability
 Less need for higher levels of redundancy
 But, usually means high impact to client performance
 Different rebalancing scenarios
Copyright 2015 FUJITSU
55
Case 1: Add Node
 Rebalance behavior = many to few
 The newly added OSDs act as a throttle
 Added node is very busy not (initially) a bottleneck
 Client traffic is directed to original nodes
 Client traffic will migrate to new node as more pgs become active+clean
 No data at risk
 No replication or reconstruction
 Best case scenario
Copyright 2012 FUJITSU LIMITED
`
Add
56
 Backfill behavior = many to few
 Remaining OSDs in affected node act as a throttle
 Similar to add node, but with less data movement
 But now, active client data is being recovered
 Client traffic can be blocked while a PG is being recovered
 Client reads and writes can be severely impacted
Case 2: Disk Failure
Copyright 2012 FUJITSU LIMITED
X
57
Case 3: Node failure
Copyright 2012 FUJITSU LIMITED
 Backfill behavior = many to many
 Every node can be a hotspot
• No single node acting as a throttle
 Worst case scenario
 Recovery parameters that work well with adding nodes or handling disk failures can make
a cluster practical unusable when a node fails
X
58
Ceph is highly tunable for recovery operations
 One setting doesn’t fit all scenarios
 Currently profiling Ceph behavior and its impact on client traffic
 Possible solution
 Offline recovery for disk/node failures
• Or “trickle” recovery
• Rely on erasure coding to recover objects between recovery times or during long
rebuilds
60
Summary and Conclusion
ETERNUS CD10k is the safe way to make Ceph enterprise ready
 Unlimited Scalability: 4 to 224 nodes, scales up to >50 Petabyte
 Immortal System with Zero downtime: Non-disruptive add / remove /
exchange of hardware (disks and nodes) or Software update
 TCO optimized: Highly automated and fully integrated management reduces
operational efforts
 Tight integration in OpenStack with own GUI
Fujitsu will continue to enhance ease-of-use and performance
 This is important!
 As Ceph’s popularity increases, competitors will attack Ceph in these areas.
61
62
4K Random IO - HDDs
Copyright 2012 FUJITSU LIMITED
32
64
128
256
512
1024
2048
4096
8192
1 2 4 8 16 32 64 128
Firefly
Firefly on
CentOS7
Hammer on
CentOS7
Hammer w/
4.1 Client
IOPs
Outstanding
Requests
4K Random Read
32
64
128
256
512
1024
2048
1 2 4 8 16 32 64 128
Firefly
Firefly on
CentOS7
Hammer on
CentOS7
Hammer w/
4.1 Client
IOPs
Outstanding
Requests
4K Random Write
6363 Copyright 2015 FUJITSU
Fujitsu
Technology
Solutions
Dieter.Kasper@ts.fujitsu.com
66
Ceph is the most comprehensive implementation of Unified Storage
Overcome traditional challenges of rapidly growing and
dynamically changing storage environments:
The Ceph difference
Ceph’s CRUSH algorithm
liberates storage clusters
from the scalability and performance
limitations
imposed by centralized data table
mapping. It replicates and re-
balance data within the cluster
dynamically - elminating this tedious
task for administrators, while
delivering high-performance and
infinite scalability.
http://ceph.com/ceph-storage
Librados
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
Ceph Object
Gateway
(RGW)
A bucket-based
REST gateway,
compatible with
S3 and Swift
Ceph Block
Device
(RBD)
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
Ceph File
System
(CephFS)
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
App App
Object
Host / VM
Virtual Disk
Client
Files & Dirs
Ceph Storage Cluster (RADOS)
A reliable, autonomous, distributed object store comprised
of self-healing, self-managing, intelligent storage nodes
67
Interfaces
Ceph Object
Block
File
System
RESTful
[S3,Swift]
Ceph
Object
68
Generic-
Block
Hyper-V
Generic-
File
Generic-
File
Generic-
OS
NFS-
Client
ESX
S3
Open-
Stack
Software Architecture
Storage
Node
libceph.ko
Ceph
Client
App
librados
rbd.ko ceph.ko librbd libcephfs
LIO-Target
FC IB iSCSI FCoE NFS-
gane-
sha
Ceph-
fuse
librgw
kvmSwiftS3 Hyper-V
/dev /mnt
Solaris, XEN,
Hyper-V, ESX
RGW SMB-
samba
SMB-
Client
RADOS = OSD-Swarm
not yet implemented experimental
69
Generic-
Block
Hyper-V
Generic-
File
Generic-
File
Generic-
OS
NFS-
Client
ESX
S3
Open-
Stack
Application Interface Support - Roadmap
Server
libceph.ko
Client
App
librados
rbd.ko ceph.ko librbd libcephfs
LIO-Target
FC IB iSCSI FCoE NFS-
gane-
sha
Ceph-
fuse
librgw
kvmSwiftS3 Hyper-V
/dev /mnt
Solaris, XEN,
Hyper-V, ESX
RGW SMB-
samba
SMB-
Client
RADOS = OSD-Swarm
1st 2nd 3rd
70
Building Storage with Ceph looks simple – but……
Many new Complexities
 Rightsizing server, disk types, network
bandwidth
 Silos of management tools (HW, SW..)
 Keeping Ceph versions with versions of
server HW, OS, connectivity, drivers in sync
 Management of maintenance and support
contracts of components
 Troubleshooting
Copyright 2015 FUJITSU
Build Ceph source storage yourself
Software Defined Something ;-)
https://www.youtube.com/watch?v=5O6DczyhCkE
71
Is managing Ceph based storage easy ?
 Deployment of servers
 Connecting servers
 Operating servers
 Provisioning Ceph on Servers
 Operating Ceph
 Testing compatibility of Ceph with servers and network
 Testing performance
 Updating Ceph on each server
 Start new compatibility and performance tests after
each update
 Get trained for management tools for each component
Copyright 2015 FUJITSU
72
Differentiation with Open Source
 Although basing on open source software ETERNUS CD10000
is a fully integrated and quality assured storage system
 End-to-end maintenance, consisting upgrades and troubleshooting
for the complete system from one source
 Adding functionality where Ceph has gaps (e.g. VMware, SNMP)
 Integrated management of Ceph and hardware functions
increases operational efficiency and makes Ceph simpler to use
 Performance-optimized sizing and architecture avoid bottlenecks
during operation, maintenance and failure recovery
 Adding integrated apps on top of the system
Copyright 2015 FUJITSU
ETERNUS CD10000 makes Ceph enterprise-ready
73
3rd platform implications challenges for storage
Third platform – Distributed software-defined scale-out storage
Big Data Analytics/Social Business/Mobile Broadband/Cloud Services
Manageability
 Central management
of huge storage
amounts
 Unified multi-protocol
access (block, file,
object)
 Seamless introduction
of new storage
Reliability
 Full redundancy
 Self healing
 Geographical dispersion
 Fast rebuild
Scalability
 Practically unlimited
scalability in terms of
performance & capacity
 No bottlenecks
 No hot spots

74
What is ETERNUS CD10000?
ETERNUSCD10000
Block Level
Access
Object Level
Access
File Level
Access
Central Management & SW enhancements
Ceph Storage System S/W
10GbE Frontend Network
Infiniband Backend Network
PerformanceNodes
CapacityNodes
Fujitsu Standard
Hardware
Fujitsu Software
Open Source
Application
Interface
Cinder Swift Manila
EC
Backup
Sync
& Share
Archive
OpenStack APIs
Integrated
Applications
75
Fujitsu Software Enhancements
Central software deployment
Central network, logfile, and cluster management
SNMP integration of all nodes and network components
GUI for easier deployment, configuration, administration and maintenance
Own Fujitsu Erasure Coding on Roadmap
Still 100% compatible with underlying open source
Integrated Middleware / Applications: Backup, Archive, Sync & Share, ...







76
ETERNUS CD10000 Principles
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
pseudo-random distribution
transparent creation of data copies automatic recreation of lost redundancy
automated tech refresh

More Related Content

What's hot

Accel - EMC - Data Domain Series
Accel - EMC - Data Domain SeriesAccel - EMC - Data Domain Series
Accel - EMC - Data Domain Series
accelfb
 
Les solutions EMC de sauvegarde des données avec déduplication dans les envir...
Les solutions EMC de sauvegarde des données avec déduplication dans les envir...Les solutions EMC de sauvegarde des données avec déduplication dans les envir...
Les solutions EMC de sauvegarde des données avec déduplication dans les envir...
ljaquet
 
OSS Presentation by Kevin Halgren
OSS Presentation by Kevin HalgrenOSS Presentation by Kevin Halgren
OSS Presentation by Kevin Halgren
OpenStorageSummit
 
Avamar Run Book - 5-14-2015_v3
Avamar Run Book - 5-14-2015_v3Avamar Run Book - 5-14-2015_v3
Avamar Run Book - 5-14-2015_v3
Bill Oliver
 

What's hot (20)

FalconStor NSS Presentation
FalconStor NSS PresentationFalconStor NSS Presentation
FalconStor NSS Presentation
 
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
Problem Reporting and Analysis Linux on System z -How to survive a Linux Crit...
 
Presentation data domain advanced features and functions
Presentation   data domain advanced features and functionsPresentation   data domain advanced features and functions
Presentation data domain advanced features and functions
 
EMC Deduplication Fundamentals
EMC Deduplication FundamentalsEMC Deduplication Fundamentals
EMC Deduplication Fundamentals
 
Accel - EMC - Data Domain Series
Accel - EMC - Data Domain SeriesAccel - EMC - Data Domain Series
Accel - EMC - Data Domain Series
 
Les solutions EMC de sauvegarde des données avec déduplication dans les envir...
Les solutions EMC de sauvegarde des données avec déduplication dans les envir...Les solutions EMC de sauvegarde des données avec déduplication dans les envir...
Les solutions EMC de sauvegarde des données avec déduplication dans les envir...
 
End of the Road - Facing Current Scaling Limits within OpenStack
End of the Road - Facing Current Scaling Limits within OpenStackEnd of the Road - Facing Current Scaling Limits within OpenStack
End of the Road - Facing Current Scaling Limits within OpenStack
 
Oracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified StorageOracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified Storage
 
Couchbase Performance Benchmarking
Couchbase Performance BenchmarkingCouchbase Performance Benchmarking
Couchbase Performance Benchmarking
 
OSS Presentation by Kevin Halgren
OSS Presentation by Kevin HalgrenOSS Presentation by Kevin Halgren
OSS Presentation by Kevin Halgren
 
Avamar Run Book - 5-14-2015_v3
Avamar Run Book - 5-14-2015_v3Avamar Run Book - 5-14-2015_v3
Avamar Run Book - 5-14-2015_v3
 
Emc data domain
Emc data domainEmc data domain
Emc data domain
 
Avamar 7 2010
Avamar 7 2010Avamar 7 2010
Avamar 7 2010
 
INTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORSINTEL® XEON® SCALABLE PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
 
Emc data domain® boost integration guide
Emc data domain® boost integration guideEmc data domain® boost integration guide
Emc data domain® boost integration guide
 
Implementing an NDMP backup solution using EMC NetWorker on IBM Storwize V700...
Implementing an NDMP backup solution using EMC NetWorker on IBM Storwize V700...Implementing an NDMP backup solution using EMC NetWorker on IBM Storwize V700...
Implementing an NDMP backup solution using EMC NetWorker on IBM Storwize V700...
 
Cloud Storage - Technical Whitepaper - SolidFire
Cloud  Storage - Technical Whitepaper - SolidFireCloud  Storage - Technical Whitepaper - SolidFire
Cloud Storage - Technical Whitepaper - SolidFire
 
Make room for more virtual desktops with fast storage
Make room for more virtual desktops with fast storageMake room for more virtual desktops with fast storage
Make room for more virtual desktops with fast storage
 
Конференция «Бизнес-ориентированный центр обработки данных». 21 мая 2015 г. С...
Конференция «Бизнес-ориентированный центр обработки данных». 21 мая 2015 г. С...Конференция «Бизнес-ориентированный центр обработки данных». 21 мая 2015 г. С...
Конференция «Бизнес-ориентированный центр обработки данных». 21 мая 2015 г. С...
 
EMC for Network Attached Storage (NAS) Backup and Recovery Using NDMP
EMC for Network Attached Storage (NAS) Backup and Recovery Using NDMPEMC for Network Attached Storage (NAS) Backup and Recovery Using NDMP
EMC for Network Attached Storage (NAS) Backup and Recovery Using NDMP
 

Viewers also liked

Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by WorkloadCeph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Community
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Community
 
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Community
 
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Community
 

Viewers also liked (20)

Ceph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom LabsCeph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom Labs
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by WorkloadCeph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
Ceph Day Shanghai - Community Update
Ceph Day Shanghai - Community Update Ceph Day Shanghai - Community Update
Ceph Day Shanghai - Community Update
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
 
Ceph Day Taipei - Community Update
Ceph Day Taipei - Community Update Ceph Day Taipei - Community Update
Ceph Day Taipei - Community Update
 
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
 
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools
 
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 

Similar to Ceph Day Chicago - Brining Ceph Storage to the Enterprise

BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
Boni Bruno
 
Huawei OceanStorDoradoAll-Flashtorage Systems.pdf
Huawei OceanStorDoradoAll-Flashtorage Systems.pdfHuawei OceanStorDoradoAll-Flashtorage Systems.pdf
Huawei OceanStorDoradoAll-Flashtorage Systems.pdf
vineeshen2
 

Similar to Ceph Day Chicago - Brining Ceph Storage to the Enterprise (20)

Ceph Day SF 2015 - Building your own disaster? The safe way to make Ceph ready!
Ceph Day SF 2015 - Building your own disaster? The safe way to make Ceph ready!Ceph Day SF 2015 - Building your own disaster? The safe way to make Ceph ready!
Ceph Day SF 2015 - Building your own disaster? The safe way to make Ceph ready!
 
Sioux Hot-or-Not: The future of Linux (Alan Cox)
Sioux Hot-or-Not: The future of Linux (Alan Cox)Sioux Hot-or-Not: The future of Linux (Alan Cox)
Sioux Hot-or-Not: The future of Linux (Alan Cox)
 
Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015
Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015
Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015
 
Canonical Ubuntu OpenStack Overview Presentation
Canonical Ubuntu OpenStack Overview PresentationCanonical Ubuntu OpenStack Overview Presentation
Canonical Ubuntu OpenStack Overview Presentation
 
The State of CXL-related Activities within OCP
The State of CXL-related Activities within OCPThe State of CXL-related Activities within OCP
The State of CXL-related Activities within OCP
 
Le soluzioni tecnologiche a supporto del mondo OpenStack e Container
Le soluzioni tecnologiche a supporto del mondo OpenStack e ContainerLe soluzioni tecnologiche a supporto del mondo OpenStack e Container
Le soluzioni tecnologiche a supporto del mondo OpenStack e Container
 
T12.Fujitsu World Tour India 2016-Your Datacenter‘s backbone
T12.Fujitsu World Tour India 2016-Your Datacenter‘s backboneT12.Fujitsu World Tour India 2016-Your Datacenter‘s backbone
T12.Fujitsu World Tour India 2016-Your Datacenter‘s backbone
 
Peanut Butter and jelly: Mapping the deep Integration between Ceph and OpenStack
Peanut Butter and jelly: Mapping the deep Integration between Ceph and OpenStackPeanut Butter and jelly: Mapping the deep Integration between Ceph and OpenStack
Peanut Butter and jelly: Mapping the deep Integration between Ceph and OpenStack
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Presentation cisco unified fabric
Presentation   cisco unified fabricPresentation   cisco unified fabric
Presentation cisco unified fabric
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
 
Huawei OceanStorDoradoAll-Flashtorage Systems.pdf
Huawei OceanStorDoradoAll-Flashtorage Systems.pdfHuawei OceanStorDoradoAll-Flashtorage Systems.pdf
Huawei OceanStorDoradoAll-Flashtorage Systems.pdf
 
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchPivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
IBM NYSE event - 1-16 IBM's Alex Yost and Sean Poulley on IBM X6 Technology B...
IBM NYSE event - 1-16 IBM's Alex Yost and Sean Poulley on IBM X6 Technology B...IBM NYSE event - 1-16 IBM's Alex Yost and Sean Poulley on IBM X6 Technology B...
IBM NYSE event - 1-16 IBM's Alex Yost and Sean Poulley on IBM X6 Technology B...
 
Software defined storage rev. 2.0
Software defined storage rev. 2.0 Software defined storage rev. 2.0
Software defined storage rev. 2.0
 
Pro sphere customer technical
Pro sphere customer technicalPro sphere customer technical
Pro sphere customer technical
 
Emc vspex customer_presentation_euc_citrix_xen_desktop5.6_2.0
Emc vspex customer_presentation_euc_citrix_xen_desktop5.6_2.0Emc vspex customer_presentation_euc_citrix_xen_desktop5.6_2.0
Emc vspex customer_presentation_euc_citrix_xen_desktop5.6_2.0
 
Get the most out of your storage with the Dell EMC Unity XT 880F All-Flash ar...
Get the most out of your storage with the Dell EMC Unity XT 880F All-Flash ar...Get the most out of your storage with the Dell EMC Unity XT 880F All-Flash ar...
Get the most out of your storage with the Dell EMC Unity XT 880F All-Flash ar...
 
Reducing Cost and Complexity with Industrial System Consolidation
Reducing Cost and Complexity with Industrial System ConsolidationReducing Cost and Complexity with Industrial System Consolidation
Reducing Cost and Complexity with Industrial System Consolidation
 

Recently uploaded

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Ceph Day Chicago - Brining Ceph Storage to the Enterprise

  • 1. Bringing Ceph storage to the enterprise Copyright 2015 FUJITSU Paul von Stamwitz Sr. Storage Architect Storage Planning, R&D Center pvonstamwitz@us.fujitsu.com
  • 2. 1  The safe and convenient way to make Ceph storage enterprise ready  ETERNUS CD10k integrated in OpenStack  mSHEC Erasure Code from Fujitsu  Contribution to performance enhancements
  • 3. 2 Building Storage with Ceph looks simple Copyright 2015 FUJITSU Ceph + some servers + network = storage
  • 4. 3 Building Storage with Ceph looks simple – but…… Many new Complexities  Rightsizing server, disk types, network bandwidth  Silos of management tools (HW, SW..)  Keeping Ceph versions with versions of server HW, OS, connectivity, drivers in sync  Management of maintenance and support contracts of components  Troubleshooting Copyright 2015 FUJITSU Build Ceph source storage yourself
  • 5. 4 The challenges of software defined storage  What users want  Open standards  High scalability  High reliability  Lower costs  No-lock in from a vendor  What users may get  An own developed storage system based on open / industry standard HW & SW components  High scalability and reliability ? If the stack works !  Lower investments but higher operational efforts  Lock-in into the own stack Copyright 2015 FUJITSU
  • 6. 5 ETERNUS CD10000 – Making Ceph enterprise ready Build Ceph source storage yourself Out of the box ETERNUS CD10000 incl. support incl. maintenance ETERNUS CD10000 combines open source storage with enterprise–class quality of service  E2E Solution Contract by Fujitsu based on Red Hat Ceph Enterprise Easy Deployment / Management by Fujitsu + + + Lifecycle Management for Hardware & Software by Fujitsu +
  • 7. 6 Fujitsu Maintenance, Support and Professional Services ETERNUS CD10000: A complete offer Copyright 2015 FUJITSU
  • 8. 7 Massive Scalability  Cluster of storage nodes  Capacity and performance scales by adding storage nodes  Three different node types enable differentiated service levels  Density, capacity optimized  Performance optimized  Optimized for small scale dev & test  1st version of CD10000 (Q3.2014) is released for a range of 4 to 224 nodes  Scales up to >50 Petabyte Copyright 2015 FUJITSU Basic node 12 TB Performance node 35 TB Capacity node 252 TB
  • 9. 8 Immortal System Copyright 2015 FUJITSU Node1 Node2 Node(n) + Adding nodes with new generation of hardware ………+ Adding nodes  Non-disruptive add / remove / exchange of hardware (disks and nodes)  Mix of nodes/disks of different generations, online technology refresh  Very long lifecycle reduces migration efforts and costs
  • 10. 9 TCO optimized  Based on x86 industry standard architectures  Based on open source software (Ceph)  High-availability and self-optimizing functions are part of the design at no extra costs  Highly automated and fully integrated management reduces operational efforts  Online maintenance and technology refresh reduce costs of downtime dramatically  Extreme long lifecycle delivers investment protection  End-to-end design an maintenance from Fujitsu reduces, evaluation, integration, maintenance costs Copyright 2015 FUJITSU Better service levels at reduced costs – business centric storage
  • 11. 10 One storage – seamless management  ETERNUS CD10000 delivers one seamless management for the complete stack  Central Ceph software deployment  Central storage node management  Central network management  Central log file management  Central cluster management  Central configuration, administration and maintenance  SNMP integration of all nodes and network components Copyright 2015 FUJITSU
  • 12. 11 Seamless management (2) Dashboard – Overview of cluster status Server Management – Management of cluster hardware – add/remove server (storage node), replace storage devices Cluster Management – Management of cluster resources – cluster and pool creation Monitoring the cluster – Monitoring overall capacity, pool utilization, status of OSD, Monitor, and MDS processes, Placement Group status, and RBD status Managing OpenStack Interoperation: Connection to OpenStack Server, and placement of pools in Cinder multi-backend
  • 13. 12 Optional use of Calamari Management GUI 12
  • 14. 13 Example: Replacing an HDD  Plain Ceph  taking the failed disk offline in Ceph  taking the failed disk offline on OS / Controller Level  identify (right) hard drive in server  exchanging hard drive  partitioning hard drive on OS level  Make and mount file system  bring the disk up in Ceph again  On ETERNUS CD10000  vsm_cli <cluster> replace-disk-out <node> <dev>  exchange hard drive  vsm_cli <cluster> replace-disk-in <node> <dev>
  • 15. 14 Example: Adding a Node  Plain Ceph  Install hardware  Install OS  Configure OS  Partition disks (OSDs, Journals)  Make filesystems  Configure network  Configure ssh  Configure Ceph  Add node to cluster  On ETERNUS CD10000  Install hardware • hardware will automatically PXE boot and install the current cluster environment including current configuration  Node automatically available to GUI  Add node to cluster with mouse click on GUI • Automatic PG adjustment if needed
  • 16. 16 Adding and Integrating Apps  The ETERNUS CD10000 architecture enables the integration of apps  Fujitsu is working with customers and software vendors to integrate selected storage apps  E.g. archiving, sync & share, data discovery, cloud apps… Copyright 2015 FUJITSU Cloud Services Sync & Share Archive iRODS data discovery ETERNUSCD10000 Object Level Access Block Level Access File Level Access Central Management Ceph Storage System S/W and Fujitsu Extensions 10GbE Frontend Network Fast Interconnect Network PerformanceNodes CapacityNodes
  • 17. 17 ETERNUS CD10000 at University Mainz  Large university in Germany  Uses iRODS Application for library services  iRODS is an open-source data management software in use at research organizations and government agencies worldwide  Organizes and manages large depots of distributed digital data  Customer has built an interface from iRODS to Ceph  Stores raw data of measurement instruments (e.g. research in chemistry and physics) for 10+ years meeting compliance rules of the EU  Need to provide extensive and rapidly growing data volumes online at reasonable costs  Will implement a sync & share service on top of ETERNUS CD10000
  • 18. 19 Summary ETERNUS CD10k – Key Values Copyright 2015 FUJITSU ETERNUS CD10000 ETERNUS CD10000 Unlimited Scalability TCO optimized The new unified Immortal System Zero Downtime ETERNUS CD10000 combines open source storage with enterprise–class quality of service
  • 19. 20  The safe way to make Ceph storage enterprise ready  ETERNUS CD10k integrated in OpenStack  mSHEC Erasure Code from Fujitsu  Contribution to performance enhancements
  • 20. 21 What is OpenStack Free open source (Apache license) software governed by a non-profit foundation (corporation) with a mission to produce the ubiquitous Open Source Cloud Computing platform that will meet the needs of public and private clouds regardless of size, by being simple to implement and massively scalable. Platin Gold Corporate … …  Massively scalable cloud operating system that controls large pools of compute, storage, and networking resources  Community OSS with contributions from 1000+ developers and 180+ participating organizations  Open web-based API Programmatic IaaS  Plug-in architecture; allows different hypervisors, block storage systems, network implementations, hardware agnostic, etc. http://www.openstack.org/foundation/companies/
  • 21. 23 Attained fast growing customer interest  VMware clouds dominate  OpenStack clouds already #2  Worldwide adoption Source: OpenStack User Survey and Feedback Nov 3rd 2014 Source: OpenStack User Survey and Feedback May 13th 2014
  • 22. 25 OpenStack.org User Survey Paris: Nov. 2014
  • 23. 26 OpenStack Cloud Layers OpenStack and ETERNUS CD10000 Physical Server (CPU, Memory, SSD, HDD) and Network Base Operating System (CentOS) OAM -dhcp -Deploy -LCM Hypervisor KVM, ESXi, Hyper-V Compute (Nova) Network (Neutron) + plugins Dashboard (Horizon) Billing Portal OpenStack Cloud APIs RADOS Block (RBD) S3 (Rados-GW) Object (Swift)Volume (Cinder) Authentication (Keystone) Images (Glance) EC2 API Metering (Ceilometer) Manila (File) File (CephFS) Fujitsu Open Cloud Storage
  • 24. 28  The safe way to make Ceph storage enterprise ready  ETERNUS CD10k integrated in OpenStack  mSHEC Erasure Code from Fujitsu  Contribution to performance enhancements
  • 25. 29 Backgrounds (1)  Erasure codes for content data  Content data for ICT services is ever-growing  Demand for higher space efficiency and durability  Reed Solomon code (de facto erasure code) improves both Reed Solomon Code(Old style)Triple Replication However, Reed Solomon code is not so recovery-efficient content data copy copy 3x space parity parity 1.5x space content data
  • 26. 30 Backgrounds (2)  Local parity improves recovery efficiency  Data recovery should be as efficient as possible • in order to avoid multiple disk failures and data loss  Reed Solomon code was improved by local parity methods • data read from disks is reduced during recovery Data Chunks Parity Chunks Reed Solomon Code (No Local Parities) Local Parities data read from disks However, multiple disk failures is out of consideration A Local Parity Method
  • 27. 31  Local parity method for multiple disk failures  Existing methods is optimized for single disk failure • e.g. Microsoft MS-LRC, Facebook Xorbas  However, Its recovery overhead is large in case of multiple disk failures • because they have a chance to use global parities for recovery Our Goal A Local Parity Method Our goal is a method efficiently handling multiple disk failures Multiple Disk Failures
  • 28. 32  SHEC (= Shingled Erasure Code)  An erasure code only with local parity groups • to improve recovery efficiency in case of multiple disk failures  The calculation ranges of local parities are shifted and partly overlap with each other (like the shingles on a roof) • to keep enough durability Our Proposal Method (SHEC) k : data chunks (=10) m : parity chunks (=6) l : calculation range (=5)
  • 29. 34 Tools for sizing erasure code parameters  Calculate MTTR and PDL for Reed Solomon and SHEC  Inputs • No. of OSDs • OSD size • Data rate • Annual disk failure rate  Output • Plot for each configuration • Plus recovery efficiency  Identify  Mark the plot for a specific layout  Example, RS(4,3) Copyright 2015 FUJITSU
  • 30. 35 Tuning tool to select best layout based  Based on user specifications  Minimum Reliability  Capacity efficiency  Recovery efficiency  E.g. for Reed/Solomon and …  Reliability > 1e-9  Capacity efficiency > 60%  Recovery overhead < 6 reads  Results are  RS(6,4)=2.16e-16, 60%, 6 reads  RS(6,3)=1.90e-12, 67%, 6 reads  RS(5,3)=8.80e-13, 63%, 5 reads • (or all possibilities can be listed) Copyright 2015 FUJITSU
  • 31. 36 1. mSHEC is more adjustable than Reed Solomon code, because SHEC provides many recovery-efficient layouts including Reed Solomon codes 2. mSHEC’s recovery time was ~20% faster than Reed Solomon code in case of double disk failures 3. mSHEC erasure-code included in Hammer release 4. For more information see https://wiki.ceph.com/Planning/Blueprints/Hammer/Shingled_Erasure_Code_(SHEC) or ask Fujitsu Summary mSHEC
  • 32. 37  The safe way to make Ceph storage enterprise ready  ETERNUS CD10k integrated in OpenStack  mSHEC Erasure Code from Fujitsu  Contribution to performance enhancements
  • 33. 38 Areas to improve Ceph performance Ceph has an adequate performance today, But there are performance issues which prevent us from taking full advantage of our hardware resources. Three main goals for improvement: (1) Decrease latency in the Ceph code path (2) Enhance large cluster scalability with many nodes / OSDS (3) Improve balance of client performance with cluster recovery
  • 34. 40 Improve Latency: Measure first Turn Around Time of a single Write IO (2x replication)
  • 35. 49 Identify Hot Spots Thread per ceph-osd depends on complexity of Ceph cluster: 3x node with 4 OSDs each ~700 threads per node; 9x nodes with 40 OSDs each > 100k threads per node  ThreadPool::WorkThread is a hot spot = work in the ObjectStore / FileStore total CPU usage during test 43.17 CPU seconds Pipe::Writer 4.59 10.63% Pipe::Reader 5.81 13.45% ShardedThreadPool::WorkThreadSharded 8.08 18.70% ThreadPool::WorkThread 15.56 36.04% FileJournal::Writer 2.41 5.57% FileJournal::WriteFinisher 1.01 2.33% Finisher::finisher_thread_entry 2.86 6.63%
  • 36. 50 Propose changes  most of the work is done in FileStore::do_transactions  each write transaction consists of  3 calls to omap_setkeys,  the actual call to write to the file system  2 calls to setattr  Proposal: coalesce calls to omap_setkeys  1 function call instead of 3 calls, set 5 key value pairs instead of 6 (duplicate key)  Official change was to coalesce at the higher PG layer
  • 37. 52 See the difference (hopefully) Reduced latency in ThreadPool::WorkThread by 54 microseconds = 23% Significant reduction of CPU usage at the ceph-osd: 9% for the complete ceph-osd  Approx 5% better performance at the Ceph client total CPU usage during test 43.17 CPU seconds 39.33 CPU seconds Pipe::Writer 4.59 10.63% 4.73 12.02% Pipe::Reader 5.81 13.45% 5.91 15.04% ShardedThreadPool::WorkThreadSharded 8.08 18.70% 7.94 20.18% ThreadPool::WorkThread 15.56 36.04% 12.45 31.66% FileJournal::Writer 2.41 5.57% 2.44 6.22% FileJournal::WriteFinisher 1.01 2.33% 1.03 2.61% Finisher::finisher_thread_entry 2.86 6.63% 2.76 7.01%
  • 38. 53 Other areas of investigation and improvement  Lock analysis  RWLock instead of mutex  Start with CRC locks  Bufferlist tuning  Optimize for jumbo packets  malloc issues  Working closely with Ceph developer community  Weekly meetings  Hack-a-thons Copyright 2015 FUJITSU
  • 39. 54 Rebalancing and client performance  It a balancing act!  Faster rebalance means higher reliability  Less need for higher levels of redundancy  But, usually means high impact to client performance  Different rebalancing scenarios Copyright 2015 FUJITSU
  • 40. 55 Case 1: Add Node  Rebalance behavior = many to few  The newly added OSDs act as a throttle  Added node is very busy not (initially) a bottleneck  Client traffic is directed to original nodes  Client traffic will migrate to new node as more pgs become active+clean  No data at risk  No replication or reconstruction  Best case scenario Copyright 2012 FUJITSU LIMITED ` Add
  • 41. 56  Backfill behavior = many to few  Remaining OSDs in affected node act as a throttle  Similar to add node, but with less data movement  But now, active client data is being recovered  Client traffic can be blocked while a PG is being recovered  Client reads and writes can be severely impacted Case 2: Disk Failure Copyright 2012 FUJITSU LIMITED X
  • 42. 57 Case 3: Node failure Copyright 2012 FUJITSU LIMITED  Backfill behavior = many to many  Every node can be a hotspot • No single node acting as a throttle  Worst case scenario  Recovery parameters that work well with adding nodes or handling disk failures can make a cluster practical unusable when a node fails X
  • 43. 58 Ceph is highly tunable for recovery operations  One setting doesn’t fit all scenarios  Currently profiling Ceph behavior and its impact on client traffic  Possible solution  Offline recovery for disk/node failures • Or “trickle” recovery • Rely on erasure coding to recover objects between recovery times or during long rebuilds
  • 44. 60 Summary and Conclusion ETERNUS CD10k is the safe way to make Ceph enterprise ready  Unlimited Scalability: 4 to 224 nodes, scales up to >50 Petabyte  Immortal System with Zero downtime: Non-disruptive add / remove / exchange of hardware (disks and nodes) or Software update  TCO optimized: Highly automated and fully integrated management reduces operational efforts  Tight integration in OpenStack with own GUI Fujitsu will continue to enhance ease-of-use and performance  This is important!  As Ceph’s popularity increases, competitors will attack Ceph in these areas.
  • 45. 61
  • 46. 62 4K Random IO - HDDs Copyright 2012 FUJITSU LIMITED 32 64 128 256 512 1024 2048 4096 8192 1 2 4 8 16 32 64 128 Firefly Firefly on CentOS7 Hammer on CentOS7 Hammer w/ 4.1 Client IOPs Outstanding Requests 4K Random Read 32 64 128 256 512 1024 2048 1 2 4 8 16 32 64 128 Firefly Firefly on CentOS7 Hammer on CentOS7 Hammer w/ 4.1 Client IOPs Outstanding Requests 4K Random Write
  • 47. 6363 Copyright 2015 FUJITSU Fujitsu Technology Solutions Dieter.Kasper@ts.fujitsu.com
  • 48. 66 Ceph is the most comprehensive implementation of Unified Storage Overcome traditional challenges of rapidly growing and dynamically changing storage environments: The Ceph difference Ceph’s CRUSH algorithm liberates storage clusters from the scalability and performance limitations imposed by centralized data table mapping. It replicates and re- balance data within the cluster dynamically - elminating this tedious task for administrators, while delivering high-performance and infinite scalability. http://ceph.com/ceph-storage Librados A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP Ceph Object Gateway (RGW) A bucket-based REST gateway, compatible with S3 and Swift Ceph Block Device (RBD) A reliable and fully- distributed block device, with a Linux kernel client and a QEMU/KVM driver Ceph File System (CephFS) A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE App App Object Host / VM Virtual Disk Client Files & Dirs Ceph Storage Cluster (RADOS) A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes
  • 50. 68 Generic- Block Hyper-V Generic- File Generic- File Generic- OS NFS- Client ESX S3 Open- Stack Software Architecture Storage Node libceph.ko Ceph Client App librados rbd.ko ceph.ko librbd libcephfs LIO-Target FC IB iSCSI FCoE NFS- gane- sha Ceph- fuse librgw kvmSwiftS3 Hyper-V /dev /mnt Solaris, XEN, Hyper-V, ESX RGW SMB- samba SMB- Client RADOS = OSD-Swarm not yet implemented experimental
  • 51. 69 Generic- Block Hyper-V Generic- File Generic- File Generic- OS NFS- Client ESX S3 Open- Stack Application Interface Support - Roadmap Server libceph.ko Client App librados rbd.ko ceph.ko librbd libcephfs LIO-Target FC IB iSCSI FCoE NFS- gane- sha Ceph- fuse librgw kvmSwiftS3 Hyper-V /dev /mnt Solaris, XEN, Hyper-V, ESX RGW SMB- samba SMB- Client RADOS = OSD-Swarm 1st 2nd 3rd
  • 52. 70 Building Storage with Ceph looks simple – but…… Many new Complexities  Rightsizing server, disk types, network bandwidth  Silos of management tools (HW, SW..)  Keeping Ceph versions with versions of server HW, OS, connectivity, drivers in sync  Management of maintenance and support contracts of components  Troubleshooting Copyright 2015 FUJITSU Build Ceph source storage yourself Software Defined Something ;-) https://www.youtube.com/watch?v=5O6DczyhCkE
  • 53. 71 Is managing Ceph based storage easy ?  Deployment of servers  Connecting servers  Operating servers  Provisioning Ceph on Servers  Operating Ceph  Testing compatibility of Ceph with servers and network  Testing performance  Updating Ceph on each server  Start new compatibility and performance tests after each update  Get trained for management tools for each component Copyright 2015 FUJITSU
  • 54. 72 Differentiation with Open Source  Although basing on open source software ETERNUS CD10000 is a fully integrated and quality assured storage system  End-to-end maintenance, consisting upgrades and troubleshooting for the complete system from one source  Adding functionality where Ceph has gaps (e.g. VMware, SNMP)  Integrated management of Ceph and hardware functions increases operational efficiency and makes Ceph simpler to use  Performance-optimized sizing and architecture avoid bottlenecks during operation, maintenance and failure recovery  Adding integrated apps on top of the system Copyright 2015 FUJITSU ETERNUS CD10000 makes Ceph enterprise-ready
  • 55. 73 3rd platform implications challenges for storage Third platform – Distributed software-defined scale-out storage Big Data Analytics/Social Business/Mobile Broadband/Cloud Services Manageability  Central management of huge storage amounts  Unified multi-protocol access (block, file, object)  Seamless introduction of new storage Reliability  Full redundancy  Self healing  Geographical dispersion  Fast rebuild Scalability  Practically unlimited scalability in terms of performance & capacity  No bottlenecks  No hot spots 
  • 56. 74 What is ETERNUS CD10000? ETERNUSCD10000 Block Level Access Object Level Access File Level Access Central Management & SW enhancements Ceph Storage System S/W 10GbE Frontend Network Infiniband Backend Network PerformanceNodes CapacityNodes Fujitsu Standard Hardware Fujitsu Software Open Source Application Interface Cinder Swift Manila EC Backup Sync & Share Archive OpenStack APIs Integrated Applications
  • 57. 75 Fujitsu Software Enhancements Central software deployment Central network, logfile, and cluster management SNMP integration of all nodes and network components GUI for easier deployment, configuration, administration and maintenance Own Fujitsu Erasure Coding on Roadmap Still 100% compatible with underlying open source Integrated Middleware / Applications: Backup, Archive, Sync & Share, ...       
  • 58. 76 ETERNUS CD10000 Principles Storage Node Storage Node Storage Node Storage Node Storage Node Storage Node pseudo-random distribution transparent creation of data copies automatic recreation of lost redundancy automated tech refresh