SlideShare a Scribd company logo
1
Make the Future with China!
Ceph: Open Source Storage Software Optimizations
on Intel® Architecture for Cloud Workloads
Jian Zhang – Software Engineer, Intel Corporation
DATS005
2
Agenda
• The Problem
• Ceph Introduction
• Ceph Performance
• Ceph Cache Tiering and Erasure Code
• Intel Product Portfolio for Ceph
• Ceph Best Practices
• Summary
3
Agenda
• The Problem
• Ceph Introduction
• Ceph Performance
• Ceph Cache Tiering and Erasure Code
• Intel Product Portfolio for Ceph
• Ceph Best Practices
• Summary
4
The Problem: Data Big Bang
From 2013 to 2020, the digital universe will grow by a factor of 10, from 4.4 ZB to 44 ZB
It more than doubles every two years.
Data needs are growing at a rate unsustainable with today’s infrastructure and labor costs
Source: IDC – The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things - April 2014
1 2 3 4 7
12
19
30
48
77
125
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
1.2
COST CHALLENGES COTINUE TO
GROW
Storage Cost structure needs
a fundamental shift
Storage
Capacity in
TB 62%
CAGR
IT
Budget
2%
CAGR
IT PROS WILL SHOULDER A
GREATER STORAGE BURDEN
230 GB
Per IT Pro
1,231 GB
Per IT Pro
5
Diverse Workloads & Cost Drive Need for Distributed Storage
Traditional Workloads Today’s Workloads
Diverse Workloads
Management
Scale on Demand
Increasing Complexity
Cost
Traditional Workloads Today’s Trends
Storage Fabric Network Fabric
Mobile HPC Big Data &
Analytics
CloudOLTP Email ERP
Apps Apps
Challenges
CRM
6
Distributed Storage
Traditional Scale-up Model
SAN
(Storage Area
Network)
Server
High Availability (Failover)
High perf workloads (e.g., database)
Enterprise Mission Critical Hybrid Cloud
Limited Scale
Costly (Cap-ex and Op-ex)
Pay as you Grow, massive on-demand scale
Cost, Performance optimized
Open and commercial solutions on x86 servers
Applicable to cloud workloads
Not a good fit for traditional high perf workloads
• Ceph is the most popular† open source virtual block storage option. Also provides object, file
(experimental).
• Strong customer interest - several production implementations already.
Distributed Storage Model
Application Servers
Storage Nodes
Storage Client
Metadata Servers
Converged
Network
† OpenStack User Survey Insights: http://superuser.openstack.org/articles/openstack-user-survey-insights-november-2014
7
Agenda
• The Problem
• Ceph Introduction
• Ceph Performance
• Ceph Cache Tiering and Erasure Code
• Intel Product Portfolio for Ceph
• Ceph Best Practices
• Summary
8
Ceph Introduction
• Ceph is an open-source, massively scalable, software-defined storage system which
provides object, block and file system storage in a single platform. It runs on commodity
hardware—saving you costs, giving you flexibility—and because it’s in the Linux* kernel, it’s
easy to consume.
• Object Store (RADOSGW)
- A bucket based REST gateway
- Compatible with S3 and swift
• File System (CEPH FS)
- A POSIX-compliant distributed file system
- Kernel client and FUSE
• Block device service (RBD)
- OpenStack* native support
- Kernel client and QEMU/KVM driver
RADOS
A software-based, reliable, autonomous, distributed
object store comprised of self-healing, self-managing,
intelligent storage nodes and lightweight monitors
LIBRADOS
A library allowing apps to directly access RADOS
RGW
A web services
gateway for
object storage
Application
RBD
A reliable, fully
distributed block
device
CephFS
A distributed file
system with
POSIX semantics
Host/VM Client
9
Ceph Cluster Overview
Client Servers
Storage Servers
• Ceph Clients
- Block/Object/File system storage
- User space or kernel driver
• Peer to Peer via Ethernet
- Direct access to storage
- No centralized metadata = no
bottlenecks
• Ceph Storage Nodes
- Data distributed and replicated
across nodes
- No single point of failure
- Scale capacity and performance
with additional nodes
Ceph scales to 1000s of nodes
Ethernet
KVM
Application
Guest OS
Application
Guest OS
KVM
Application
Guest OS
Application
Guest OS
KVM
Application
Guest OS
Application
Guest OS
RADOS
RBD Object File
RADOS
RBD Object File
RADOS
RBD Object File
SSD SSD
OSDOSD OSD
SSD SSD
OSDOSD OSD
SSD SSD
OSDOSD OSD
SSD SSD
OSDOSD OSD
MON MON
10
Object Store Daemon (OSD) Read and Write Flow
OSD
Disk
OSD OSD OSD
Disk Disk Disk
Compute
Node
RBD
RADOS
App
OSD
Disk
OSD OSD OSD
Disk Disk Disk
OSD
Disk
OSD OSD OSD
Disk Disk Disk
1
2 2
3
3
4
Server Server Server
Compute
Node
RBD
RADOS
App
1 2
WriteRead 1
2
3
4
Client app writes data,
RADOS sends data to
primary OSD
Primary OSD identifies
replica OSDs and sends
data, writes data to local
disk
Replica OSDs write data
to local disk, signal
completion to primary
Primary OSD signals
completion to client app
Client app
issues read
request, RADOS
sends request
to primary OSD
Primary OSD
reads data from
local disk and
completes read
request
1
2
11
Ceph WorkloadsStoragePerformance
(IOPS,Throughput)
Storage Capacity
(PB)
Lower Higher
Virtual
Machines
LowerHigher
Virtual
Block
Media
Streaming
Enterprise
Dropbox
Backup,
Archive
Remote
Disks
VDI
App
Storage
HPC
Batch
analytics
Mobile
Content
Depot
Databases
Block
Object
Current
Future
Intel Focus
Test &
Dev
FS
12
Agenda
• The Problem
• Ceph Introduction
• Ceph Performance
• Ceph Cache Tiering and Erasure Code
• Intel Product Portfolio for Ceph
• Ceph Best Practices
• Summary
13
Ceph Block Performance – Configuration
1x10Gb NIC
Test Environment Compute Node
• 2 nodes with Intel®
Xeon™ processor
x5570 @ 2.93GHz,
128GB mem
• 1 node with IntelXeon
processor E5 2680
@2.8GHz, 56GB mem
Storage Node
• IntelXeon processor
E3-1275 v2 @ 3.5
GHz
• 32GB Memory
• 1xSSD for OS
• 10x 3 TB 7200rpm
• 2x 400GB Intel® SSD
DC S3700
CEPH1
MON
OSD1 OSD10…
CEPH2
OSD1 OSD10…
CEPH3
OSD1 OSD10…
CEPH4
OSD1 OSD10…
KVM
VM1
FIO FIO
VM40
CLIENT 1
KVM
VM1
FIO FIO
VM40
CLIENT 2
KVM
VM1
FIO FIO
VM40
CLIENT 3
2x10Gb NIC
Note: See page #37, #38, #39 for system configuration and benchmark data
14
Ceph Block Performance – Measure Raw
Performance
3 TB
7200 rpm
1 Run FIO on one HDD, collect disk IO performance
Note: Sequential 64K (Client) = Random 512K (Ceph OSD)
2 Estimate cluster performance (include replication overhead for writes – 2x in this test)
3 TB
7200 rpm3 TB
7200 rpm3 TB
7200 rpm20 HDDs
(Writes)
40 HDDs
(Reads)
70
MB/s
80
MB/s
270
IOPS
160
IOPS
Random 512K
Write Read
Random 4K
Write Read
1400
MB/s
3200
MB/s
5400
IOPS
6400
IOPS
Write Read
Random 4K
Write Read
Random 512K
Note: See page #37, #38, #39 for system configuration and benchmark data
15
0 1000 2000 3000 4000 5000 6000 7000
4K Rand Read - IOPS
4K Rand Write - IOPS
64K Seq Read - MB/s
64K Seq Write - MB/s
Maximum Possible Peak Observed
Ceph Block Performance – Test Results
Drop OSD Cache
Prepare Data (dd)
Run FIO
1. 60GB Span
2. 4 IOs: Sequential
(W,R), Random (W, R)
3. 100s warm-up, 600s
test
4. RBD images – 1 to 120 Note: Random tests use Queue Depth=8, Sequential tests use Queue Depth=64
See page #39, #40, #41 for system configuration and benchmark data
CEPH Cluster Performance
88%
64%
102%
94%
Ceph performance is close to max cluster IO limit for all but random
writes – room for further optimizations
16
Ceph Block Performance – Tuning effects
0 1000 2000 3000 4000 5000 6000 7000
64K Seq Write - MB/s
64K Seq Read - MB/s
4K Rand Write - IOPS
4K Rand Read - IOPS
Ceph Block Performance Tuning Impact
--compared with default ceph.conf
Throughput W/ Tunings Throughput
2.36x
1.88x
3.59x
2.00x
Best Tuning Knobs
Read ahead = 2048
I/O merge & write cache
Omap data on a separate Disk
Large pg number: 81920
Note: See page #37, #38, #39 for system configuration and benchmark data
17
Ceph Object Performance – Configuration
1x10Gb NIC
Test Environment
Client Node
• 2 nodes with Intel®
Xeon™ processor
x5570 @ 2.93GHz,
24GB mem
Storage Node
• IntelXeon processor
E3-1280 v2 @ 3.6
GHz
• 16 GB Memory
• 1xSSD for OS
• 10x1 TB 7200rpm
• 3x 480GB Intel® SSD
S530
CEPH1
MON
OSD1 OSD10…
CEPH2
OSD1 OSD10…
CEPH3
OSD1 OSD10…
CEPH4
OSD1 OSD10…
KVM
VM1
FIO FIO
VM20
CLIENT 1
KVM
VM1
FIO FIO
VM20
CLIENT 2
2x10Gb NIC
Note: See page #40, #41 for system configuration and benchmark data
Rados Gateway
(RGW)
Rados Gateway
• 1 nodes with Intel
Xeon processor
E5-2670 @
2.6GHz, 64GB mem
18
Ceph Object Performance – Test Results
Prepare the Data
Run COSBench
1. 100 containers x 100
objects each
2. 4 IOs: 128K Read/Write;
10M Read/write
3. 100s warm-up, 600s
test
4. COSBench workers – 1
to 2048
Note: COSBench is an Intel development open source cloud object storage benchmark
https://github.com/intel-cloud/cosbench
Note: See page #42, #43 for system configuration and benchmark data
CEPH Cluster Performance
Ceph Object performance is close to max cluster IO limit
#con x #
obj
Object-
Size
RW-Mode
Worker-
Count
Avg-
ResTime
95%-
ResTime
Throughput Bandwidth Bottleneck
-- -- -- ms ms op/s MB/s --
100x100
128KB
Read 80 10 20 7,951 971 RGW CPU
Write 320 143 340 2,243 274 OSD CPU
10MB
Read 160 1,365 3,870 117 1,118 RGW NIC
Write 160 3,819 6,530 42 397 OSD NIC
19
Agenda
• The Problem
• Ceph Introduction
• Ceph Performance
• Ceph Cache Tiering and Erasure Code
• Intel Product Portfolio for Ceph
• Ceph Best Practices
• Summary
20
Ceph Cache Tiering
• An important feature towards Ceph
enterprise readiness
- Cost-effective
- High performance tier W/ more SSDs
- Use a pool of fast storage devices (Typically
SSDs) and use it as a cache for an existing
larger pool
 E.g., Reads would first check the cache pool for
a copy of the object, and then fall through to
the existing pool if there is a miss
• Cache tiering mode
- Read only
- Write back
Application
Ceph Storage Cluster
Cache Pool
(Replicated)
Backing Pool
(Erasure Coded)
21
Ceph Cache Tiering Optimization – Proxy
Read/write
Proxy the read/write operations while the object is missing in cache tier, promotion
in background
• 3.8x and 3.6x performance improvement respectively with proxy-write and proxy-read
optimization
Promotion logic
Cache tier
Base tier
Client
①
② ③
④
Promotion
logic
Cache tier
Base tier
Client
①
② ③
④
Proxy
logic
Current design Proxy design
0
2000
4000
6000
8000
0
500
1000
1500
2000
RR RW
Latency(ms)
IOPS
W/ CT IOPS W/ CT optimized IOPS
W/ CT Latency W/ CT optimized Latency
Cache Tiering optimization
4K random write
Proxy-read and write significantly improved cache tiering performance
Note: See page #37, #38, #39 for system configuration and benchmark data
22
Ceph Erasure Coding
Ceph Storage Cluster
Replicated Pool
COPY
COPY COPY
Object
Ceph Storage Cluster
Erasure Coded Pool
1
Object
2 3 4 X Y
Full Copies of stored objects
• Very high durability
• 3x (200% overhead)
• Quicker recovery
One Copy plus parity
• Cost-effective durability
• 1.5X (50% store overhead)
• Expensive recovery
23
Ceph EC optimization – I/O hint
• ISA-L EC library merged to Firefly
• EC performance
- Acceptable performance impact
 <10% degradation for 10M objects large scale read/write tests
- But: Compared with 3x replica, we now tolerate 40%
object loss with 1.6x space
• Rados I/O hint
- Provide a hint to the storage system to classify the
operate type brings differentiated storage services
- Balance throughput and cost, and boost Storage
performance
- With rados I/O hint optimization, we can get even higher
throughput compared W/O EC!
Client Node
RADOS Node
Hypervisor
Guest
VM
Qemu/Virtio
Application
RBD
RADOS
RADOS Protocol
OSD
FileStore NewStore…
SSD
Network
Hint
MemStore KVStore
Hint engine
35% performance improvement for EC write with Rados I/O Hint
SSD SSD SSD SSD
Hint engine Hint engine Hint engine
I/O hint policy engine
Hint
Note: See page #40, #41 for system configuration and benchmark data
24
Agenda
• The Problem
• Ceph Introduction
• Ceph Performance
• Ceph Cache Tiering and Erasure Code
• Intel Product Portfolio for Ceph
• Ceph Best Practices
• Summary
25
How is Intel helping?
• Deliver Storage workload optimized products and technologies
- Optimize for Silicon, Flash* and Networking technologies
• Working with the community to optimize Ceph on IA
- Make Ceph run best on Intel® Architecture (IA) based platforms (performance,
reliability and management)
• Publish IA Reference Architectures
- Share best practices with the ecosystem and community
26
CPU
Platforms
NVM
Networking
Server
Boards &
Systems
Software
Libraries
Software
Products
Opensource
Ecosystem
Intel’s Product Portfolio for Ceph
Intel® Cache Acceleration
Software (Intel® CAS)
• OEMs / ODMs
• ISVs
Intel® Storage Acceleration
Library (Intel® ISA-L)
• Virtual Storage Manager
(VSM)
• Ceph contributions: No.2
• Cephperf†
• Reference Architectures
Solution focus with Intel® platform and software ingredients. Deep collaboration
with Red Hat* and Inktank* (by Red Hat) to deliver enterprise ready Ceph solutions.
†Cephperf to be open sourced in Q2’15
27
VSM– Ceph Simplified
Home page:
https://01.org/virtual-storage-manager
Code Repository:
https://github.com/01org/virtual-storage-
manager
Issue Tracking:
https://01.org/jira/browse/VSM
Mailing list:
http://vsm-discuss.33411.n7.nabble.com/
VSM (Virtual Storage Manager) - An open source Ceph management tool developed by
Intel, and announced on 2014 Nov’s OpenStack* Paris summit, designed to help make
day to day management of Ceph easier for storage administrators.
28
Agenda
• The Problem
• Ceph Introduction
• Ceph Performance
• Ceph Cache Tiering and Erasure Code
• Intel Product Portfolio for Ceph
• Ceph Best Practices
• Summary
29
Mon
Caching
Compute Node
RADOS
Node
RADOS
Protocol
RADOS Protocol
OSD
Journal Filestore
SSD
File System
Network
SSD
MON
Ceph – Best Deployment Practices
Note: Refer to backup for detailed tuning recommendations.
• XFS, deadline scheduler
• Tune read_ahead_kb for
sequential I/O read
• Queue Depth – 64 for
sequential, 8 for random
Jumbo Frames for 10Gbps
Use 10x of default queue
params
• In small cluster, can co-
locate with OSDs with
4GB-6GB
• Beyond 100 OSDs (100
HDDs or 10 nodes),
deploy monitors on
separate nodes
• 3 monitors for < 200
nodes
• One OSD process per disk
• Approximately 1GHz Intel®
Xeon™-class CPU per OSD
• 1GB memory per OSD
30
Agenda
• The Problem
• Ceph Introduction
• Ceph Performance
• Ceph Cache Tiering and Erasure Code
• Intel Product Portfolio for Ceph
• Ceph Best Practices
• Summary
31
Summary
• Cloud workloads and cost is driving the need for distributed storage
solutions
• Strong customer interest and lots of production implementations in
Ceph
• Intel is optimizing CEPH for Intel® Architecture
32
Next Steps
• Take advantage of Intel software optimizations and reference
architectures for your production deployments
- Pilot “cephperf” in Q2’15 and give us feedback http://01.org/cephperf
• Engage with open source communities for delivering enterprise features
• Innovate storage offerings using value added features with Ceph
33
Additional Sources of Information
• A PDF of this presentation is available from our Technical Session
Catalog: www.intel.com/idfsessionsSZ. This URL is also printed on the
top of Session Agenda Pages in the Pocket Guide.
• More web based info: http://ceph.com
• Intel® Solutions Reference Architectures www.intel.com/storage
• Intel® Storage Acceleration Library (Open Source Version) -
https://01.org/intel%C2%AE-storage-acceleration-library-open-source-
version
34
Legal Notices and Disclaimers
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com,
or from the OEM or retailer.
No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual
performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and
benchmark results, visit http://www.intel.com/performance.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future
costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice.
Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a
number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual
report on Form 10-K.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current
characterized errata are available on request.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether
referenced data are accurate.
Intel, Xeon and the Intel logo are trademarks of Intel Corporation in the United States and other countries.
*Other names and brands may be claimed as the property of others.
© 2015 Intel Corporation.
35
Risk Factors
The above statements and any others in this document that refer to plans and expectations for the first quarter, the year and the future are forward-
looking statements that involve a number of risks and uncertainties. Words such as "anticipates," "expects," "intends," "plans," "believes," "seeks,"
"estimates," "may," "will," "should" and their variations identify forward-looking statements. Statements that refer to or are based on projections,
uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel's actual results, and variances from Intel's
current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements.
Intel presently considers the following to be important factors that could cause actual results to differ materially from the company's expectations.
Demand for Intel’s products is highly variable and could differ from expectations due to factors including changes in the business and economic
conditions; consumer confidence or income levels; customer acceptance of Intel’s and competitors’ products; competitive and pricing pressures,
including actions taken by competitors; supply constraints and other disruptions affecting customers; changes in customer order patterns including
order cancellations; and changes in the level of inventory at customers. Intel’s gross margin percentage could vary significantly from expectations based
on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue
levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; excess or obsolete inventory; changes in unit
costs; defects or disruptions in the supply of materials or resources; and product manufacturing quality/yields. Variations in gross margin may also be
caused by the timing of Intel product introductions and related expenses, including marketing expenses, and Intel’s ability to respond quickly to
technological developments and to introduce new features into existing products, which may result in restructuring and asset impairment charges.
Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its
suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in
currency exchange rates. Results may also be affected by the formal or informal imposition by countries of new or revised export and/or import and
doing-business regulations, which could be changed without prior notice. Intel operates in highly competitive industries and its operations have high
costs that are either fixed or difficult to reduce in the short term. The amount, timing and execution of Intel’s stock repurchase program and dividend
program could be affected by changes in Intel’s priorities for the use of cash, such as operational spending, capital spending, acquisitions, and as a result
of changes to Intel’s cash flows and changes in tax laws. Product defects or errata (deviations from published specifications) may adversely impact our
expenses, revenues and reputation. Intel’s results could be affected by litigation or regulatory matters involving intellectual property, stockholder,
consumer, antitrust, disclosure and other issues. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from
manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring
other remedies such as compulsory licensing of intellectual property. Intel’s results may be affected by the timing of closing of acquisitions, divestitures
and other significant transactions. A detailed discussion of these and other factors that could affect Intel’s results is included in Intel’s SEC filings,
including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release.
Rev. 1/15/15
36
Backup
37
Block Test Environment - Configuration Details
Client Nodes
CPU 2 x Intel ®Xeon™ x5570 @ 2.93GHz (4-core, 8 threads) (Qty: 2)
2x Intel Xeon E5 2680 @2.8GHz (16-core, 32 threads) (Qty: 1)
Memory 128 GB (8GB * 16 DDR3 1333 MHZ) or 56GB (8GB * 7) for E5 server
NIC 10Gb 82599EB
Disks 1 HDD for OS
Client VM
CPU 1 X VCPU VCPUPIN
Memory 512 MB
Ceph Nodes
CPU 1 x Intel Xeon E3-1275 V2 @ 3.5 GHz (4-core, 8
threads)
Memory 32 GB (4 x 8GB DDR3 @ 1600 MHz)
NIC 2 X 82599 10GbE
HBA/C204 {SAS2008 PCI Express* Fusion-MPT SAS-2} / {6
Series/C200 Series Chipset Family SATA AHCI
Controller}
Disks 1 x SSDSA2SH064G1GC 2.5’’ 64GB for OS
2 x Intel SSDSC2BA40 400 GB SSD (Journal)
10 x Seagate* ST3000NM0033-9ZM 3.5’’ 3TB 7200rpm
SATA HDD (Data)
Ceph cluster
OS CentOS 6.5
Kernel 2.6.32-431
Ceph 0.61.2 built from source
Client host
OS Ubuntu* 12.10
Kernel 3.6.3
Client VM
OS Ubuntu 12.10
Kernel 3.5.0-17
• XFS as file system for Data Disk
• Each Data Disk (SATA HDD) was parted into 1
partition for OSD daemon
• Default replication setting (2 replicas), 7872 pgs.
• Tunings
- Set read_ahead_kb=2048
- MTU= 8000
• Change i/o scheduler to [deadline]:
# Echo deadline >/sys/block/[dev]/queue/scheduler
38
Ceph RBD Tuned Configuration
[global]
debug default = 0
log file = /var/log/ceph/$name.log
max open files = 131072
auth cluster required = none
auth service required = none
auth client required = none
[osd]
osd mkfs type = xfs
osd mount options xfs =
rw,noatime,inode64,logbsize=256k,delaylog
osd mkfs options xfs = -f -i size=2048
filestore max inline xattr size = 254
filestore max inline xattrs = 6
osd_op_threads=20
filestore_queue_max_ops=500
filestore_queue_committing_max_ops=5000
journal_max_write_entries=1000
journal_queue_max_ops=3000
objecter_inflight_ops=10240
filestore_queue_max_bytes=1048576000
filestore_queue_committing_max_bytes
=1048576000
journal_max_write_bytes=1048576000
journal_queue_max_bytes=1048576000
ms_dispatch_throttle_bytes=1048576000
objecter_infilght_op_bytes=1048576000
filestore_max_sync_interval=10
filestore_flusher=false
filestore_flush_min=0
filestore_sync_flush=true
39
Testing Methodology
Storage interface
Use QemuRBD as storage interface
Tool
• Use “dd” to prepare data for R/W tests
• Use fio (ioengine=libaio, direct=1) to generate 4 IO
patterns: sequential write/read, random write/read
• Access Span: 60GB
• For capping tests, Seq Read/Write (60MB/s), and
Rand Read/Write (100 ops/s)
• QoS Compliance:
- For random 4k read/write cases: latency <= 20ms
- For sequential 64K read/write cases: BW >= 54 MB/s
Run rules
• Drop osds page caches ( “1” > /proc/sys/vm/drop_caches)
• 100 secs for warm up, 600 secs for data collection
• Run 4KB/64KB tests under different # of rbds (1 to 120)
Space allocation (per node)
• Data Drive:
- Sits on 10x 3TB HDD drives
- So 4800GB/40 * 2 = 240GB data space will be used on
each Data disk at 80 VMs.
• Journal:
- Sits on 2x 400GB SSD drives
- One journal partition per data drive, 10GB
40
Object Test Environment - Configuration Details
Client & RGW
CPU Client: 2 x Intel ®Xeon™ x5570 @ 2.93GHz (4-core,
8 threads) (Qty: 2)
GRW: 2x Intel Xeon E5 2670@2.6GHz (16-core,
32 threads) (Qty: 1)
Memory 128 GB (8GB * 16 DDR3 1333 MHZ) or 56GB (8GB *
7) for E5 server
NIC 10Gb 82599EB
Disks 1 HDD for OS
Ceph OSD Nodes
CPU 1 x Intel Xeon E3-1280 V2 @ 3.6 GHz (4-core, 8
threads)
Memory 32 GB (4 x 8GB DDR3 @ 1600 MHz)
NIC 2 X 82599 10GbE
HBA/C204 {SAS2308 PCI Express* Fusion-MPT SAS-2} / {6
Series/C200 Series Chipset Family SATA AHCI
Controller}
Disks 1 x SSDSA2SH064G1GC 2.5’’ 64GB for OS
3 x Intel SSDSC2CW480A3 480 GB SSD (Journal)
10 x Seagate* ST1000NM0011 3.5’’ 1TB 7200rpm
SATA HDD (Data)
Ceph cluster
OS Ubuntu 14.04
Kernel 3.13.0
Ceph 0.61.8 built from source
Client host
OS Ubuntu* 12.10
Kernel 3.6.3
• XFS as file system for Data Disk
• Each Data Disk (SATA HDD) was parted into 1
partition for OSD daemon
• Default replication setting (3 replicas), 12416 pgs.
• Tunings
- Set read_ahead_kb=2048
- MTU= 8000
41
Ceph Rados Gateway (RGW) Tuning
ceph-gateway
–rgw enable ops log = false
–rgw enable usage log = false
–rgw thread pool size = 256
–Log disabled
apache2-http-server
–<IfModule mpm_worker_module>
– ServerLimit 50
– StartServers 5
– MinSpareThreads 25
– MaxSpareThreads 75
– ThreadLimit 100
– ThreadsPerChild 100
– MaxClients 5000
– MaxRequestsPerChild 0
–</IfModule>
GW instances
–5 instances in one GW server
–Access log turned off

More Related Content

What's hot

Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
ShapeBlue
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
Sage Weil
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
OpenStack Korea Community
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Community
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
Sage Weil
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
Jose De La Rosa
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Community
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
Ceph Community
 
Ceph c01
Ceph c01Ceph c01
Ceph c01
Lâm Đào
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
Italo Santos
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
NAVER D2
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Ceph Community
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
Karan Singh
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
Trinath Somanchi
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
Emma Haruka Iwao
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
Red_Hat_Storage
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
Ceph Community
 
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSEUnderstanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
OpenStack
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
OpenStack Korea Community
 
RBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason DillamanRBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason Dillaman
Ceph Community
 

What's hot (20)

Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOcean
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
 
Ceph c01
Ceph c01Ceph c01
Ceph c01
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSEUnderstanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
RBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason DillamanRBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason Dillaman
 

Similar to Ceph: Open Source Storage Software Optimizations on Intel® Architecture for Cloud Workloads

Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Ceph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Community
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in Ceph
Ceph Community
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
Danielle Womboldt
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Community
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Community
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
Vijayendra Shamanna
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
Red_Hat_Storage
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red_Hat_Storage
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
In-Memory Computing Summit
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale
NetApp
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Community
 
Ceph: Low Fail Go Scale
Ceph: Low Fail Go Scale Ceph: Low Fail Go Scale
Ceph: Low Fail Go Scale
Ceph Community
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
In-Memory Computing Summit
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Community
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBS
Hanson Dong
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
hellobank1
 

Similar to Ceph: Open Source Storage Software Optimizations on Intel® Architecture for Cloud Workloads (20)

Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Ceph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in CephCeph Day Beijing - SPDK in Ceph
Ceph Day Beijing - SPDK in Ceph
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
Ceph: Low Fail Go Scale
Ceph: Low Fail Go Scale Ceph: Low Fail Go Scale
Ceph: Low Fail Go Scale
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBS
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 

More from Odinot Stanislas

Silicon Photonics and datacenter
Silicon Photonics and datacenterSilicon Photonics and datacenter
Silicon Photonics and datacenter
Odinot Stanislas
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
Odinot Stanislas
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Odinot Stanislas
 
SDN/NFV: Service Chaining
SDN/NFV: Service Chaining SDN/NFV: Service Chaining
SDN/NFV: Service Chaining
Odinot Stanislas
 
SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)
Odinot Stanislas
 
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesPCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
Odinot Stanislas
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Odinot Stanislas
 
Software Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture TechnologiesSoftware Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture Technologies
Odinot Stanislas
 
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Odinot Stanislas
 
Accelerate the SDN with Intel ONP
Accelerate the SDN with Intel ONPAccelerate the SDN with Intel ONP
Accelerate the SDN with Intel ONP
Odinot Stanislas
 
Moving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressMoving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM Express
Odinot Stanislas
 
Intel Cloud Builder : Siveo
Intel Cloud Builder : SiveoIntel Cloud Builder : Siveo
Intel Cloud Builder : Siveo
Odinot Stanislas
 
Configuration and deployment guide for SWIFT on Intel Architecture
Configuration and deployment guide for SWIFT on Intel ArchitectureConfiguration and deployment guide for SWIFT on Intel Architecture
Configuration and deployment guide for SWIFT on Intel Architecture
Odinot Stanislas
 
Intel IT Open Cloud - What's under the Hood and How do we Drive it?
Intel IT Open Cloud - What's under the Hood and How do we Drive it?Intel IT Open Cloud - What's under the Hood and How do we Drive it?
Intel IT Open Cloud - What's under the Hood and How do we Drive it?
Odinot Stanislas
 
Configuration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureConfiguration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® Architecture
Odinot Stanislas
 
Améliorer OpenStack avec les technologies Intel
Améliorer OpenStack avec les technologies IntelAméliorer OpenStack avec les technologies Intel
Améliorer OpenStack avec les technologies Intel
Odinot Stanislas
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Odinot Stanislas
 
Big Data and Intel® Intelligent Systems Solution for Intelligent transportation
Big Data and Intel® Intelligent Systems Solution for Intelligent transportationBig Data and Intel® Intelligent Systems Solution for Intelligent transportation
Big Data and Intel® Intelligent Systems Solution for Intelligent transportation
Odinot Stanislas
 
Big Data Solutions for Healthcare
Big Data Solutions for HealthcareBig Data Solutions for Healthcare
Big Data Solutions for Healthcare
Odinot Stanislas
 
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
Odinot Stanislas
 

More from Odinot Stanislas (20)

Silicon Photonics and datacenter
Silicon Photonics and datacenterSilicon Photonics and datacenter
Silicon Photonics and datacenter
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
SDN/NFV: Service Chaining
SDN/NFV: Service Chaining SDN/NFV: Service Chaining
SDN/NFV: Service Chaining
 
SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)
 
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesPCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
 
Software Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture TechnologiesSoftware Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture Technologies
 
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
 
Accelerate the SDN with Intel ONP
Accelerate the SDN with Intel ONPAccelerate the SDN with Intel ONP
Accelerate the SDN with Intel ONP
 
Moving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressMoving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM Express
 
Intel Cloud Builder : Siveo
Intel Cloud Builder : SiveoIntel Cloud Builder : Siveo
Intel Cloud Builder : Siveo
 
Configuration and deployment guide for SWIFT on Intel Architecture
Configuration and deployment guide for SWIFT on Intel ArchitectureConfiguration and deployment guide for SWIFT on Intel Architecture
Configuration and deployment guide for SWIFT on Intel Architecture
 
Intel IT Open Cloud - What's under the Hood and How do we Drive it?
Intel IT Open Cloud - What's under the Hood and How do we Drive it?Intel IT Open Cloud - What's under the Hood and How do we Drive it?
Intel IT Open Cloud - What's under the Hood and How do we Drive it?
 
Configuration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureConfiguration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® Architecture
 
Améliorer OpenStack avec les technologies Intel
Améliorer OpenStack avec les technologies IntelAméliorer OpenStack avec les technologies Intel
Améliorer OpenStack avec les technologies Intel
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
 
Big Data and Intel® Intelligent Systems Solution for Intelligent transportation
Big Data and Intel® Intelligent Systems Solution for Intelligent transportationBig Data and Intel® Intelligent Systems Solution for Intelligent transportation
Big Data and Intel® Intelligent Systems Solution for Intelligent transportation
 
Big Data Solutions for Healthcare
Big Data Solutions for HealthcareBig Data Solutions for Healthcare
Big Data Solutions for Healthcare
 
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
Protect Your Big Data with Intel<sup>®</sup> Xeon<sup>®</sup> Processors a..
 

Recently uploaded

GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 

Recently uploaded (20)

GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 

Ceph: Open Source Storage Software Optimizations on Intel® Architecture for Cloud Workloads

  • 1. 1 Make the Future with China! Ceph: Open Source Storage Software Optimizations on Intel® Architecture for Cloud Workloads Jian Zhang – Software Engineer, Intel Corporation DATS005
  • 2. 2 Agenda • The Problem • Ceph Introduction • Ceph Performance • Ceph Cache Tiering and Erasure Code • Intel Product Portfolio for Ceph • Ceph Best Practices • Summary
  • 3. 3 Agenda • The Problem • Ceph Introduction • Ceph Performance • Ceph Cache Tiering and Erasure Code • Intel Product Portfolio for Ceph • Ceph Best Practices • Summary
  • 4. 4 The Problem: Data Big Bang From 2013 to 2020, the digital universe will grow by a factor of 10, from 4.4 ZB to 44 ZB It more than doubles every two years. Data needs are growing at a rate unsustainable with today’s infrastructure and labor costs Source: IDC – The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things - April 2014 1 2 3 4 7 12 19 30 48 77 125 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 1.2 COST CHALLENGES COTINUE TO GROW Storage Cost structure needs a fundamental shift Storage Capacity in TB 62% CAGR IT Budget 2% CAGR IT PROS WILL SHOULDER A GREATER STORAGE BURDEN 230 GB Per IT Pro 1,231 GB Per IT Pro
  • 5. 5 Diverse Workloads & Cost Drive Need for Distributed Storage Traditional Workloads Today’s Workloads Diverse Workloads Management Scale on Demand Increasing Complexity Cost Traditional Workloads Today’s Trends Storage Fabric Network Fabric Mobile HPC Big Data & Analytics CloudOLTP Email ERP Apps Apps Challenges CRM
  • 6. 6 Distributed Storage Traditional Scale-up Model SAN (Storage Area Network) Server High Availability (Failover) High perf workloads (e.g., database) Enterprise Mission Critical Hybrid Cloud Limited Scale Costly (Cap-ex and Op-ex) Pay as you Grow, massive on-demand scale Cost, Performance optimized Open and commercial solutions on x86 servers Applicable to cloud workloads Not a good fit for traditional high perf workloads • Ceph is the most popular† open source virtual block storage option. Also provides object, file (experimental). • Strong customer interest - several production implementations already. Distributed Storage Model Application Servers Storage Nodes Storage Client Metadata Servers Converged Network † OpenStack User Survey Insights: http://superuser.openstack.org/articles/openstack-user-survey-insights-november-2014
  • 7. 7 Agenda • The Problem • Ceph Introduction • Ceph Performance • Ceph Cache Tiering and Erasure Code • Intel Product Portfolio for Ceph • Ceph Best Practices • Summary
  • 8. 8 Ceph Introduction • Ceph is an open-source, massively scalable, software-defined storage system which provides object, block and file system storage in a single platform. It runs on commodity hardware—saving you costs, giving you flexibility—and because it’s in the Linux* kernel, it’s easy to consume. • Object Store (RADOSGW) - A bucket based REST gateway - Compatible with S3 and swift • File System (CEPH FS) - A POSIX-compliant distributed file system - Kernel client and FUSE • Block device service (RBD) - OpenStack* native support - Kernel client and QEMU/KVM driver RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors LIBRADOS A library allowing apps to directly access RADOS RGW A web services gateway for object storage Application RBD A reliable, fully distributed block device CephFS A distributed file system with POSIX semantics Host/VM Client
  • 9. 9 Ceph Cluster Overview Client Servers Storage Servers • Ceph Clients - Block/Object/File system storage - User space or kernel driver • Peer to Peer via Ethernet - Direct access to storage - No centralized metadata = no bottlenecks • Ceph Storage Nodes - Data distributed and replicated across nodes - No single point of failure - Scale capacity and performance with additional nodes Ceph scales to 1000s of nodes Ethernet KVM Application Guest OS Application Guest OS KVM Application Guest OS Application Guest OS KVM Application Guest OS Application Guest OS RADOS RBD Object File RADOS RBD Object File RADOS RBD Object File SSD SSD OSDOSD OSD SSD SSD OSDOSD OSD SSD SSD OSDOSD OSD SSD SSD OSDOSD OSD MON MON
  • 10. 10 Object Store Daemon (OSD) Read and Write Flow OSD Disk OSD OSD OSD Disk Disk Disk Compute Node RBD RADOS App OSD Disk OSD OSD OSD Disk Disk Disk OSD Disk OSD OSD OSD Disk Disk Disk 1 2 2 3 3 4 Server Server Server Compute Node RBD RADOS App 1 2 WriteRead 1 2 3 4 Client app writes data, RADOS sends data to primary OSD Primary OSD identifies replica OSDs and sends data, writes data to local disk Replica OSDs write data to local disk, signal completion to primary Primary OSD signals completion to client app Client app issues read request, RADOS sends request to primary OSD Primary OSD reads data from local disk and completes read request 1 2
  • 11. 11 Ceph WorkloadsStoragePerformance (IOPS,Throughput) Storage Capacity (PB) Lower Higher Virtual Machines LowerHigher Virtual Block Media Streaming Enterprise Dropbox Backup, Archive Remote Disks VDI App Storage HPC Batch analytics Mobile Content Depot Databases Block Object Current Future Intel Focus Test & Dev FS
  • 12. 12 Agenda • The Problem • Ceph Introduction • Ceph Performance • Ceph Cache Tiering and Erasure Code • Intel Product Portfolio for Ceph • Ceph Best Practices • Summary
  • 13. 13 Ceph Block Performance – Configuration 1x10Gb NIC Test Environment Compute Node • 2 nodes with Intel® Xeon™ processor x5570 @ 2.93GHz, 128GB mem • 1 node with IntelXeon processor E5 2680 @2.8GHz, 56GB mem Storage Node • IntelXeon processor E3-1275 v2 @ 3.5 GHz • 32GB Memory • 1xSSD for OS • 10x 3 TB 7200rpm • 2x 400GB Intel® SSD DC S3700 CEPH1 MON OSD1 OSD10… CEPH2 OSD1 OSD10… CEPH3 OSD1 OSD10… CEPH4 OSD1 OSD10… KVM VM1 FIO FIO VM40 CLIENT 1 KVM VM1 FIO FIO VM40 CLIENT 2 KVM VM1 FIO FIO VM40 CLIENT 3 2x10Gb NIC Note: See page #37, #38, #39 for system configuration and benchmark data
  • 14. 14 Ceph Block Performance – Measure Raw Performance 3 TB 7200 rpm 1 Run FIO on one HDD, collect disk IO performance Note: Sequential 64K (Client) = Random 512K (Ceph OSD) 2 Estimate cluster performance (include replication overhead for writes – 2x in this test) 3 TB 7200 rpm3 TB 7200 rpm3 TB 7200 rpm20 HDDs (Writes) 40 HDDs (Reads) 70 MB/s 80 MB/s 270 IOPS 160 IOPS Random 512K Write Read Random 4K Write Read 1400 MB/s 3200 MB/s 5400 IOPS 6400 IOPS Write Read Random 4K Write Read Random 512K Note: See page #37, #38, #39 for system configuration and benchmark data
  • 15. 15 0 1000 2000 3000 4000 5000 6000 7000 4K Rand Read - IOPS 4K Rand Write - IOPS 64K Seq Read - MB/s 64K Seq Write - MB/s Maximum Possible Peak Observed Ceph Block Performance – Test Results Drop OSD Cache Prepare Data (dd) Run FIO 1. 60GB Span 2. 4 IOs: Sequential (W,R), Random (W, R) 3. 100s warm-up, 600s test 4. RBD images – 1 to 120 Note: Random tests use Queue Depth=8, Sequential tests use Queue Depth=64 See page #39, #40, #41 for system configuration and benchmark data CEPH Cluster Performance 88% 64% 102% 94% Ceph performance is close to max cluster IO limit for all but random writes – room for further optimizations
  • 16. 16 Ceph Block Performance – Tuning effects 0 1000 2000 3000 4000 5000 6000 7000 64K Seq Write - MB/s 64K Seq Read - MB/s 4K Rand Write - IOPS 4K Rand Read - IOPS Ceph Block Performance Tuning Impact --compared with default ceph.conf Throughput W/ Tunings Throughput 2.36x 1.88x 3.59x 2.00x Best Tuning Knobs Read ahead = 2048 I/O merge & write cache Omap data on a separate Disk Large pg number: 81920 Note: See page #37, #38, #39 for system configuration and benchmark data
  • 17. 17 Ceph Object Performance – Configuration 1x10Gb NIC Test Environment Client Node • 2 nodes with Intel® Xeon™ processor x5570 @ 2.93GHz, 24GB mem Storage Node • IntelXeon processor E3-1280 v2 @ 3.6 GHz • 16 GB Memory • 1xSSD for OS • 10x1 TB 7200rpm • 3x 480GB Intel® SSD S530 CEPH1 MON OSD1 OSD10… CEPH2 OSD1 OSD10… CEPH3 OSD1 OSD10… CEPH4 OSD1 OSD10… KVM VM1 FIO FIO VM20 CLIENT 1 KVM VM1 FIO FIO VM20 CLIENT 2 2x10Gb NIC Note: See page #40, #41 for system configuration and benchmark data Rados Gateway (RGW) Rados Gateway • 1 nodes with Intel Xeon processor E5-2670 @ 2.6GHz, 64GB mem
  • 18. 18 Ceph Object Performance – Test Results Prepare the Data Run COSBench 1. 100 containers x 100 objects each 2. 4 IOs: 128K Read/Write; 10M Read/write 3. 100s warm-up, 600s test 4. COSBench workers – 1 to 2048 Note: COSBench is an Intel development open source cloud object storage benchmark https://github.com/intel-cloud/cosbench Note: See page #42, #43 for system configuration and benchmark data CEPH Cluster Performance Ceph Object performance is close to max cluster IO limit #con x # obj Object- Size RW-Mode Worker- Count Avg- ResTime 95%- ResTime Throughput Bandwidth Bottleneck -- -- -- ms ms op/s MB/s -- 100x100 128KB Read 80 10 20 7,951 971 RGW CPU Write 320 143 340 2,243 274 OSD CPU 10MB Read 160 1,365 3,870 117 1,118 RGW NIC Write 160 3,819 6,530 42 397 OSD NIC
  • 19. 19 Agenda • The Problem • Ceph Introduction • Ceph Performance • Ceph Cache Tiering and Erasure Code • Intel Product Portfolio for Ceph • Ceph Best Practices • Summary
  • 20. 20 Ceph Cache Tiering • An important feature towards Ceph enterprise readiness - Cost-effective - High performance tier W/ more SSDs - Use a pool of fast storage devices (Typically SSDs) and use it as a cache for an existing larger pool  E.g., Reads would first check the cache pool for a copy of the object, and then fall through to the existing pool if there is a miss • Cache tiering mode - Read only - Write back Application Ceph Storage Cluster Cache Pool (Replicated) Backing Pool (Erasure Coded)
  • 21. 21 Ceph Cache Tiering Optimization – Proxy Read/write Proxy the read/write operations while the object is missing in cache tier, promotion in background • 3.8x and 3.6x performance improvement respectively with proxy-write and proxy-read optimization Promotion logic Cache tier Base tier Client ① ② ③ ④ Promotion logic Cache tier Base tier Client ① ② ③ ④ Proxy logic Current design Proxy design 0 2000 4000 6000 8000 0 500 1000 1500 2000 RR RW Latency(ms) IOPS W/ CT IOPS W/ CT optimized IOPS W/ CT Latency W/ CT optimized Latency Cache Tiering optimization 4K random write Proxy-read and write significantly improved cache tiering performance Note: See page #37, #38, #39 for system configuration and benchmark data
  • 22. 22 Ceph Erasure Coding Ceph Storage Cluster Replicated Pool COPY COPY COPY Object Ceph Storage Cluster Erasure Coded Pool 1 Object 2 3 4 X Y Full Copies of stored objects • Very high durability • 3x (200% overhead) • Quicker recovery One Copy plus parity • Cost-effective durability • 1.5X (50% store overhead) • Expensive recovery
  • 23. 23 Ceph EC optimization – I/O hint • ISA-L EC library merged to Firefly • EC performance - Acceptable performance impact  <10% degradation for 10M objects large scale read/write tests - But: Compared with 3x replica, we now tolerate 40% object loss with 1.6x space • Rados I/O hint - Provide a hint to the storage system to classify the operate type brings differentiated storage services - Balance throughput and cost, and boost Storage performance - With rados I/O hint optimization, we can get even higher throughput compared W/O EC! Client Node RADOS Node Hypervisor Guest VM Qemu/Virtio Application RBD RADOS RADOS Protocol OSD FileStore NewStore… SSD Network Hint MemStore KVStore Hint engine 35% performance improvement for EC write with Rados I/O Hint SSD SSD SSD SSD Hint engine Hint engine Hint engine I/O hint policy engine Hint Note: See page #40, #41 for system configuration and benchmark data
  • 24. 24 Agenda • The Problem • Ceph Introduction • Ceph Performance • Ceph Cache Tiering and Erasure Code • Intel Product Portfolio for Ceph • Ceph Best Practices • Summary
  • 25. 25 How is Intel helping? • Deliver Storage workload optimized products and technologies - Optimize for Silicon, Flash* and Networking technologies • Working with the community to optimize Ceph on IA - Make Ceph run best on Intel® Architecture (IA) based platforms (performance, reliability and management) • Publish IA Reference Architectures - Share best practices with the ecosystem and community
  • 26. 26 CPU Platforms NVM Networking Server Boards & Systems Software Libraries Software Products Opensource Ecosystem Intel’s Product Portfolio for Ceph Intel® Cache Acceleration Software (Intel® CAS) • OEMs / ODMs • ISVs Intel® Storage Acceleration Library (Intel® ISA-L) • Virtual Storage Manager (VSM) • Ceph contributions: No.2 • Cephperf† • Reference Architectures Solution focus with Intel® platform and software ingredients. Deep collaboration with Red Hat* and Inktank* (by Red Hat) to deliver enterprise ready Ceph solutions. †Cephperf to be open sourced in Q2’15
  • 27. 27 VSM– Ceph Simplified Home page: https://01.org/virtual-storage-manager Code Repository: https://github.com/01org/virtual-storage- manager Issue Tracking: https://01.org/jira/browse/VSM Mailing list: http://vsm-discuss.33411.n7.nabble.com/ VSM (Virtual Storage Manager) - An open source Ceph management tool developed by Intel, and announced on 2014 Nov’s OpenStack* Paris summit, designed to help make day to day management of Ceph easier for storage administrators.
  • 28. 28 Agenda • The Problem • Ceph Introduction • Ceph Performance • Ceph Cache Tiering and Erasure Code • Intel Product Portfolio for Ceph • Ceph Best Practices • Summary
  • 29. 29 Mon Caching Compute Node RADOS Node RADOS Protocol RADOS Protocol OSD Journal Filestore SSD File System Network SSD MON Ceph – Best Deployment Practices Note: Refer to backup for detailed tuning recommendations. • XFS, deadline scheduler • Tune read_ahead_kb for sequential I/O read • Queue Depth – 64 for sequential, 8 for random Jumbo Frames for 10Gbps Use 10x of default queue params • In small cluster, can co- locate with OSDs with 4GB-6GB • Beyond 100 OSDs (100 HDDs or 10 nodes), deploy monitors on separate nodes • 3 monitors for < 200 nodes • One OSD process per disk • Approximately 1GHz Intel® Xeon™-class CPU per OSD • 1GB memory per OSD
  • 30. 30 Agenda • The Problem • Ceph Introduction • Ceph Performance • Ceph Cache Tiering and Erasure Code • Intel Product Portfolio for Ceph • Ceph Best Practices • Summary
  • 31. 31 Summary • Cloud workloads and cost is driving the need for distributed storage solutions • Strong customer interest and lots of production implementations in Ceph • Intel is optimizing CEPH for Intel® Architecture
  • 32. 32 Next Steps • Take advantage of Intel software optimizations and reference architectures for your production deployments - Pilot “cephperf” in Q2’15 and give us feedback http://01.org/cephperf • Engage with open source communities for delivering enterprise features • Innovate storage offerings using value added features with Ceph
  • 33. 33 Additional Sources of Information • A PDF of this presentation is available from our Technical Session Catalog: www.intel.com/idfsessionsSZ. This URL is also printed on the top of Session Agenda Pages in the Pocket Guide. • More web based info: http://ceph.com • Intel® Solutions Reference Architectures www.intel.com/storage • Intel® Storage Acceleration Library (Open Source Version) - https://01.org/intel%C2%AE-storage-acceleration-library-open-source- version
  • 34. 34 Legal Notices and Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, Xeon and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. © 2015 Intel Corporation.
  • 35. 35 Risk Factors The above statements and any others in this document that refer to plans and expectations for the first quarter, the year and the future are forward- looking statements that involve a number of risks and uncertainties. Words such as "anticipates," "expects," "intends," "plans," "believes," "seeks," "estimates," "may," "will," "should" and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel's actual results, and variances from Intel's current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be important factors that could cause actual results to differ materially from the company's expectations. Demand for Intel’s products is highly variable and could differ from expectations due to factors including changes in the business and economic conditions; consumer confidence or income levels; customer acceptance of Intel’s and competitors’ products; competitive and pricing pressures, including actions taken by competitors; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Intel’s gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; and product manufacturing quality/yields. Variations in gross margin may also be caused by the timing of Intel product introductions and related expenses, including marketing expenses, and Intel’s ability to respond quickly to technological developments and to introduce new features into existing products, which may result in restructuring and asset impairment charges. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Results may also be affected by the formal or informal imposition by countries of new or revised export and/or import and doing-business regulations, which could be changed without prior notice. Intel operates in highly competitive industries and its operations have high costs that are either fixed or difficult to reduce in the short term. The amount, timing and execution of Intel’s stock repurchase program and dividend program could be affected by changes in Intel’s priorities for the use of cash, such as operational spending, capital spending, acquisitions, and as a result of changes to Intel’s cash flows and changes in tax laws. Product defects or errata (deviations from published specifications) may adversely impact our expenses, revenues and reputation. Intel’s results could be affected by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. Intel’s results may be affected by the timing of closing of acquisitions, divestitures and other significant transactions. A detailed discussion of these and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release. Rev. 1/15/15
  • 37. 37 Block Test Environment - Configuration Details Client Nodes CPU 2 x Intel ®Xeon™ x5570 @ 2.93GHz (4-core, 8 threads) (Qty: 2) 2x Intel Xeon E5 2680 @2.8GHz (16-core, 32 threads) (Qty: 1) Memory 128 GB (8GB * 16 DDR3 1333 MHZ) or 56GB (8GB * 7) for E5 server NIC 10Gb 82599EB Disks 1 HDD for OS Client VM CPU 1 X VCPU VCPUPIN Memory 512 MB Ceph Nodes CPU 1 x Intel Xeon E3-1275 V2 @ 3.5 GHz (4-core, 8 threads) Memory 32 GB (4 x 8GB DDR3 @ 1600 MHz) NIC 2 X 82599 10GbE HBA/C204 {SAS2008 PCI Express* Fusion-MPT SAS-2} / {6 Series/C200 Series Chipset Family SATA AHCI Controller} Disks 1 x SSDSA2SH064G1GC 2.5’’ 64GB for OS 2 x Intel SSDSC2BA40 400 GB SSD (Journal) 10 x Seagate* ST3000NM0033-9ZM 3.5’’ 3TB 7200rpm SATA HDD (Data) Ceph cluster OS CentOS 6.5 Kernel 2.6.32-431 Ceph 0.61.2 built from source Client host OS Ubuntu* 12.10 Kernel 3.6.3 Client VM OS Ubuntu 12.10 Kernel 3.5.0-17 • XFS as file system for Data Disk • Each Data Disk (SATA HDD) was parted into 1 partition for OSD daemon • Default replication setting (2 replicas), 7872 pgs. • Tunings - Set read_ahead_kb=2048 - MTU= 8000 • Change i/o scheduler to [deadline]: # Echo deadline >/sys/block/[dev]/queue/scheduler
  • 38. 38 Ceph RBD Tuned Configuration [global] debug default = 0 log file = /var/log/ceph/$name.log max open files = 131072 auth cluster required = none auth service required = none auth client required = none [osd] osd mkfs type = xfs osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog osd mkfs options xfs = -f -i size=2048 filestore max inline xattr size = 254 filestore max inline xattrs = 6 osd_op_threads=20 filestore_queue_max_ops=500 filestore_queue_committing_max_ops=5000 journal_max_write_entries=1000 journal_queue_max_ops=3000 objecter_inflight_ops=10240 filestore_queue_max_bytes=1048576000 filestore_queue_committing_max_bytes =1048576000 journal_max_write_bytes=1048576000 journal_queue_max_bytes=1048576000 ms_dispatch_throttle_bytes=1048576000 objecter_infilght_op_bytes=1048576000 filestore_max_sync_interval=10 filestore_flusher=false filestore_flush_min=0 filestore_sync_flush=true
  • 39. 39 Testing Methodology Storage interface Use QemuRBD as storage interface Tool • Use “dd” to prepare data for R/W tests • Use fio (ioengine=libaio, direct=1) to generate 4 IO patterns: sequential write/read, random write/read • Access Span: 60GB • For capping tests, Seq Read/Write (60MB/s), and Rand Read/Write (100 ops/s) • QoS Compliance: - For random 4k read/write cases: latency <= 20ms - For sequential 64K read/write cases: BW >= 54 MB/s Run rules • Drop osds page caches ( “1” > /proc/sys/vm/drop_caches) • 100 secs for warm up, 600 secs for data collection • Run 4KB/64KB tests under different # of rbds (1 to 120) Space allocation (per node) • Data Drive: - Sits on 10x 3TB HDD drives - So 4800GB/40 * 2 = 240GB data space will be used on each Data disk at 80 VMs. • Journal: - Sits on 2x 400GB SSD drives - One journal partition per data drive, 10GB
  • 40. 40 Object Test Environment - Configuration Details Client & RGW CPU Client: 2 x Intel ®Xeon™ x5570 @ 2.93GHz (4-core, 8 threads) (Qty: 2) GRW: 2x Intel Xeon E5 2670@2.6GHz (16-core, 32 threads) (Qty: 1) Memory 128 GB (8GB * 16 DDR3 1333 MHZ) or 56GB (8GB * 7) for E5 server NIC 10Gb 82599EB Disks 1 HDD for OS Ceph OSD Nodes CPU 1 x Intel Xeon E3-1280 V2 @ 3.6 GHz (4-core, 8 threads) Memory 32 GB (4 x 8GB DDR3 @ 1600 MHz) NIC 2 X 82599 10GbE HBA/C204 {SAS2308 PCI Express* Fusion-MPT SAS-2} / {6 Series/C200 Series Chipset Family SATA AHCI Controller} Disks 1 x SSDSA2SH064G1GC 2.5’’ 64GB for OS 3 x Intel SSDSC2CW480A3 480 GB SSD (Journal) 10 x Seagate* ST1000NM0011 3.5’’ 1TB 7200rpm SATA HDD (Data) Ceph cluster OS Ubuntu 14.04 Kernel 3.13.0 Ceph 0.61.8 built from source Client host OS Ubuntu* 12.10 Kernel 3.6.3 • XFS as file system for Data Disk • Each Data Disk (SATA HDD) was parted into 1 partition for OSD daemon • Default replication setting (3 replicas), 12416 pgs. • Tunings - Set read_ahead_kb=2048 - MTU= 8000
  • 41. 41 Ceph Rados Gateway (RGW) Tuning ceph-gateway –rgw enable ops log = false –rgw enable usage log = false –rgw thread pool size = 256 –Log disabled apache2-http-server –<IfModule mpm_worker_module> – ServerLimit 50 – StartServers 5 – MinSpareThreads 25 – MaxSpareThreads 75 – ThreadLimit 100 – ThreadsPerChild 100 – MaxClients 5000 – MaxRequestsPerChild 0 –</IfModule> GW instances –5 instances in one GW server –Access log turned off