Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of Storage as-a-Service

Best Practices for Ceph-
Powered Implementations of
Storage as-a-Service
Kamesh Pemmaraju, Sr. Product Mgr, Dell
Ceph Developer Day, New York City, Oct 2014

Outline
• Planning your Ceph implementation
• Ceph Use Cases
• Choosing targets for Ceph deployments
• Reference Architecture Considerations
• Dell Reference Configurations
• Customer Case Study

Planning your Ceph Implementation
• Business Requirements
– Budget considerations, organizational commitment
– Avoiding lock-in – use open source and industry standards
– Enterprise IT use cases
– Cloud applications/XaaS use cases for massive-scale, cost-effective storage
• Sizing requirements
– What is the initial storage capacity?
– Is it steady-state data usage vs. Spike data usage
– What is the expected growth rate?
• Workload requirements
– Does the workload need high performance or it is more capacity focused?
– What are IOPS/Throughput requirements?
– What type of data will be stored?
– Ephemeral vs. persistent data, Object, Block, File?
• Ceph is like a Swiss Army knife – it can tuned a wide variety of use cases. Let
us look at some of them

Ceph is like a Swiss Army Knife – it can fit in a
wide variety of target use cases
Virtualization and Private
Ceph Cloud
Target
(traditional SAN/NAS)
High Performance
(traditional SAN)
NAS & Object
Content Store
(traditional NAS)
Capacity Performance
Traditional IT
Cloud
Applications
XaaS Compute Cloud
Open Source Block
XaaS Content Store
Open Source NAS/Object
Ceph Target

USE CASE: OPENSTACK
Copyright © 2013 by Inktank | Private and Confidential
5

USE CASE: OPENSTACK
6
Volumes Ephemeral
Copy-on-Write Snapshots

USE CASE: OPENSTACK
7

USE CASE: CLOUD STORAGE
8
S3/Swift S3/Swift S3/Swift S3/Swift

USE CASE: WEBSCALE APPLICATIONS
9
Native
Protocol
Native
Protocol
Native
Protocol
Native
Protocol

USE CASE: PERFORMANCE BLOCK
10
CEPH STORAGE CLUSTER

Read/Write Read/Write
11

Write Write Read Read
12

USE CASE: ARCHIVE / COLD STORAGE
13

USE CASE: DATABASES
14
Native
Protocol
Native
Protocol
Native
Protocol
Native
Protocol

USE CASE: HADOOP
15
Native
Protocol
Native
Protocol
Native
Protocol
Native
Protocol

Architectural considerations – Redundancy and
replication considerations
• Tradeoff between Cost vs. Reliability (use-case dependent)
• Use the Crush configs to map out your failures domains and performance pools
• Failure domains
– Disk (OSD and OS)
– SSD journals
– Node
– Rack
– Site (replication at the RADOS level, Block replication, consider latencies)
• Storage pools
– SSD pool for higher performance
– Capacity pool
• Plan for failure domains of the monitor nodes
• Consider failure replacement scenarios, lowered redundancies, and performance
impacts

Server Considerations
• Storage Node:
– one OSD per HDD, 1 – 2 GB ram, and 1 Gz/core/OSD,
– SSD’s for journaling and for using SSD pooling (tiering) in Firefly
– Erasure coding will increase useable capacity at the expense of additional compute
load
– SAS JBOD expanders for extra capacity (beware of extra latency, oversubscribed
SAS lanes, large footprint for a failure zone)
• Monitor nodes (MON): odd number for quorum, services
can be hosted on the storage node for smaller
deployments, but will need dedicated nodes larger
installations
• Dedicated RADOS Gateway nodes for large object store
deployments and for federated gateways for multi-site

Networking Considerations
• Dedicated or Shared network
– Be sure to involve the networking and security teams early when designing your
networking options
– Network redundancy considerations
– Dedicated client and OSD networks
– VLAN’s vs. Dedicated switches
– 1 Gbs vs 10 Gbs vs 40 Gbs!
• Networking design
– Spine and Leaf
– Multi-rack
– Core fabric connectivity
– WAN connectivity and latency issues for multi-site deployments

Ceph additions coming to the Dell Red Hat
OpenStack solution
Pilot configuration Components
• Dell PowerEdge R620/R720/R720XD Servers
• Dell Networking S4810/S55 Switches, 10GB
• Red Hat Enterprise Linux OpenStack Platform
• Dell ProSupport
• Dell Professional Services
• Avail. w/wo High Availability
Specs at a glance
• Node 1: Red Hat Openstack Manager
• Node 2: OpenStack Controller (2 additional controllers
for HA)
• Nodes 3-8: OpenStack Nova Compute
• Nodes: 9-11: Ceph 12x3 TB raw storage
• Network Switches: Dell Networking S4810/S55
• Supports ~ 170-228 virtual machines
Benefits
• Rapid on-ramp to OpenStack cloud
• Scale up, modular compute and storage blocks
• Single point of contact for solution support
• Enterprise-grade OpenStack software package
Storage
bundles

Example Ceph Dell Server Configurations
Type Size Components
Performance 20 TB • R720XD
• 24 GB DRAM
• 10 X 4 TB HDD (data drives)
• 2 X 300 GB SSD (journal)
Capacity 44TB /
105 TB*
• R720XD
• 64 GB DRAM
• 2 X 300 GB SSH (journal)
• MD1200
• 12 X 4 TB HHD (data drives)
Extra Capacity 144 TB /
240 TB*
• R720XD
• 128 GB DRAM
• MD3060e (JBOD)
• 60 X 4 TB HHD (data drives)

What Are We Doing To Enable?
• Dell & Red Hat & Inktank have partnered to bring a complete
Enterprise-grade storage solution for RHEL-OSP + Ceph
• The joint solution provides:
– Co-engineered and validated Reference Architecture
– Pre-configured storage bundles optimized for performance or
storage
– Storage enhancements to existing OpenStack Bundles
– Certification against RHEL-OSP
– Professional Services, Support, and Training
› Collaborative Support for Dell hardware customers
› Deployment services & tools

Overcoming a data deluge
US university that specializes in Cancer and Genomic research
• 900 researchers
• Data sets challenging resources
• Research data scattered everywhere
• Transferring datasets took forever and clogged
shared networks
• Distributed data management reduced
productivity and put data at risk
• Needed centralized repository for compliance
Dell - Confidential

Research Computing System (Originally)
A collection of grids, proto-clouds, tons of virtualization and DevOps
HPC
Cluster
HPC
Cluster
HPC
Storage
DDR Infiniband QDR Infiniband
1Gb Ethernet
University Research Network
Interactive Services
Thumb
drives
Local
servers
Laptops
Laptops
Thumb
drives
Local
servers
Dell - Confidential

Solution: a scale-out storage cloud
Based on OpenStack and Ceph
• Housed and managed centrally, accessible
across campus network
− File system + cluster, can grow as big as you want
− Provisions from a massive common pool
− 400+ TBs at less than 41¢/GB; scalable to 5PB
• Researchers gain
− Work with larger, more diverse data sets
− Save workflows for new devices & analysis
− Qualify for grants due to new levels of protection
• Demonstrating utility with applications
− Research storage
− Crashplan (cloud back up) on POC
− Gitlab hosting on POC
“We’ve made it possible for users to
satisfy their own storage needs with
the Dell private cloud, so that their
research is not hampered by IT.”
David L. Shealy, PhD
Faculty Director, Research Computing
Chairman, Dept. of Physics
Dell - Confidential

Research Computing System (Today)
Centralized storage cloud based on OpenStack and Ceph
Ceph
node
Cep
node
Ceph
node
Ceph
node
Ceph
node
POC
Open
Stack
node
HPC
Cluster
HPC
Cluster
HPC
Storage
10Gb Ethernet
Cloud services layer
Virtualized server and storage computing cloud
based on OpenStack, Crowbar and Ceph
Dell - Confidential

Building a research cloud
Project goals extend well beyond data management
• Designed to support emerging
data-intensive scientific computing paradigm
− 12 x 16-core compute nodes
− 1 TB RAM, 420 TBs storage
− 36 TBs storage attached to each compute node
• Individually customized test/development/
production environments
− Direct user control over all aspects of the
application environment
− Rapid setup and teardown
• Growing set of cloud-based tools & services
− Easily integrate shareware, open source, and
commercial software
“We envision the OpenStack-based
cloud to act as the gateway to our
HPC resources, not only as the
purveyor of services we provide, but
also enabling users to build their own
cloud-based services.”
John-Paul Robinson, System Architect
Dell - Confidential

Research Computing System (Next Gen)
A cloud-based computing environment with high speed access to
dedicated and dynamic compute resources
Open
Stack
node
Open
Stack
node
Open
Stack
node
Ceph
node
Ceph
node
Ceph
node
Ceph
node
Ceph
node
Open
Stack
node
Open
Stack
node
Open
Stack
node
Open
Stack
node
HPC
Cluster
HPC
Cluster
HPC
Storage
10Gb Ethernet
Cloud services layer
Virtualized server and storage computing cloud
based on OpenStack, Crowbar and Ceph
Dell - Confidential

Contact Information
Reach Kamesh additional information:
Kamesh_Pemmaraju@Dell.com
@kpemmaraju
http://www.cloudel.com

Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of Storage as-a-Service

Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of Storage as-a-Service

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of Storage as-a-Service

Similar to Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of Storage as-a-Service (20)

Recently uploaded

Recently uploaded (20)

Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of Storage as-a-Service

Editor's Notes