Suse Enterprise Storage 3 provides iSCSI access to connect to ceph storage remotely over TCP/IP, allowing clients to access ceph storage using the iSCSI protocol. The iSCSI target driver in SES3 provides access to RADOS block devices. This allows any iSCSI initiator to connect to SES3 over the network. SES3 also includes optimizations for iSCSI gateways like offloading operations to object storage devices to reduce locking on gateway nodes.
Handwritten Text Recognition for manuscripts and early printed texts
Ceph Day Bring Ceph To Enterprise
1. Bring Ceph to Enterprise
Setup a 50T mobile cluster in 30 min
Alex Lau (劉俊賢)
Software Consultant
alau@suse.com
2. Block
Storage
File
System
Object
Storage
How to access to ceph storage?
Introduction of iSCSI
Remote Cluster
Data Encrypted
at Rest
Monitor
Nodes
Management
Node
Heterogeneous
OS Access
RADOS
gateway
RESTful api
iSCSI
3. SUSE Enterprise Storage 3
A first commercial available ISCSI access to
connect to SES3.
It allow client access to ceph storage remotely
over TCP/IP iscsi protocol.
SES3 provide a iscsi Target driver on top of
RDB ( RADOS block device ).
This allow any iscsi Initiator can access SES3
over network.
4. iSCSI Architecture
Technical Background
Protocol:
‒ Block storage access over TCP/IP
‒ Initiators the client that access the iscsi target over tcp/ip
‒ Targets, the server that provide access to a local block
SCSI and iSCSI:
‒ iSCSI encapsulated commands and responses
‒ TCP package of iscsi is representing SCSI command
Remote access:
‒ iSCSI Initiators able to access a remote block like local disk
‒ Attach and format with XFS, brtfs etc.
‒ Booting directly from a iscsi target is supported
6. Before iSCSI support what’s wrong?
Missing features
LIO over RBD:
‒ It doesn’t support “atomic compare and write”
‒ It doesn’t support “persistent group reservations”
iSCSI:
‒ ISCSI Active/Active Multiple Path MPIO is not supported
‒ Block layer support all these require a different approach
7. Benefit of iSCSI LIO gateway for RBD
Multiple Platform access to ceph:
‒ It doesn’t require to be part of the cluster like radosgw
Standard iSCSI interface:
‒ Most OS support iSCSI
‒ Open-iscsi in most Linux OS
LIO Linux IO Target:
‒ In kernel scsi target implementation
Flexible configuration:
‒ Targetcli utility is available with lrbd
8. Config RBD iSCSI gateway
Introduction of lrbd
Easy Setup:
‒ Package bundle with iscsi since SES2.0
‒ Multi-Node configuration support with targetcli
Technical Background:
‒ JSON configuration format
‒ Target, Portals, Pools, Auth
‒ Configuration state stored in ceph cluster
Related Link:
‒ https://github.com/swiftgist/lrbd
‒ https://github.com/swiftgist/lrbd/wiki
9. 9
iSCSI Gateway Optimizations
Efficient handling of certain SCSI operations:
‒ Offload RBD image IO to OSDs
‒ Avoid Locking on iSCSI gateway nodes
‒ Compare and Write
‒ New cmpext OSD operation to handle RBD data comparison
‒ Dispatch as compound cmpext+write OSD request
‒ Write Same
‒ New writesame OSD operation to expand duplicate data at the OSD
‒ Reservations
‒ State stored as RBD image extended attribute
‒ Updated using compound cmpxattr+setxattr OSD request
10. 10
Public Network
OSD1 OSD2 OSD3 OSD4
Multiple Path Support with iSCSI on
RBD
Cluster Network
iSCSI Gateway
RBD
Module
iSCSI Gateway
RBD
Module
iSCSI Initiator
RBD image
11. How to manage storage growth and
costs of ceph ?
Easily scale and
manage data
storage
Control storage
growth and
manage costs
Support today’s
investment and
adapt to the future
$
13. SUSE Enterprise Storage Management
Vision
Open Source :
‒ Alternative to proprietary storage management systems
Enterprise:
‒ Work as expected with traditional storage unified storage
interface e.g. NAS, SAN
SDS Support:
‒ Provide initial ceph setup in managing and monitoring to
ease in complicated scale out scenarios
It will be available in next SES release or download it now at
https://build.opensuse.org/package/show/filesystems:openATTIC/openatti
c
14. openATTIC Features
Existing capability
Modern Web UI
RESTful API
‒ Software Defined Storage
Unified Storage
‒ NAS (NFS, CIFS, HTTP)
‒ SAN (iSCSI, Fiber
Channel)
Volume Mirroring
‒ DRBD
File System
‒ LVM, XFS, ZFS, Btrfs,
ext3/4
Monitoring
‒ Nagios / Icinga built-in
‒ Ceph Management (WIP)
15. openATTIC Architecture
Technical Detail
Backend:
‒ Python (Django)
‒ Django REST Framework
‒ Nagios / Icinga &
PNP4Nagios
‒ Linux tools
‒ LVM, LIO, DRBD
‒ Ceph API
‒ librados, librbd
Web Frontend
‒ AngularJS
‒ Bootstrap
‒ REST API
Automated Test Suites
‒ Python unit tests
‒ Gatling
‒ RESTful API
‒ Protractor / Jasmine
‒ WebUI test
16. openATTIC Architecture
High Level Overview
Django
Linux OS
Tools
openATTIC
SYSTEMD
RESTful API
PostgreSQL
DBUS
Shell
librados/li
brbd
Web UI REST
Client
HTTP
NoDB
17. openATTIC Development
Current status
Create and map RBDs as block devices (volumes)
Pool management Web UI (table view)
OSD management Web UI (table view)
RBD management Web UI (add/delete, table view)
Monitor a cluster health and performance
Support for managing Ceph with salt integration (WIP)
Role management of node, monitor, storage, cephfs, iscsi, radosgw
22. oA Ceph Roadmap
future is in your hand
Ceph Cluster Status Dashboard incl. Performance Graphs
Extend Pool Management
OSD Monitoring/Management
RBD Management/Monitoring
CephFS Management
RGW Management (users, buckets keys)
Deployment, remote configuration of Ceph nodes (via Salt)
Public Roadmap on the openATTIC Wiki to solicit community
feedback: http://bit.ly/28PCTWf
23. How ceph control storage cost?
Control storage
growth and
manage costs
$
24. Minimal recommendation
OSD Storage Node
‒ 2GB RAM per OSD
‒ 1.5GHz CPU core per
OSD
‒ 10GEb public and
backend
‒ 4GB RAM for cache tier
MON Monitor Node
‒ 3 Mons minimal
‒ 2GB RAM per node
‒ SSD System OS
‒ Mon and OSD should not
be virtualized
‒ Bonding 10GEb
25. SUSE Storage Pricing
JBOD Storage
Mid-range
Array
Mid-range
NAS
High-end
Disk Array
SUSE Enterprise
Storage
Fully
Featured
NAS Device
Entry-level
Disk Array
26. Use storage with multiple tiers
Writing Quickly Application like:
• e.g. Video Recording
• e.g. Lots of IoT Data
Reading Quickly Application like:
• e.g. Video Streaming
• e.g. Big Data analysis
Write Tier
Hot Pool
Normal Tier
Cold Pool
Read Tier
Hot Pool
SUSE Enterprise Storage Cluster
Normal Tier
Cold Pool
27. How to create multiple price point?
1000$ = 1000G 2000MB rw
4 PCIe = 4000$ = 8000MB rw
4T Storage 400,000 IOPS
4$ per G
250$ = 1000G, 500MB rw
16 Driver = 4000$ = 8000MB rw
16T Storage 100,000 IOPS
1$ per G
250$ = 8000G 150MB rw
16 Driver = 4000$ = 2400MB rw
128T Storage 2000 IOPS
0.1$ per G
28. Control Costs
How EC reduce storage cost? $
Copy Copy Copy
Replication Pool
SES CEPH CLUSTSER
Control CostsErasure Coded Pool
SES CEPH CLUSTSER
Data Data Data Data
Parity Parity
Multiple Copy of stored data
• 300% cost of data size
• Low Latency, Faster Recovery
Single Copy with Parity
• 150% cost of data size
• Data/Parity ratio trade of CPU
31. Pros and Cons of this mobile cluster
Price:
‒ Around 3200$ vs Expensive Laptops
Size:
‒ 50T and 20kg is mobile enough to demo a usable cluster
‒ Real HDD better for presentation of a storage solution
Benchmark:
‒ Beside Networking capability, all features and requirement of a
ceph cluster meet
Features:
‒ Great fit for developers and tester to perform software base test
but something that VM can’t be done
32. How DevOps story fit?
Introduce you salt
Support today’s
investment and
adapt to the future
33. Salt enable ceph
Existing capability
Sesceph
‒ Python API library that help deploy and manage ceph
‒ Already upstream in to salt available in next release
‒ https://github.com/oms4suse/sesceph
Python-ceph-cfg
‒ Python salt module that use sesceph to deploy
‒ https://github.com/oms4suse/python-ceph-cfg
Both library come with SES3.0 already
34. Why Salt?
Existing capability
Product setup
‒ SUSE OpenStack cloud, SUSE manager and SUSE Enterprise
Storage all come with salt enable
Parallel execution
‒ E.g. Compare to ceph-deploy to prepare OSD
Customize Python module
‒ Continuous development on python api easy to manage
Flexible Configuration
‒ Default Jinja2 + YAML ( stateconf )
‒ Pydsl if you like python directly, json, pyobject, etc
35. Create a cluster with a single stage file
https://github.com/AvengerMoJo/Ceph-
Saltstack/blob/master/stages/ses/ceph/ceph_create.sls
This is a show case of how a simple way to create a cluster with a
simple stage file
It is up to your custom to create your own easily
36. Quick deployment example
Git repo for fast deploy and benchmark
https://github.com/AvengerMoJo/Ceph-Saltstack
Demo recording
https://asciinema.org/a/4hmdsrksn0fd8fgpssdgqsjdb
1) Salt setup
2) Git clone and copy module to salt _modules
3) Saltutil.sync_all push to all minion nodes
4) ntp_update all nodes
5) Create new mons, and create keys
6) Clean disk partitions and prepare OSD
7) Update crushmap
37. Reduce storage costs and management
with SUSE Enterprise Storage
Manage
Less
Adapt
Quickly
Control
Costs
38.
39. Scale storage from
terabytes to hundreds of
petabytes without downtime
SOCIAL
MEDIA
BUSINESS
OPERATIONS
MOBILE
DATA
CUSTOMER
DATA
%UPTIME