SlideShare a Scribd company logo
openSUSE Cloud Storage Workshop
AvengerMoJo ( Alex Lau alau@suse.com )
Nov, 2016
STORAGE INTRO
Traditional Storage
Google: Traditional Storage
Storage Medium
Secondary Storage
Storage Size
Bits and Bytes
> Bytes (B)
> Kilobyte (KB)
> Megabyte (MB)
> Gigabyte (GB)
> Terabyte (TB)
> Petabyte (PB)
> Exabyte (EB)
> 8 Bits
> 8,192 Bits
> 8,388,608 Bits
> 8,589,934,592 Bits
> 8,796,093,022,208 Bits
> 9,007,199,254,740,992 Bits
> 9,223,372,036,854,775,808 Bits
Hard Driver Terms
> Capacity ( Size )
> Cylinders, Sectors and Tracks
> Revolution per Minute ( Speed )
> Transfer Rate ( e.g. SATA III )
> Access Time ( Seek time + Latency )
RAID
> Redundant Array of Independent Disks
– 2 or more disks put together to act as 1
NAS and SAN
> Network Attached
Storage
> TCP/IP
> NFS/SMB
> Serve Files
> Storage Area
Network
> Fiber Channel
> ISCSI
> Serve Block ( LUN )
Storage Trend
> Data Size and Capacity
– Multimedia Contents
– Big Demo binary, Detail Graphic /
Photos, Audio and Video etc.
> Data Functional need
– Different Business requirement
– More Data driven process
– More application with data
– More ecommerce
> Data Backup for a longer
period
– Legislation and Compliance
– Business analysis
Storage Usage
Tier 0
Ultra High
Performance
Tier 1
High-value, OLTP,
Revenue Generating
Tier 2
Backup/Recovery,
Reference Data,
Bulk Data
Tier 3
Object, Archive,
Compliance Archive,
Long-term Retention
1-3%
15-20%
20-25%
50-60%
Storage Pricing
JBOD Storage
Mid-range
Array
Mid-range
NAS
High-end
Disk Array
SUSE Enterprise
Storage
Fully
Featured
NAS Device
Entry-level
Disk Array
Dell EMC, Hitachi,
HP, IBM
NetApp,
Pura Storage,
Nexsan
Promise, Synology,
QNAP, Infortrend,
ProWare, SansDigitial
CLOUD STORAGE INTRO
Software Define Storage
Who is doing cloud storage?
Who is doing Software Define Storage
Completeness of Vision
Leaders
Visionaries
Challengers
Niche
AbilitytoExecute
Gartner’s Report
http://www.theregister.co.uk/201
6/10/21/gartners_not_scoffing_
at_scofs_and_objects/
> SUSE has an aggressive
pricing for deployment with
commodity hardware
> SES make both ceph and
openstack enterprise ready
Software Define Storage Definition
From http://www.snia.org/sds
> Virtualized storage with a service management interface, includes pools of
storage with data service characteristics
> Automation
– Simplified management that reduces the cost of maintaining the storage infrastructure
> Standard Interfaces
– APIs for the management, provisioning and maintenance of storage devices and services
> Virtualized Data Path
– Block, File and/or Object interfaces that support applications written to these interfaces
> Scalability
– Seamless ability to scale the storage infrastructure without disruption to the specified
availability or performance
> Transparency
– The ability for storage consumers to monitor and manage their own storage consumption
against available resources and costs
SDS characters
SUSE’s Ceph benefit point of view
> High Extensibility:
– Distributed over multiple nodes in cluster
> High Availability:
– No single point of failure
> High Flexibility:
– API, Block Device and Cloud Supported Architecture
> Pure Software Define Architecture
> Self Monitoring and Self Repairing
DevOps with SDS
> Collaboration between
– Development
– Operations
– QA ( Testing )
> SDS should enable
DevOps to use a variety of
data management tools to
communicate their storage
http://www.snia.org/sds
Why using ceph?
> Thin Provisioning
> Cache Tiering
> Erasure Coding
> Self Manage and Self Repair with continuous
monitoring
> High ROI compare to traditional Storage Solution
Vendor
Thin Provisioning
Traditional Storage Provision SDS Thin Provisioning
Data
Allocated
Data
Allocated
Volume A
Volume B
Data
Data
Available
Storage
Volume A
Volume B
Cache Tiers
Writing Quickly Application like:
• e.g. Video Recording
• e.g. Lots of IoT Data
Reading Quickly Application like:
• e.g. Video Streaming
• e.g. Big Data analysis
Write Tier
Hot Pool Normal Tier
Cold Pool
Read Tier
Hot Pool
SUSE ceph Storage Cluster
Normal Tier
Cold Pool
Control Costs
Erasure Coding
Copy Copy Copy
Replication Pool
SES CEPH CLUSTSER
Control Costs
Erasure Coded Pool
SES CEPH CLUSTSER
Data Data Data Data
Parity Parity
Multiple Copy of stored data
• 300% cost of data size
• Low Latency, Faster Recovery
Single Copy with Parity
• 150% cost of data size
• Data/Parity ratio trade of CPU
Self Manage and Self Repair
> CRUSH map
– Controlled Replication Under Scalable Hashing
– Controlled, Scalable, Decentralized Placement of Replicated
Data
•Hash
•Num
of PG
Object
•Cluster
state
•Rule
CRUSH
•Peer
OSD
•Local
Disk
OSD
WHAT IS CEPH?
Different components
Basic Ceph Cluster
> Interface
– Object Store
– Block
– File
> MON
– Cluster map
> OSD
– Data storage
> MDS
– cephfs
LIBRADOS
OSD MON
OSD
OSD
MON
MON
MDS
MDS
MDS
RADOS
Object Store
Block Store
File Store
Interface
CEPHFS
RDB
RADOSG
W
Ceph Monitor
> Paxos Role
– Proposers
– Acceptors
– Learners
– Leader
Mon
OSD
MAP
MON
MAP
PG
MAP
CRUSH
MAP
Paxos Service
Paxos
LevelDB
K/V K/V K/V
Log
…
ObjectStore Daemon
> Low level IO operation
> FileJournal normally finished
first before FileStore write to
disk
> DBObjectMap provide
KeyValue omap for copy on
write function
File
Store
OSD
OSDOSD OSD
PG PG PG PG …
Object Store
File
Store
FileJournal
DBObjectMap
FileStore Backend
> OSD Manage its own consistency
of data
> All write operation are
transactional on top of existing
filesystem
– XFS, Btrfs, ext4
> ACID ( Atomicity, Consistency,
Isolation, Durability ) operations to
protect data write
File
Store
OSD
Disk Disk Disk
BtrfsXFS ext4
File
Store
OSD
File
Store
OSD
OSD MON
OSD
OSD
MON
MON
RADOS
cephfs MeatData Server
> MDS store data at RADOS
– Directories, Files ownership,
access mode etc
> POSIX compatible
> Don’t Server File
> Only Required for share
filesystem
> High Availability and
Scalable
OSD MON
OSD
OSD
MON
MON
MDS
MDS
MDS
RADOS
CephFS
Client
META
DataDataData
CRUSH map
> Devices:
– Devices consist of any object storage device–i.e., the storage drive
corresponding to a ceph-osd daemon. You should have a device for each
OSD daemon in your Ceph configuration file.
> Bucket Types:
– Bucket types define the types of buckets used in your CRUSH hierarchy.
Buckets consist of a hierarchical aggregation of storage locations (e.g.,
rows, racks, chassis, hosts, etc.) and their assigned weights.
> Bucket Instances:
– Once you define bucket types, you must declare bucket instances for your
hosts, and any other failure domain partitioning you choose.
> Rules:
– Rules consist of the manner of selecting buckets.
Kraken / SUSE Key Features
> Client from multiple OS and hardware including ARM
> Multi Path iSCSI support
> Cloud Ready and S3 Supported
> Data encryption over physical disk
> Cephfs support
> Bluestore support
> Ceph-manager
> openATTIC
ARM64 Server
> Ceph already been tested with the
following Gigabyte Cavium system
> Gigabyte H270-H70 Cavium
- 48 Core * 8 : 384 Cores
- 32G * 32: 1T Memory
- 256G * 16: 4T SSD
- 40GbE * 8 Network
iSCSI Architecture
Technical Background
Protocol:
‒ Block storage access over TCP/IP
‒ Initiators the client that access the iscsi target over tcp/ip
‒ Targets, the server that provide access to a local block
SCSI and iSCSI:
‒ iSCSI encapsulated commands and responses
‒ TCP package of iscsi is representing SCSI command
Remote access:
‒ iSCSI Initiators able to access a remote block like local disk
‒ Attach and format with XFS, brtfs etc.
‒ Booting directly from a iscsi target is supported
Public Network
OSD1 OSD2 OSD3 OSD4
Cluster Network
iSCSI
Gateway
RBD
Module
iSCSI
Gateway
RBD
Module
iSCSI Initiator
RBD image
BlueFS
META
DataDataData RocksDB
Allocator
Block Block Block
BlueStore Backend
> Rocksdb
– Object metadata
– Ceph key/value data
> Block Device
– Directly data object
> Reduce Journal Write
operation by half
BlueStore
Ceph object gateway
> RESTful gateway to
ceph storage cluster
– S3 Compatible
– Swift Compatible
LIBRADOS
OSD MON
OSD
OSD
MON
MON
RADOS
RADOSGW
RADOSGW
S3 API
Swift API
CephFS
> POSIX compatible
> MDS provide metadata
information
> Kernel cephfs module and
FUSE cephfs module
available
> Advance features that is still
require lots of testing
– Directory Fragmentation
– Inline Data
– Snapshots
– Multiple filesystems in a cluster
libcephfs
librados
OSD MON
OSD
OSD
MON
MON
MDS
MDS
MDS
RADOS
FUSE cephfsKernel cephfs.ko
openATTIC Architecture
High Level Overview
Django
Linux OS Tools
openATTIC
SYSTEMD
RESTful API
PostgreSQL
DBUS
Shell
librados/li
brbd
Web UI REST Client
HTTP
NoDB
HARDWARE
What is the minimal setup?
Ceph Cluster in a VM Requirement
> At least 3 VM
> 3 MON
> 3 OSD
– At least 15GB per osd
– Host device better be
on SSD
VM
OSD
MON
>15G
VM
OSD
MON
>15G
VM
OSD
MON
>15G
Minimal Production recommendation
> OSD Storage Node
‒ 2GB RAM per OSD
‒ 1.5GHz CPU core per
OSD
‒ 10GEb public and
backend
‒ 4GB RAM for cache
tier
> MON Monitor Node
‒ 3 Mons minimal
‒ 2GB RAM per node
‒ SSD System OS
‒ Mon and OSD should
not be virtualized
‒ Bonding 10GEb
For developer
Dual 1G Network
6T = 220$
220 * 3 = 660$
512G = 150$
OSD1
OSD2
OSD3
OSD4
MON1
300$
6T = 220$
220 * 3 = 660$
512G = 150$
6T = 220$
220 * 3 = 660$
512G = 150$
OSD5
OSD6
OSD7
OSD8
MON2
300$
OSD9
OSD10
OSD11
OSD12
MON3
300$
HTPC AMD (A8-5545M)
Form factor:
– 29.9 mm x 107.6 mm x 114.4mm
CPU:
– AMD A8-5545M ( Clock up 2.7GHz / 4M 4Core)
RAM:
– 8G DDR-3-1600 KingStone ( Up to 16G SO-DIMM )
Storage:
– mS200 120G/m-SATA/read:550M, write: 520M
Lan:
– Gigabit LAN (RealTek RTL8111G)
Connectivity:
– USB3.0 * 4
Price:
– $6980 (NTD)
Enclosure
Form factor:
– 215(D) x 126(w) x 166(H) mm
Storage:
– Support all brand of 3.5" SATA I / II / III hard disk drive 4 x 8TB = 32TB
Connectivity:
– USB 3.0 or eSATA Interface
Price:
– $3000 (NTD)
How to create multiple price point?
1000$ = 1000G 2000MB rw
4 PCIe = 4000$ = 8000MB rw
4T Storage 400,000 IOPS
4$ per G
250$ = 1000G, 500MB rw
16 Driver = 4000$ = 8000MB rw
16T Storage 100,000 IOPS
1$ per G
250$ = 8000G 150MB rw
16 Driver = 4000$ = 2400MB rw
128T Storage 2000 IOPS
0.1$ per G
ARM64 hardware compare to Public Cloud price
R120-T30 - 5700$ * 7
- 48 Core * 7 : 336 Cores
- 8 * 16G * 7 : 896G Memory
- 1T * 2 * 7 : 14T SSD
- 8T * 6 * 7 : 336T HDD
- 40GbE * 7
- 10GbE * 14
> EC 5+2 is about 250T
> 2500 Customer 100GB
> 2$ Storage = 5000$
> 8 Months = 40000$
CEPH DEVELOPMENT
Source, and Salt in action
SUSE software lifecycle
Upstream
Repo
openSUSE
Build Service
Internal Build
Service
QA and Test
process
Product
•Tumbleweed
•SLE->Leap
> Upstream
– Factory and
Tumbleweed
> SLE
– Patch Upstream
– Leap
Ceph Repo
> Upstream
– https://github.com/ceph/ceph
> SUSE Upstream
– https://github.com/SUSE/ceph
> Open Build Service
– https://build.opensuse.org/pac
kage/show/filesystems:ceph:U
nstable
> Kraken Release
– https://build.opensuse.org/proj
ect/show/filesystems:ceph:kra
ken
Tumbleweed Zypper Repo
> Kraken
– http://download.opensuse.org/repositori
es/filesystems:/ceph:/kraken/openSUS
E_Tumbleweed/
> Salt and Deepsea
– http://download.opensuse.org/repositori
es/home:/swiftgist/openSUSE_Tumble
weed/
– http://download.opensuse.org/repositori
es/filesystems:/ceph/openSUSE_Tumbl
eweed/
> Tumbleweed OS
– http://download.opensuse.org/tumblew
eed/repo/oss/suse/
> Carbon + Diamond
– http://download.opensuse.org/repositori
es/systemsmanagement:/calamari/ope
nSUSE_Tumbleweed
Salt files collection for ceph
DeepSea
> https://github.com/SUSE/DeepSea
> A collection of Salt files to manage multiple Ceph clusters
with a single salt master
> The intended flow for the orchestration runners and related
salt states
– ceph.stage.0 or salt-run state.orch ceph.stage.prep
– ceph.stage.1 or salt-run state.orch ceph.stage.discovery
– Create /srv/pillar/ceph/proposals/policy.cfg
– ceph.stage.2 or salt-run state.orch ceph.stage.configure
– ceph.stage.3 or salt-run state.orch ceph.stage.deploy
– ceph.stage.4 or salt-run state.orch ceph.stage.services
Salt enable ceph
Existing capability
Sesceph
‒ Python API library that help deploy and manage ceph
‒ Already upstream in to salt available in next release
‒ https://github.com/oms4suse/sesceph
Python-ceph-cfg
‒ Python salt module that use sesceph to deploy
‒ https://github.com/oms4suse/python-ceph-cfg
Why Salt?
Existing capability
Product setup
‒ SUSE OpenStack cloud, SUSE manager and SUSE Enterprise Storage all come with
salt enable
Parallel execution
‒ E.g. Compare to ceph-deploy to prepare OSD
> Customize Python module
‒ Continuous development on python api easy to manage
> Flexible Configuration
‒ Default Jinja2 + YAML ( stateconf )
‒ Pydsl if you like python directly, json, pyobject, etc
Quick salt deployment example
> Git repo for fast deploy and benchmark
 https://github.com/AvengerMoJo/Ceph-Saltstack
> Demo recording
 https://asciinema.org/a/81531
1) Salt setup
2) Git clone and copy module to salt _modules
3) Saltutil.sync_all push to all minion nodes
4) ntp_update all nodes
5) Create new mons, and create keys
6) Clean disk partitions and prepare OSD
7) Update crushmap
CEPH OPERATION
Ceph commands
ceph-deploy
> ssh no password id need
to pass over to all cluster
nodes
> echo nodes ceph user
has sudo for root
permission
> ceph-deploy new
<node1> <node2>
<node3>
– Create all the new MON
> ceph.conf file will be
created at the current
directory for you to build
your cluster
configuration
> Each cluster node
should have identical
ceph.conf file
OSD Prepare and Activate
> ceph-deploy osd prepare
<node1>:</dev/sda5>:</var/lib/ceph/osd/journal/osd-0>
> ceph-deploy osd activate <node1>:</dev/sda5>
Cluster Status
> ceph status
> ceph osd stat
> ceph osd dump
> ceph osd tree
> ceph mon stat
> ceph mon dump
> ceph quorum_status
> ceph osd lspools
Pool Management
> ceph osd lspools
> ceph osd pool create <pool-name> <pg-num> <pgp-
num> <pool-type> <crush-ruleset-name>
> ceph osd pool delete <pool-name> <pool-name> --yes-
i-really-really-mean-it
> ceph osd pool set <pool-name> <key> <value>
CRUSH Map Management
> ceph osd getcrushmap -o crushmap.out
> crushtool -d crushmap.out -o decom_crushmap.txt
> cp decom_crushmap.txt update_decom_crushmap.txt
> crushtool -c update_decom_crushmap.txt -o update_crushmap.out
> ceph osd setcrushmap -i update_crushmap.out
> crushtool --test -i update_crushmap.out --show-choose-tries --rule 2
--num-rep=2
> crushtool --test -i update_crushmap.out --show-utilization --num-
rep=2
ceph osd crush show-tunables
RBD Management
> rbd --pool ssd create --size 10000 ssd_block
– Create a 1G rbd in ssd pool
> rbd map ssd/ssd_block ( in client )
– It should show up in /dev/rbd/<pool-name>/<block-name>
> Then you can use it like a block device
Demo usage
> It could be QEMU/KVM rbd client for VM
> It could be also be NFS/CIFS server ( but you need to
consider how to support HA over that )
WHAT NEXT?
Email me alau@suse.com
Let me know what you want to hear next

More Related Content

What's hot

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Community
 

What's hot (19)

Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
 
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + GanetiLondon Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
 
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance ArchiectureCeph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance Archiecture
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
Bluestore
BluestoreBluestore
Bluestore
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
 
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
 
MySQL Head-to-Head
MySQL Head-to-HeadMySQL Head-to-Head
MySQL Head-to-Head
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
 
Ceph Day San Jose - From Zero to Ceph in One Minute
Ceph Day San Jose - From Zero to Ceph in One Minute Ceph Day San Jose - From Zero to Ceph in One Minute
Ceph Day San Jose - From Zero to Ceph in One Minute
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 

Viewers also liked

Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
OpenStack
 

Viewers also liked (9)

應用Ceph技術打造軟體定義儲存新局
應用Ceph技術打造軟體定義儲存新局應用Ceph技術打造軟體定義儲存新局
應用Ceph技術打造軟體定義儲存新局
 
Keynote openSUSE Asia Summit 2015
Keynote openSUSE Asia Summit 2015Keynote openSUSE Asia Summit 2015
Keynote openSUSE Asia Summit 2015
 
Pretotyping - Ayla Matalon
Pretotyping - Ayla MatalonPretotyping - Ayla Matalon
Pretotyping - Ayla Matalon
 
AvengerGear Presentation for openSUSE Students
AvengerGear Presentation for openSUSE StudentsAvengerGear Presentation for openSUSE Students
AvengerGear Presentation for openSUSE Students
 
Punishment Driven Development
Punishment Driven DevelopmentPunishment Driven Development
Punishment Driven Development
 
AvengerGear present: From pretotype to prototype
AvengerGear present: From pretotype to prototypeAvengerGear present: From pretotype to prototype
AvengerGear present: From pretotype to prototype
 
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,PavlicekXen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek
 

Similar to openSUSE storage workshop 2016

SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
UniFabric
 

Similar to openSUSE storage workshop 2016 (20)

ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
Experience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewExperience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's View
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
 
Caching methodology and strategies
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategies
 
Caching Methodology & Strategies
Caching Methodology & StrategiesCaching Methodology & Strategies
Caching Methodology & Strategies
 
Offloading for Databases - Deep Dive
Offloading for Databases - Deep DiveOffloading for Databases - Deep Dive
Offloading for Databases - Deep Dive
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Storage spaces direct webinar
Storage spaces direct webinarStorage spaces direct webinar
Storage spaces direct webinar
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
Обзор новой СХД EMC Unity. Планирование обновления с VNX\VNX2, Тимофей Григор...
Обзор новой СХД EMC Unity. Планирование обновления с VNX\VNX2, Тимофей Григор...Обзор новой СХД EMC Unity. Планирование обновления с VNX\VNX2, Тимофей Григор...
Обзор новой СХД EMC Unity. Планирование обновления с VNX\VNX2, Тимофей Григор...
 
JetStor NAS ZFS based 716U 724U Network Attached Storage
JetStor NAS ZFS based 716U 724U Network Attached StorageJetStor NAS ZFS based 716U 724U Network Attached Storage
JetStor NAS ZFS based 716U 724U Network Attached Storage
 
JetStor NAS series 2016
JetStor NAS series 2016JetStor NAS series 2016
JetStor NAS series 2016
 
FlashSystem 7300 Midrange Enterprise for Hybrid Cloud L2 Sellers Presentation...
FlashSystem 7300 Midrange Enterprise for Hybrid Cloud L2 Sellers Presentation...FlashSystem 7300 Midrange Enterprise for Hybrid Cloud L2 Sellers Presentation...
FlashSystem 7300 Midrange Enterprise for Hybrid Cloud L2 Sellers Presentation...
 
Oracle 12c Multi Tenant
Oracle 12c Multi TenantOracle 12c Multi Tenant
Oracle 12c Multi Tenant
 
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on Solaris
 
Storage Managment
Storage ManagmentStorage Managment
Storage Managment
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

openSUSE storage workshop 2016

  • 1. openSUSE Cloud Storage Workshop AvengerMoJo ( Alex Lau alau@suse.com ) Nov, 2016
  • 5. Storage Size Bits and Bytes > Bytes (B) > Kilobyte (KB) > Megabyte (MB) > Gigabyte (GB) > Terabyte (TB) > Petabyte (PB) > Exabyte (EB) > 8 Bits > 8,192 Bits > 8,388,608 Bits > 8,589,934,592 Bits > 8,796,093,022,208 Bits > 9,007,199,254,740,992 Bits > 9,223,372,036,854,775,808 Bits
  • 6. Hard Driver Terms > Capacity ( Size ) > Cylinders, Sectors and Tracks > Revolution per Minute ( Speed ) > Transfer Rate ( e.g. SATA III ) > Access Time ( Seek time + Latency )
  • 7. RAID > Redundant Array of Independent Disks – 2 or more disks put together to act as 1
  • 8. NAS and SAN > Network Attached Storage > TCP/IP > NFS/SMB > Serve Files > Storage Area Network > Fiber Channel > ISCSI > Serve Block ( LUN )
  • 9. Storage Trend > Data Size and Capacity – Multimedia Contents – Big Demo binary, Detail Graphic / Photos, Audio and Video etc. > Data Functional need – Different Business requirement – More Data driven process – More application with data – More ecommerce > Data Backup for a longer period – Legislation and Compliance – Business analysis
  • 10. Storage Usage Tier 0 Ultra High Performance Tier 1 High-value, OLTP, Revenue Generating Tier 2 Backup/Recovery, Reference Data, Bulk Data Tier 3 Object, Archive, Compliance Archive, Long-term Retention 1-3% 15-20% 20-25% 50-60%
  • 11. Storage Pricing JBOD Storage Mid-range Array Mid-range NAS High-end Disk Array SUSE Enterprise Storage Fully Featured NAS Device Entry-level Disk Array Dell EMC, Hitachi, HP, IBM NetApp, Pura Storage, Nexsan Promise, Synology, QNAP, Infortrend, ProWare, SansDigitial
  • 13. Who is doing cloud storage?
  • 14. Who is doing Software Define Storage
  • 15. Completeness of Vision Leaders Visionaries Challengers Niche AbilitytoExecute Gartner’s Report http://www.theregister.co.uk/201 6/10/21/gartners_not_scoffing_ at_scofs_and_objects/ > SUSE has an aggressive pricing for deployment with commodity hardware > SES make both ceph and openstack enterprise ready
  • 16. Software Define Storage Definition From http://www.snia.org/sds > Virtualized storage with a service management interface, includes pools of storage with data service characteristics > Automation – Simplified management that reduces the cost of maintaining the storage infrastructure > Standard Interfaces – APIs for the management, provisioning and maintenance of storage devices and services > Virtualized Data Path – Block, File and/or Object interfaces that support applications written to these interfaces > Scalability – Seamless ability to scale the storage infrastructure without disruption to the specified availability or performance > Transparency – The ability for storage consumers to monitor and manage their own storage consumption against available resources and costs
  • 17. SDS characters SUSE’s Ceph benefit point of view > High Extensibility: – Distributed over multiple nodes in cluster > High Availability: – No single point of failure > High Flexibility: – API, Block Device and Cloud Supported Architecture > Pure Software Define Architecture > Self Monitoring and Self Repairing
  • 18. DevOps with SDS > Collaboration between – Development – Operations – QA ( Testing ) > SDS should enable DevOps to use a variety of data management tools to communicate their storage http://www.snia.org/sds
  • 19. Why using ceph? > Thin Provisioning > Cache Tiering > Erasure Coding > Self Manage and Self Repair with continuous monitoring > High ROI compare to traditional Storage Solution Vendor
  • 20. Thin Provisioning Traditional Storage Provision SDS Thin Provisioning Data Allocated Data Allocated Volume A Volume B Data Data Available Storage Volume A Volume B
  • 21. Cache Tiers Writing Quickly Application like: • e.g. Video Recording • e.g. Lots of IoT Data Reading Quickly Application like: • e.g. Video Streaming • e.g. Big Data analysis Write Tier Hot Pool Normal Tier Cold Pool Read Tier Hot Pool SUSE ceph Storage Cluster Normal Tier Cold Pool
  • 22. Control Costs Erasure Coding Copy Copy Copy Replication Pool SES CEPH CLUSTSER Control Costs Erasure Coded Pool SES CEPH CLUSTSER Data Data Data Data Parity Parity Multiple Copy of stored data • 300% cost of data size • Low Latency, Faster Recovery Single Copy with Parity • 150% cost of data size • Data/Parity ratio trade of CPU
  • 23. Self Manage and Self Repair > CRUSH map – Controlled Replication Under Scalable Hashing – Controlled, Scalable, Decentralized Placement of Replicated Data •Hash •Num of PG Object •Cluster state •Rule CRUSH •Peer OSD •Local Disk OSD
  • 25. Basic Ceph Cluster > Interface – Object Store – Block – File > MON – Cluster map > OSD – Data storage > MDS – cephfs LIBRADOS OSD MON OSD OSD MON MON MDS MDS MDS RADOS Object Store Block Store File Store Interface CEPHFS RDB RADOSG W
  • 26. Ceph Monitor > Paxos Role – Proposers – Acceptors – Learners – Leader Mon OSD MAP MON MAP PG MAP CRUSH MAP Paxos Service Paxos LevelDB K/V K/V K/V Log …
  • 27. ObjectStore Daemon > Low level IO operation > FileJournal normally finished first before FileStore write to disk > DBObjectMap provide KeyValue omap for copy on write function File Store OSD OSDOSD OSD PG PG PG PG … Object Store File Store FileJournal DBObjectMap
  • 28. FileStore Backend > OSD Manage its own consistency of data > All write operation are transactional on top of existing filesystem – XFS, Btrfs, ext4 > ACID ( Atomicity, Consistency, Isolation, Durability ) operations to protect data write File Store OSD Disk Disk Disk BtrfsXFS ext4 File Store OSD File Store OSD OSD MON OSD OSD MON MON RADOS
  • 29. cephfs MeatData Server > MDS store data at RADOS – Directories, Files ownership, access mode etc > POSIX compatible > Don’t Server File > Only Required for share filesystem > High Availability and Scalable OSD MON OSD OSD MON MON MDS MDS MDS RADOS CephFS Client META DataDataData
  • 30. CRUSH map > Devices: – Devices consist of any object storage device–i.e., the storage drive corresponding to a ceph-osd daemon. You should have a device for each OSD daemon in your Ceph configuration file. > Bucket Types: – Bucket types define the types of buckets used in your CRUSH hierarchy. Buckets consist of a hierarchical aggregation of storage locations (e.g., rows, racks, chassis, hosts, etc.) and their assigned weights. > Bucket Instances: – Once you define bucket types, you must declare bucket instances for your hosts, and any other failure domain partitioning you choose. > Rules: – Rules consist of the manner of selecting buckets.
  • 31. Kraken / SUSE Key Features > Client from multiple OS and hardware including ARM > Multi Path iSCSI support > Cloud Ready and S3 Supported > Data encryption over physical disk > Cephfs support > Bluestore support > Ceph-manager > openATTIC
  • 32. ARM64 Server > Ceph already been tested with the following Gigabyte Cavium system > Gigabyte H270-H70 Cavium - 48 Core * 8 : 384 Cores - 32G * 32: 1T Memory - 256G * 16: 4T SSD - 40GbE * 8 Network
  • 33. iSCSI Architecture Technical Background Protocol: ‒ Block storage access over TCP/IP ‒ Initiators the client that access the iscsi target over tcp/ip ‒ Targets, the server that provide access to a local block SCSI and iSCSI: ‒ iSCSI encapsulated commands and responses ‒ TCP package of iscsi is representing SCSI command Remote access: ‒ iSCSI Initiators able to access a remote block like local disk ‒ Attach and format with XFS, brtfs etc. ‒ Booting directly from a iscsi target is supported
  • 34. Public Network OSD1 OSD2 OSD3 OSD4 Cluster Network iSCSI Gateway RBD Module iSCSI Gateway RBD Module iSCSI Initiator RBD image
  • 35. BlueFS META DataDataData RocksDB Allocator Block Block Block BlueStore Backend > Rocksdb – Object metadata – Ceph key/value data > Block Device – Directly data object > Reduce Journal Write operation by half BlueStore
  • 36. Ceph object gateway > RESTful gateway to ceph storage cluster – S3 Compatible – Swift Compatible LIBRADOS OSD MON OSD OSD MON MON RADOS RADOSGW RADOSGW S3 API Swift API
  • 37. CephFS > POSIX compatible > MDS provide metadata information > Kernel cephfs module and FUSE cephfs module available > Advance features that is still require lots of testing – Directory Fragmentation – Inline Data – Snapshots – Multiple filesystems in a cluster libcephfs librados OSD MON OSD OSD MON MON MDS MDS MDS RADOS FUSE cephfsKernel cephfs.ko
  • 38. openATTIC Architecture High Level Overview Django Linux OS Tools openATTIC SYSTEMD RESTful API PostgreSQL DBUS Shell librados/li brbd Web UI REST Client HTTP NoDB
  • 39. HARDWARE What is the minimal setup?
  • 40. Ceph Cluster in a VM Requirement > At least 3 VM > 3 MON > 3 OSD – At least 15GB per osd – Host device better be on SSD VM OSD MON >15G VM OSD MON >15G VM OSD MON >15G
  • 41. Minimal Production recommendation > OSD Storage Node ‒ 2GB RAM per OSD ‒ 1.5GHz CPU core per OSD ‒ 10GEb public and backend ‒ 4GB RAM for cache tier > MON Monitor Node ‒ 3 Mons minimal ‒ 2GB RAM per node ‒ SSD System OS ‒ Mon and OSD should not be virtualized ‒ Bonding 10GEb
  • 42. For developer Dual 1G Network 6T = 220$ 220 * 3 = 660$ 512G = 150$ OSD1 OSD2 OSD3 OSD4 MON1 300$ 6T = 220$ 220 * 3 = 660$ 512G = 150$ 6T = 220$ 220 * 3 = 660$ 512G = 150$ OSD5 OSD6 OSD7 OSD8 MON2 300$ OSD9 OSD10 OSD11 OSD12 MON3 300$
  • 43. HTPC AMD (A8-5545M) Form factor: – 29.9 mm x 107.6 mm x 114.4mm CPU: – AMD A8-5545M ( Clock up 2.7GHz / 4M 4Core) RAM: – 8G DDR-3-1600 KingStone ( Up to 16G SO-DIMM ) Storage: – mS200 120G/m-SATA/read:550M, write: 520M Lan: – Gigabit LAN (RealTek RTL8111G) Connectivity: – USB3.0 * 4 Price: – $6980 (NTD)
  • 44. Enclosure Form factor: – 215(D) x 126(w) x 166(H) mm Storage: – Support all brand of 3.5" SATA I / II / III hard disk drive 4 x 8TB = 32TB Connectivity: – USB 3.0 or eSATA Interface Price: – $3000 (NTD)
  • 45. How to create multiple price point? 1000$ = 1000G 2000MB rw 4 PCIe = 4000$ = 8000MB rw 4T Storage 400,000 IOPS 4$ per G 250$ = 1000G, 500MB rw 16 Driver = 4000$ = 8000MB rw 16T Storage 100,000 IOPS 1$ per G 250$ = 8000G 150MB rw 16 Driver = 4000$ = 2400MB rw 128T Storage 2000 IOPS 0.1$ per G
  • 46. ARM64 hardware compare to Public Cloud price R120-T30 - 5700$ * 7 - 48 Core * 7 : 336 Cores - 8 * 16G * 7 : 896G Memory - 1T * 2 * 7 : 14T SSD - 8T * 6 * 7 : 336T HDD - 40GbE * 7 - 10GbE * 14 > EC 5+2 is about 250T > 2500 Customer 100GB > 2$ Storage = 5000$ > 8 Months = 40000$
  • 48. SUSE software lifecycle Upstream Repo openSUSE Build Service Internal Build Service QA and Test process Product •Tumbleweed •SLE->Leap > Upstream – Factory and Tumbleweed > SLE – Patch Upstream – Leap
  • 49. Ceph Repo > Upstream – https://github.com/ceph/ceph > SUSE Upstream – https://github.com/SUSE/ceph > Open Build Service – https://build.opensuse.org/pac kage/show/filesystems:ceph:U nstable > Kraken Release – https://build.opensuse.org/proj ect/show/filesystems:ceph:kra ken
  • 50. Tumbleweed Zypper Repo > Kraken – http://download.opensuse.org/repositori es/filesystems:/ceph:/kraken/openSUS E_Tumbleweed/ > Salt and Deepsea – http://download.opensuse.org/repositori es/home:/swiftgist/openSUSE_Tumble weed/ – http://download.opensuse.org/repositori es/filesystems:/ceph/openSUSE_Tumbl eweed/ > Tumbleweed OS – http://download.opensuse.org/tumblew eed/repo/oss/suse/ > Carbon + Diamond – http://download.opensuse.org/repositori es/systemsmanagement:/calamari/ope nSUSE_Tumbleweed
  • 51. Salt files collection for ceph DeepSea > https://github.com/SUSE/DeepSea > A collection of Salt files to manage multiple Ceph clusters with a single salt master > The intended flow for the orchestration runners and related salt states – ceph.stage.0 or salt-run state.orch ceph.stage.prep – ceph.stage.1 or salt-run state.orch ceph.stage.discovery – Create /srv/pillar/ceph/proposals/policy.cfg – ceph.stage.2 or salt-run state.orch ceph.stage.configure – ceph.stage.3 or salt-run state.orch ceph.stage.deploy – ceph.stage.4 or salt-run state.orch ceph.stage.services
  • 52. Salt enable ceph Existing capability Sesceph ‒ Python API library that help deploy and manage ceph ‒ Already upstream in to salt available in next release ‒ https://github.com/oms4suse/sesceph Python-ceph-cfg ‒ Python salt module that use sesceph to deploy ‒ https://github.com/oms4suse/python-ceph-cfg
  • 53. Why Salt? Existing capability Product setup ‒ SUSE OpenStack cloud, SUSE manager and SUSE Enterprise Storage all come with salt enable Parallel execution ‒ E.g. Compare to ceph-deploy to prepare OSD > Customize Python module ‒ Continuous development on python api easy to manage > Flexible Configuration ‒ Default Jinja2 + YAML ( stateconf ) ‒ Pydsl if you like python directly, json, pyobject, etc
  • 54. Quick salt deployment example > Git repo for fast deploy and benchmark  https://github.com/AvengerMoJo/Ceph-Saltstack > Demo recording  https://asciinema.org/a/81531 1) Salt setup 2) Git clone and copy module to salt _modules 3) Saltutil.sync_all push to all minion nodes 4) ntp_update all nodes 5) Create new mons, and create keys 6) Clean disk partitions and prepare OSD 7) Update crushmap
  • 56. ceph-deploy > ssh no password id need to pass over to all cluster nodes > echo nodes ceph user has sudo for root permission > ceph-deploy new <node1> <node2> <node3> – Create all the new MON > ceph.conf file will be created at the current directory for you to build your cluster configuration > Each cluster node should have identical ceph.conf file
  • 57. OSD Prepare and Activate > ceph-deploy osd prepare <node1>:</dev/sda5>:</var/lib/ceph/osd/journal/osd-0> > ceph-deploy osd activate <node1>:</dev/sda5>
  • 58. Cluster Status > ceph status > ceph osd stat > ceph osd dump > ceph osd tree > ceph mon stat > ceph mon dump > ceph quorum_status > ceph osd lspools
  • 59. Pool Management > ceph osd lspools > ceph osd pool create <pool-name> <pg-num> <pgp- num> <pool-type> <crush-ruleset-name> > ceph osd pool delete <pool-name> <pool-name> --yes- i-really-really-mean-it > ceph osd pool set <pool-name> <key> <value>
  • 60. CRUSH Map Management > ceph osd getcrushmap -o crushmap.out > crushtool -d crushmap.out -o decom_crushmap.txt > cp decom_crushmap.txt update_decom_crushmap.txt > crushtool -c update_decom_crushmap.txt -o update_crushmap.out > ceph osd setcrushmap -i update_crushmap.out > crushtool --test -i update_crushmap.out --show-choose-tries --rule 2 --num-rep=2 > crushtool --test -i update_crushmap.out --show-utilization --num- rep=2 ceph osd crush show-tunables
  • 61. RBD Management > rbd --pool ssd create --size 10000 ssd_block – Create a 1G rbd in ssd pool > rbd map ssd/ssd_block ( in client ) – It should show up in /dev/rbd/<pool-name>/<block-name> > Then you can use it like a block device
  • 62. Demo usage > It could be QEMU/KVM rbd client for VM > It could be also be NFS/CIFS server ( but you need to consider how to support HA over that )
  • 63. WHAT NEXT? Email me alau@suse.com Let me know what you want to hear next

Editor's Notes

  1. 11
  2. As previously mentioned SUSE Enterprise storage is a highly scalable and highly available storage solution. A SUSE Enterprise Storage Cluster is build using commodity server and disk drive components. Giving you freedom of choice to choose the hardware and significantly reduce capital cost by eliminating the need to purchase more expensive proprietary storage systems. Still your current investment is protected because different types and speeds of drives can be deployed dependent on your requirements. This could include flash drives for very high performance or high capacity hard disk drives for bulk storage. [click]
  3. 33
  4. 38
  5. 53