Build our own affordable cloud storage
openSUSE Asia Summit Dec, 2015
CLOUD STORAGE INTRO
Software Define Storage
Storage Trend
> Data Size and Capacity
– Multimedia Contents
– Big Demo binary, Detail Graphic /
Photos, Audio and Video etc.
> Data Functional need
– Different Business requirement
– More Data driven process
– More application with data
– More ecommerce
> Data Backup for a longer
period
– Legislation and Compliance
– Business analysis
Storage Usage
Tier 0
Ultra High
Performance
Tier 1
High-value, OLTP,
Revenue Generating
Tier 2
Backup/Recovery,
Reference Data,
Bulk Data
Tier 3
Object, Archive,
Compliance Archive,
Long-term Retention
1-3%
15-20%
20-25%
50-60%
Software Define Storage
> High Extensibility:
– Distributed over multiple nodes in cluster
> High Availability:
– No single point of failure
> High Flexibility:
– API, Block Device and Cloud Supported Architecture
> Pure Software Define Architecture
> Self Monitoring and Self Repairing
Sample Cluster
Why using Cloud Storage?
> Very High ROI compare to traditional Hard Storage Solution
Vendor
> Cloud Ready and S3 Supported
> Thin Provisioning
> Remote Replication
> Cache Tiering
> Erasure Coding
> Self Manage and Self Repair with continuous monitoring
Other Key Features
> Support client from multiple OS
> Data encryption over physical disk ( more CPU needed)
> On the fly data compression
> Basically Unlimited Extendibility
> Copy-On-Writing ( Clone and Snapshot )
> iSCSI support ( VM and thin client etc )
WHO USING IT ?
Show Cases of Cloud Storage
EMC, Hitachi,
HP, IBM
NetApp, Dell,
Pura Storage,
Nexsan
Promise, Synology,
QNAP, Infortrend,
ProWare, SansDigitial
Who is doing Software Define Storage
Who is using software define storage?
HOW MUCH?
What if we use Software Define Storage?
HTPC AMD (A8-5545M)
Form factor:
– 29.9 mm x 107.6 mm x 114.4mm
CPU:
– AMD A8-5545M ( Clock up 2.7GHz / 4M 4Core)
RAM:
– 8G DDR-3-1600 KingStone ( Up to 16G SO-DIMM )
Storage:
– mS200 120G/m-SATA/read:550M, write: 520M
Lan:
– Gigabit LAN (RealTek RTL8111G)
Connectivity:
– USB3.0 * 4
Price:
– $6980 (NTD)
Enclosure
Form factor:
– 215(D) x 126(w) x 166(H) mm
Storage:
– Support all brand of 3.5" SATA I / II / III hard disk drive 4 x 8TB = 32TB
Connectivity:
– USB 3.0 or eSATA Interface
Price:
– $3000 (NTD)
AMD (A8-5545M)
> Node = 6980
> 512SSD + 4TB + 6TB
+ Enclosure =5000 +
4000 + 7000 = 16000
> 30TB total = 16000 * 3
= 58000
> It is about the half of
Amazon Cloud 30TB
cost over 1 year
QUICK 3 NODE SETUP
Demo basic setup of a small cluster
CEPH Cluster Requirement
> At least 3 MON
> At least 3 OSD
– At least 15GB per osd
– Journal better on SSD
ceph-deploy
> ssh no password id need
to pass over to all cluster
nodes
> echo nodes ceph user
has sudo for root
permission
> ceph-deploy new
<node1> <node2>
<node3>
– Create all the new MON
> ceph.conf file will be
created at the current
directory for you to build
your cluster
configuration
> Each cluster node
should have identical
ceph.conf file
OSD Prepare and Activate
> ceph-deploy osd prepare
<node1>:</dev/sda5>:</var/lib/ceph/osd/journal/osd-0>
> ceph-deploy osd activate <node1>:</dev/sda5>
Cluster Status
> ceph status
> ceph osd stat
> ceph osd dump
> ceph osd tree
> ceph mon stat
> ceph mon dump
> ceph quorum_status
> ceph osd lspools
Pool Management
> ceph osd lspools
> ceph osd pool create <pool-name> <pg-num> <pgp-
num> <pool-type> <crush-ruleset-name>
> ceph osd pool delete <pool-name> <pool-name> --yes-
i-really-really-mean-it
> ceph osd pool set <pool-name> <key> <value>
CRUSH Map Management
> ceph osd getcrushmap -o crushmap.out
> crushtool -d crushmap.out -o decom_crushmap.txt
> cp decom_crushmap.txt update_decom_crushmap.txt
> crushtool -c update_decom_crushmap.txt -o update_crushmap.out
> ceph osd setcrushmap -i update_crushmap.out
> crushtool --test -i update_crushmap.out --show-choose-tries --rule 2
--num-rep=2
> crushtool --test -i update_crushmap.out --show-utilization --num-
rep=2
ceph osd crush show-tunables
RBD Management
> rbd --pool ssd create --size 10000 ssd_block
– Create a 1G rbd in ssd pool
> rbd map ssd/ssd_block ( in client )
– It should show up in /dev/rbd/<pool-name>/<block-name>
> Then you can use it like a block device
SALT STACK + SES
Install config and benchmark
Files prepare for this demo
Kiwi Image SLE12 + SES2
> https://files.secureserver.
net/0fCLysbi0hb8cr
Git Salt Stack Repo
> https://github.com/Aveng
erMoJo/Ceph-Saltstack
USB install and then Prepare Salt-Minion
> #accept all node* key from minion
> salt-key -a
> #copy all the module and _systemd /srv/salt/ ;
> sudo salt 'node*' saltutil.sync_all
> #benchmark ( get all the disk io basic number )
> sudo salt "node*" ceph_sles.bench_disk /dev/sda /dev/sdb /dev/sdc /dev/sdd
> #get all the disk information
> sudo salt "node*" ceph_sles.disk_info
> #get all the networking information
> sudo salt -L "salt-master node1 node2 node3 node4 node5"
ceph_sles.bench_network salt-master node1 node2 node3 node4 node5
Prepare and Create Clusters Mons
> #create salt-master ssh key
> sudo salt "salt-master" ceph_sles.keygen
> #send key over to nodes
> sudo salt "salt-master" ceph_sles.send_key node1 node2 node3
> #create new cluster with the new mon
> sudo salt "salt-master" ceph_sles.new_mon node1 node2 node3
> #sending cluster conf and key over to the nodes
> sudo salt "salt-master" ceph_sles.push_conf salt-master node1 node2 node3
Create Journal and OSD
> #create the osd journal partition
> #we can combin the get_disk_info for ssd auto assign
> sudo salt -L "node1 node2 node3" ceph_sles.prep_osd_journal /dev/sda 40G
> # clean all the osd disk partition first
> sudo salt 'salt-master' ceph_sles.clean_disk_partition "node1,node2,node3"
"/dev/sdb,/dev/sdc,/dev/sdd"
> # prep the list of osd for the cluster
> sudo salt "salt-master" ceph_sles.prep_osd "node1,node2,node3"
"/dev/sdb,/dev/sdc,/dev/sdd"
Update Crushmap and do rados benchmark
> # crushmap update for the benchmark
> sudo salt "salt-master"
ceph_sles.crushmap_update_disktype_ssd_hdd node1
node2 node3
> # rados bench
> sudo salt "salt-master" ceph_sles.bench_rados
Cache Tier setup
> sudo salt "salt-master" ceph_sles.create_pool samba_ssd_pool
100 2 ssd_replicated
> sudo salt "salt-master" ceph_sles.create_pool samba_hdd_pool
100 3 hdd_replicated
> ceph osd tier add samba_hdd_pool samba_ssd_pool
> ceph osd tier cache-mode samba_ssd_pool writeback
> ceph osd tier set-overlay samba_hdd_pool samba_ssd_pool
> ceph osd pool set samba_ssd_pool hit_set_type bloom
> ceph osd pool set samba_ssd_pool hit_set_count 2
> ceph osd pool set samba_ssd_pool hit_set_period 300
Block device demo
> rbd --pool samba_hdd_pool create --size 10000 samba_test
> sudo rbd --pool samba_ssd_pool ls
> sudo rbd --pool samba_ssd_pool map samba_test
> sudo mkfs.xfs /dev/rbd0
> sudo mount /dev/rbd0 /mnt/samba
> sudo systemctl restart smb.service
WHAT NEXT?
Email me alau@suse.com
Let me know what you want to hear next

Build an affordable Cloud Stroage

  • 1.
    Build our ownaffordable cloud storage openSUSE Asia Summit Dec, 2015
  • 2.
  • 3.
    Storage Trend > DataSize and Capacity – Multimedia Contents – Big Demo binary, Detail Graphic / Photos, Audio and Video etc. > Data Functional need – Different Business requirement – More Data driven process – More application with data – More ecommerce > Data Backup for a longer period – Legislation and Compliance – Business analysis
  • 4.
    Storage Usage Tier 0 UltraHigh Performance Tier 1 High-value, OLTP, Revenue Generating Tier 2 Backup/Recovery, Reference Data, Bulk Data Tier 3 Object, Archive, Compliance Archive, Long-term Retention 1-3% 15-20% 20-25% 50-60%
  • 5.
    Software Define Storage >High Extensibility: – Distributed over multiple nodes in cluster > High Availability: – No single point of failure > High Flexibility: – API, Block Device and Cloud Supported Architecture > Pure Software Define Architecture > Self Monitoring and Self Repairing
  • 6.
  • 7.
    Why using CloudStorage? > Very High ROI compare to traditional Hard Storage Solution Vendor > Cloud Ready and S3 Supported > Thin Provisioning > Remote Replication > Cache Tiering > Erasure Coding > Self Manage and Self Repair with continuous monitoring
  • 8.
    Other Key Features >Support client from multiple OS > Data encryption over physical disk ( more CPU needed) > On the fly data compression > Basically Unlimited Extendibility > Copy-On-Writing ( Clone and Snapshot ) > iSCSI support ( VM and thin client etc )
  • 9.
    WHO USING IT? Show Cases of Cloud Storage
  • 10.
    EMC, Hitachi, HP, IBM NetApp,Dell, Pura Storage, Nexsan Promise, Synology, QNAP, Infortrend, ProWare, SansDigitial
  • 11.
    Who is doingSoftware Define Storage
  • 12.
    Who is usingsoftware define storage?
  • 13.
    HOW MUCH? What ifwe use Software Define Storage?
  • 14.
    HTPC AMD (A8-5545M) Formfactor: – 29.9 mm x 107.6 mm x 114.4mm CPU: – AMD A8-5545M ( Clock up 2.7GHz / 4M 4Core) RAM: – 8G DDR-3-1600 KingStone ( Up to 16G SO-DIMM ) Storage: – mS200 120G/m-SATA/read:550M, write: 520M Lan: – Gigabit LAN (RealTek RTL8111G) Connectivity: – USB3.0 * 4 Price: – $6980 (NTD)
  • 15.
    Enclosure Form factor: – 215(D)x 126(w) x 166(H) mm Storage: – Support all brand of 3.5" SATA I / II / III hard disk drive 4 x 8TB = 32TB Connectivity: – USB 3.0 or eSATA Interface Price: – $3000 (NTD)
  • 16.
    AMD (A8-5545M) > Node= 6980 > 512SSD + 4TB + 6TB + Enclosure =5000 + 4000 + 7000 = 16000 > 30TB total = 16000 * 3 = 58000 > It is about the half of Amazon Cloud 30TB cost over 1 year
  • 18.
    QUICK 3 NODESETUP Demo basic setup of a small cluster
  • 19.
    CEPH Cluster Requirement >At least 3 MON > At least 3 OSD – At least 15GB per osd – Journal better on SSD
  • 20.
    ceph-deploy > ssh nopassword id need to pass over to all cluster nodes > echo nodes ceph user has sudo for root permission > ceph-deploy new <node1> <node2> <node3> – Create all the new MON > ceph.conf file will be created at the current directory for you to build your cluster configuration > Each cluster node should have identical ceph.conf file
  • 21.
    OSD Prepare andActivate > ceph-deploy osd prepare <node1>:</dev/sda5>:</var/lib/ceph/osd/journal/osd-0> > ceph-deploy osd activate <node1>:</dev/sda5>
  • 22.
    Cluster Status > cephstatus > ceph osd stat > ceph osd dump > ceph osd tree > ceph mon stat > ceph mon dump > ceph quorum_status > ceph osd lspools
  • 23.
    Pool Management > cephosd lspools > ceph osd pool create <pool-name> <pg-num> <pgp- num> <pool-type> <crush-ruleset-name> > ceph osd pool delete <pool-name> <pool-name> --yes- i-really-really-mean-it > ceph osd pool set <pool-name> <key> <value>
  • 24.
    CRUSH Map Management >ceph osd getcrushmap -o crushmap.out > crushtool -d crushmap.out -o decom_crushmap.txt > cp decom_crushmap.txt update_decom_crushmap.txt > crushtool -c update_decom_crushmap.txt -o update_crushmap.out > ceph osd setcrushmap -i update_crushmap.out > crushtool --test -i update_crushmap.out --show-choose-tries --rule 2 --num-rep=2 > crushtool --test -i update_crushmap.out --show-utilization --num- rep=2 ceph osd crush show-tunables
  • 25.
    RBD Management > rbd--pool ssd create --size 10000 ssd_block – Create a 1G rbd in ssd pool > rbd map ssd/ssd_block ( in client ) – It should show up in /dev/rbd/<pool-name>/<block-name> > Then you can use it like a block device
  • 26.
    SALT STACK +SES Install config and benchmark
  • 27.
    Files prepare forthis demo Kiwi Image SLE12 + SES2 > https://files.secureserver. net/0fCLysbi0hb8cr Git Salt Stack Repo > https://github.com/Aveng erMoJo/Ceph-Saltstack
  • 28.
    USB install andthen Prepare Salt-Minion > #accept all node* key from minion > salt-key -a > #copy all the module and _systemd /srv/salt/ ; > sudo salt 'node*' saltutil.sync_all > #benchmark ( get all the disk io basic number ) > sudo salt "node*" ceph_sles.bench_disk /dev/sda /dev/sdb /dev/sdc /dev/sdd > #get all the disk information > sudo salt "node*" ceph_sles.disk_info > #get all the networking information > sudo salt -L "salt-master node1 node2 node3 node4 node5" ceph_sles.bench_network salt-master node1 node2 node3 node4 node5
  • 29.
    Prepare and CreateClusters Mons > #create salt-master ssh key > sudo salt "salt-master" ceph_sles.keygen > #send key over to nodes > sudo salt "salt-master" ceph_sles.send_key node1 node2 node3 > #create new cluster with the new mon > sudo salt "salt-master" ceph_sles.new_mon node1 node2 node3 > #sending cluster conf and key over to the nodes > sudo salt "salt-master" ceph_sles.push_conf salt-master node1 node2 node3
  • 30.
    Create Journal andOSD > #create the osd journal partition > #we can combin the get_disk_info for ssd auto assign > sudo salt -L "node1 node2 node3" ceph_sles.prep_osd_journal /dev/sda 40G > # clean all the osd disk partition first > sudo salt 'salt-master' ceph_sles.clean_disk_partition "node1,node2,node3" "/dev/sdb,/dev/sdc,/dev/sdd" > # prep the list of osd for the cluster > sudo salt "salt-master" ceph_sles.prep_osd "node1,node2,node3" "/dev/sdb,/dev/sdc,/dev/sdd"
  • 31.
    Update Crushmap anddo rados benchmark > # crushmap update for the benchmark > sudo salt "salt-master" ceph_sles.crushmap_update_disktype_ssd_hdd node1 node2 node3 > # rados bench > sudo salt "salt-master" ceph_sles.bench_rados
  • 32.
    Cache Tier setup >sudo salt "salt-master" ceph_sles.create_pool samba_ssd_pool 100 2 ssd_replicated > sudo salt "salt-master" ceph_sles.create_pool samba_hdd_pool 100 3 hdd_replicated > ceph osd tier add samba_hdd_pool samba_ssd_pool > ceph osd tier cache-mode samba_ssd_pool writeback > ceph osd tier set-overlay samba_hdd_pool samba_ssd_pool > ceph osd pool set samba_ssd_pool hit_set_type bloom > ceph osd pool set samba_ssd_pool hit_set_count 2 > ceph osd pool set samba_ssd_pool hit_set_period 300
  • 33.
    Block device demo >rbd --pool samba_hdd_pool create --size 10000 samba_test > sudo rbd --pool samba_ssd_pool ls > sudo rbd --pool samba_ssd_pool map samba_test > sudo mkfs.xfs /dev/rbd0 > sudo mount /dev/rbd0 /mnt/samba > sudo systemctl restart smb.service
  • 34.
    WHAT NEXT? Email mealau@suse.com Let me know what you want to hear next