Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ceph Day Bring Ceph To Enterprise

iSCSI, openATTIC and Salt to bring ceph into enterprise

  • Login to see the comments

Ceph Day Bring Ceph To Enterprise

  1. 1. Bring Ceph to Enterprise Setup a 50T mobile cluster in 30 min Alex Lau (劉俊賢) Software Consultant
  2. 2. Block Storage File System Object Storage How to access to ceph storage? Introduction of iSCSI Remote Cluster Data Encrypted at Rest Monitor Nodes Management Node Heterogeneous OS Access RADOS gateway RESTful api iSCSI
  3. 3. SUSE Enterprise Storage 3 A first commercial available ISCSI access to connect to SES3. It allow client access to ceph storage remotely over TCP/IP iscsi protocol. SES3 provide a iscsi Target driver on top of RDB ( RADOS block device ). This allow any iscsi Initiator can access SES3 over network.
  4. 4. iSCSI Architecture Technical Background Protocol: ‒ Block storage access over TCP/IP ‒ Initiators the client that access the iscsi target over tcp/ip ‒ Targets, the server that provide access to a local block SCSI and iSCSI: ‒ iSCSI encapsulated commands and responses ‒ TCP package of iscsi is representing SCSI command Remote access: ‒ iSCSI Initiators able to access a remote block like local disk ‒ Attach and format with XFS, brtfs etc. ‒ Booting directly from a iscsi target is supported
  5. 5. Public Network OSD1 OSD2 OSD3 OSD4 Before iSCSI RBD support … Target System RBD Block LIO to ISCSI Initiator System
  6. 6. Before iSCSI support what’s wrong? Missing features LIO over RBD: ‒ It doesn’t support “atomic compare and write” ‒ It doesn’t support “persistent group reservations” iSCSI: ‒ ISCSI Active/Active Multiple Path MPIO is not supported ‒ Block layer support all these require a different approach
  7. 7. Benefit of iSCSI LIO gateway for RBD Multiple Platform access to ceph: ‒ It doesn’t require to be part of the cluster like radosgw Standard iSCSI interface: ‒ Most OS support iSCSI ‒ Open-iscsi in most Linux OS LIO Linux IO Target: ‒ In kernel scsi target implementation Flexible configuration: ‒ Targetcli utility is available with lrbd
  8. 8. Config RBD iSCSI gateway Introduction of lrbd Easy Setup: ‒ Package bundle with iscsi since SES2.0 ‒ Multi-Node configuration support with targetcli Technical Background: ‒ JSON configuration format ‒ Target, Portals, Pools, Auth ‒ Configuration state stored in ceph cluster Related Link: ‒ ‒
  9. 9. 9 iSCSI Gateway Optimizations Efficient handling of certain SCSI operations: ‒ Offload RBD image IO to OSDs ‒ Avoid Locking on iSCSI gateway nodes ‒ Compare and Write ‒ New cmpext OSD operation to handle RBD data comparison ‒ Dispatch as compound cmpext+write OSD request ‒ Write Same ‒ New writesame OSD operation to expand duplicate data at the OSD ‒ Reservations ‒ State stored as RBD image extended attribute ‒ Updated using compound cmpxattr+setxattr OSD request
  10. 10. 10 Public Network OSD1 OSD2 OSD3 OSD4 Multiple Path Support with iSCSI on RBD Cluster Network iSCSI Gateway RBD Module iSCSI Gateway RBD Module iSCSI Initiator RBD image
  11. 11. How to manage storage growth and costs of ceph ? Easily scale and manage data storage Control storage growth and manage costs Support today’s investment and adapt to the future $
  12. 12. Introduction to openATTIC Easily scale and manage data storage
  13. 13. SUSE Enterprise Storage Management Vision Open Source : ‒ Alternative to proprietary storage management systems Enterprise: ‒ Work as expected with traditional storage unified storage interface e.g. NAS, SAN SDS Support: ‒ Provide initial ceph setup in managing and monitoring to ease in complicated scale out scenarios It will be available in next SES release or download it now at c
  14. 14. openATTIC Features Existing capability Modern Web UI RESTful API ‒ Software Defined Storage Unified Storage ‒ NAS (NFS, CIFS, HTTP) ‒ SAN (iSCSI, Fiber Channel) Volume Mirroring ‒ DRBD File System ‒ LVM, XFS, ZFS, Btrfs, ext3/4 Monitoring ‒ Nagios / Icinga built-in ‒ Ceph Management (WIP)
  15. 15. openATTIC Architecture Technical Detail Backend: ‒ Python (Django) ‒ Django REST Framework ‒ Nagios / Icinga & PNP4Nagios ‒ Linux tools ‒ LVM, LIO, DRBD ‒ Ceph API ‒ librados, librbd Web Frontend ‒ AngularJS ‒ Bootstrap ‒ REST API Automated Test Suites ‒ Python unit tests ‒ Gatling ‒ RESTful API ‒ Protractor / Jasmine ‒ WebUI test
  16. 16. openATTIC Architecture High Level Overview Django Linux OS Tools openATTIC SYSTEMD RESTful API PostgreSQL DBUS Shell librados/li brbd Web UI REST Client HTTP NoDB
  17. 17. openATTIC Development Current status  Create and map RBDs as block devices (volumes)  Pool management Web UI (table view)  OSD management Web UI (table view)  RBD management Web UI (add/delete, table view)  Monitor a cluster health and performance  Support for managing Ceph with salt integration (WIP)  Role management of node, monitor, storage, cephfs, iscsi, radosgw
  18. 18. Volume Management
  19. 19. Pool Listing
  20. 20. OSD Listing
  21. 21. RBD Listing
  22. 22. oA Ceph Roadmap future is in your hand  Ceph Cluster Status Dashboard incl. Performance Graphs  Extend Pool Management  OSD Monitoring/Management  RBD Management/Monitoring  CephFS Management  RGW Management (users, buckets keys)  Deployment, remote configuration of Ceph nodes (via Salt)  Public Roadmap on the openATTIC Wiki to solicit community feedback:
  23. 23. How ceph control storage cost? Control storage growth and manage costs $
  24. 24. Minimal recommendation OSD Storage Node ‒ 2GB RAM per OSD ‒ 1.5GHz CPU core per OSD ‒ 10GEb public and backend ‒ 4GB RAM for cache tier MON Monitor Node ‒ 3 Mons minimal ‒ 2GB RAM per node ‒ SSD System OS ‒ Mon and OSD should not be virtualized ‒ Bonding 10GEb
  25. 25. SUSE Storage Pricing JBOD Storage Mid-range Array Mid-range NAS High-end Disk Array SUSE Enterprise Storage Fully Featured NAS Device Entry-level Disk Array
  26. 26. Use storage with multiple tiers Writing Quickly Application like: • e.g. Video Recording • e.g. Lots of IoT Data Reading Quickly Application like: • e.g. Video Streaming • e.g. Big Data analysis Write Tier Hot Pool Normal Tier Cold Pool Read Tier Hot Pool SUSE Enterprise Storage Cluster Normal Tier Cold Pool
  27. 27. How to create multiple price point? 1000$ = 1000G 2000MB rw 4 PCIe = 4000$ = 8000MB rw 4T Storage 400,000 IOPS 4$ per G 250$ = 1000G, 500MB rw 16 Driver = 4000$ = 8000MB rw 16T Storage 100,000 IOPS 1$ per G 250$ = 8000G 150MB rw 16 Driver = 4000$ = 2400MB rw 128T Storage 2000 IOPS 0.1$ per G
  28. 28. Control Costs How EC reduce storage cost? $ Copy Copy Copy Replication Pool SES CEPH CLUSTSER Control CostsErasure Coded Pool SES CEPH CLUSTSER Data Data Data Data Parity Parity Multiple Copy of stored data • 300% cost of data size • Low Latency, Faster Recovery Single Copy with Parity • 150% cost of data size • Data/Parity ratio trade of CPU
  29. 29. Public Cloud Setup H270-H70 - 40000$ - 48 Core * 8 : 384 Cores - 32G * 32: 1T Memory - 1T * 16: 16T SSD - 40GbE * 8 R120-T30 - 5700$ * 7 - 48 Core * 7 : 336 Cores - 8 * 16G * 7 : 896G Memory - 1T * 2 * 7 : 14T SSD - 8T * 6 * 7 : 336T HDD - 40GbE * 7 - 10GbE * 14 1000 Customer Running 5$ - Web Hosting = 5000$ 8 Months = 40000$ EC 5+2 is about 250T 2500 Customer 100GB 2$ Storage = 5000$ 8 Months = 40000$
  30. 30. For developer? OSD1 OSD2 OSD3 OSD4 MON1 OSD5 OSD6 OSD7 OSD8 MON2 OSD9 OSD10 OSD11 OSD12 MON3 Dual 1G Network 300$ 300$ 6T = 220$ 220 * 3 = 660$ 512G = 150$ 300$ 6T = 220$ 220 * 3 = 660$ 512G = 150$ 6T = 220$ 220 * 3 = 660$ 512G = 150$
  31. 31. Pros and Cons of this mobile cluster Price: ‒ Around 3200$ vs Expensive Laptops Size: ‒ 50T and 20kg is mobile enough to demo a usable cluster ‒ Real HDD better for presentation of a storage solution Benchmark: ‒ Beside Networking capability, all features and requirement of a ceph cluster meet Features: ‒ Great fit for developers and tester to perform software base test but something that VM can’t be done
  32. 32. How DevOps story fit? Introduce you salt Support today’s investment and adapt to the future
  33. 33. Salt enable ceph Existing capability Sesceph ‒ Python API library that help deploy and manage ceph ‒ Already upstream in to salt available in next release ‒ Python-ceph-cfg ‒ Python salt module that use sesceph to deploy ‒ Both library come with SES3.0 already
  34. 34. Why Salt? Existing capability Product setup ‒ SUSE OpenStack cloud, SUSE manager and SUSE Enterprise Storage all come with salt enable Parallel execution ‒ E.g. Compare to ceph-deploy to prepare OSD Customize Python module ‒ Continuous development on python api easy to manage Flexible Configuration ‒ Default Jinja2 + YAML ( stateconf ) ‒ Pydsl if you like python directly, json, pyobject, etc
  35. 35. Create a cluster with a single stage file Saltstack/blob/master/stages/ses/ceph/ceph_create.sls This is a show case of how a simple way to create a cluster with a simple stage file It is up to your custom to create your own easily
  36. 36. Quick deployment example Git repo for fast deploy and benchmark  Demo recording  1) Salt setup 2) Git clone and copy module to salt _modules 3) Saltutil.sync_all push to all minion nodes 4) ntp_update all nodes 5) Create new mons, and create keys 6) Clean disk partitions and prepare OSD 7) Update crushmap
  37. 37. Reduce storage costs and management with SUSE Enterprise Storage Manage Less Adapt Quickly Control Costs
  38. 38. Scale storage from terabytes to hundreds of petabytes without downtime SOCIAL MEDIA BUSINESS OPERATIONS MOBILE DATA CUSTOMER DATA %UPTIME