Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira

110 views

Published on

Ceph Day Galicia
April 4th 2018, Santiago de Compostela ES
Camilo Echevarne and Félix Barbeira, Dinahosting

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira

  1. 1. Backup Management with Ceph Object Storage
  2. 2. ¿Who are we? Camilo Echevarne Félix Barbeira cechevarne@dinahosting.com fbarbeira@dinahosting.com Linux Sysadmin Head Linux Sysadmin
  3. 3. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  4. 4. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  5. 5. ¿What is Dinahosting? Our main business is web hosting and domain registration. We offer the user all tools needed to develop their project on Internet with guarantees: - Domain name for your site. - E-mail services. - Hosting plans: from the simplest ones to complex and powerful solutions like Cloud Hosting, as well as VPS and Dedicated Servers.
  6. 6. ¿Where are we? Presence on more than 130 international markets México, Argentina, Colombia, Chile, Portugal, Peru, Venezuela, USA, Brazil, Ecuador, France, United Kingdom, Italy, Denmark, Netherlands, Uruguay, Bolivia, Japan, China, Senegal, etc. Santiago
  7. 7. +130.000 customers +3.000 servers + 240.000 domainsSome numbers…
  8. 8. Revenue 0 3.500.000 7.000.000 10.500.000 14.000.000 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Revenue
  9. 9. - Toll-free phone number. - Chat. - E-mail. - Social network presence. - 24/7 service. - No call-center auto-attendant. Customer service
  10. 10. • Only for clients of managed services. • Restorations at file level. • 30 days max retention. • Weekly complete backup, incremental rest of the week. • ~3000 machines. • ~1PB available space. • ~30 bare metal storage servers. • Complete backup size ~125TB. Backups Data size increases year by year an so their management complexity
  11. 11. Agenda • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  12. 12. Current system NFS servers RAID storage
  13. 13. RAID: the end of an era • Slow recovery. • Hazardous recovery. • Painful recovery. • Disk incompatibility. • Wasted disk for hot-spare. • Expensive storage cards. • Hard to scale. • False sense of security. [2]
  14. 14. ¿Would we be protected against … ? hardware error network outage datacenter disaster power supply operating system error filesystem failure RAID: the end of an era
  15. 15. Problems managing files - Backwards compatibility. - Wasted space. - ¿Storage node partition table? - Corrupt files when disk is full. - Many hours spent by SysOps. - Forced to deploy and maintain an API.
  16. 16. IT support The bosses Problems managing files
  17. 17. Agenda • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  18. 18. Upload backup to the cloud ¿When do we start to loose money?
  19. 19. Price per month 1PB cloud storage AWS Cost Blocking elements S3 Infrequent Access (IA) ~15.000 € Price S3 Glacier ~5.000 €* - Slow data retrieval** - Limited availability*** - 500TB upload limit**** * Files deleted before 90 days incur a pro-rated charge. ** Expedited retrievals: 1-5 min. 3 expedited retrievals can be performed every 5 minutes. Each unit of provisioned capacity costs 100$ per month. Retrieval Standard: 3-5 hours. *** Glacier inventory refresh every 24h. **** Increasing the upload limit is available contacting AWS support. AZURE Cost Blocking elements Storage Standard Cool ~9.000 € Price Storage Standard Archive ~5.000 € Restauraciones <15h Azure charges extra cost if files are deleted before 30 and 180 days respectively. GCP Cost Blocking elements Nearline Storage ~8.000 € Price Coldline Storage ~5.000 € Price In both types of storage, data access is measured in milliseconds. Upload backup to the cloud
  20. 20. [3] Upload backup to the cloud
  21. 21. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  22. 22. Unified, distributed storage system. Intelligence on software. - Open source. - Massively scalable. - Independent components. - No SPOF. - High performance. - S3 API. - Active community. - Use of commodity hardware.
  23. 23. Clients Object Storage Ceph Storage Cluster Block Storage File Storage Ceph Linux OS CPU Memory HDDs Network Ceph Ceph Linux OS Linux OS CPU Memory HDDs Network CPU Memory HDDs Network Linux OS CPU Memory HDDs Network Ceph … … … Ceph SDS Linux OS Hardware Distributed storage Server1 Server2 Server3 ServerN
  24. 24. OSD (object storage daemon) Monitors - From one to thousands. - Generally speaking, 1 OSD = 1 hard disk. - Communicate between them to replicate data and make recoveries. - Maintain cluster maps. - Provide consensus on the decisions of data distribution. - Small and odd number. - Do not store data. Gateways - Entry points to cluster.
  25. 25. OSD NODE Disk OSD OSD OSD DiskDiskDisk OSD Disk OSD OSD OSD DiskDiskDisk OSD Disk OSD OSD OSD DiskDiskDisk Server1 Server2 Server3 NODE RADOS sends the read request to the primary OSD. Primary OSD reads data from local disk and notifies Ceph client. WRITEREAD 1 2 1 2 Client writes data, RADOS creates object and sends data to the primary OSD. 1 4 3 2 Primary OSD finds the number of replicas and sends data to replica OSDs. Replica OSDs write data and send completion to primary OSD. Primary OSD signals write completion to Ceph client. 1 2 2 3 4 3 Data flow on OSDs
  26. 26. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  27. 27. Ceph OSDs • DELL R720XD / R730XD • CPU: 2xE5-2660 8 cores 2,20ghz • RAM: 64GB-96GB • Disks: • 12x8TB SATA • 1 SATA disk for OS • NIC: 10G • Controller: H730 / LSI JBOD Hardware planification Ceph monitors • VPS. • 4 vcores • RAM: 4GB • NIC: 10G Ceph gateways • DELL R230 • CPU: E3-1230v5 3.4GHz (8) • RAM: 8GB • NIC: 10G ¿Optimize for cost or performance? In our case, the principal objective is to optimize total cost per GB.
  28. 28. ¿What happen to the OSDs if the OS disk dies? “We recommend using a dedicated drive for the operating system and software, and one drive for each OSD Daemon you run on the host” So…¿where do I put the operating system of the OSD node? PROS CONS OS in RAID1 - Cluster protected against OS failures. - Hot-swap disks. - We do not have a RAID card*. - We would need 1 extra disk. OS single disk - Only 1 disk slot used. - High reliability. Monitor disk with SMART. If disk dies, all OSDs of that machine die too. OS in SATADOM All disk slots available for OSDs. They are not reliable after months of use. OS from SAN - All disk slots available for OSDs. - RAID protected. We depend on the network and remote storage. OS in SD All disk slots available for OSDs. Poor performance, not reliable. *PERC H730 supports RAID. Hardware planification
  29. 29. Rules of thumb for a Ceph installation. - 10G networking as a minimum. - Deeply knowledge of hardware you wish to use. - Always use at least 3 replicas - Try to use enterprise SSDs. - Don’t use configuration options you don’t understand. - Power loss testing. - Have a recovery plan. - Use a CM (configuration management) system. Hardware planification
  30. 30. https://github.com/ceph/ceph-ansible.git • Gradual learning curve. • Plain deploy, no lifecycle management. • No orchestration. • No server needed. • Evolution of ceph-deploy tool. http://docs.ceph.com/ceph-ansible/ TIPS: - Use ansible compatible version (no bleeding edge version supported). - Do not use master branch unless you like strong emotions.
  31. 31. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  32. 32. Cliente Public network Monitor01 Monitor02 Monitor03 RadosGW01 HTTP (S3) …RadosGW02 Ceph Architecture OSD OSD OSD OSD OSDn 10G Odd number (3) … RadosGWn 10G10G 10G 10G IPv6 IPv4 & IPv6 10G 10G
  33. 33. RadosGW01 …RadosGW02 RadosGWn Client HA Gateway Option 1: LB active/passive mode. LB-ON LB-OFF Inconveniences: - Bandwidth bottleneck at LB. - Mount 2 LB at least. - Increases TCO. - Increases complexity. Ceph Architecture
  34. 34. RadosGW01 …RadosGW02 RadosGWn Client HA Gateway Option 2: DNS Round Robin. Inconveniences: - DNS responses are stateless. - TTL dependency. - No instant fail-over. Ceph Architecture
  35. 35. HA Gateway Option 3 (selected): local anycast of gateway ip. Advantages: - Bandwidth of all nodes is added. - Instant fail-over. - Route is deleted if it fails node, RadosGW daemon or FrRouting daemon. OSPF Route RadosGW01 ©√FRROUTING IP RADOS-GATEWAY health-check RadosGW0n ©√FRROUTING IP RADOS-GATEWAY health-check… OSPF Route CPD RadosGW01 ©√FRROUTING IP RADOS-GATEWAY health-check Ceph Architecture
  36. 36. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  37. 37. Tuning No silver bullets. root@ceph:~# ceph --show-config | wc -l 1397 root@ceph:~# Default options are designed for general use cases. Most of the times you need to make some adjustments in order to achieve real performance. Ceph documentation is highly valuable and extensive: http://docs.ceph.com/docs/master/
  38. 38. Tuning • Enable Jumbo Frames. ping6 -M do -s 8972 <ip> • Monitor options: [mon] mon osd nearfull ratio = .90 mon osd down out subtree limit = host • OSD options: [osd] osd scrub sleep = .1 osd scrub load threshold = 1.0 osd scrub begin hour = 12 osd scrub end hour = 0 Overhead Data Standard Frames (1522 MTU) Jumbo Frame (9000 MTU) Overhead Data • Daily reweight: ceph osd reweight-by-utilization [threshold]
  39. 39. Erasure Code Replicated pool vs. Erasure code pool OBJECT COPY COPYCOPY Replicated pool CEPH STORAGE CLUSTER OBJECT Erasure coded pool CEPH STORAGE CLUSTER 1 2 3 X Y Full copies of stored objects • High durability. • 3x (200% overhead) • Quicker recovery • Admit all kind of operations. • Use less resources (CPU). One copy plus parity • Cost-effective durability. • 1.5x (50% overhead) • Expensive recovery. • Partial writes not supported*. • Higher CPU usage.
  40. 40. CEPH CLIENT Erasure coded pool CEPH STORAGE CLUSTER 1 OSD 2 OSD 3 OSD 4 OSD X OSD Y OSD READ Reads Erasure Code ¿HOW DOES ERASURE CODE WORKS?
  41. 41. Erasure coded pool CEPH STORAGE CLUSTER 1 OSD 2 OSD 3 OSD 4 OSD X OSD Y OSD READ READS Erasure Code CEPH CLIENT Reads ¿HOW DOES ERASURE CODE WORKS?
  42. 42. Erasure coded pool CEPH STORAGE CLUSTER 1 OSD 2 OSD 3 OSD 4 OSD X OSD Y OSD READ REPLY READS Erasure Code CEPH CLIENT Reads ¿HOW DOES ERASURE CODE WORKS?
  43. 43. CEPH STORAGE CLUSTER 1 OSD 2 OSD 3 OSD 4 OSD X OSD Y OSD WRITE Erasure coded pool Erasure Code CEPH CLIENT Writes ¿HOW DOES ERASURE CODE WORKS?
  44. 44. Erasure coded pool CEPH STORAGE CLUSTER 1 OSD 2 OSD 3 OSD 4 OSD X OSD Y OSD WRITE WRITES Erasure Code CEPH CLIENT Writes ¿HOW DOES ERASURE CODE WORKS?
  45. 45. CEPH STORAGE CLUSTER 1 OSD 2 OSD 3 OSD 4 OSD X OSD Y OSD WRITE ACK Writes Erasure coded pool Erasure Code CEPH CLIENT ¿HOW DOES ERASURE CODE WORKS?
  46. 46. Two variables: K + M K = data shards M = erasure code shards Usable space OSDs number allowed to fail 3+1 75 % 1 4+2 66 % 2 18+2 90 % 2 11+3 78.5 % 3 n=k+m n= total shards r=k/n r=encoding rate n=4+2=6 r=4/6=0.66 CERN Erasure Code ¿HOW DOES ERASURE CODE WORKS?
  47. 47. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  48. 48. { "user_id": "fbarbeira", "display_name": "Felix Barbeira", "email": "fbarbeira@dinahosting.com", "suspended": 0, "max_buckets": 100, "auid": 0, "subusers": [ { "id": "fbarbeira:read-only", "permissions": "read" } ], "keys": [ { "user": “fbarbeira:read-only", "access_key": "XXXXXXXXXXXXXXXXXXXX", "secret_key": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" }, { "user": "fbarbeira", "access_key": "XXXXXXXXXXXXXXXXXXXX", "secret_key": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" } ], [...] }, "user_quota": { "enabled": true, "check_on_raw": false, "max_size": -1, "max_size_kb": 1073741824, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw" } Limit max size object 1TB (default no-limit) Ceph user profile Limit bucket max number (default 1000) Subuser with read-only permission
  49. 49. ¿How do we sync backups? message broker AGENTBACKUP_DONE Consume n elements SERVER publish publish SERVER publish SERVER … User-RW WRITE
  50. 50. ¿How do we restore backups? PANEL AGENT Generate temporary links Method 1: Restorations ordered by control panel. SERVER GET User-RO Method 2: Restorations from the same machine.
  51. 51. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  52. 52. s3cmd minio Ceph client requirements • No dependencies. • Multi-OS compatible. • Low resources requirements. • Active development. • Bandwidth limit. We need to limit used bandwidth.
  53. 53. CPU and IO problems CPU NET 1G Limit bandwidth Powerful machine CPU NET 1G No limits Not-so-powerful machine CPU NET 1G No limits Powerful machine Elastic limit CPU NET 1G
  54. 54. Linux Traffic Control (TC) Flow1 Flow2 Flow3 Flow4 Port FIFO queue Ceph traffic Default behaviour Flow1 Flow2 Flow3 Flow4 Classifier Ceph traffic Port Ceph traffic FIFO queue Hierarchical Token Bucket (HTB) Applying tc policy CPU and IO adjustments
  55. 55. Linux Traffic Control (TC) Regulate outgoing traffic using system load. CPU NET Network limit Allowed CPU load range Reduce/increase transfer rate CPU and IO adjustments
  56. 56. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  57. 57. Prometheus Grafana Storage scrape metrics Alertmanager generate alerts E-mail XMPP NODE_EXPORTER MGR_EXPORTER NODE_EXPORTER MGR_EXPORTER NODE_EXPORTER MGR_EXPORTER NODE_EXPORTER NODE_EXPORTER Monitors OSDs and Gateways Monitoring
  58. 58. Monitoring user@prometheus:~$ curl --silent http://ceph-monitor:9283/metrics | head -20 # HELP ceph_osd_op_out_bytes Client operations total read size # TYPE ceph_osd_op_out_bytes counter ceph_osd_op_out_bytes{ceph_daemon="osd.6"} 192202.0 ceph_osd_op_out_bytes{ceph_daemon="osd.26"} 355345.0 ceph_osd_op_out_bytes{ceph_daemon="osd.30"} 99943.0 ceph_osd_op_out_bytes{ceph_daemon="osd.8"} 9687.0 ceph_osd_op_out_bytes{ceph_daemon="osd.20"} 6480.0 ceph_osd_op_out_bytes{ceph_daemon="osd.36"} 73682.0 ceph_osd_op_out_bytes{ceph_daemon="osd.22"} 497679.0 ceph_osd_op_out_bytes{ceph_daemon="osd.47"} 123536.0 ceph_osd_op_out_bytes{ceph_daemon="osd.34"} 95692.0 ceph_osd_op_out_bytes{ceph_daemon="osd.45"} 114504.0 ceph_osd_op_out_bytes{ceph_daemon="osd.10"} 8695.0 ceph_osd_op_out_bytes{ceph_daemon="osd.39"} 0.0 ceph_osd_op_out_bytes{ceph_daemon="osd.43"} 107303.0 ceph_osd_op_out_bytes{ceph_daemon="osd.12"} 199043.0 ceph_osd_op_out_bytes{ceph_daemon="osd.28"} 1165455.0 ceph_osd_op_out_bytes{ceph_daemon="osd.41"} 216581.0 ceph_osd_op_out_bytes{ceph_daemon="osd.14"} 124186.0 user@prometheus:~$ Prometheus exporter example:
  59. 59. Monitoring S.M.A.R.T. Status
  60. 60. Monitoring Ceph Status
  61. 61. Monitoring Gateway Status
  62. 62. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  63. 63. Upgrades overview Unattended upgrades • Mirror on-premises. • Upgrade policy: • Security: ASAP. • Updates: every Tuesday. • Maintenance window. • Package blacklist: ceph and ceph-* • Index results on Elasticsearch. Ceph upgrades sequence: - Monitors. - OSDs. - Gateways.
  64. 64. Upgrades policy Upgrades dashboard
  65. 65. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK CEPH_HEALTH ? Orchestrated reboots
  66. 66. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK HEALTHY Orchestrated reboots
  67. 67. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK LOCK OSD1 Orchestrated reboots
  68. 68. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD1 CEPH_HEALTH ? Orchestrated reboots
  69. 69. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD1 HEALTHY Orchestrated reboots
  70. 70. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD1 LOCK Orchestrated reboots
  71. 71. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD1 WAIT Orchestrated reboots
  72. 72. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD1 UNLOCK Orchestrated reboots
  73. 73. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK Orchestrated reboots
  74. 74. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK Orchestrated reboots
  75. 75. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK LOCKOSD3 Orchestrated reboots
  76. 76. OSD1 Prometheus REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK … OSD2 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 REBOOT_REQUIRED CEPH_HEALTHY ETCD_LOCK REBOOTING ETCD_UNLOCK OSD3 Orchestrated reboots
  77. 77. • Presentation. • The problem. • Alternatives. • Ceph to the rescue. • Hardware planification. • Architecture. • Tuning. • Backup management. • Clients. • Monitoring. • Upgrades. • Future plans. Agenda
  78. 78. Future plans • Metadata Search (Elasticsearch). • Search objects using tags. • Top 10 backup size. • Average size. • Crush Maps. • Use current datacenter configuration in Ceph: rack, row, room… • Increase availability. • EC adjustment. • Indexless buckets. • Incompatible with lifecycles.
  79. 79. ¿Questions?
  80. 80. References: [1] http://www.vacalouraestudio.es/ [2] https://www.krollontrack.co.uk/blog/survival-stories/24tb-of-confidential-data-recovered-from-raid-6-array/ [3] https://www.elempresario.com/noticias/economia/2017/09/27/ el_numero_billetes_500_euros_continua_minimos_desde_2003_54342_1098.html ¡Gracias!

×