SlideShare a Scribd company logo
Building reliable Ceph
clusters with SUSE
Enterprise Storage
Survival skills for the real world
Lars Marowsky-Brée
Distinguished Engineer
lmb@suse.com
What this talk is not
●
A comprehensive introduction to Ceph
●
SUSE Enterprise Storage roadmap session
●
A discussion of Ceph performance tuning
2
SUSE Enterprise Storage -
Reprise
3
The Ceph project
●
An Open Source Software-Defined-Storage project
●
Multiple front-ends
– S3/Swift object interface
– Native Linux block IO
– Heterogeneous Block IO (iSCSI)
– Native Linux network file system (CephFS)
– Heterogeneous Network File System (nfs-ganesha)
– Low-level, C++/Python/… libraries
– Linux, UNIX, Windows, Applications, Cloud, Containers
●
Common, smart data store (RADOS)
– Pseudo-random, algorithmic data distribution
4
Software-Defined-Storage
Ceph Cluster: Logical View
6
MON
MON
MON
MDS
MDS
OSDOSDOSD
OSD OSD OSD
iSCSI
Gateway
iSCSI
Gateway
iSCSI
Gateway
S3/Swift
Gateway
S3/Swift
Gateway
NFS
Gateway
RADOS
Introducing Dependability
7
Introducing dependability
●
Availability
●
Reliability
– Durability
●
Safety
●
Maintainability
8
The elephant in the room
●
Before we discuss technology ...
●
… guess what causes most outages?
9
Improve your human factor
●
Great, you are already here!
●
Training
●
Documentation
●
Team your team with a world-class
support and consulting organizations
10
High-level considerations
11
Advantages of Homogeneity
●
Eases system administration
●
Components are interchangeable
●
Lower purchasing costs
●
Standardized ordering process
12
Murphy’s Law, 2016 version
●
“At scale, everything fails.”
●
Distributed systems protect against
individual failures causing service failures by
eliminating Single Points of Failure
●
Distributed systems are still vulnerable to
correlated failures
13
2n+1
Advantages of Heterogeneity
Everything is broken …
… but everything is broken differently
14
Homogeneity is non-sustainable
●
Hardware gets replaced
– Replacement with same model not available, or
– not desirable given current prices
●
Software updates are not (yet) globally immediate
●
Requirements change
●
Your cluster ends up being heterogeneous anyway
●
… you might as well benefit from it.
15
Failure is inevitable; suffering is optional
●
If you want uptime, prepare for downtime
●
Architect your system to survive a single or
multiple failures
●
Test whether the system meets your SLA
– while degraded and during recovery!
16
How much availability do you need?
●
Availability and durability are not free
●
Cost, Complexity increase exponentially
●
Scale out makes some things easier
17
A bag of suggestions
18
Embrace diversity
●
Automatic recovery requires a >50% majority
– Splitting into multiple different categories/models
– Feasible for some components
– Multiple architectures?
– Mix them across different racks/pods
●
A 50:50 split still allows manual recovery in case of
catastrophic failures
– Different UPS and power circuits
19
Hardware choices
●
SUSE offers Reference Architectures:
– e.g., Lenovo, HPE, Cisco, Dell
●
Partners offer turn-key solutions
– e.g., HPE, Thomas-Krenn
●
SUSE Yes certification reduces risk
– https://www.suse.com/newsroom/post/2016/suse-extends-
partner-software-certification-for-cloud-and-storage-customers/
●
Small variations can have a huge impact!
20
Not all the eggs in one basket^Wrack
●
Distribute servers physically to limit the impact of power outages,
spills, …
●
Ceph’s CRUSH map allows you to describe the physical topology of
your fault domains (engineering speak for “availability zones”)
21
How many MONitors do I need?
22
2n+1
To converge roles or not
●
“Hyper converged” equals correlated
failures
●
It does drive down cost of implementation
●
Sizing becomes less deterministic
●
Services might recover at the same time
●
At scale, don’t correlate the MONs and
OSDs
23
Storage diversity
24
24
●
Avoid desktop
HDDs
●
Avoid sequential
serial numbers
●
Mount at different
angles if paranoid
●
Multiple vendors
●
Avoid desktop
SSDs
●
Monitor wear-
leveling
●
Remember the
journals see all
writes
Storage Node Sizing
●
Node failures most common granularity
– Admin mistake, network, kernel crash
●
Consider impact of outage on:
– Performance (degraded and recovery)
– and capacity!
●
A single node should not be more than 10% of your
total capacity
●
Free capacity should be larger than largest node
25
Data availability and durability
●
Replication:
– Number of copies
– Linear overhead
●
Erasure Coding:
– Flexible number of data and coding blocks
– Can survive any number of outages
– Fractional overhead
– https://www.youtube.com/watch?v=-KyGv6AZN9M
26
k+m
k
2n+1
Durability: Three-way Replication
27
Usable capacity: 33%
Durability: 2 faults
Durability: 4+3 Erasure Coding
28
Usable capacity: 57%
Durability: 3 faults
Consider Cache Tiering
●
Data in cache tier is replicated
●
Backing tier may be slower, but more
durable
29
Durability 201
●
Different strokes for different pools
●
Erasure coding schemes galore
30
Finding and correcting bad data
●
Ceph “scrubbing” detects inconsistent
or missing placement groups
periodically
http://ceph.com/planet/ceph-manually-
repair-object/
http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/#scrubbing
●
SUSE Enterprise Storage 5 will
validate checksums on every read
31
Automatic fault detection and recovery
●
Do you want this in your cluster?
●
Consider setting “noout”:
– during maintenance windows
– in small clusters
32
Network considerations
●
Have both the public and cluster network bonded
●
Consider different NICs
– Use last year’s NICs and switches
●
One channel from each network to each switch
33
Gateway considerations
●
RadosGW (S3/Swift):
– Use HTTP/TCP load balancers
– Possible to build using SLE HA with LVS or haproxy
●
iSCSI targets:
– Multiple gateways, natively supported by iSCSI
●
Improves availability and throughput
– Make sure you meet your performance SLAs during degraded
modes
34
Avoid configuration drift
●
Ensure that systems are configured consistently
– Installed packages
– Package versions
– Configuration (NTP, logging, passwords, …)
●
Avoid manual configuration
●
Use Salt instead
http://ourobengr.com/2016/11/hello-salty-goodness/
https://www.suse.com/communities/blog/managing-configuration-
drift-salt-snapper/
35
Trust but verify a.k.a. monitoring
●
Performance as the system ages
●
SSD degradation / wear leveling
●
Capacity utilization
●
“Free” capacity is usable for recovery
●
React to issues in a timely fashion!
36
Update, always (but with care)
●
Updates are good for your system
– Security
– Performance
– Stability
●
Ceph remains available even while updates are being rolled out
●
SUSE’s tested maintenance updates are the main product value
37
Trust nobody(not even SUSE)
●
If you at all possibly can, use a staging system
– Ideally: a (reduced) version of your production
environment
– At least: a virtualized environment
●
Test updates before rolling them out in production
– Not just code, but also processes!
●
Long-term maintainability:
– Avoid vendor lock-in, use Open Source
38
Disaster can will strike
●
Does it matter?
●
If it does:
– Backups
– Replicate to other sites
●
rbd-mirror, radosgw multi-site
●
Have fire drills!
39
Avoid complexity (KISS)
●
Be aggressive in what you test
– Test all the features
●
Be conservative in what you deploy
– Deploy only what you need
40
In conclusion
Don’t panic.
SUSE’s here to help.
41
Building reliable Ceph clusters with SUSE Enterprise Storage

More Related Content

What's hot

New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
Kamesh Pemmaraju
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turkbuildacloud
 
librados
libradoslibrados
librados
Patrick McGarry
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
Red_Hat_Storage
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
Marcel Hergaarden
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Community
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red_Hat_Storage
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Rongze Zhu
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Community
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
Kamesh Pemmaraju
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStack
OpenStack_Online
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
joshdurgin
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Danny Al-Gaaf
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure Environments
Ceph Community
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Community
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Patrick McGarry
 
Simplifying Ceph Management with Virtual Storage Manager (VSM)
Simplifying Ceph Management with Virtual Storage Manager (VSM)Simplifying Ceph Management with Virtual Storage Manager (VSM)
Simplifying Ceph Management with Virtual Storage Manager (VSM)
Ceph Community
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Patrick McGarry
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014
Ian Colle
 

What's hot (20)

New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 
librados
libradoslibrados
librados
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to Enterprise
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStack
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure Environments
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Simplifying Ceph Management with Virtual Storage Manager (VSM)
Simplifying Ceph Management with Virtual Storage Manager (VSM)Simplifying Ceph Management with Virtual Storage Manager (VSM)
Simplifying Ceph Management with Virtual Storage Manager (VSM)
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014
 

Viewers also liked

2012 Recent US Work Relating to Munitions in the Underwater Environment
2012 Recent US Work Relating to Munitions in the Underwater Environment2012 Recent US Work Relating to Munitions in the Underwater Environment
2012 Recent US Work Relating to Munitions in the Underwater Environment
Geoffrey Carton
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
Jose De La Rosa
 
Hmc industry report_drone_technology_160321[1]
Hmc industry report_drone_technology_160321[1]Hmc industry report_drone_technology_160321[1]
Hmc industry report_drone_technology_160321[1]
Robert Cheek
 
How to Increase Hiring Efficiency with Social Recruiting
How to Increase Hiring Efficiency with Social RecruitingHow to Increase Hiring Efficiency with Social Recruiting
How to Increase Hiring Efficiency with Social Recruiting
Dice
 
Building a Scalable CI Platform using Docker, Drone and Rancher
Building a Scalable CI  Platform using Docker, Drone and RancherBuilding a Scalable CI  Platform using Docker, Drone and Rancher
Building a Scalable CI Platform using Docker, Drone and Rancher
Shannon Williams
 
Use of drone for field observation - Presentation of applications in emergenc...
Use of drone for field observation - Presentation of applications in emergenc...Use of drone for field observation - Presentation of applications in emergenc...
Use of drone for field observation - Presentation of applications in emergenc...
mapali
 
drones-an introduction to design
drones-an introduction to designdrones-an introduction to design
drones-an introduction to design
Safeer Muhammad
 
SubBottom Profiler training
SubBottom Profiler trainingSubBottom Profiler training
SubBottom Profiler training
Codevintec Italiana srl
 
AVORA I successful participation in SAUC-E'12
AVORA I successful participation in SAUC-E'12AVORA I successful participation in SAUC-E'12
AVORA I successful participation in SAUC-E'12
avora_auv
 
Underwater communication
Underwater communicationUnderwater communication
Underwater communication
chiranjitgiri
 
Drone Patent Strategy
Drone Patent StrategyDrone Patent Strategy
Drone Patent Strategy
Drone Research
 
Using mapping drones for disaster prevention & response
Using mapping drones for disaster prevention & responseUsing mapping drones for disaster prevention & response
Using mapping drones for disaster prevention & response
Drone Adventures
 
Drone Technology
Drone TechnologyDrone Technology
Drone Technology
Asad Qayyum Babar
 
underwater acoustic propogation channels
underwater acoustic propogation channelsunderwater acoustic propogation channels
underwater acoustic propogation channels
Shudhanshu Singh
 
Kongsberg Maritime AUVs
Kongsberg Maritime AUVs Kongsberg Maritime AUVs
Kongsberg Maritime AUVs
Hydrographic Society Benelux
 
Drones
DronesDrones
Localization scheme for underwater wsn
Localization scheme for underwater wsnLocalization scheme for underwater wsn
Localization scheme for underwater wsnAkshay Paswan
 
Drone-Unmanned Aerial Vehicle
Drone-Unmanned Aerial VehicleDrone-Unmanned Aerial Vehicle
Drone-Unmanned Aerial Vehicle
shivu1234
 
Advantages of a combined sonar data acquisition system for AUVs and ASVs
Advantages of a combined sonar data acquisition system for AUVs and ASVsAdvantages of a combined sonar data acquisition system for AUVs and ASVs
Advantages of a combined sonar data acquisition system for AUVs and ASVs
Hydrographic Society Benelux
 

Viewers also liked (20)

2012 Recent US Work Relating to Munitions in the Underwater Environment
2012 Recent US Work Relating to Munitions in the Underwater Environment2012 Recent US Work Relating to Munitions in the Underwater Environment
2012 Recent US Work Relating to Munitions in the Underwater Environment
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Hmc industry report_drone_technology_160321[1]
Hmc industry report_drone_technology_160321[1]Hmc industry report_drone_technology_160321[1]
Hmc industry report_drone_technology_160321[1]
 
Abhas hydrophone
Abhas hydrophoneAbhas hydrophone
Abhas hydrophone
 
How to Increase Hiring Efficiency with Social Recruiting
How to Increase Hiring Efficiency with Social RecruitingHow to Increase Hiring Efficiency with Social Recruiting
How to Increase Hiring Efficiency with Social Recruiting
 
Building a Scalable CI Platform using Docker, Drone and Rancher
Building a Scalable CI  Platform using Docker, Drone and RancherBuilding a Scalable CI  Platform using Docker, Drone and Rancher
Building a Scalable CI Platform using Docker, Drone and Rancher
 
Use of drone for field observation - Presentation of applications in emergenc...
Use of drone for field observation - Presentation of applications in emergenc...Use of drone for field observation - Presentation of applications in emergenc...
Use of drone for field observation - Presentation of applications in emergenc...
 
drones-an introduction to design
drones-an introduction to designdrones-an introduction to design
drones-an introduction to design
 
SubBottom Profiler training
SubBottom Profiler trainingSubBottom Profiler training
SubBottom Profiler training
 
AVORA I successful participation in SAUC-E'12
AVORA I successful participation in SAUC-E'12AVORA I successful participation in SAUC-E'12
AVORA I successful participation in SAUC-E'12
 
Underwater communication
Underwater communicationUnderwater communication
Underwater communication
 
Drone Patent Strategy
Drone Patent StrategyDrone Patent Strategy
Drone Patent Strategy
 
Using mapping drones for disaster prevention & response
Using mapping drones for disaster prevention & responseUsing mapping drones for disaster prevention & response
Using mapping drones for disaster prevention & response
 
Drone Technology
Drone TechnologyDrone Technology
Drone Technology
 
underwater acoustic propogation channels
underwater acoustic propogation channelsunderwater acoustic propogation channels
underwater acoustic propogation channels
 
Kongsberg Maritime AUVs
Kongsberg Maritime AUVs Kongsberg Maritime AUVs
Kongsberg Maritime AUVs
 
Drones
DronesDrones
Drones
 
Localization scheme for underwater wsn
Localization scheme for underwater wsnLocalization scheme for underwater wsn
Localization scheme for underwater wsn
 
Drone-Unmanned Aerial Vehicle
Drone-Unmanned Aerial VehicleDrone-Unmanned Aerial Vehicle
Drone-Unmanned Aerial Vehicle
 
Advantages of a combined sonar data acquisition system for AUVs and ASVs
Advantages of a combined sonar data acquisition system for AUVs and ASVsAdvantages of a combined sonar data acquisition system for AUVs and ASVs
Advantages of a combined sonar data acquisition system for AUVs and ASVs
 

Similar to Building reliable Ceph clusters with SUSE Enterprise Storage

Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
Vaibhav Sharma
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
Sage Weil
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
John Spray
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practicesHaseeb Alam
 
Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302
Boden Russell
 
Webinar - Getting Started With Ceph
Webinar - Getting Started With CephWebinar - Getting Started With Ceph
Webinar - Getting Started With Ceph
Ceph Community
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for you
Taco Scargo
 
OpenVZ Linux containers
OpenVZ Linux containersOpenVZ Linux containers
OpenVZ Linux containers
OpenVZ
 
First steps on CentOs7
First steps on CentOs7First steps on CentOs7
First steps on CentOs7
Marc Cortinas Val
 
London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph
Ceph Community
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
Sage Weil
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
Amazon Web Services
 
OpenVZ Linux Containers
OpenVZ Linux ContainersOpenVZ Linux Containers
OpenVZ Linux Containers
Kirill Kolyshkin
 
Ceph storage for ocp deploying and managing ceph on top of open shift conta...
Ceph storage for ocp   deploying and managing ceph on top of open shift conta...Ceph storage for ocp   deploying and managing ceph on top of open shift conta...
Ceph storage for ocp deploying and managing ceph on top of open shift conta...
OrFriedmann
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
Ceph Community
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
C4Media
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
MayaData Inc
 
Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015
Roger Zhou 周志强
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
Boden Russell
 

Similar to Building reliable Ceph clusters with SUSE Enterprise Storage (20)

Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
 
Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302
 
Webinar - Getting Started With Ceph
Webinar - Getting Started With CephWebinar - Getting Started With Ceph
Webinar - Getting Started With Ceph
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for you
 
OpenVZ Linux containers
OpenVZ Linux containersOpenVZ Linux containers
OpenVZ Linux containers
 
First steps on CentOs7
First steps on CentOs7First steps on CentOs7
First steps on CentOs7
 
RAC - Test
RAC - TestRAC - Test
RAC - Test
 
London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
OpenVZ Linux Containers
OpenVZ Linux ContainersOpenVZ Linux Containers
OpenVZ Linux Containers
 
Ceph storage for ocp deploying and managing ceph on top of open shift conta...
Ceph storage for ocp   deploying and managing ceph on top of open shift conta...Ceph storage for ocp   deploying and managing ceph on top of open shift conta...
Ceph storage for ocp deploying and managing ceph on top of open shift conta...
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 

Recently uploaded

A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
vrstrong314
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 

Recently uploaded (20)

A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 

Building reliable Ceph clusters with SUSE Enterprise Storage

  • 1. Building reliable Ceph clusters with SUSE Enterprise Storage Survival skills for the real world Lars Marowsky-Brée Distinguished Engineer lmb@suse.com
  • 2. What this talk is not ● A comprehensive introduction to Ceph ● SUSE Enterprise Storage roadmap session ● A discussion of Ceph performance tuning 2
  • 4. The Ceph project ● An Open Source Software-Defined-Storage project ● Multiple front-ends – S3/Swift object interface – Native Linux block IO – Heterogeneous Block IO (iSCSI) – Native Linux network file system (CephFS) – Heterogeneous Network File System (nfs-ganesha) – Low-level, C++/Python/… libraries – Linux, UNIX, Windows, Applications, Cloud, Containers ● Common, smart data store (RADOS) – Pseudo-random, algorithmic data distribution 4
  • 6. Ceph Cluster: Logical View 6 MON MON MON MDS MDS OSDOSDOSD OSD OSD OSD iSCSI Gateway iSCSI Gateway iSCSI Gateway S3/Swift Gateway S3/Swift Gateway NFS Gateway RADOS
  • 9. The elephant in the room ● Before we discuss technology ... ● … guess what causes most outages? 9
  • 10. Improve your human factor ● Great, you are already here! ● Training ● Documentation ● Team your team with a world-class support and consulting organizations 10
  • 12. Advantages of Homogeneity ● Eases system administration ● Components are interchangeable ● Lower purchasing costs ● Standardized ordering process 12
  • 13. Murphy’s Law, 2016 version ● “At scale, everything fails.” ● Distributed systems protect against individual failures causing service failures by eliminating Single Points of Failure ● Distributed systems are still vulnerable to correlated failures 13 2n+1
  • 14. Advantages of Heterogeneity Everything is broken … … but everything is broken differently 14
  • 15. Homogeneity is non-sustainable ● Hardware gets replaced – Replacement with same model not available, or – not desirable given current prices ● Software updates are not (yet) globally immediate ● Requirements change ● Your cluster ends up being heterogeneous anyway ● … you might as well benefit from it. 15
  • 16. Failure is inevitable; suffering is optional ● If you want uptime, prepare for downtime ● Architect your system to survive a single or multiple failures ● Test whether the system meets your SLA – while degraded and during recovery! 16
  • 17. How much availability do you need? ● Availability and durability are not free ● Cost, Complexity increase exponentially ● Scale out makes some things easier 17
  • 18. A bag of suggestions 18
  • 19. Embrace diversity ● Automatic recovery requires a >50% majority – Splitting into multiple different categories/models – Feasible for some components – Multiple architectures? – Mix them across different racks/pods ● A 50:50 split still allows manual recovery in case of catastrophic failures – Different UPS and power circuits 19
  • 20. Hardware choices ● SUSE offers Reference Architectures: – e.g., Lenovo, HPE, Cisco, Dell ● Partners offer turn-key solutions – e.g., HPE, Thomas-Krenn ● SUSE Yes certification reduces risk – https://www.suse.com/newsroom/post/2016/suse-extends- partner-software-certification-for-cloud-and-storage-customers/ ● Small variations can have a huge impact! 20
  • 21. Not all the eggs in one basket^Wrack ● Distribute servers physically to limit the impact of power outages, spills, … ● Ceph’s CRUSH map allows you to describe the physical topology of your fault domains (engineering speak for “availability zones”) 21
  • 22. How many MONitors do I need? 22 2n+1
  • 23. To converge roles or not ● “Hyper converged” equals correlated failures ● It does drive down cost of implementation ● Sizing becomes less deterministic ● Services might recover at the same time ● At scale, don’t correlate the MONs and OSDs 23
  • 24. Storage diversity 24 24 ● Avoid desktop HDDs ● Avoid sequential serial numbers ● Mount at different angles if paranoid ● Multiple vendors ● Avoid desktop SSDs ● Monitor wear- leveling ● Remember the journals see all writes
  • 25. Storage Node Sizing ● Node failures most common granularity – Admin mistake, network, kernel crash ● Consider impact of outage on: – Performance (degraded and recovery) – and capacity! ● A single node should not be more than 10% of your total capacity ● Free capacity should be larger than largest node 25
  • 26. Data availability and durability ● Replication: – Number of copies – Linear overhead ● Erasure Coding: – Flexible number of data and coding blocks – Can survive any number of outages – Fractional overhead – https://www.youtube.com/watch?v=-KyGv6AZN9M 26 k+m k 2n+1
  • 27. Durability: Three-way Replication 27 Usable capacity: 33% Durability: 2 faults
  • 28. Durability: 4+3 Erasure Coding 28 Usable capacity: 57% Durability: 3 faults
  • 29. Consider Cache Tiering ● Data in cache tier is replicated ● Backing tier may be slower, but more durable 29
  • 30. Durability 201 ● Different strokes for different pools ● Erasure coding schemes galore 30
  • 31. Finding and correcting bad data ● Ceph “scrubbing” detects inconsistent or missing placement groups periodically http://ceph.com/planet/ceph-manually- repair-object/ http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/#scrubbing ● SUSE Enterprise Storage 5 will validate checksums on every read 31
  • 32. Automatic fault detection and recovery ● Do you want this in your cluster? ● Consider setting “noout”: – during maintenance windows – in small clusters 32
  • 33. Network considerations ● Have both the public and cluster network bonded ● Consider different NICs – Use last year’s NICs and switches ● One channel from each network to each switch 33
  • 34. Gateway considerations ● RadosGW (S3/Swift): – Use HTTP/TCP load balancers – Possible to build using SLE HA with LVS or haproxy ● iSCSI targets: – Multiple gateways, natively supported by iSCSI ● Improves availability and throughput – Make sure you meet your performance SLAs during degraded modes 34
  • 35. Avoid configuration drift ● Ensure that systems are configured consistently – Installed packages – Package versions – Configuration (NTP, logging, passwords, …) ● Avoid manual configuration ● Use Salt instead http://ourobengr.com/2016/11/hello-salty-goodness/ https://www.suse.com/communities/blog/managing-configuration- drift-salt-snapper/ 35
  • 36. Trust but verify a.k.a. monitoring ● Performance as the system ages ● SSD degradation / wear leveling ● Capacity utilization ● “Free” capacity is usable for recovery ● React to issues in a timely fashion! 36
  • 37. Update, always (but with care) ● Updates are good for your system – Security – Performance – Stability ● Ceph remains available even while updates are being rolled out ● SUSE’s tested maintenance updates are the main product value 37
  • 38. Trust nobody(not even SUSE) ● If you at all possibly can, use a staging system – Ideally: a (reduced) version of your production environment – At least: a virtualized environment ● Test updates before rolling them out in production – Not just code, but also processes! ● Long-term maintainability: – Avoid vendor lock-in, use Open Source 38
  • 39. Disaster can will strike ● Does it matter? ● If it does: – Backups – Replicate to other sites ● rbd-mirror, radosgw multi-site ● Have fire drills! 39
  • 40. Avoid complexity (KISS) ● Be aggressive in what you test – Test all the features ● Be conservative in what you deploy – Deploy only what you need 40

Editor's Notes

  1. <number>
  2. <number>
  3. <number>
  4. <number>
  5. <number>
  6. <number>
  7. <number>
  8. <number>
  9. <number>
  10. <number>
  11. <number>
  12. <number>
  13. <number>
  14. <number>
  15. <number>
  16. <number>
  17. <number>
  18. <number>
  19. <number>
  20. <number>
  21. <number>
  22. <number>
  23. <number>
  24. <number>
  25. <number>
  26. <number>
  27. <number>
  28. <number>
  29. <number>
  30. <number>
  31. <number>
  32. <number>
  33. <number>
  34. <number>
  35. <number>