Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Corpus collapsum
Partition tolerance testing of Galera with
Docker and NetEm
Raghavendra Prabhu
 raghavendra.d.prabhu@gma...
The Title
Split Brain?
Split brain
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed
system. ”
“ A distributed system is one ...
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed
system. ”
“ A distributed system is one ...
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed
system. ”
“ A distributed system is one ...
Introduction
Seed quotes..
“ ’Network is reliable’ - a fallacy of the distributed
system. ”
“ A distributed system is one ...
20000 feet view
Introduction
Actors
▶ Database - WSREP/PXC
▶ Plugin - Galera
▶ Traffic control
♦ Traffic Control - tc
♦ NetEm
Raghavendra ...
Introduction
Actors
▶ Database - WSREP/PXC
▶ Plugin - Galera
▶ Traffic control
♦ Traffic Control - tc
♦ NetEm
Raghavendra ...
Introduction
Actors
▶ Database - WSREP/PXC
▶ Plugin - Galera
▶ Traffic control
♦ Traffic Control - tc
♦ NetEm
Raghavendra ...
Introduction
Actors
▶ Containers - Docker
▶ Load
♦ Generators - Sysbench, RQG
▶ Network
♦ Dnsmasq
♦ nsenter
Raghavendra Pr...
Introduction
Actors
▶ Containers - Docker
▶ Load
♦ Generators - Sysbench, RQG
▶ Network
♦ Dnsmasq
♦ nsenter
Raghavendra Pr...
Introduction
Actors
▶ Jenkins
♦ Build flow and CI
▶ Storage
♦ Why
Raghavendra Prabhu (Percona) Corpus collapsum 20 Februar...
Distributed Systems Testing
A Kobayashi Maru
Cheat on CAP!
Details
Rationale
▶ The ‘P’ in CAP
▶ WAN scalability
▶ Real Reason - fun!
▶ Tolerance to latency variance
Raghavendra Prab...
Details
Rationale
▶ The ‘P’ in CAP
▶ WAN scalability
▶ Real Reason - fun!
▶ Tolerance to latency variance
Raghavendra Prab...
Details
Rationale
▶ The ‘P’ in CAP
▶ WAN scalability
▶ Real Reason - fun!
▶ Tolerance to latency variance
Raghavendra Prab...
Details
Rationale
▶ The ‘P’ in CAP
▶ WAN scalability
▶ Real Reason - fun!
▶ Tolerance to latency variance
Raghavendra Prab...
Details
Rationale
▶ Failures in warehouses.
▶ Not quorum, but consensus.
▶ Real world networks and synchronous replication...
Galera
Details
Galera
▶ Data-centric approach
▶ Extended Virtual Synchrony
▶ Causality and Synchronous
▶ Flow control and tempora...
Details
Galera
▶ Latency
- Global ordering
- Certification and not apply
- Communication overhead
▶ Layers
- Replication
-...
Where did it start
Details
Where did it start
▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192
▶ Loss of PC
▶ Crash
▶ HAT
Raghavendra Pr...
One can bring the whole
down
Details
Tests
▶ Chaos testing
▶ Flow control with sysbench
▶ Network Loss
▶ Future
Raghavendra Prabhu (Percona) Corpus col...
There is no higher menace than
distributed systems testing
Details
NetEm
▶ Initial setup
- Bridge
- Egress only
- IFB
- Present state
▶ NetEm
- tc qdisc buckets
- packet loss, delay...
Details
Tests: Chaos testing
▶ Nodes killed at random around sysbench
▶ Less than half of nodes are chosen
▶ docker inspec...
Details
Tests: Network Loss
▶ Loss nodes
▶ Detach/Keep qdisc
▶ Reconciliation
▶ Sanity checks
▶ Formation of PC || time to...
The Flow
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
Jenkins Build images Start Dnsmasq Bootstrap
Load/SysbenchSST/OthersPre-sanitynsenter/netem
Raghavendra...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Basic Flow
RR sysbench
Detach/Keep
Sanity check Reconciliation
Post sanity Core trace
Cleanup Collect logs
Raghave...
Details
Parameters
▶ Sysbench
▶ Segment
▶ Reconciliation period
▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum...
Details
Parameters
▶ Sysbench
▶ Segment
▶ Reconciliation period
▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum...
Details
Parameters
▶ Sysbench
▶ Segment
▶ Reconciliation period
▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum...
Details
Parameters
▶ Sysbench
▶ Segment
▶ Reconciliation period
▶ Loss nodes
Raghavendra Prabhu (Percona) Corpus collapsum...
Plumbing the pressure
Details
Parameters
▶ NetEm
▶ Qdisc detach
▶ fsync
▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 20...
Details
Parameters
▶ NetEm
▶ Qdisc detach
▶ fsync
▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 20...
Details
Parameters
▶ NetEm
▶ Qdisc detach
▶ fsync
▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 20...
Details
Parameters
▶ NetEm
▶ Qdisc detach
▶ fsync
▶ Shutdown
Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 20...
Containers!
Details
Docker
▶ Why not virtualize
Occam
Namespaces
▶ Simplicity
♦ Network
Logical scalability
♦ One application per node...
Details
Docker
▶ Portability
- Qualitative behavior.
▶ Reproducibility
- Makes it determinstic
▶ Configurable and CI
- Byp...
Details
Docker
▶ QEMU vis-à-vis Docker
▶ Scalability
♦ Performance
♦ Feature
▶ Abstraction of channels
Raghavendra Prabhu ...
Details
Container Networking
▶ Linking didn’t help
▶ Dnsmasq to rescue!
♦ Hosts file and volumes
♦ SIGHUP and refresh
▶ Po...
Testing methods
Details
Overview
▶ Transient noise
▶ Lasting ’sickness’
▶ Sick nodes
▶ Dead members
Raghavendra Prabhu (Percona) Corpus co...
Details
Method I
▶ Qdisc is detached after load
▶ Objective
- Time to recover of full cluster
▶ Done with a larger subset
...
Details
Method II
▶ Qdisc is kept till the end
▶ Objective
- Formation of primary component
▶ Comparatively smaller set
Ra...
Details
Observations
▶ Post sanity types
- Why
▶ Which method is more pertinent
▶ State transfer issues
- Beginning
- Duri...
Details
Observations
▶ Direct load to affected nodes
▶ Partition external to system
▶ Logs
- journalctl
- Streaming?
Ragha...
Details
Other noises
▶ Aim
▶ Fsync
- libeatmydata
- Variance
▶ Correlation with network
▶ How with Docker
- LD_PRELOAD
Rag...
System Load
Details
Load generation
▶ Sysbench
- Generation
- Reconnect on partition
▶ Sockets chosen
- Load on affected nodes
▶ Distr...
Details
Load generation
▶ Nature of data/load
- DDL
▶ RQG in future
- Fuzz testing
Raghavendra Prabhu (Percona) Corpus col...
The Fix
Strike Out!
Details
Eviction
▶ STONITH
▶ Permanent eviction
▶ ’N’ strikes & out!
- Timers - evs parameters
- wsrep_evs_delayed and wsr...
Details
Eviction
▶ Aim
▶ Quorum required
- Why? - Not shoot each other
- Non-PC nodes also.
Raghavendra Prabhu (Percona) C...
Details
Eviction
▶ Aim
▶ Quorum required
- Why? - Not shoot each other
- Non-PC nodes also.
Raghavendra Prabhu (Percona) C...
Details
Coredumps with Docker
▶ Breakdown of abstraction
▶ Lack of isolation
▶ What was done
- Volumes
- core_pattern & sy...
Details
WAN Segments
▶ How they work
▶ Simulates data center
▶ Random allocation - latency multiplier
▶ Joiner starvation
...
Epilogue
The code
▶ Github:
- https://github.com/percona/pxc-docker
-
https://github.com/percona/percona-xtradb-cluster/
-...
Epilogue
Code: todo
▶ Docker automated builds
▶ Orchestration
▶ Docker
♦ Injection
♦ Signal proxying
Raghavendra Prabhu (P...
Epilogue
Code: todo
▶ => Proof of concept to a framework =>
▶ Run it bare - CoreOS, Atomic
▶ Overlay with etcd/fleet/libsw...
Future work
Epilogue
Future work
▶ Fault injection
♦ Memory
- Poisoned memory
♦ Disk
- libeatmydata
- Opposite
- ENOSPC
Raghavendra Pr...
Epilogue
Fault injection
▶ CPU
- NUMA?
- Hotplug
▶ More network
- corruption, duplication, reordering, rate-limit
- Better...
Worst case improves Average
case
Epilogue
Future work
▶ Disturb cluster more!
- Membership changes
* Manual eviction
* Pull the cord!
- Corrupt nodes
▶ Int...
Epilogue
Eventual consistency
▶ CAP
▶ Latency factor
▶ Is Galera EC? No!
- ACIDs only, No BASE
▶ Bounded Staleness
- PBS
▶...
Epilogue
Further Reading
▶ Resources
▶ Byzantine fault tolerance
- Reaching agreement in presence of faults
▶ The Network ...
Epilogue
Further Reading
▶ Worst-Case Distributed Systems Design
▶ HAT, not CAP: Introducing Highly Available Transactions...
Epilogue
We are Hiring Too!
▶ Looking for build engineer - Packaging and Jenkins/CI are your
strengths and you are a linux...
Conference for Database
geeks!
My Talk: Securing databases with
systemd for containers and
services
Epilogue
About/Contact - HA compliant
▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB
Cluster, Percona.
▶ Slides w...
Epilogue
Image Credits
▶ http://galeracluster.com/documentation-webpages/
▶ https://en.wikipedia.org/wiki/Network_theory
▶...
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Upcoming SlideShare
Loading in …5
×

Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm

730 views

Published on

This talk is about partition tolerance and chaos testing of a Galera cluster with Docker containers and NetEm.

Published in: Software
  • Be the first to comment

Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm

  1. 1. Corpus collapsum Partition tolerance testing of Galera with Docker and NetEm Raghavendra Prabhu  raghavendra.d.prabhu@gmail.com Percona  raghavendra.prabhu@percona.com  randomsurfer  wnohang.net  rdprabhu  ronin13
  2. 2. The Title
  3. 3. Split Brain?
  4. 4. Split brain
  5. 5. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
  6. 6. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
  7. 7. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
  8. 8. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68
  9. 9. 20000 feet view
  10. 10. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68
  11. 11. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68
  12. 12. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68
  13. 13. Introduction Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 8 / 68
  14. 14. Introduction Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 8 / 68
  15. 15. Introduction Actors ▶ Jenkins ♦ Build flow and CI ▶ Storage ♦ Why Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 9 / 68
  16. 16. Distributed Systems Testing A Kobayashi Maru Cheat on CAP!
  17. 17. Details Rationale ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
  18. 18. Details Rationale ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
  19. 19. Details Rationale ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
  20. 20. Details Rationale ▶ The ‘P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68
  21. 21. Details Rationale ▶ Failures in warehouses. ▶ Not quorum, but consensus. ▶ Real world networks and synchronous replication - Delay - Partition - Non-graceful exits Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 12 / 68
  22. 22. Galera
  23. 23. Details Galera ▶ Data-centric approach ▶ Extended Virtual Synchrony ▶ Causality and Synchronous ▶ Flow control and temporal Synchrony Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 14 / 68
  24. 24. Details Galera ▶ Latency - Global ordering - Certification and not apply - Communication overhead ▶ Layers - Replication - Certification - Group communication ▶ Isolation - REPEATABLE-READ - SNAPSHOT-ISOLATION Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 15 / 68
  25. 25. Where did it start
  26. 26. Details Where did it start ▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192 ▶ Loss of PC ▶ Crash ▶ HAT Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 20 / 68
  27. 27. One can bring the whole down
  28. 28. Details Tests ▶ Chaos testing ▶ Flow control with sysbench ▶ Network Loss ▶ Future Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 22 / 68
  29. 29. There is no higher menace than distributed systems testing
  30. 30. Details NetEm ▶ Initial setup - Bridge - Egress only - IFB - Present state ▶ NetEm - tc qdisc buckets - packet loss, delay, corruption, duplication, reordering - nsenter ▶ Future - Docker exec - Rocket ACI Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 24 / 68
  31. 31. Details Tests: Chaos testing ▶ Nodes killed at random around sysbench ▶ Less than half of nodes are chosen ▶ docker inspect && SIGKILL ▶ Configurable sleep && retry ♦ Snapshot/Incremental State Transfer - Composability of transactional databases ▶ docker restart && repeat Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 25 / 68
  32. 32. Details Tests: Network Loss ▶ Loss nodes ▶ Detach/Keep qdisc ▶ Reconciliation ▶ Sanity checks ▶ Formation of PC || time to recover Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 26 / 68
  33. 33. The Flow
  34. 34. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  35. 35. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  36. 36. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  37. 37. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  38. 38. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  39. 39. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  40. 40. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  41. 41. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap Load/SysbenchSST/OthersPre-sanitynsenter/netem Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68
  42. 42. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  43. 43. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  44. 44. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  45. 45. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  46. 46. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  47. 47. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  48. 48. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  49. 49. Details Basic Flow RR sysbench Detach/Keep Sanity check Reconciliation Post sanity Core trace Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68
  50. 50. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
  51. 51. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
  52. 52. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
  53. 53. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68
  54. 54. Plumbing the pressure
  55. 55. Details Parameters ▶ NetEm ▶ Qdisc detach ▶ fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
  56. 56. Details Parameters ▶ NetEm ▶ Qdisc detach ▶ fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
  57. 57. Details Parameters ▶ NetEm ▶ Qdisc detach ▶ fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
  58. 58. Details Parameters ▶ NetEm ▶ Qdisc detach ▶ fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68
  59. 59. Containers!
  60. 60. Details Docker ▶ Why not virtualize Occam Namespaces ▶ Simplicity ♦ Network Logical scalability ♦ One application per node Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 34 / 68
  61. 61. Details Docker ▶ Portability - Qualitative behavior. ▶ Reproducibility - Makes it determinstic ▶ Configurable and CI - Byproducts Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 35 / 68
  62. 62. Details Docker ▶ QEMU vis-à-vis Docker ▶ Scalability ♦ Performance ♦ Feature ▶ Abstraction of channels Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 36 / 68
  63. 63. Details Container Networking ▶ Linking didn’t help ▶ Dnsmasq to rescue! ♦ Hosts file and volumes ♦ SIGHUP and refresh ▶ Potential issues Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 37 / 68
  64. 64. Testing methods
  65. 65. Details Overview ▶ Transient noise ▶ Lasting ’sickness’ ▶ Sick nodes ▶ Dead members Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 39 / 68
  66. 66. Details Method I ▶ Qdisc is detached after load ▶ Objective - Time to recover of full cluster ▶ Done with a larger subset Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 40 / 68
  67. 67. Details Method II ▶ Qdisc is kept till the end ▶ Objective - Formation of primary component ▶ Comparatively smaller set Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 41 / 68
  68. 68. Details Observations ▶ Post sanity types - Why ▶ Which method is more pertinent ▶ State transfer issues - Beginning - During re-emergence Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 42 / 68
  69. 69. Details Observations ▶ Direct load to affected nodes ▶ Partition external to system ▶ Logs - journalctl - Streaming? Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 43 / 68
  70. 70. Details Other noises ▶ Aim ▶ Fsync - libeatmydata - Variance ▶ Correlation with network ▶ How with Docker - LD_PRELOAD Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 44 / 68
  71. 71. System Load
  72. 72. Details Load generation ▶ Sysbench - Generation - Reconnect on partition ▶ Sockets chosen - Load on affected nodes ▶ Distribution of Load - RR with socat - Native sysbench support - HAProxy? Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 46 / 68
  73. 73. Details Load generation ▶ Nature of data/load - DDL ▶ RQG in future - Fuzz testing Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 47 / 68
  74. 74. The Fix
  75. 75. Strike Out!
  76. 76. Details Eviction ▶ STONITH ▶ Permanent eviction ▶ ’N’ strikes & out! - Timers - evs parameters - wsrep_evs_delayed and wsrep_evs_evict_list Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 50 / 68
  77. 77. Details Eviction ▶ Aim ▶ Quorum required - Why? - Not shoot each other - Non-PC nodes also. Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 51 / 68
  78. 78. Details Eviction ▶ Aim ▶ Quorum required - Why? - Not shoot each other - Non-PC nodes also. Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 51 / 68
  79. 79. Details Coredumps with Docker ▶ Breakdown of abstraction ▶ Lack of isolation ▶ What was done - Volumes - core_pattern & sysctl - suid and ulimit Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 52 / 68
  80. 80. Details WAN Segments ▶ How they work ▶ Simulates data center ▶ Random allocation - latency multiplier ▶ Joiner starvation ▶ Donor selection Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 53 / 68
  81. 81. Epilogue The code ▶ Github: - https://github.com/percona/pxc-docker - https://github.com/percona/percona-xtradb-cluster/ - https://github.com/percona/galera ▶ Jenkins: - http://jenkins.percona.com/job/PXC-5.6-netem/ - http://jenkins.percona.com/job/PXC-5.6-bench/ - http://jenkins.percona.com/job/PXC-5.6-chaos/ ▶ Contributions/testing/bugs welcome! Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 54 / 68
  82. 82. Epilogue Code: todo ▶ Docker automated builds ▶ Orchestration ▶ Docker ♦ Injection ♦ Signal proxying Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 55 / 68
  83. 83. Epilogue Code: todo ▶ => Proof of concept to a framework => ▶ Run it bare - CoreOS, Atomic ▶ Overlay with etcd/fleet/libswarm Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 56 / 68
  84. 84. Future work
  85. 85. Epilogue Future work ▶ Fault injection ♦ Memory - Poisoned memory ♦ Disk - libeatmydata - Opposite - ENOSPC Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 58 / 68
  86. 86. Epilogue Fault injection ▶ CPU - NUMA? - Hotplug ▶ More network - corruption, duplication, reordering, rate-limit - Better distribution - Other shaping Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 59 / 68
  87. 87. Worst case improves Average case
  88. 88. Epilogue Future work ▶ Disturb cluster more! - Membership changes * Manual eviction * Pull the cord! - Corrupt nodes ▶ Introduce inconsistencies - Consistency voting - Silent corruptions Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 61 / 68
  89. 89. Epilogue Eventual consistency ▶ CAP ▶ Latency factor ▶ Is Galera EC? No! - ACIDs only, No BASE ▶ Bounded Staleness - PBS ▶ ACID and CAP ▶ Instrumentation ▶ Lambda architecture Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 62 / 68
  90. 90. Epilogue Further Reading ▶ Resources ▶ Byzantine fault tolerance - Reaching agreement in presence of faults ▶ The Network is Reliable ▶ NetEm ▶ Latency: The New Web Performance Bottleneck ▶ Galera Cluster Documentation ▶ Auto eviction code ▶ Don’t Settle for Eventual Consistency ▶ Extended Virtual Synchrony ▶ Galera Flow Control Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 63 / 68
  91. 91. Epilogue Further Reading ▶ Worst-Case Distributed Systems Design ▶ HAT, not CAP: Introducing Highly Available Transactions ▶ Bridging the Gap: Opportunities in Coordination-Avoiding Databases ▶ Linearizability versus Serializability Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 64 / 68
  92. 92. Epilogue We are Hiring Too! ▶ Looking for build engineer - Packaging and Jenkins/CI are your strengths and you are a linux geek. bonus points if you are a linux distro user/contributor/maintainer. ▶ Senior C/C++ developer - if linux userspace development and databases (and distributed systems) is your thing. ▶ Apply here: http://percona.theresumator.com/. Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 65 / 68
  93. 93. Conference for Database geeks! My Talk: Securing databases with systemd for containers and services
  94. 94. Epilogue About/Contact - HA compliant ▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona. ▶ Slides will be at slideshare.net/slidunder. ▶ About.me: raghavendra.prabhu ▶ Keybase.io: rdprabhu ▶ Presentation under CC BY-SA 4.0 Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 67 / 68
  95. 95. Epilogue Image Credits ▶ http://galeracluster.com/documentation-webpages/ ▶ https://en.wikipedia.org/wiki/Network_theory ▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png ▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354 ▶ https://flic.kr/p/9J6GNu ▶ http://schauerte.me/data.html ▶ https://secure.flickr.com/photos/brewbooks/7780990192 ▶ https://www.flickr.com/photos/kwerfeldein/2649294869 ▶ https://secure.flickr.com/photos/mindmob/51951632 ▶ https://secure.flickr.com/photos/arenamontanus/2227769907 ▶ https://www.flickr.com/photos/markop/477199204 ▶ https://www.flickr.com/photos/gcwest/281385801 ▶ https://www.flickr.com/photos/29233640@N07/13466208953 ▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/ ▶ http://ok-panic.net/art/jeff/dennis.jpg ▶ https://www.facebook.com/sciencedump/photos/a.296290153732762.90161. 111815475513565/985102638184840/?type=1 ▶ http://upload.wikimedia.org/wikipedia/commons/0/05/Sna_large.png ▶ http://background-kid.com/background-images-light-blue-color.html Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 68 / 68

×