Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Corpus collapsum 
Partition tolerance of Galera in a noisy high load 
environment 
Highload++ 2014 
Raghavendra Prabhu 
 ...
The Title?
Our Cluster
Split brain
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” 
“ A distributed system is o...
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” 
“ A distributed system is o...
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” 
“ A distributed system is o...
Introduction 
Seed quotes.. 
“ ’Network is reliable’ - a fallacy of the distributed system. ” 
“ A distributed system is o...
20000 feet view
Introduction 
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
Ragha...
Introduction 
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
Ragha...
Introduction 
Actors 
▶ Database - WSREP/PXC 
▶ Plugin - Galera 
▶ Traffic control 
♦ Traffic Control - tc 
♦ NetEm 
Ragha...
Introduction 
Actors 
▶ Containers - Docker 
▶ Load 
♦ Generators - Sysbench, RQG 
▶ Network 
♦ Dnsmasq 
♦ nsenter 
Raghav...
Introduction 
Actors 
▶ Containers - Docker 
▶ Load 
♦ Generators - Sysbench, RQG 
▶ Network 
♦ Dnsmasq 
♦ nsenter 
Raghav...
Introduction 
Actors 
▶ Jenkins 
♦ Build flow and CI 
▶ Storage 
♦ Why 
▶ “Others” 
Raghavendra Prabhu (Percona) Corpus co...
Details 
But why 
▶ The ’P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra ...
Details 
But why 
▶ The ’P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra ...
Details 
But why 
▶ The ’P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra ...
Details 
But why 
▶ The ’P’ in CAP 
▶ WAN scalability 
▶ Real Reason - fun! 
▶ Tolerance to latency variance 
Raghavendra ...
Details 
But why 
▶ Failures in warehouses. 
▶ Not quorum, but consensus. 
▶ Real world networks and synchronous replicati...
Galera
Details 
Galera 
▶ Data-centric approach 
▶ EVS 
▶ Causality and Synchronous 
▶ Latency 
Raghavendra Prabhu (Percona) Corp...
Where did it start
Details 
Where did it start 
▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192 
▶ Loss of PC 
▶ Crash 
▶ HA goal 
Ragh...
One can bring the whole down
The Flow
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Ragh...
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Ragh...
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Ragh...
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Ragh...
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Ragh...
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Ragh...
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Ragh...
Details 
Basic Flow 
Jenkins Build images Start Dnsmasq Bootstrap 
nsenter/netem Pre-sanity SST/Others Load/Sysbench 
Ragh...
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
...
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
...
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
...
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
...
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
...
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
...
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
...
Details 
Basic Flow 
RR sysbench 
Detach/Keep 
Post sanity Core trace 
Sanity check Reconciliation 
Cleanup Collect logs 
...
Details 
Cluster Resilience 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 23 / 58
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus col...
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus col...
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus col...
Details 
Parameters 
▶ Sysbench 
▶ Segment 
▶ Reconciliation period 
▶ Loss nodes 
Raghavendra Prabhu (Percona) Corpus col...
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October...
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October...
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October...
Details 
Parameters 
▶ NetEm 
▶ Detach loss 
▶ Fsync 
▶ Shutdown 
Raghavendra Prabhu (Percona) Corpus collapsum 31 October...
Containers!
Details 
Docker 
▶ Why not virtualize 
♦ Occam 
♦ Namespaces 
▶ Simplicity 
♦ Network 
♦ One application per node 
Raghave...
Details 
Docker 
▶ Portability 
- See same qualitative behavior that I do. 
▶ Reproducibility 
- Makes it determinstic 
▶ ...
Details 
Docker 
▶ QEMU and Docker 
▶ Scalability 
♦ Performance 
♦ Feature 
▶ Abstraction of channels 
Raghavendra Prabhu...
Details 
Container Networking 
▶ Linking didn’t help 
▶ Dnsmasq to rescue! 
♦ Hosts file and volumes 
♦ SIGHUP and refresh...
Details 
Noise 
▶ Initial setup 
- Bridge 
- Egress only 
- IFB 
▶ Present state 
▶ NetEm 
- tc qdisc buckets 
- packet lo...
Testing methods
Details 
Method I 
▶ Qdisc is detached after load 
▶ Objective 
- Time to recover of full cluster 
▶ Done with a larger su...
Details 
Method II 
▶ Qdisc is kept till the end 
▶ Objective 
- Formation of primary component 
▶ Comparatively smaller s...
Details 
Observations 
▶ Post sanity types 
- Why 
▶ Which method is more pertinent 
▶ State transfer issues 
- Beginning ...
Details 
Observations 
▶ Direct load to affected nodes 
▶ Logs 
- journalctl 
- Streaming? 
Raghavendra Prabhu (Percona) C...
Details 
Other noises 
▶ Aim 
▶ Fsync 
- libeatmydata 
- Variance 
▶ Correlation with network 
▶ How with Docker 
Raghaven...
System Load
Details 
Load generation 
▶ Sysbench 
- Generation 
- Reconnect on partition 
▶ Sockets chosen 
- Load on affected nodes 
...
Details 
Load generation 
▶ Nature of data/load 
- DDL 
▶ RQG in future 
- Fuzz testing 
Raghavendra Prabhu (Percona) Corp...
The Fix
Strike Out!
Details 
Eviction 
▶ STONITH 
▶ Permanent eviction 
▶ ’N’ strikes & out! 
- Timers - evs parameters 
- wsrep_evs_delayed a...
Details 
Eviction 
▶ Aim 
▶ Quorum required 
- Why? - Not shoot each other - Non-PC nodes also. 
Raghavendra Prabhu (Perco...
Details 
Eviction 
▶ Aim 
▶ Quorum required 
- Why? - Not shoot each other - Non-PC nodes also. 
Raghavendra Prabhu (Perco...
Details 
Eviction 
▶ EVS version and upgrade 
▶ TODO! 
- Ingress only - Follow here. 
▶ Credits to Teemu Ollakka, Yan Zhan...
Details 
Coredumps with Docker 
▶ Breakdown of abstraction 
▶ Lack of isolation 
▶ What was done 
- Volumes 
- core_patter...
Details 
WAN Segments 
▶ How they work 
▶ Random allocation 
▶ Joiner starvation 
▶ Simulates data center 
▶ Donor selecti...
Epilogue 
The code 
▶ Github: https://github.com/percona/pxc-docker 
▶ Jenkins: http://jenkins.percona.com/job/PXC-5.6-net...
Epilogue 
Code: todo 
▶ Docker automated builds 
▶ Orchestration 
▶ Docker 
♦ Injection 
♦ Signal proxying 
Raghavendra Pr...
Epilogue 
Code: todo 
▶ Use Hoare’s channels - Go! 
▶ Run it bare - CoreOS 
▶ Overlay with etcd/fleet/libswarm 
Raghavendr...
Future work
Epilogue 
Future work 
▶ Fault injection 
♦ Memory 
- Poisoned memory 
♦ Disk 
- libeatmydata 
- Opposite: laggard! 
- ENO...
Epilogue 
Fault injection 
▶ CPU 
- NUMA? 
- Hotplug 
▶ More network 
- corruption, duplication, reordering, rate-limit 
-...
More Chaos
Epilogue 
Future work 
▶ Disturb cluster more! 
- Membership changes 
* Manual eviction 
* Pull the cord! 
- Corrupt nodes...
Epilogue 
Further Reading 
▶ Byzantine fault tolerance 
- Reaching agreement in presence of faults 
▶ The Network is Relia...
Epilogue 
About 
▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona. 
▶ Slides will be at slideshare...
Epilogue 
Image Credits 
▶ http://galeracluster.com/documentation-webpages/ 
▶ http://www.thelastdragontribute.com/40th-an...
Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)
Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)
Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)
Upcoming SlideShare
Loading in …5
×

Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

1,138 views

Published on

Доклад Рагавендра Прабу на HighLoad++ 2014.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Corpus collapsum - устойчивость Galera к партиционированию, Raghavendra Prabhu (Percona)

  1. 1. Corpus collapsum Partition tolerance of Galera in a noisy high load environment Highload++ 2014 Raghavendra Prabhu  raghavendra.d.prabhu@gmail.com Percona  raghavendra.prabhu@percona.com  randomsurfer  wnohang.net  rdprabhu  ronin13
  2. 2. The Title?
  3. 3. Our Cluster
  4. 4. Split brain
  5. 5. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
  6. 6. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
  7. 7. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
  8. 8. Introduction Seed quotes.. “ ’Network is reliable’ - a fallacy of the distributed system. ” “ A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. ” - Leslie Lamport “ Never attribute to malice that which is adequately explained by stupidity. ” - Hanlon’s Razor “ Never attribute to Byzantine failure which can be explained by an ill node(s) ” - Me Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 5 / 58
  9. 9. 20000 feet view
  10. 10. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58
  11. 11. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58
  12. 12. Introduction Actors ▶ Database - WSREP/PXC ▶ Plugin - Galera ▶ Traffic control ♦ Traffic Control - tc ♦ NetEm Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 7 / 58
  13. 13. Introduction Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 8 / 58
  14. 14. Introduction Actors ▶ Containers - Docker ▶ Load ♦ Generators - Sysbench, RQG ▶ Network ♦ Dnsmasq ♦ nsenter Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 8 / 58
  15. 15. Introduction Actors ▶ Jenkins ♦ Build flow and CI ▶ Storage ♦ Why ▶ “Others” Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 9 / 58
  16. 16. Details But why ▶ The ’P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
  17. 17. Details But why ▶ The ’P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
  18. 18. Details But why ▶ The ’P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
  19. 19. Details But why ▶ The ’P’ in CAP ▶ WAN scalability ▶ Real Reason - fun! ▶ Tolerance to latency variance Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 10 / 58
  20. 20. Details But why ▶ Failures in warehouses. ▶ Not quorum, but consensus. ▶ Real world networks and synchronous replication - Delay - Partition Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 11 / 58
  21. 21. Galera
  22. 22. Details Galera ▶ Data-centric approach ▶ EVS ▶ Causality and Synchronous ▶ Latency Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 13 / 58
  23. 23. Where did it start
  24. 24. Details Where did it start ▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192 ▶ Loss of PC ▶ Crash ▶ HA goal Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 18 / 58
  25. 25. One can bring the whole down
  26. 26. The Flow
  27. 27. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  28. 28. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  29. 29. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  30. 30. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  31. 31. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  32. 32. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  33. 33. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  34. 34. Details Basic Flow Jenkins Build images Start Dnsmasq Bootstrap nsenter/netem Pre-sanity SST/Others Load/Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 21 / 58
  35. 35. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  36. 36. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  37. 37. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  38. 38. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  39. 39. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  40. 40. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  41. 41. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  42. 42. Details Basic Flow RR sysbench Detach/Keep Post sanity Core trace Sanity check Reconciliation Cleanup Collect logs Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 22 / 58
  43. 43. Details Cluster Resilience Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 23 / 58
  44. 44. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
  45. 45. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
  46. 46. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
  47. 47. Details Parameters ▶ Sysbench ▶ Segment ▶ Reconciliation period ▶ Loss nodes Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 24 / 58
  48. 48. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
  49. 49. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
  50. 50. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
  51. 51. Details Parameters ▶ NetEm ▶ Detach loss ▶ Fsync ▶ Shutdown Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 25 / 58
  52. 52. Containers!
  53. 53. Details Docker ▶ Why not virtualize ♦ Occam ♦ Namespaces ▶ Simplicity ♦ Network ♦ One application per node Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 27 / 58
  54. 54. Details Docker ▶ Portability - See same qualitative behavior that I do. ▶ Reproducibility - Makes it determinstic ▶ Configurable and CI - Byproducts Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 28 / 58
  55. 55. Details Docker ▶ QEMU and Docker ▶ Scalability ♦ Performance ♦ Feature ▶ Abstraction of channels Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 29 / 58
  56. 56. Details Container Networking ▶ Linking didn’t help ▶ Dnsmasq to rescue! ♦ Hosts file and volumes ♦ SIGHUP and refresh ▶ More elegant methods Swarm Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 30 / 58
  57. 57. Details Noise ▶ Initial setup - Bridge - Egress only - IFB ▶ Present state ▶ NetEm - tc qdisc buckets - packet loss, delay, corruption, duplication, reordering - nsenter ▶ Future - Docker exec Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 31 / 58
  58. 58. Testing methods
  59. 59. Details Method I ▶ Qdisc is detached after load ▶ Objective - Time to recover of full cluster ▶ Done with a larger subset Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 33 / 58
  60. 60. Details Method II ▶ Qdisc is kept till the end ▶ Objective - Formation of primary component ▶ Comparatively smaller set Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 34 / 58
  61. 61. Details Observations ▶ Post sanity types - Why ▶ Which method is more pertinent ▶ State transfer issues - Beginning - During re-emergence Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 35 / 58
  62. 62. Details Observations ▶ Direct load to affected nodes ▶ Logs - journalctl - Streaming? Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 36 / 58
  63. 63. Details Other noises ▶ Aim ▶ Fsync - libeatmydata - Variance ▶ Correlation with network ▶ How with Docker Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 37 / 58
  64. 64. System Load
  65. 65. Details Load generation ▶ Sysbench - Generation - Reconnect on partition ▶ Sockets chosen - Load on affected nodes ▶ Distribution of Load - RR with socat - Native sysbench support Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 39 / 58
  66. 66. Details Load generation ▶ Nature of data/load - DDL ▶ RQG in future - Fuzz testing Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 40 / 58
  67. 67. The Fix
  68. 68. Strike Out!
  69. 69. Details Eviction ▶ STONITH ▶ Permanent eviction ▶ ’N’ strikes & out! - Timers - evs parameters - wsrep_evs_delayed and wsrep_evs_evict_list Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 43 / 58
  70. 70. Details Eviction ▶ Aim ▶ Quorum required - Why? - Not shoot each other - Non-PC nodes also. Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 44 / 58
  71. 71. Details Eviction ▶ Aim ▶ Quorum required - Why? - Not shoot each other - Non-PC nodes also. Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 44 / 58
  72. 72. Details Eviction ▶ EVS version and upgrade ▶ TODO! - Ingress only - Follow here. ▶ Credits to Teemu Ollakka, Yan Zhang and Alex Yurchenko from codership. Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 45 / 58
  73. 73. Details Coredumps with Docker ▶ Breakdown of abstraction ▶ Lack of isolation ▶ What was done - Volumes - core_pattern & sysctl - suid and ulimit Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 46 / 58
  74. 74. Details WAN Segments ▶ How they work ▶ Random allocation ▶ Joiner starvation ▶ Simulates data center ▶ Donor selection Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 47 / 58
  75. 75. Epilogue The code ▶ Github: https://github.com/percona/pxc-docker ▶ Jenkins: http://jenkins.percona.com/job/PXC-5.6-netem/ - Demo? ▶ Contributions/testing welcome! ▶ Dependencies - Sysbench Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 48 / 58
  76. 76. Epilogue Code: todo ▶ Docker automated builds ▶ Orchestration ▶ Docker ♦ Injection ♦ Signal proxying Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 49 / 58
  77. 77. Epilogue Code: todo ▶ Use Hoare’s channels - Go! ▶ Run it bare - CoreOS ▶ Overlay with etcd/fleet/libswarm Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 50 / 58
  78. 78. Future work
  79. 79. Epilogue Future work ▶ Fault injection ♦ Memory - Poisoned memory ♦ Disk - libeatmydata - Opposite: laggard! - ENOSPC Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 52 / 58
  80. 80. Epilogue Fault injection ▶ CPU - NUMA? - Hotplug ▶ More network - corruption, duplication, reordering, rate-limit - Better distribution - Other shaping Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 53 / 58
  81. 81. More Chaos
  82. 82. Epilogue Future work ▶ Disturb cluster more! - Membership changes * Manual eviction * Pull the cord! - Corrupt nodes ▶ Consistency voting Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 55 / 58
  83. 83. Epilogue Further Reading ▶ Byzantine fault tolerance - Reaching agreement in presence of faults ▶ The Network is Reliable ▶ NetEm ▶ Latency: The New Web Performance Bottleneck ▶ Galera ▶ Auto eviction code ▶ Don’t Settle for Eventual Consistency Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 56 / 58
  84. 84. Epilogue About ▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDB Cluster, Percona. ▶ Slides will be at slideshare and owncloud ▶ Keybase.io: rdprabhu ▶ About.me: raghavendra.prabhu ▶ Keybase.io: rdprabhu ▶ Presentation under CC BY-SA 4.0 Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 57 / 58
  85. 85. Epilogue Image Credits ▶ http://galeracluster.com/documentation-webpages/ ▶ http://www.thelastdragontribute.com/40th-anniversary-death-of-bruce-lee/ ▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png ▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354 ▶ https://flic.kr/p/9J6GNu ▶ https://secure.flickr.com/photos/brewbooks/7780990192 ▶ https://www.flickr.com/photos/kwerfeldein/2649294869 ▶ https://secure.flickr.com/photos/mindmob/51951632 ▶ https://secure.flickr.com/photos/arenamontanus/2227769907 ▶ https://www.flickr.com/photos/markop/477199204 ▶ http://galeracluster.com/wp-content/uploads/2013/10/galera_replication1.png ▶ https://www.flickr.com/photos/gcwest/281385801 ▶ https://www.flickr.com/photos/opethdamna/360934079 ▶ http://digital-amphetamine.deviantart.com/art/Sky-82555664 ▶ http://highload.co/i/logo.png ▶ https://flic.kr/p/xTT8n ▶ https://www.flickr.com/photos/29233640@N07/13466208953 ▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/ Raghavendra Prabhu (Percona) Corpus collapsum 31 October, 2014 58 / 58

×