Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

...Lag

1,340 views

Published on

Diagnosing postgresql streaming replication lag.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

...Lag

  1. 1. ...Lag
  2. 2. ...Lag What’s wrong with my slave?
  3. 3. Act I How do I know?
  4. 4. Monitoring 101 Cacti, Nagios, Zabbix + pagerduty
  5. 5. select * from pg_stat_replication;
  6. 6. What is Normal?
  7. 7. Create Table rep_ts ( ts timestamp not null default NOW() );
  8. 8. Time V Bytes
  9. 9. Time Measurement
  10. 10. SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::int END AS log_delay
  11. 11. Normal - 7 Days Max < 3.5 Sec, 10 spikes Date -> Time->
  12. 12. Also Normal Date -> Time-> Max < 20k Sec, 7 spikes
  13. 13. 7 Days, 2 regular spikes
  14. 14. Still Normal Date -> Time-> Max < 10k Sec, 14 spikes
  15. 15. is replication paused?
  16. 16. Byte Measurement
  17. 17. SELECT client_hostname, pg_xlog_location_diff (pg_stat_replication.sent_location, pg_stat_replication. replay_location), EXTRACT(EPOCH FROM now()) FROM pg_stat_replication;
  18. 18. Normal - 7 Days
  19. 19. Act II What’s Going Wrong?
  20. 20. Most issues are in the initial setup phase.
  21. 21. 1. Configuration
  22. 22. 1. Configuration 2. Hardware
  23. 23. 1. Configuration 2. Hardware 3. Human Error
  24. 24. Async or Sync?
  25. 25. Traffic Isolation: separate host with alternate configuration
  26. 26. Act III Configuration
  27. 27. max_standby_archive_delay & max_standby_streaming_delay
  28. 28. max_standby_archive_delay Applies as WAL data is read
  29. 29. max_standby_streaming_delay Applies when WAL data is received
  30. 30. replication_timeout & wal_receiver_status_interval
  31. 31. hot_standby_feedback
  32. 32. hot_standby_feedback + max_standby_streaming_delay = LAG
  33. 33. Act IV Hardware
  34. 34. 25 - 30 Seconds 5 Hours, Irregular Spikes
  35. 35. Spindle disk, Non-Partitioned xlogs
  36. 36. 15 Hours, Mostly Regular Spikes
  37. 37. VERY Slight Increase
  38. 38. Zoomed Out, 3 Weeks Clear Pattern from 0 to ~5 s
  39. 39. Zoomed Out, 3 Weeks Clear Pattern from 0 to ~5 s
  40. 40. NTP
  41. 41. What’s that 500 second Abnormal Spike?
  42. 42. Not Worth It
  43. 43. What have you experienced?

×