Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Architecture & Pitfalls of Logical Replication

1,011 views

Published on

@PostgresConf US 2018, Jersey City / United States April 16 - 20, 2018

https://postgresconf.org/conferences/2018

Published in: Software
  • Positions Available Now! We currently have several openings for writing workers. ■■■ http://t.cn/AieXSfKU
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Architecture & Pitfalls of Logical Replication

  1. 1. Copyright©2018 NTT Corp. All Rights Reserved. Architecture & Pitfalls
 of Logical Replication NTT OSS Center Atsushi Torikoshi PGConf.US 2018
  2. 2. 2 Who am I ➢Atsushi Torikoshi ➢@atorik_shi ➢torikoshi_atsushi_z2@lab.ntt.co.jp ➢NTT Open Source Software Center ➢PostgreSQL technical support ➢PostgreSQL performance verification Copyright©2018 NTT Corp. All Rights Reserved.
  3. 3. 3 About NTT • Who we are – NTT(Nippon Telegraph and Telephone Corporation) – Japanese telecommunications company • What NTT OSS Center does – Promotes the adoption of OSS by the group companies • Total support – support desk, Introduction support, Product maintenance • R&D – developing OSS and related tools with the communities • Deals about 60 OSS products – developing OSS and related tools with the communities NTT NTT OSS Center Copyright©2018 NTT Corp. All Rights Reserved.
  4. 4. Copyright©2018 NTT Corp. All Rights Reserved. 4 •Background of Logical Replication •Architecture and Behavior •Pitfalls •Summary INDEX
  5. 5. Copyright©2018 NTT Corp. All Rights Reserved. 5 BACKGROUND OF LOGICAL REPLICATION
  6. 6. Copyright©2018 NTT Corp. All Rights Reserved. 6 PostgreSQL has built-in Physical Replication since 2010. It replicates a whole DB by sending WAL. Suitable for load balancing and high availability. Physical Replication Upstream Downstream sendTable Table Table WALWAL WALWAL Table Table Table replay
  7. 7. Copyright©2018 NTT Corp. All Rights Reserved. 7 Physical Replication cannot do things like: • partial replication • replication between different major version PostgreSQL Logical Replication has added flexibility to built-in replication and made these things possible! Logical Replication Upstream Downstream decode, sendTable Table Table WALWAL WALWAL Table Table apply write
  8. 8. Copyright©2018 NTT Corp. All Rights Reserved. 8 Comparison between Logical and Physical Replication Physical Logical way of the replication Sending and replaying all WAL decoding WAL and extracting changes downstream DB copy of the upstream DB not necessarily the same as upstream DB up/downstream DB can be different PostgreSQL version manipulations for downstream DB SELECT only No restriction, but some manipulations may lead to conflict What is replicated ALL views, partition root tables, large objects and some manipulations including DDL are NOT replicated
  9. 9. Copyright©2018 NTT Corp. All Rights Reserved. 9 Logical Replication enables flexible data replication. 1. Replicating partial data for analytical purpose 2. Consolidating multiple DBs into a single one 3. Online version up Expected use cases of Logical Replication (1) (2) (3)
  10. 10. Copyright©2018 NTT Corp. All Rights Reserved. 10 ARCHITECTURE
 AND
 BEHAVIOR
  11. 11. Copyright©2018 NTT Corp. All Rights Reserved. 11 • ‘walsender’ and ‘apply worker’ do most of the work for Logical Replication. • ‘sync worker’ and corresponding ‘walsender’ run only at initial table sync. Basics of the architecture WAL wal sender Publisher (upstream) write wal sender apply worker launcher sync worker launch launch Subscriber(downstream) backend process read decode backend process
  12. 12. Copyright©2018 NTT Corp. All Rights Reserved. 12 • ‘walsender’ reads WAL and decodes it. Then sends it to subscriber. • ‘apply worker’ applies that change. Basics of the architecture ~replication WAL backend process wal sender Publisher write read apply worker Subscriber TableTableTable write decode send change
  13. 13. Copyright©2018 NTT Corp. All Rights Reserved. 13 • ‘walsender’ reassembles queries by its transaction. • When WAL is INSERT, UPDATE or DELETE, ‘walsender’ keeps the change in memory. Basics of the architecture ~replication WAL walsender INSERT UPDATE UPDATE DELETE UPDATE apply worker Publisher Subscriber :transaction
  14. 14. Copyright©2018 NTT Corp. All Rights Reserved. 14 • ‘walsender’ reassembles queries by its transaction. • When WAL is INSERT, UPDATE or DELETE, ‘walsender’ keeps the change in memory. Basics of the architecture ~replication WAL walsender INSERT UPDATE UPDATE DELETE UPDATE 1. read WAL apply worker Publisher Subscriber :transaction
  15. 15. Copyright©2018 NTT Corp. All Rights Reserved. 15 • ‘walsender’ reassembles queries by its transaction. • When WAL is INSERT, UPDATE or DELETE, ‘walsender’ keeps the change in memory. Basics of the architecture ~replication WAL walsender INSERT INSERT UPDATE UPDATE DELETE UPDATE 1. read WAL 2. decode apply worker Publisher Subscriber :transaction
  16. 16. Copyright©2018 NTT Corp. All Rights Reserved. 16 • ‘walsender’ reassembles queries by its transaction. • When WAL is INSERT, UPDATE or DELETE, ‘walsender’ keeps the change in memory. Basics of the architecture ~replication WAL walsender INSERT INSERT UPDATE UPDATE DELETE UPDATE 1. read WAL 2. decode 3. reassemble by transaction apply worker Publisher Subscriber :transaction INSERT
  17. 17. Copyright©2018 NTT Corp. All Rights Reserved. 17 • When WAL is COMMIT, ‘walsender’ sends all the changes for that transaction to subscriber. Basics of the architecture ~replication :transaction WAL apply worker walsender COMMIT INSERT UPDATE UPDATE DELETE UPDATE 1. read WAL 2. decode 4. send Publisher Subscriber 3. reassemble by transaction COMMIT
  18. 18. Copyright©2018 NTT Corp. All Rights Reserved. 18 • When WAL is ROLLBACK, ‘walsender’ just throws away the changes for that transaction. Basics of the architecture ~replication :transaction WAL walsender ROLLBACK INSERT UPDATE UPDATE DELETE UPDATE ROLLBACK 1. read WAL 2. decode 4. cleanup apply worker Publisher Subscriber 3. reassemble by transaction
  19. 19. Copyright©2018 NTT Corp. All Rights Reserved. 19 • At initial table sync, COPY runs. • COPY is done by dedicated ‘walsender’ and sync worker. These processes exit after COPY is done. Initial table sync WAL backend process wal sender Publisher write read apply worker Subscriber TableTableTable sync worker wal sender write (COPY)
  20. 20. Copyright©2018 NTT Corp. All Rights Reserved. 20 • PostgreSQL doesn’t have merge agents for conflict resolution. If there are multiple changes for the same data at one time, the last change is reflected. (Not) Conflict Publisher Subscriber id name 1 ‘A’ 2 ‘B’ id name 1 ‘A’ 2 ‘B’
  21. 21. Copyright©2018 NTT Corp. All Rights Reserved. 21 • PostgreSQL doesn’t have merge agents for conflict resolution. If there are multiple changes for the same data at one time, the last change is reflected. (Not) Conflict Publisher Subscriber 2. UPDATE table SET name = ‘Y‘ WHERE id = 2 id name 1 ‘A’ 2 ‘Y’ 1. UPDATE table SET name = ‘X‘ WHERE id = 2 id name 1 ‘A’ 2 ‘X’
  22. 22. Copyright©2018 NTT Corp. All Rights Reserved. 22 • PostgreSQL doesn’t have merge agents for conflict resolution. If there are multiple changes for the same data at one time, the last change is reflected. (Not) Conflict Publisher Subscriber 2. UPDATE table SET name = ‘Y‘ WHERE id = 2 id name 1 ‘A’ 2 ‘X’ 1. UPDATE table SET name = ‘X‘ WHERE id = 2 3. replicate id name 1 ‘A’ 2 ‘X’
  23. 23. Copyright©2018 NTT Corp. All Rights Reserved. 23 • If replicating data causes an error at subscriber side, the replication stops. Conflict Publisher Subscriber id 1 2 1. INSERT INTO table VALUES (2); id 1 2 2. INSERT INTO table VALUES (2);
  24. 24. Copyright©2018 NTT Corp. All Rights Reserved. 24 • If replicating data causes an error at subscriber side, the replication stops. Conflict Publisher Subscriber id 1 2 1. INSERT INTO table VALUES (2); id 1 2 2. INSERT INTO table VALUES (2); 3. replicate
  25. 25. Copyright©2018 NTT Corp. All Rights Reserved. 25 • If replicating data causes an error at subscriber side, the replication stops. Conflict Publisher Subscriber id 1 2 2. INSERT INTO table VALUES (2); id 1 2 1. INSERT INTO table VALUES (2); 3. replicate 4. conflict
  26. 26. Copyright©2018 NTT Corp. All Rights Reserved. 26 • Users must resolve conflict manually. • After the conflict is resolved, replication is resumed. Conflict Publisher Subscriber id 1 2 2. INSERT INTO table VALUES (2); id 1 2 1. INSERT INTO table VALUES (2); 3. replicate 4. conflict
  27. 27. Copyright©2018 NTT Corp. All Rights Reserved. 27 PITFALLS
  28. 28. Copyright©2018 NTT Corp. All Rights Reserved. 28 Q1. How does ‘walsender’ deal with WAL which are NOT target of replication?
  29. 29. Copyright©2018 NTT Corp. All Rights Reserved. 29 A1. ‘walsender’ decodes most of the WAL.
  30. 30. Copyright©2018 NTT Corp. All Rights Reserved. 30 • behavior: 'walsender’ decodes *all* of the changes to the target database, NOT just the changes to subscribed tables. 1. ‘walsender’ decodes most of the WAL
  31. 31. Copyright©2018 NTT Corp. All Rights Reserved. 31 • pitfall: Changes in non-subscribed tables even consume resources, such as CPU and memory. 1. ‘walsender’ decodes most of the WAL perf visualization of walsender updating only non-subscribed tables DecodeDelete DecodeInsert DecodeCommit
  32. 32. Copyright©2018 NTT Corp. All Rights Reserved. 32 • Lesson: ‘walsender’ consumes resources depending on the whole amount of changes on the publisher database database, NOT only on the amount of changes on subscribed tables. 1. ‘walsender’ decodes most of the WAL
  33. 33. Copyright©2018 NTT Corp. All Rights Reserved. 33 Q2. Does keeping changes on walsender cause issues?
  34. 34. Copyright©2018 NTT Corp. All Rights Reserved. 34 A2. Yes, It may consume a lot of memory.
  35. 35. Copyright©2018 NTT Corp. All Rights Reserved. 35 • behavior: ‘walsender’ keeps each change of a transaction in memory until COMMIT or ROLLBACK. 2. ‘walsender’ may consume a lot of memory
  36. 36. Copyright©2018 NTT Corp. All Rights Reserved. 36 • pitfall: It may cause ‘walsender’ to consume a lot of memory. 2. ‘walsender’ may consume a lot of memory Type of manipulation Measures to prevent memory use many changes in one transaction walsender’ has a feature to spill out changes to disk, when the number of changes in one transaction exceeds 4096. changes which modifies much data There are no feature to avoid using memory. many transactions many savepoints ※ Patches changing this behavior are under discussion.
  37. 37. Copyright©2018 NTT Corp. All Rights Reserved. 37 • lesson: If possible, it’s better to avoid the manipulations which have no measures to prevent consuming a lot of memory. Monitoring memory usage at publisher may be a good idea. 2. ‘walsender’ may consume a lot of memory
  38. 38. Copyright©2018 NTT Corp. All Rights Reserved. 38 Q3. Can we use synchronous replication in Logical Replication?
  39. 39. Copyright©2018 NTT Corp. All Rights Reserved. 39 A3. Yes, but the response time may be quite long.
  40. 40. Copyright©2018 NTT Corp. All Rights Reserved. 40 • behavior: Under synchronous replication, before replying to the client, publishers wait for the COMMIT responses from all the subscribers. 3. The response time may be quite long Publisher table 2 table 1 Client BEGIN; INSERT INTO Table1 VALUES (‘a’); COMMIT; (1) (4) Subscriber table 1 BEGIN; INSERT INTO Table1 VALUES (‘a’); COMMIT; (2) (3) Table1
  41. 41. Copyright©2018 NTT Corp. All Rights Reserved. 41 • pitfall: Under synchronous replication, Publishers wait for COMMIT responses from all the subscribers, even when there are no changes to those subscribers. 3. The response time may be quite long Publisher table 2 table 1 Subscriber2 table 2 Client BEGIN; INSERT INTO Table1 VALUES (‘a’); COMMIT; Sends only BEGIN and COMMIT (1) (2) (3) (4) Subscriber1 table 1 BEGIN; INSERT INTO Table1 VALUES (‘a’); COMMIT; (2) (3) Table1
  42. 42. Copyright©2018 NTT Corp. All Rights Reserved. 42 • lesson: The response time to clients depends on the slowest subscriber. • Also, as we’ve seen it on Q2, ‘walsender‘ sends changes to ‘apply worker’ after COMMIT, it also tends to make response time longer. • It may also be beneficial to confirm you really need synchronous replication. 3. The response time may be quite long
  43. 43. Copyright©2018 NTT Corp. All Rights Reserved. 43 Q4. Is the way to monitor the status of replication the same as Physical Replication?
  44. 44. Copyright©2018 NTT Corp. All Rights Reserved. 44 A4. Only monitoring pg_stat_replication might not be enough.
  45. 45. Copyright©2018 NTT Corp. All Rights Reserved. 45 • behavior: Initial table sync is done by dedicated processes, sync worker and walsender. 4. pg_stat_replication might not be enough WAL backend process wal sender Publisher write read apply worker Subscriber TableTableTable sync worker wal sender write (COPY)
  46. 46. Copyright©2018 NTT Corp. All Rights Reserved. 46 • pitfall: Even if ‘sync worker’ failed to start and nothing has been replicated yet, pg_stat_replication.state is ‘streaming’. 4. pg_stat_replication might not be enough
  47. 47. Copyright©2018 NTT Corp. All Rights Reserved. 47 • lesson: We should also monitor pg_subscription_rel and check ‘srsubstate’ is ‘r’, meaning ready. 4. pg_stat_replication might not be enough
  48. 48. Copyright©2018 NTT Corp. All Rights Reserved. 48 Q5. How should we resolve the conflict?
  49. 49. Copyright©2018 NTT Corp. All Rights Reserved. 49 A5. We can use pg_replication_origin_advance(), but it may skip some data.
  50. 50. Copyright©2018 NTT Corp. All Rights Reserved. 50 • behavior: pg_replication_origin_advance() enables us to set the LSN up to which data has been replicated. 5. pg_replication_origin_advance() may skip data | | 10 20 remote lsn Here
  51. 51. Copyright©2018 NTT Corp. All Rights Reserved. 51 • behavior: pg_replication_origin_advance() enables us to set the LSN up to which data has been replicated. 5. pg_replication_origin_advance() may skip data | | 10 20 remote lsn pg_replication_origin_advance(‘node_name’, 20) Here
  52. 52. Copyright©2018 NTT Corp. All Rights Reserved. 52 • behavior: pg_replication_origin_advance() enables us to set the LSN up to which data has been replicated. 5. pg_replication_origin_advance() may skip data | | 10 20 remote lsn pg_replication_origin_advance(‘node_name’, 20) Here Conflict point
  53. 53. Copyright©2018 NTT Corp. All Rights Reserved. 53 • pitfalls: If there are some changes on the publisher after the conflict, pg_replication_origin_advance(‘current wal lsn on publisher’) skips applying that changes. 5. pg_replication_origin_advance() may skip data | | 10 20 remote lsn pg_replication_origin_advance(‘node_name’, 20) INSERT UPDATEConflict point
  54. 54. Copyright©2018 NTT Corp. All Rights Reserved. 54 • lessons: Changing conflicting data on the subscriber may be usually a better choice. 5. pg_replication_origin_advance() may skip data
  55. 55. Copyright©2018 NTT Corp. All Rights Reserved. 55 Q6. Can backup be performed usual?
  56. 56. Copyright©2018 NTT Corp. All Rights Reserved. 56 A6. Backup DB under Logical Replication may need additional procedure.
  57. 57. Copyright©2018 NTT Corp. All Rights Reserved. 57 • behavior: pg_dump doesn't backup pg_subscription_rel, which keeps the state of initial table sync. 6. Logical Replication may need additional procedure
  58. 58. Copyright©2018 NTT Corp. All Rights Reserved. 58 • pitfalls: Restoring data backed up by pg_dump at a subscriber causes initial table sync again. It usually makes the replication stop due to key duplication error. 6. Logical Replication may need additional procedure Publisher Subscriber pg_dump TableTableTable
  59. 59. Copyright©2018 NTT Corp. All Rights Reserved. 59 • pitfalls: Restoring data backed up by pg_dump at a subscriber causes initial table synchronization again. It usually makes the replication stop due to key duplication error. 6. Logical Replication may need additional procedure Publisher Subscriber (1)restore pg_dump TableTableTable TableTableTable
  60. 60. Copyright©2018 NTT Corp. All Rights Reserved. 60 • pitfalls: Restoring data backed up by pg_dump at a subscriber causes initial table synchronization again. It usually makes the replication stop due to key duplication error. 6. Logical Replication may need additional procedure Publisher Subscriber (1)restore pg_dump TableTableTable TableTableTable (2)replication
  61. 61. Copyright©2018 NTT Corp. All Rights Reserved. 61 • pitfalls: Restoring data backed up by pg_dump at a subscriber causes initial table synchronization again. It usually makes the replication stop due to key duplication error. 6. Logical Replication may need additional procedure Publisher Subscriber (1)restore pg_dump TableTableTable TableTableTable (3)conflict (2)replication
  62. 62. Copyright©2018 NTT Corp. All Rights Reserved. 62 • lessons: We can avoid this resyncing by refresh subscription with 'copy_data = false‘. But if a subscription has tables which have not completed the initial sync, we need more work.. It's better to consider well what data is really necessary and how to prevent data loss. In some cases it may be better to start replication from scratch. 6. Logical Replication may need additional procedure
  63. 63. Copyright©2018 NTT Corp. All Rights Reserved. 63 SUMMARY
  64. 64. Copyright©2018 NTT Corp. All Rights Reserved. 64 Design Take into account some counterintuitive behaviors which cause performance impact. • ‘walsender’ keeps changes in memory • In sync replication, publishers wait for COMMIT from all the subscribers even which have no change. • Changes on non-subscribed tables are also decoded. How should we manage Logical Replication?
  65. 65. Copyright©2018 NTT Corp. All Rights Reserved. 65 Monitoring • Monitor memory usage on publisher. • Monitor not only pg_stat_replication but pg_subscription_rel. How should we manage Logical Replication?
  66. 66. Copyright©2018 NTT Corp. All Rights Reserved. 66 Operation • pg_replication_origin_advance() may skip some data. • Backup and restore need some extra procedures, It's better to consider well what data is really necessary and how to prevent data loss. How should we manage Logical Replication?
  67. 67. Copyright©2018 NTT Corp. All Rights Reserved. 67 Thank you ! torikoshi_atsushi_z2@lab.ntt.co.jp @atorik_shi

×