Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Migration Best Practices: From RDBMS to Cassandra without a Hitch

9,968 views

Published on

Presenter: Duy Hai Doan, Technical Advocate at Datastax

Libon is a messaging service designed to improve mobile communications through free calls, chat and a voicemail services regardless of operator or Internet access provider. As a mobile communications application, Libon processes billions of messages and calls while backing up billions of contact data. Join this webinar to learn best practices and pitfalls to avoid when tackling a migration project from Relational Database (RDBMS) to Cassandra and how Libon is now able to ingest massive volumes of high velocity data with read and write latency below 10 milliseconds.

Published in: Technology

Migration Best Practices: From RDBMS to Cassandra without a Hitch

  1. 1. Migration Best Practices: From RDBMS to Cassandra without a Hitch #Cassandra @doanduyhai
  2. 2. #Cassandra @doanduyhai Who am I ? 2 DuyHai Doan Achilles Cassandra Technical Advocate @ Datastax Former Java Developer @ Libon
  3. 3. #Cassandra @doanduyhai Agenda •  Libon context •  Migration strategy •  Business code migration •  Data Modeling •  Take Away 3
  4. 4. #Cassandra @doanduyhai Libon Context
  5. 5. #Cassandra @doanduyhai What is Libon ? •  Messaging app •  VOIP (out) •  Custom voicemail & greetings •  SMS/chat/file transfer •  Contacts matching 5
  6. 6. #Cassandra @doanduyhai Contact Matching 6 Libon User
  7. 7. #Cassandra @doanduyhai Contact Matching 7 Libon User Friend
  8. 8. #Cassandra @doanduyhai Contact Matching 8 Libon User Friend Contact matching
  9. 9. #Cassandra @doanduyhai Contact Matching 9 Libon User Friend Accept link
  10. 10. #Cassandra @doanduyhai Project Context •  Application grew over the years 10
  11. 11. #Cassandra @doanduyhai Project Context •  Application grew over the years •  Already using Cassandra to handle events •  messaging / file sharing / SMS / notifications •  Cassandra R/W latencies ≈ 0,4 ms •  server response time under 10 ms 11
  12. 12. #Cassandra @doanduyhai Project Context •  About contacts … 12
  13. 13. #Cassandra @doanduyhai Project Context •  About contacts … •  stored as relational model in RDBMS (Oracle) 13
  14. 14. #Cassandra @doanduyhai Project Context •  About contacts … •  stored as relational model in RDBMS (Oracle) •  1 user ≈ 300 contacts 14
  15. 15. #Cassandra @doanduyhai Project Context •  About contacts … •  stored as relational model in RDBMS (Oracle) •  1 user ≈ 300 contacts •  with millions users ☞ billions of contacts to handle 15
  16. 16. #Cassandra @doanduyhai Project Context •  About contacts … •  stored as relational model in RDBMS (Oracle) •  1 user ≈ 300 contacts •  with millions users ☞ billions of contacts to handle •  query latency unpredictable 16
  17. 17. #Cassandra @doanduyhai17
  18. 18. #Cassandra @doanduyhai Fixing the problem •  Tune the RDBMS 18
  19. 19. #Cassandra @doanduyhai Fixing the problem •  Tune the RDBMS •  indices 19
  20. 20. #Cassandra @doanduyhai Fixing the problem •  Tune the RDBMS •  indices •  partitioning 20
  21. 21. #Cassandra @doanduyhai Fixing the problem •  Tune the RDBMS •  indices •  partitioning •  less joins, simplified relational model 21
  22. 22. #Cassandra @doanduyhai Fixing the problem •  Tune the RDBMS •  indices •  partitioning •  less joins, simplified relational model •  hardware capacity increased 22
  23. 23. #Cassandra @doanduyhai Fixing the problem •  Tune the RDBMS •  indices •  partitioning •  less joins, simplified relational model •  hardware capacity increased 23 That worked
  24. 24. #Cassandra @doanduyhai Fixing the problem •  Tune the RDBMS •  indices •  partitioning •  less joins, simplified relational model •  hardware capacity increased 24 That worked but …
  25. 25. #Cassandra @doanduyhai Back-end application RDBMS Cassandra 25
  26. 26. #Cassandra @doanduyhai Back-end application RDBMS Cassandra 26 We need to choose
  27. 27. #Cassandra @doanduyhai Next Challenges •  High Availability (DB failure, site failure …) 27
  28. 28. #Cassandra @doanduyhai Next Challenges •  High Availability (DB failure, site failure …) •  Predictable performance at scale 28
  29. 29. #Cassandra @doanduyhai Next Challenges •  High Availability (DB failure, site failure …) •  Predictable performance at scale •  Going to multi data-centers 29
  30. 30. #Cassandra @doanduyhai Next Challenges •  High Availability (DB failure, site failure …) •  Predictable performance at scale •  Going to multi data-centers ☞ Cassandra, what else ? 30
  31. 31. #Cassandra @doanduyhai Data Migration Strategy
  32. 32. #Cassandra @doanduyhai Objectives •  No downtime 32
  33. 33. #Cassandra @doanduyhai Objectives •  No downtime •  No concurrency corner-cases 33
  34. 34. #Cassandra @doanduyhai Objectives •  No downtime •  No concurrency corner-cases •  Safe rollback possible 34
  35. 35. #Cassandra @doanduyhai Objectives •  No downtime •  No concurrency corner-cases •  Safe rollback possible •  Replay-ability & resume-ability 35
  36. 36. #Cassandra @doanduyhai Strategy •  4 phases 36
  37. 37. #Cassandra @doanduyhai Strategy •  4 phases •  Write contacts to both data stores 37
  38. 38. #Cassandra @doanduyhai Strategy •  4 phases •  Write contacts to both data stores •  Old contacts migration 38
  39. 39. #Cassandra @doanduyhai Strategy •  4 phases •  Write contacts to both data stores •  Old contacts migration •  Switch to Cassandra (but keep RDBMS in case of…) 39
  40. 40. #Cassandra @doanduyhai Strategy •  4 phases •  Write contacts to both data stores •  Old contacts migration •  Switch to Cassandra (but keep RDBMS in case of…) •  Remove the RDBMS code 40
  41. 41. #Cassandra @doanduyhai Migration Phase 1 41 Back end server · · · SQLSQLSQL C* C* C* C* C* Write contactUUID contactId … contactUUID 129363 123e4567- e89b-12d3… 834849 contacId(long) + contactUUID
  42. 42. #Cassandra @doanduyhai Migration Phase 1 42 Back end server · · · SQLSQLSQL C* C* C* C* C* Read
  43. 43. #Cassandra @doanduyhai Migration Phase 2 •  On live production, migrate old contacts 43 SQLSQLSQL C* C* C* C* C* For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL Old contacts created before phase 1
  44. 44. #Cassandra @doanduyhai Migration Phase 2 •  On live production, migrate old contacts 44 SQLSQLSQL C* C* C* C* C* For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL Logged batches of INSERT INTO contacts(..) VALUES(…) USING TIMESTAMP now() - 1 week Old contacts created before phase 1
  45. 45. #Cassandra @doanduyhai Migration Phase 2 45 USING TIMESTAMP now() - 1 week 😳
  46. 46. #Cassandra @doanduyhai Migration Phase 2 •  During data migration … 46
  47. 47. #Cassandra @doanduyhai Migration Phase 2 •  During data migration … •  … concurrent writes from the migration batch … 47
  48. 48. #Cassandra @doanduyhai Migration Phase 2 •  During data migration … •  … concurrent writes from the migration batch … •  … and updates from production for the same contact 48
  49. 49. #Cassandra @doanduyhai Migration Phase 2 49 contact_uuid name (now -1 week) … name (now) … Johny … Johnny … Insert from batch (to the past) Update from production
  50. 50. #Cassandra @doanduyhai Migration Phase 2 50 contact_uuid name (now -1 week) … name (now) … Johny … Johnny … Future reads pick the most up-to-date value
  51. 51. #Cassandra @doanduyhai Last Write Win in action 51 Case 1 Case 2 Batchpast(Johny) t1 Prodnow(Johnny) t2 t3 Read(Johnny) Batchpast(Johny) t1 Prodnow(Johnny) t2 t3 Read(Johnny)
  52. 52. #Cassandra @doanduyhai Migration Phase 2 52 "Write to the Past… to save the Future" Libon – 2014/10/08
  53. 53. #Cassandra @doanduyhai Migration Phase 3 53 Back end server · · · SQLSQLSQL C* C* C* C* C* Write
  54. 54. #Cassandra @doanduyhai Migration Phase 4 54 Back end server · · · SQLSQLSQL C* C* C* C* C* Write ❌
  55. 55. #Cassandra @doanduyhai Business Code Refactoring
  56. 56. #Cassandra @doanduyhai Code Inventory •  Written for RDBMS 56
  57. 57. #Cassandra @doanduyhai Code Inventory •  Written for RDBMS •  Lots of joins (no surprise) 57
  58. 58. #Cassandra @doanduyhai Code Inventory •  Written for RDBMS •  Lots of joins (no surprise) •  Designed around transactions 58
  59. 59. #Cassandra @doanduyhai Code Inventory •  Written for RDBMS •  Lots of joins (no surprise) •  Designed around transactions •  Spring @Transactional everywhere 59
  60. 60. #Cassandra @doanduyhai Code Inventory cont. •  Entities go through Services & Repositories 60 Repositories Services ContactEntity
  61. 61. #Cassandra @doanduyhai Code Inventory cont. •  Hibernate is auto-magic 61
  62. 62. #Cassandra @doanduyhai Code Inventory cont. •  Hibernate is auto-magic •  lazy loading •  1st level cache •  N+1 select 62 Repositories Services ContactEntity
  63. 63. #Cassandra @doanduyhai Which options ? •  Throw existing code … •  … and re-design from scratch for Cassandra 63
  64. 64. #Cassandra @doanduyhai Which options ? •  Throw existing code … •  … and re-design from scratch for Cassandra 64 No way !
  65. 65. #Cassandra @doanduyhai Code Quality •  Existing business code has… •  … ≈ 3500 unit tests 65
  66. 66. #Cassandra @doanduyhai Code Quality •  Existing business code has… •  … ≈ 3500 unit tests •  and ≈600+ integration tests 66
  67. 67. #Cassandra @doanduyhai Code Quality 67 "The code coverage is one of your most valuable technical asset" Libon – since beginning
  68. 68. #Cassandra @doanduyhai Repositories Services Refactoring Strategy 68 ContactMatchingService ContactService ContactSync ContactEntity n 1 n n
  69. 69. #Cassandra @doanduyhai Repositories Services Refactoring Strategy 69 ContactMatchingService ContactService ContactNoSQLEntity ContactSync ContactEntity n 1 n n Proxy
  70. 70. #Cassandra @doanduyhai Repositories Services Refactoring Strategy 70 ContactMatchingService ContactService ContactNoSQLEntity ContactSync ContactEntity n 1 n n Denorm2 … DenormN Denorm1 Proxy
  71. 71. #Cassandra @doanduyhai Refactoring Strategy •  Use CQRS •  ContactReadRepository •  ContactWriteRepository •  ContactUpdateRepository •  ContactDeleteRepository 71
  72. 72. #Cassandra @doanduyhai Refactoring Strategy •  ContactReadRepository •  direct sequential read •  no joins •  1 read ≈ 1 SELECT 72
  73. 73. #Cassandra @doanduyhai Refactoring Strategy •  ContactWriteRepository •  write to all denormalized tables •  using CQL logged batches •  use TTLs 73
  74. 74. #Cassandra @doanduyhai Refactoring Strategy •  ContactUpdateRepository •  read-before-write most of the time 😟 •  rare updates ☞ acceptable perf penalty 74
  75. 75. #Cassandra @doanduyhai Refactoring Strategy •  ContactDeleteRepository •  delete by partition key 75
  76. 76. #Cassandra @doanduyhai Outcome •  5 months of 2 men work 76
  77. 77. #Cassandra @doanduyhai Outcome •  5 months of 2 men work •  Many iterations to fix bugs (thanks to IT) 77
  78. 78. #Cassandra @doanduyhai Outcome •  5 months of 2 men work •  Many iterations to fix bugs (thanks to IT) •  Lots of performance benchmarks using Gatling 78
  79. 79. #Cassandra @doanduyhai Gatling Output 79
  80. 80. #Cassandra @doanduyhai Outcome •  5 months of 2 men work •  Many iterations to fix bugs (thanks to IT) •  Lots of performance benchmarks using Gatling ☞ data model & code validation 80
  81. 81. #Cassandra @doanduyhai Outcome •  5 months of 2 men work •  Many iterations to fix bugs (thanks to IT) •  Lots of performance benchmarks using Gatling ☞ data model & code validation •  … we are almost there for production 81
  82. 82. #Cassandra @doanduyhai Data Model
  83. 83. #Cassandra @doanduyhai Denormalization, the good •  Support fast reads •  1 read ≈ 1 SELECT •  Worthy because mostly read, few updates 83
  84. 84. #Cassandra @doanduyhai Denormalization, the bad •  Updating mutable data can be nightmare •  Data model bound by existing client-facing API •  Update paths very error-prone without tests 84
  85. 85. #Cassandra @doanduyhai Data model in detail 85 Contacts_by_id Contacts_by_identifiers Contacts_in_profiles Contacts_by_modification_date Contacts_by_firstname_lastname Contacts_linked_user
  86. 86. #Cassandra @doanduyhai Data model in detail 86 Contacts_by_id Contacts_by_identifiers Contacts_in_profiles Contacts_by_modification_date Contacts_by_firstname_lastname Contacts_linked_user user_id always component of partition key
  87. 87. #Cassandra @doanduyhai Scalable design 87 n1 n2 n3 n4 n5 n6 n7 n8 A B C D E F G H user_id1 user_id2 user_id3 user_id4 user_id5
  88. 88. #Cassandra @doanduyhai Scalable design 88 n1 n2 n3 n4 n5 n6 n7 n8 A B C D E F G H user_id1user_id2 user_id3 user_id4 user_id5
  89. 89. #Cassandra @doanduyhai Bloom filters in action •  For some tables, partition key = (user_id, contact_id) ☞ fast look-up, leverages Bloom filters ☞ touches 1 SSTable most of the time 89
  90. 90. #Cassandra @doanduyhai Data model in detail 90 Contacts_by_id Contacts_by_identifiers Contacts_in_profiles Contacts_by_modification_date Contacts_by_firstname_lastname Contacts_linked_user Wide partition
  91. 91. #Cassandra @doanduyhai A "queue" story •  contacts_by_modification_date •  queue-like pattern 😭 91
  92. 92. #Cassandra @doanduyhai A "queue" story •  contacts_by_modification_date •  queue-like pattern 😭 ☞ buckets to the rescue 92 user_id:2014-12 date35 date12 … … date47 … … … … user_id:2014-11 date11 date12 … … date34 … … … …
  93. 93. #Cassandra @doanduyhai Data model summary •  7 tables for denormalization 93
  94. 94. #Cassandra @doanduyhai Data model summary •  7 tables for denormalization •  Normalize some tables because rare access 94
  95. 95. #Cassandra @doanduyhai Data model summary •  7 tables for denormalization •  Normalize some tables because rare access •  Read-before write in most update scenarios 😟 95
  96. 96. #Cassandra @doanduyhai Notes on contact_id •  In SQL, auto-generated long using sequence •  In Cassandra, auto-generated timeuuid 96
  97. 97. #Cassandra @doanduyhai Notes on contact_id •  How to store both types ? 97
  98. 98. #Cassandra @doanduyhai Notes on contact_id •  How to store both types ? •  As text ? ☞ easy solution … 98
  99. 99. #Cassandra @doanduyhai Notes on contact_id •  How to store both types ? •  As text ? ☞ easy solution … •  … but waste of space ! •  because encoded as UTF-8 or ASCII in Cassandra 99
  100. 100. #Cassandra @doanduyhai Notes on contact_id •  Long ☞ 8 bytes •  Long as text(UTF-8: 1 byte) ☞ "digits count" bytes 100
  101. 101. #Cassandra @doanduyhai Notes on contact_id •  UUID ☞ 16 bytes E81D4C70-A638-11E4-83CB-DEB70BF9330F •  32 hex chars + 4 hyphens = 36 chars •  UUID as text(UTF-8: 1 byte) ☞ 36 bytes •  Bytes overhead = 36 – 16 = 20 bytes 101
  102. 102. #Cassandra @doanduyhai Notes on contact_id •  20 bytes wasted per contact uuid 102
  103. 103. #Cassandra @doanduyhai Notes on contact_id •  20 bytes wasted per contact uuid •  × 7 denormalizations = 140 bytes per contact uuid 103
  104. 104. #Cassandra @doanduyhai Notes on contact_id •  20 bytes wasted per contact uuid •  × 7 denormalizations = 140 bytes per contact uuid •  × 109 contacts = 140 GB wasted 104 😠 not even counting replication factor …
  105. 105. #Cassandra @doanduyhai Notes on contact_id •  ☞ just save contact id as byte[ ] 105
  106. 106. #Cassandra @doanduyhai Notes on contact_id •  ☞ just save contact id as byte[ ] •  Achilles @TypeTransformer for automatic conversion (see later) 106
  107. 107. #Cassandra @doanduyhai Notes on contact_id •  ☞ just save contact id as byte[ ] •  Achilles @TypeTransformer for automatic conversion (see later) •  Use blobAsBigInt( ) or blobAsUUID( ) to view data 107
  108. 108. #Cassandra @doanduyhai Achilles •  Advanced "object mapper" •  Fluent API •  Tons of features •  TDD friendly 108
  109. 109. #Cassandra @doanduyhai Achilles •  Dirty checking, what is it ? 109
  110. 110. #Cassandra @doanduyhai Achilles •  Dirty checking, what is it ? •  1 contact ≈ 8 mutable fields 110
  111. 111. #Cassandra @doanduyhai Achilles •  Dirty checking, what is it ? •  1 contact ≈ 8 mutable fields •  × 7 denormalizations = 56 update combinations … 111
  112. 112. #Cassandra @doanduyhai Achilles •  Dirty checking, what is it ? •  1 contact ≈ 8 mutable fields •  × 7 denormalizations = 56 update combinations … •  and not even counting multiple fields updates … 112
  113. 113. #Cassandra @doanduyhai Achilles •  Are you going to manually generate 56+ prepared statements for all possible updates ? 113
  114. 114. #Cassandra @doanduyhai Achilles •  Are you going to manually generate 56+ prepared statements for all possible updates ? •  Or just use dynamic plain string statements and get some perf penalty ? 114
  115. 115. #Cassandra @doanduyhai Achilles •  Dirty check in action 115 //No read-before-write ContactEntity proxy = manager.forUpdate(ContactEntity.class, contactId); proxy.setFirstName(…); proxy.setLastName(…); //type-safe updates proxy.setAddress(…); manager.update(proxy);
  116. 116. #Cassandra @doanduyhai Achilles 116 Empty Entity DirtyMap Proxy Setters interception PrimaryKey
  117. 117. #Cassandra @doanduyhai Achilles •  Dynamic statements generation 117 UPDATE contacts SET firstname=?, lastname=?,address=? WHERE contact_id=? prepared statements are cached, of course
  118. 118. #Cassandra @doanduyhai Achilles •  Insert strategy, why is it so important ? 118
  119. 119. #Cassandra @doanduyhai Achilles •  Simple INSERT prepared statement 119 INSERT INTO contacts(contact_id,name,age,address,gender,avatar,…) VALUES(?, ?, ?, ? … ?);
  120. 120. #Cassandra @doanduyhai Achilles •  Runtime values binding •  some columns are optional 120 preparedStatement.bind(49374,’John DOE’,33, null, null, …, null);
  121. 121. #Cassandra @doanduyhai Achilles 121 Wait … are you saying inserting null in CQL??? 😳
  122. 122. #Cassandra @doanduyhai Achilles 122 Inserting null creating tombstones
  123. 123. #Cassandra @doanduyhai Achilles 123 Inserting null creating tombstones × 7 denormalizations
  124. 124. #Cassandra @doanduyhai Achilles 124 Inserting null creating tombstones × 7 denormalizations × billions of contacts created 😱 not even counting replication factor …
  125. 125. #Cassandra @doanduyhai Achilles •  Simple annotation 125 @Entity(table = "contacts_by_id ») @Strategy(insert = InsertStrategy.NOT_NULL_FIELDS) public class ContactById { }
  126. 126. #Cassandra @doanduyhai Achilles •  Runtime dynamic INSERT statement 126 INSERT INTO contacts(contact_id, name, age, address,) VALUES(:contact_id, :name, :age, :address); prepared statements are cached, of course
  127. 127. #Cassandra @doanduyhai Achilles •  Remember the contactId ⇄ byte[ ] conversion ? 127 @PartitionKey @Column(name = "contact_id") @TypeTransformer(valueCodecClass = ContactIdToBytes.class) private ContactId contactId; BYOC ☞ Bring Your Own Codec
  128. 128. #Cassandra @doanduyhai Achilles 128 public interface Codec<FROM, TO> { Class<FROM> sourceType(); Class<TO> targetType(); TO encode(FROM fromJava) FROM decode(TO fromCassandra); }
  129. 129. #Cassandra @doanduyhai Achilles •  Dynamic logging in action 129 2014-12-01 14:25:20,554 Bound statement : [INSERT INTO contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES (:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...] 2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND (modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead]
  130. 130. #Cassandra @doanduyhai Achilles •  Dynamic logging •  runtime activation •  no need to recompile/re-deploy •  save us hours of debugging •  TRACE log level ☞ query tracing 130
  131. 131. #Cassandra @doanduyhai Take Away
  132. 132. #Cassandra @doanduyhai Conditions for success •  Data modeling is crucial 132
  133. 133. #Cassandra @doanduyhai Conditions for success •  Data modeling is crucial •  Double-run strategy & timestamp trick FTW 133
  134. 134. #Cassandra @doanduyhai Conditions for success •  Data modeling is crucial •  Double-run strategy & timestamp trick FTW •  Data type conversion can be tricky 134
  135. 135. #Cassandra @doanduyhai Conditions for success •  Data modeling is crucial •  Double-run strategy & timestamp trick FTW •  Data type conversion can be tricky •  Benchmark ! 135
  136. 136. #Cassandra @doanduyhai Conditions for success •  Data modeling is crucial •  Double-run strategy & timestamp trick FTW •  Data type conversion can be tricky •  Benchmark ! •  Mindset shifts for the team 136
  137. 137. #Cassandra @doanduyhai Thank You ! ""

×