11. #Cassandra @doanduyhai
Project Context
• Application grew over the years
• Already using Cassandra to handle events
• messaging / file sharing / SMS / notifications
• Cassandra R/W latencies ≈ 0,4 ms
• server response time under 10 ms
11
15. #Cassandra @doanduyhai
Project Context
• About contacts …
• stored as relational model in RDBMS (Oracle)
• 1 user ≈ 300 contacts
• with millions users ☞ billions of contacts to handle
15
16. #Cassandra @doanduyhai
Project Context
• About contacts …
• stored as relational model in RDBMS (Oracle)
• 1 user ≈ 300 contacts
• with millions users ☞ billions of contacts to handle
• query latency unpredictable
16
22. #Cassandra @doanduyhai
Fixing the problem
• Tune the RDBMS
• indices
• partitioning
• less joins, simplified relational model
• hardware capacity increased
22
23. #Cassandra @doanduyhai
Fixing the problem
• Tune the RDBMS
• indices
• partitioning
• less joins, simplified relational model
• hardware capacity increased
23
That worked
24. #Cassandra @doanduyhai
Fixing the problem
• Tune the RDBMS
• indices
• partitioning
• less joins, simplified relational model
• hardware capacity increased
24
That worked
but …
30. #Cassandra @doanduyhai
Next Challenges
• High Availability (DB failure, site failure …)
• Predictable performance at scale
• Going to multi data-centers
☞ Cassandra, what else ?
30
39. #Cassandra @doanduyhai
Strategy
• 4 phases
• Write contacts to both data stores
• Old contacts migration
• Switch to Cassandra (but keep RDBMS in case of…)
39
40. #Cassandra @doanduyhai
Strategy
• 4 phases
• Write contacts to both data stores
• Old contacts migration
• Switch to Cassandra (but keep RDBMS in case of…)
• Remove the RDBMS code
40
43. #Cassandra @doanduyhai
Migration Phase 2
• On live production, migrate old contacts
43
SQLSQLSQL
C*
C*
C*
C*
C*
For each batch of users
SELECT * FROM contacts
WHERE user_id = …
AND contact_uuid IS NULL
Old contacts created
before phase 1
44. #Cassandra @doanduyhai
Migration Phase 2
• On live production, migrate old contacts
44
SQLSQLSQL
C*
C*
C*
C*
C*
For each batch of users
SELECT * FROM contacts
WHERE user_id = …
AND contact_uuid IS NULL
Logged batches of
INSERT INTO contacts(..)
VALUES(…)
USING TIMESTAMP
now() - 1 week
Old contacts created
before phase 1
48. #Cassandra @doanduyhai
Migration Phase 2
• During data migration …
• … concurrent writes from the migration batch …
• … and updates from production for the same contact
48
49. #Cassandra @doanduyhai
Migration Phase 2
49
contact_uuid
name (now -1 week)
…
name (now)
…
Johny …
Johnny …
Insert from batch
(to the past)
Update from production
51. #Cassandra @doanduyhai
Last Write Win in action
51
Case 1 Case 2
Batchpast(Johny)
t1
Prodnow(Johnny)
t2
t3
Read(Johnny)
Batchpast(Johny)
t1
Prodnow(Johnny)
t2
t3
Read(Johnny)
59. #Cassandra @doanduyhai
Code Inventory
• Written for RDBMS
• Lots of joins (no surprise)
• Designed around transactions
• Spring @Transactional everywhere
59
78. #Cassandra @doanduyhai
Outcome
• 5 months of 2 men work
• Many iterations to fix bugs (thanks to IT)
• Lots of performance benchmarks using Gatling
78
80. #Cassandra @doanduyhai
Outcome
• 5 months of 2 men work
• Many iterations to fix bugs (thanks to IT)
• Lots of performance benchmarks using Gatling
☞ data model & code validation
80
81. #Cassandra @doanduyhai
Outcome
• 5 months of 2 men work
• Many iterations to fix bugs (thanks to IT)
• Lots of performance benchmarks using Gatling
☞ data model & code validation
• … we are almost there for production
81
84. #Cassandra @doanduyhai
Denormalization, the bad
• Updating mutable data can be nightmare
• Data model bound by existing client-facing API
• Update paths very error-prone without tests
84
85. #Cassandra @doanduyhai
Data model in detail
85
Contacts_by_id
Contacts_by_identifiers
Contacts_in_profiles
Contacts_by_modification_date
Contacts_by_firstname_lastname
Contacts_linked_user
86. #Cassandra @doanduyhai
Data model in detail
86
Contacts_by_id
Contacts_by_identifiers
Contacts_in_profiles
Contacts_by_modification_date
Contacts_by_firstname_lastname
Contacts_linked_user
user_id always component
of partition key
89. #Cassandra @doanduyhai
Bloom filters in action
• For some tables, partition key = (user_id, contact_id)
☞ fast look-up, leverages Bloom filters
☞ touches 1 SSTable most of the time
89
90. #Cassandra @doanduyhai
Data model in detail
90
Contacts_by_id
Contacts_by_identifiers
Contacts_in_profiles
Contacts_by_modification_date
Contacts_by_firstname_lastname
Contacts_linked_user
Wide partition
95. #Cassandra @doanduyhai
Data model summary
• 7 tables for denormalization
• Normalize some tables because rare access
• Read-before write in most update scenarios 😟
95
96. #Cassandra @doanduyhai
Notes on contact_id
• In SQL, auto-generated long using sequence
• In Cassandra, auto-generated timeuuid
96
99. #Cassandra @doanduyhai
Notes on contact_id
• How to store both types ?
• As text ? ☞ easy solution …
• … but waste of space !
• because encoded as UTF-8 or ASCII in Cassandra
99
106. #Cassandra @doanduyhai
Notes on contact_id
• ☞ just save contact id as byte[ ]
• Achilles @TypeTransformer for automatic conversion
(see later)
106
107. #Cassandra @doanduyhai
Notes on contact_id
• ☞ just save contact id as byte[ ]
• Achilles @TypeTransformer for automatic conversion
(see later)
• Use blobAsBigInt( ) or blobAsUUID( ) to view data
107
114. #Cassandra @doanduyhai
Achilles
• Are you going to manually generate 56+ prepared
statements for all possible updates ?
• Or just use dynamic plain string statements and get
some perf penalty ?
114
129. #Cassandra @doanduyhai
Achilles
• Dynamic logging in action
129
2014-12-01 14:25:20,554 Bound statement : [INSERT INTO
contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES
(:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM]
2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...]
2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM
contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND
(modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL
[LOCAL_QUORUM]
2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead]
130. #Cassandra @doanduyhai
Achilles
• Dynamic logging
• runtime activation
• no need to recompile/re-deploy
• save us hours of debugging
• TRACE log level ☞ query tracing
130
134. #Cassandra @doanduyhai
Conditions for success
• Data modeling is crucial
• Double-run strategy & timestamp trick FTW
• Data type conversion can be tricky
134
135. #Cassandra @doanduyhai
Conditions for success
• Data modeling is crucial
• Double-run strategy & timestamp trick FTW
• Data type conversion can be tricky
• Benchmark !
135
136. #Cassandra @doanduyhai
Conditions for success
• Data modeling is crucial
• Double-run strategy & timestamp trick FTW
• Data type conversion can be tricky
• Benchmark !
• Mindset shifts for the team
136