SlideShare a Scribd company logo
Billion Records from SQL to Cassandra, lessons learned 
DuyHai Doan Brice Dutheil
Who are we ? 
Brice Dutheil 
Mockito 
Java Track Lead @ Devoxx France 
Independant contractor @ Libon 
(Orange-Vallée) 
DuyHai Doan 
Achilles 
Cassandra Technical Advocate 
Former Java Developer @ Libon 
2 
#CassandraSummit @doanduyhai @BriceDutheil
Agenda 
• Libon context 
• Migration strategy 
• Business code migration 
• Data Modeling 
• Take Away 
3 
#CassandraSummit @doanduyhai @BriceDutheil
Libon Context 
#CassandraSummit @doanduyhai @BriceDutheil
What is Libon ? 
• Messaging app 
• VOIP (out) 
• Custom voicemail & greetings 
• SMS/chat/file transfer 
• Contacts matching 
5 
#CassandraSummit @doanduyhai @BriceDutheil
Contact Matching 
6 
Libon User 
#CassandraSummit @doanduyhai @BriceDutheil
Contact Matching 
Libon User Friend 
7 
#CassandraSummit @doanduyhai @BriceDutheil
Contact Matching 
Libon User Friend 
Contact matching 
8 
#CassandraSummit @doanduyhai @BriceDutheil
Contact Matching 
Libon User Friend 
Accept link 
9 
#CassandraSummit @doanduyhai @BriceDutheil
Project Context 
• Application grew over the years 
10 
#CassandraSummit @doanduyhai @BriceDutheil
Project Context 
• Application grew over the years 
• Already using Cassandra to handle events 
• messaging / file sharing / SMS / notifications 
• Cassandra R/W latencies ≈ 0,4 ms 
• server response time under 10 ms 
11 
#CassandraSummit @doanduyhai @BriceDutheil
Project Context 
• About contacts … 
12 
#CassandraSummit @doanduyhai @BriceDutheil
Project Context 
• About contacts … 
• stored as relational model in RDBMS (Oracle) 
13 
#CassandraSummit @doanduyhai @BriceDutheil
Project Context 
• About contacts … 
• stored as relational model in RDBMS (Oracle) 
• 1 user ≈ 300 contacts 
14 
#CassandraSummit @doanduyhai @BriceDutheil
Project Context 
• About contacts … 
• stored as relational model in RDBMS (Oracle) 
• 1 user ≈ 300 contacts 
• with millions users ☞ billions of contacts to handle 
15 
#CassandraSummit @doanduyhai @BriceDutheil
Project Context 
• About contacts … 
• stored as relational model in RDBMS (Oracle) 
• 1 user ≈ 300 contacts 
• with millions users ☞ billions of contacts to handle 
• query latency unpredictable 
16 
#CassandraSummit @doanduyhai @BriceDutheil
#CassandraSummit 17 @doanduyhai @BriceDutheil
Fixing the problem 
• Tune the RDBMS 
18 
#CassandraSummit @doanduyhai @BriceDutheil
Fixing the problem 
• Tune the RDBMS 
• indices 
19 
#CassandraSummit @doanduyhai @BriceDutheil
Fixing the problem 
• Tune the RDBMS 
• indices 
• partitioning 
20 
#CassandraSummit @doanduyhai @BriceDutheil
Fixing the problem 
• Tune the RDBMS 
• indices 
• partitioning 
• less joins, simplified relational model 
21 
#CassandraSummit @doanduyhai @BriceDutheil
Fixing the problem 
• Tune the RDBMS 
• indices 
• partitioning 
• less joins, simplified relational model 
• hardware capacity increased 
22 
#CassandraSummit @doanduyhai @BriceDutheil
Fixing the problem 
• Tune the RDBMS 
• indices 
• partitioning 
• less joins, simplified relational model 
• hardware capacity increased 
That worked 
23 
#CassandraSummit @doanduyhai @BriceDutheil
Fixing the problem 
• Tune the RDBMS 
• indices 
• partitioning 
• less joins, simplified relational model 
• hardware capacity increased 
That worked 
but … 
24 
#CassandraSummit @doanduyhai @BriceDutheil
Back-end application 
RDBMS Cassandra 
25 
#CassandraSummit @doanduyhai @BriceDutheil
Next Challenges 
• High Availability (DB failure, site failure …) 
26 
#CassandraSummit @doanduyhai @BriceDutheil
Next Challenges 
• High Availability (DB failure, site failure …) 
• Predictable performance at scale 
27 
#CassandraSummit @doanduyhai @BriceDutheil
Next Challenges 
• High Availability (DB failure, site failure …) 
• Predictable performance at scale 
• Going to multi data-centers 
28 
#CassandraSummit @doanduyhai @BriceDutheil
Going for Cassandra 
• Denormalize (if possible …) 
29 
#CassandraSummit @doanduyhai @BriceDutheil
Going for Cassandra 
• Denormalize (if possible …) 
• Know your business ☞ know your queries 
30 
#CassandraSummit @doanduyhai @BriceDutheil
Going for Cassandra 
• Denormalize (if possible …) 
• Know your business ☞ know your queries 
• Linear scaling out 
31 
#CassandraSummit @doanduyhai @BriceDutheil
Going for Cassandra 
• Denormalize (if possible …) 
• Know your business ☞ know your queries 
• Linear scaling out 
• Consistent performance 
32 
#CassandraSummit @doanduyhai @BriceDutheil
Data Migration Strategy 
#CassandraSummit @doanduyhai @BriceDutheil
Objectives 
• No downtime 
34 
#CassandraSummit @doanduyhai @BriceDutheil
Objectives 
• No downtime 
• No concurrency corner-cases 
35 
#CassandraSummit @doanduyhai @BriceDutheil
Objectives 
• No downtime 
• No concurrency corner-cases 
• Safe rollback possible 
36 
#CassandraSummit @doanduyhai @BriceDutheil
Objectives 
• No downtime 
• No concurrency corner-cases 
• Safe rollback possible 
• Replay-ability & resume-ability 
37 
#CassandraSummit @doanduyhai @BriceDutheil
Strategy 
• 3 phases 
38 
#CassandraSummit @doanduyhai @BriceDutheil
Strategy 
• 3 phases 
• Write contacts to both data stores 
39 
#CassandraSummit @doanduyhai @BriceDutheil
Strategy 
• 3 phases 
• Write contacts to both data stores 
• Old contacts migration 
40 
#CassandraSummit @doanduyhai @BriceDutheil
Strategy 
• 3 phases 
• Write contacts to both data stores 
• Old contacts migration 
• Switch to Cassandra … 
• … and deprecate SQL 
41 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 1 
Back end server 
· 
· 
· 
SQSLQ L SQL 
C* 
C* 
C*C* 
C* 
Write 
contactUUID 
42 
contacId(long) + contactUUID 
contactId … contactUUID 
129363 
123e4567- 
e89b-12d3… 
834849 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 1 
Back end server 
· 
· 
· 
SQSLQ L SQL 
C* 
C* 
C*C* 
C* 
Read 
43 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 2 
• On live production, migrate old contacts 
SQSLQ L SQL 
C* 
C* 
C*C* 
C* 
For each batch of users 
SELECT * FROM contacts 
WHERE user_id = … 
AND contact_uuid IS NULL 
44 
Old contacts created 
before phase 1 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 2 
• On live production, migrate old contacts 
SQSLQ L SQL 
C* 
Logged batches of 
INSERT INTO contacts(..) 
VALUES(…) 
USING TIMESTAMP 
now() - 1 week 
C* 
C*C* 
C* 
For each batch of users 
SELECT * FROM contacts 
WHERE user_id = … 
AND contact_uuid IS NULL 
45 
Old contacts created 
before phase 1 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 2 
USING TIMESTAMP now() - 1 week  
46 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 2 
• During data migration … 
47 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 2 
• During data migration … 
• … concurrent writes from the migration batch … 
48 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 2 
• During data migration … 
• … concurrent writes from the migration batch … 
• … and updates from production for the same contact 
49 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 2 
Update from production 
Insert from batch 
(to the past) 
contact_uuid 
name (now -1 week) 
… 
name (now) 
… 
Johny … 
Johnny … 
50 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 2 
Future reads pick the most up-to-date value 
contact_uuid 
name (now -1 week) 
… 
name (now) 
… 
Johny … 
Johnny … 
51 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 2 
"Write to the Past… 
to save the Future" 
Libon – 2014/10/08 
52 
#CassandraSummit @doanduyhai @BriceDutheil
Migration Phase 3 
Back end server 
· 
· 
· 
❌ 
SQSLQ L SQL 
C* 
C* 
C*C* 
C* 
Write 
53 
#CassandraSummit @doanduyhai @BriceDutheil
Business Code Refactoring 
#CassandraSummit @doanduyhai @BriceDutheil
Code Inventory 
• Written for RDBMS 
55 
#CassandraSummit @doanduyhai @BriceDutheil
Code Inventory 
• Written for RDBMS 
• Lots of joins (no surprise) 
56 
#CassandraSummit @doanduyhai @BriceDutheil
Code Inventory 
• Written for RDBMS 
• Lots of joins (no surprise) 
• Designed around transactions 
57 
#CassandraSummit @doanduyhai @BriceDutheil
Code Inventory 
• Written for RDBMS 
• Lots of joins (no surprise) 
• Designed around transactions 
• Spring @Transactional everywhere 
58 
#CassandraSummit @doanduyhai @BriceDutheil
Code Inventory cont. 
• Entities go through Services & Repositories 
Serv ices 
ContactEntity 
Repositories 
59 
#CassandraSummit @doanduyhai @BriceDutheil
Code Inventory cont. 
• Hibernate is auto-magic 
60 
#CassandraSummit @doanduyhai @BriceDutheil
Code Inventory cont. 
• Hibernate is auto-magic 
• lazy loading 
• 1st level cache 
• N+1 select 
61 
Serv ices 
ContactEntity 
Repositories 
#CassandraSummit @doanduyhai @BriceDutheil
Which options ? 
• Throw existing code … 
• … and re-design from scratch for Cassandra 
62 
#CassandraSummit @doanduyhai @BriceDutheil
Which options ? 
• Throw existing code … 
• … and re-design from scratch for Cassandra 
No way ! 
63 
#CassandraSummit @doanduyhai @BriceDutheil
Code Quality 
• Existing business code has… 
• … ≈ 3500 unit tests 
64 
#CassandraSummit @doanduyhai @BriceDutheil
Code Quality 
• Existing business code has… 
• … ≈ 3500 unit tests 
• and ≈600+ integration tests 
65 
#CassandraSummit @doanduyhai @BriceDutheil
Code Quality 
• We are TDD aficionados … 
66 
#CassandraSummit @doanduyhai @BriceDutheil
Code Quality 
• We are TDD aficionados … 
• … and we love our code coverage 
67 
#CassandraSummit @doanduyhai @BriceDutheil
Code Quality 
"The code coverage 
is one of your most 
valuable technical asset" 
Libon – since beginning 
68 
#CassandraSummit @doanduyhai @BriceDutheil
Refactoring Strategy 
Services 
ContactMatchingServicContactSyncContactService 
e 
Repositories 
ContactEntity 
n 
1 
n 
n 
69 
#CassandraSummit @doanduyhai @BriceDutheil
Refactoring Strategy 
Services 
Repositories 
ContactMatchingServicContactServicee 
Proxy 
ContactNoSQLEntity 
ContactSync 
ContactEntity 
n 
1 
n 
n 
70 
#CassandraSummit @doanduyhai @BriceDutheil
Refactoring Strategy 
Services 
Repositories 
ContactMatchingServicContactServicee 
Proxy 
ContactNoSQLEntity 
ContactSync 
ContactEntity 
n 
1 
n 
n 
Denorm1Denorm2 
… 
DenormN 
71 
#CassandraSummit @doanduyhai @BriceDutheil
Refactoring Strategy 
• Use CQRS 
• ContactReadRepository 
• ContactWriteRepository 
• ContactUpdateRepository 
• ContactDeleteRepository 
72 
#CassandraSummit @doanduyhai @BriceDutheil
Refactoring Strategy 
• ContactReadRepository 
• direct sequential read 
• no joins 
• 1 read ≈ 1 SELECT 
73 
#CassandraSummit @doanduyhai @BriceDutheil
Refactoring Strategy 
• ContactWriteRepository 
• write to all denormalized tables 
• using CQL logged batches 
• use TTLs 
74 
#CassandraSummit @doanduyhai @BriceDutheil
Refactoring Strategy 
• ContactUpdateRepository 
• read-before-write most of the time  
• rare updates ☞ acceptable perf penalty 
75 
#CassandraSummit @doanduyhai @BriceDutheil
Refactoring Strategy 
• ContactDeleteRepository 
• delete 
• update contact modification date 
76 
#CassandraSummit @doanduyhai @BriceDutheil
Outcome 
• 5 months of 2 men work 
77 
#CassandraSummit @doanduyhai @BriceDutheil
Outcome 
• 5 months of 2 men work 
• Many iterations to fix bugs (thanks to IT) 
78 
#CassandraSummit @doanduyhai @BriceDutheil
Outcome 
• 5 months of 2 men work 
• Many iterations to fix bugs (thanks to IT) 
• Lots of performance benchmarks using Gatling 
79 
#CassandraSummit @doanduyhai @BriceDutheil
Gatling Output 
80 
#CassandraSummit @doanduyhai @BriceDutheil
Outcome 
• 5 months of 2 men work 
• Many iterations to fix bugs (thanks to IT) 
• Lots of performance benchmarks using Gatling 
☞ data model & code validation 
81 
#CassandraSummit @doanduyhai @BriceDutheil
Outcome 
• 5 months of 2 men work 
• Many iterations to fix bugs (thanks to IT) 
• Lots of performance benchmarks using Gatling 
☞ data model & code validation 
• … we are almost there for production 
82 
#CassandraSummit @doanduyhai @BriceDutheil
Data Model 
#CassandraSummit @doanduyhai @BriceDutheil
Denormalization, the good 
• Support fast reads 
• 1 read ≈ 1 SELECT 
• Worthy because mostly read, few updates 
84 
#CassandraSummit @doanduyhai @BriceDutheil
Denormalization, the bad 
• Updating mutable data can be nightmare 
• Data model bound by existing client-facing API 
• Update paths very error-prone without tests 
85 
#CassandraSummit @doanduyhai @BriceDutheil
Data model in detail 
Contacts_by_identifiers 
Contacts_by_id 
Contacts_in_profiles 
Contacts_by_modification_date 
Contacts_linked_user 
Contacts_by_firstname_lastname 
86 
#CassandraSummit @doanduyhai @BriceDutheil
Data model in detail 
Contacts_by_identifiers 
user_id always component 
of partition key 
Contacts_by_id 
Contacts_in_profiles 
Contacts_by_modification_date 
Contacts_linked_user 
Contacts_by_firstname_lastname 
87 
#CassandraSummit @doanduyhai @BriceDutheil
Scalable design 
C 
n3 
G 
n7 
88 
A 
n1 
B 
n2 
D 
n4 
E 
n5 
F 
n6 
H 
n8 
user_id1 
user_id2 
user_id3 
user_id4 
user_id5 
#CassandraSummit @doanduyhai @BriceDutheil
Scalable design 
C 
n3 
user_id5 
user_id2 user_id1 
G 
n7 
89 
A 
n1 
B 
n2 
D 
n4 
E 
n5 
F 
n6 
H 
n8 
user_id3 
user_id4 
#CassandraSummit @doanduyhai @BriceDutheil
Bloom filters in action 
• For some tables, partition key = (user_id, contact_id) 
☞ fast look-up, leverages Bloom filters 
☞ touches 1 SSTable most of the time 
90 
#CassandraSummit @doanduyhai @BriceDutheil
Data model in detail 
Contacts_by_identifiers 
Contacts_by_id 
Contacts_in_profiles 
Contacts_by_modification_date 
Contacts_linked_user 
Wide partition 
Bucketed 
Contacts_by_firstname_lastname 
91 
#CassandraSummit @doanduyhai @BriceDutheil
A "queue" story 
• contacts_by_modification_date 
• queue-like pattern  
92 
#CassandraSummit @doanduyhai @BriceDutheil
A "queue" story 
• contacts_by_modification_date 
• queue-like pattern  
☞ buckets to the rescue 
date11 
date12 … 
… … 
… 
… 
date35 
date12 … 
… 
… … 
… 
… 
93 
user_id:2014-11 
user_id:2014-12 
… 
date34 
date47 
#CassandraSummit @doanduyhai @BriceDutheil
Data model summary 
• 7 tables for denormalization 
94 
#CassandraSummit @doanduyhai @BriceDutheil
Data model summary 
• 7 tables for denormalization 
• Normalize some tables because rare access 
95 
#CassandraSummit @doanduyhai @BriceDutheil
Data model summary 
• 7 tables for denormalization 
• Normalize some tables because rare access 
• Read-before write in most update scenarios  
96 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• In SQL, auto-generated long using sequence 
• In Cassandra, auto-generated timeuuid 
97 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• How to store both types ? 
98 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• How to store both types ? 
• As text ? ☞ easy solution … 
99 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• How to store both types ? 
• As text ? ☞ easy solution … 
• … but waste of space ! 
• because encoded as UTF-8 or ASCII in Cassandra 
100 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• Long ☞ 8 bytes 
• Long as text(UTF-8: 1 byte) ☞ "digits count" bytes 
101 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• UUID ☞ 16 bytes 
• 32 hex chars + 4 hyphens = 36 chars 
• UUID as text(UTF-8: 1 byte) ☞ 36 bytes 
• Bytes overhead = 36 – 16 = 20 bytes 
102 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• 20 bytes wasted per contact uuid 
103 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• 20 bytes wasted per contact uuid 
• × 7 denormalizations = 140 bytes per contact uuid 
104 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• 20 bytes wasted per contact uuid 
• × 7 denormalizations = 140 bytes per contact uuid 
• × 109 contacts = 140 GB wasted 
 
not even counting replication factor … 
105 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• ☞ just save contact id as byte[ ] 
106 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• ☞ just save contact id as byte[ ] 
• Achilles @TypeTransformer for automatic conversion 
(see later) 
107 
#CassandraSummit @doanduyhai @BriceDutheil
Notes on contact_id 
• ☞ just save contact id as byte[ ] 
• Achilles @TypeTransformer for automatic conversion 
(see later) 
• Use blobAsBigInt( ) or blobAsUUID( ) to view data 
108 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Advanced "object mapper" 
• Fluent API 
• Tons of features 
• TDD friendly 
109 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Dirty checking, why is it important ? 
110 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Dirty checking, why is it important ? 
• 1 contact ≈ 8 mutable fields 
111 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Dirty checking, why is it important ? 
• 1 contact ≈ 8 mutable fields 
• × 7 denormalizations = 56 update combinations … 
112 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Dirty checking, why is it important ? 
• 1 contact ≈ 8 mutable fields 
• × 7 denormalizations = 56 update combinations … 
• and not even counting multiple fields updates … 
113 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Are you going to manually generate 56+ prepared 
statements for all possible updates ? 
114 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Are you going to manually generate 56+ prepared 
statements for all possible updates ? 
• Or just use dynamic plain string statements and get 
some perf penalty ? 
115 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Dirty check in action 
//No read-before-write 
ContactEntity proxy = manager.forUpdate(ContactEntity.class, contactId); 
proxy.setFirstName(…); 
proxy.setLastName(…); //type-safe updates 
proxy.setAddress(…); 
manager.update(proxy); 
116 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
Proxy Setters interception 
DirtyMap 
Empty 
Entity 
PrimaryKey 
117 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Dynamic statements generation 
UPDATE contacts SET firstname=?, lastname=?,address=? 
WHERE contact_id=? 
prepared statements are cached, of course 
118 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Insert strategy, what is it ? 
119 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Simple INSERT prepared statement 
INSERT INTO 
contacts(contact_id,name,age,address,gender,avatar,…) 
VALUES(?, ?, ?, ? … ?); 
120 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Runtime values binding 
• some columns are optional 
preparedStatement.bind(49374,’John DOE’,33, null, null, …, null); 
121 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
Wait … are you saying inserting null in CQL??? 
 
122 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
Inserting null ≡ creating tombstones 
123 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
Inserting null ≡ creating tombstones 
× 7 denormalizations 
124 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
Inserting null ≡ creating tombstones 
× 7 denormalizations 
× billions of contacts created 
 
not even counting replication factor … 
125 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Simple annotation 
@Entity(table = "contacts_by_id ») 
@Strategy(insert = InsertStrategy.NOT_NULL_FIELDS) 
public class ContactById { 
} 
126 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Runtime dynamic INSERT statement 
INSERT INTO 
contacts(contact_id, name, age, address,) 
VALUES(:contact_id, :name, :age, :address); 
prepared statements are cached, of course 
127 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Remember the contactId ⇄ byte[ ] conversion ? 
@PartitionKey 
@Column(name = "contact_id") 
@TypeTransformer(valueCodecClass = ContactIdToBytes.class) 
private ContactId contactId; 
BYOC ☞ Bring Your Own Codec 
128 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
public interface Codec<FROM, TO> { 
Class<FROM> sourceType(); 
Class<TO> targetType(); 
TO encode(FROM fromJava) 
FROM decode(TO fromCassandra); 
} 
129 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Dynamic logging in action 
2014-12-01 14:25:20,554 Bound statement : [INSERT INTO 
contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES 
(:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 
2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...] 
2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM 
contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND 
(modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL 
[LOCAL_QUORUM] 
2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead] 
130 
#CassandraSummit @doanduyhai @BriceDutheil
Achilles 
• Dynamic logging 
• runtime activation 
• no need to recompile/re-deploy 
• save us hours of debugging 
• TRACE log level ☞ query tracing 
131 
#CassandraSummit @doanduyhai @BriceDutheil
Take Away 
#CassandraSummit @doanduyhai @BriceDutheil
Conditions for success 
• Data modeling is crucial 
133 
#CassandraSummit @doanduyhai @BriceDutheil
Conditions for success 
• Data modeling is crucial 
• Double-run strategy & timestamp trick FTW 
134 
#CassandraSummit @doanduyhai @BriceDutheil
Conditions for success 
• Data modeling is crucial 
• Double-run strategy & timestamp trick FTW 
• Data type conversion can be tricky 
135 
#CassandraSummit @doanduyhai @BriceDutheil
Conditions for success 
• Data modeling is crucial 
• Double-run strategy & timestamp trick FTW 
• Data type conversion can be tricky 
• Benchmark ! 
136 
#CassandraSummit @doanduyhai @BriceDutheil
Conditions for success 
• Data modeling is crucial 
• Double-run strategy & timestamp trick FTW 
• Data type conversion can be tricky 
• Benchmark ! 
• Mindset shifts for the team 
137 
#CassandraSummit @doanduyhai @BriceDutheil
Thank You

More Related Content

Viewers also liked

C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
 
Cassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUGCassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUGDuyhai Doan
 
KillrChat Data Modeling
KillrChat Data ModelingKillrChat Data Modeling
KillrChat Data ModelingDuyhai Doan
 
Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Duyhai Doan
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016Duyhai Doan
 
Introduction to KillrChat
Introduction to KillrChatIntroduction to KillrChat
Introduction to KillrChatDuyhai Doan
 
Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016Duyhai Doan
 
Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and librariesDuyhai Doan
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGDuyhai Doan
 
Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGDuyhai Doan
 
KillrChat presentation
KillrChat presentationKillrChat presentation
KillrChat presentationDuyhai Doan
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandraDuyhai Doan
 
Spark cassandra integration 2016
Spark cassandra integration 2016Spark cassandra integration 2016
Spark cassandra integration 2016Duyhai Doan
 
Cassandra introduction at FinishJUG
Cassandra introduction at FinishJUGCassandra introduction at FinishJUG
Cassandra introduction at FinishJUGDuyhai Doan
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceDuyhai Doan
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisDuyhai Doan
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemDuyhai Doan
 
Datastax enterprise presentation
Datastax enterprise presentationDatastax enterprise presentation
Datastax enterprise presentationDuyhai Doan
 

Viewers also liked (20)

C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
Cassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUGCassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUG
 
KillrChat Data Modeling
KillrChat Data ModelingKillrChat Data Modeling
KillrChat Data Modeling
 
Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Introduction to KillrChat
Introduction to KillrChatIntroduction to KillrChat
Introduction to KillrChat
 
Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016
 
Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and libraries
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUG
 
KillrChat presentation
KillrChat presentationKillrChat presentation
KillrChat presentation
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandra
 
Spark cassandra integration 2016
Spark cassandra integration 2016Spark cassandra integration 2016
Spark cassandra integration 2016
 
Cassandra introduction at FinishJUG
Cassandra introduction at FinishJUGCassandra introduction at FinishJUG
Cassandra introduction at FinishJUG
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practice
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
 
Datastax enterprise presentation
Datastax enterprise presentationDatastax enterprise presentation
Datastax enterprise presentation
 

Similar to Libon cassandra summiteu2014

Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
 
From rdbms to cassandra without a hitch
From rdbms to cassandra without a hitchFrom rdbms to cassandra without a hitch
From rdbms to cassandra without a hitchDuyhai Doan
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsDuyhai Doan
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisDuyhai Doan
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithmsDuyhai Doan
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Cassandra for the ops dos and donts
Cassandra for the ops   dos and dontsCassandra for the ops   dos and donts
Cassandra for the ops dos and dontsDuyhai Doan
 
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...NoSQLmatters
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplDuyhai Doan
 
Laravel and Django and Rails, Oh My!
Laravel and Django and Rails, Oh My!Laravel and Django and Rails, Oh My!
Laravel and Django and Rails, Oh My!Chris Roberts
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysDuyhai Doan
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupChristopher Batey
 
Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Nenad Bozic
 
A pattern language for microservices (#gluecon #gluecon2016)
A pattern language for microservices (#gluecon #gluecon2016)A pattern language for microservices (#gluecon #gluecon2016)
A pattern language for microservices (#gluecon #gluecon2016)Chris Richardson
 
CSS3: Ripe and Ready to Respond
CSS3: Ripe and Ready to RespondCSS3: Ripe and Ready to Respond
CSS3: Ripe and Ready to RespondDenise Jacobs
 
Chris Lea - What does NoSQL Mean for You
Chris Lea - What does NoSQL Mean for YouChris Lea - What does NoSQL Mean for You
Chris Lea - What does NoSQL Mean for YouCarsonified Team
 
Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Chris Richardson
 
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Amste...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Amste...SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Amste...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Amste...Codemotion
 
Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsEDB
 

Similar to Libon cassandra summiteu2014 (20)

Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
 
From rdbms to cassandra without a hitch
From rdbms to cassandra without a hitchFrom rdbms to cassandra without a hitch
From rdbms to cassandra without a hitch
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS Paris
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Cassandra for the ops dos and donts
Cassandra for the ops   dos and dontsCassandra for the ops   dos and donts
Cassandra for the ops dos and donts
 
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
 
Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
 
Laravel and Django and Rails, Oh My!
Laravel and Django and Rails, Oh My!Laravel and Django and Rails, Oh My!
Laravel and Django and Rails, Oh My!
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev days
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester Meetup
 
Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)
 
A pattern language for microservices (#gluecon #gluecon2016)
A pattern language for microservices (#gluecon #gluecon2016)A pattern language for microservices (#gluecon #gluecon2016)
A pattern language for microservices (#gluecon #gluecon2016)
 
CSS3: Ripe and Ready to Respond
CSS3: Ripe and Ready to RespondCSS3: Ripe and Ready to Respond
CSS3: Ripe and Ready to Respond
 
Chris Lea - What does NoSQL Mean for You
Chris Lea - What does NoSQL Mean for YouChris Lea - What does NoSQL Mean for You
Chris Lea - What does NoSQL Mean for You
 
Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...
 
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Amste...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Amste...SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Amste...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Amste...
 
Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data Models
 

More from Duyhai Doan

Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Duyhai Doan
 
Le futur d'apache cassandra
Le futur d'apache cassandraLe futur d'apache cassandra
Le futur d'apache cassandraDuyhai Doan
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDuyhai Doan
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronDuyhai Doan
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search rideDuyhai Doan
 
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Duyhai Doan
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016Duyhai Doan
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016Duyhai Doan
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemDuyhai Doan
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDuyhai Doan
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Duyhai Doan
 

More from Duyhai Doan (12)

Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
 
Le futur d'apache cassandra
Le futur d'apache cassandraLe futur d'apache cassandra
Le futur d'apache cassandra
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search ride
 
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystem
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeCon
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015
 

Recently uploaded

Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 

Recently uploaded (20)

Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

Libon cassandra summiteu2014

  • 1. Billion Records from SQL to Cassandra, lessons learned DuyHai Doan Brice Dutheil
  • 2. Who are we ? Brice Dutheil Mockito Java Track Lead @ Devoxx France Independant contractor @ Libon (Orange-Vallée) DuyHai Doan Achilles Cassandra Technical Advocate Former Java Developer @ Libon 2 #CassandraSummit @doanduyhai @BriceDutheil
  • 3. Agenda • Libon context • Migration strategy • Business code migration • Data Modeling • Take Away 3 #CassandraSummit @doanduyhai @BriceDutheil
  • 4. Libon Context #CassandraSummit @doanduyhai @BriceDutheil
  • 5. What is Libon ? • Messaging app • VOIP (out) • Custom voicemail & greetings • SMS/chat/file transfer • Contacts matching 5 #CassandraSummit @doanduyhai @BriceDutheil
  • 6. Contact Matching 6 Libon User #CassandraSummit @doanduyhai @BriceDutheil
  • 7. Contact Matching Libon User Friend 7 #CassandraSummit @doanduyhai @BriceDutheil
  • 8. Contact Matching Libon User Friend Contact matching 8 #CassandraSummit @doanduyhai @BriceDutheil
  • 9. Contact Matching Libon User Friend Accept link 9 #CassandraSummit @doanduyhai @BriceDutheil
  • 10. Project Context • Application grew over the years 10 #CassandraSummit @doanduyhai @BriceDutheil
  • 11. Project Context • Application grew over the years • Already using Cassandra to handle events • messaging / file sharing / SMS / notifications • Cassandra R/W latencies ≈ 0,4 ms • server response time under 10 ms 11 #CassandraSummit @doanduyhai @BriceDutheil
  • 12. Project Context • About contacts … 12 #CassandraSummit @doanduyhai @BriceDutheil
  • 13. Project Context • About contacts … • stored as relational model in RDBMS (Oracle) 13 #CassandraSummit @doanduyhai @BriceDutheil
  • 14. Project Context • About contacts … • stored as relational model in RDBMS (Oracle) • 1 user ≈ 300 contacts 14 #CassandraSummit @doanduyhai @BriceDutheil
  • 15. Project Context • About contacts … • stored as relational model in RDBMS (Oracle) • 1 user ≈ 300 contacts • with millions users ☞ billions of contacts to handle 15 #CassandraSummit @doanduyhai @BriceDutheil
  • 16. Project Context • About contacts … • stored as relational model in RDBMS (Oracle) • 1 user ≈ 300 contacts • with millions users ☞ billions of contacts to handle • query latency unpredictable 16 #CassandraSummit @doanduyhai @BriceDutheil
  • 18. Fixing the problem • Tune the RDBMS 18 #CassandraSummit @doanduyhai @BriceDutheil
  • 19. Fixing the problem • Tune the RDBMS • indices 19 #CassandraSummit @doanduyhai @BriceDutheil
  • 20. Fixing the problem • Tune the RDBMS • indices • partitioning 20 #CassandraSummit @doanduyhai @BriceDutheil
  • 21. Fixing the problem • Tune the RDBMS • indices • partitioning • less joins, simplified relational model 21 #CassandraSummit @doanduyhai @BriceDutheil
  • 22. Fixing the problem • Tune the RDBMS • indices • partitioning • less joins, simplified relational model • hardware capacity increased 22 #CassandraSummit @doanduyhai @BriceDutheil
  • 23. Fixing the problem • Tune the RDBMS • indices • partitioning • less joins, simplified relational model • hardware capacity increased That worked 23 #CassandraSummit @doanduyhai @BriceDutheil
  • 24. Fixing the problem • Tune the RDBMS • indices • partitioning • less joins, simplified relational model • hardware capacity increased That worked but … 24 #CassandraSummit @doanduyhai @BriceDutheil
  • 25. Back-end application RDBMS Cassandra 25 #CassandraSummit @doanduyhai @BriceDutheil
  • 26. Next Challenges • High Availability (DB failure, site failure …) 26 #CassandraSummit @doanduyhai @BriceDutheil
  • 27. Next Challenges • High Availability (DB failure, site failure …) • Predictable performance at scale 27 #CassandraSummit @doanduyhai @BriceDutheil
  • 28. Next Challenges • High Availability (DB failure, site failure …) • Predictable performance at scale • Going to multi data-centers 28 #CassandraSummit @doanduyhai @BriceDutheil
  • 29. Going for Cassandra • Denormalize (if possible …) 29 #CassandraSummit @doanduyhai @BriceDutheil
  • 30. Going for Cassandra • Denormalize (if possible …) • Know your business ☞ know your queries 30 #CassandraSummit @doanduyhai @BriceDutheil
  • 31. Going for Cassandra • Denormalize (if possible …) • Know your business ☞ know your queries • Linear scaling out 31 #CassandraSummit @doanduyhai @BriceDutheil
  • 32. Going for Cassandra • Denormalize (if possible …) • Know your business ☞ know your queries • Linear scaling out • Consistent performance 32 #CassandraSummit @doanduyhai @BriceDutheil
  • 33. Data Migration Strategy #CassandraSummit @doanduyhai @BriceDutheil
  • 34. Objectives • No downtime 34 #CassandraSummit @doanduyhai @BriceDutheil
  • 35. Objectives • No downtime • No concurrency corner-cases 35 #CassandraSummit @doanduyhai @BriceDutheil
  • 36. Objectives • No downtime • No concurrency corner-cases • Safe rollback possible 36 #CassandraSummit @doanduyhai @BriceDutheil
  • 37. Objectives • No downtime • No concurrency corner-cases • Safe rollback possible • Replay-ability & resume-ability 37 #CassandraSummit @doanduyhai @BriceDutheil
  • 38. Strategy • 3 phases 38 #CassandraSummit @doanduyhai @BriceDutheil
  • 39. Strategy • 3 phases • Write contacts to both data stores 39 #CassandraSummit @doanduyhai @BriceDutheil
  • 40. Strategy • 3 phases • Write contacts to both data stores • Old contacts migration 40 #CassandraSummit @doanduyhai @BriceDutheil
  • 41. Strategy • 3 phases • Write contacts to both data stores • Old contacts migration • Switch to Cassandra … • … and deprecate SQL 41 #CassandraSummit @doanduyhai @BriceDutheil
  • 42. Migration Phase 1 Back end server · · · SQSLQ L SQL C* C* C*C* C* Write contactUUID 42 contacId(long) + contactUUID contactId … contactUUID 129363 123e4567- e89b-12d3… 834849 #CassandraSummit @doanduyhai @BriceDutheil
  • 43. Migration Phase 1 Back end server · · · SQSLQ L SQL C* C* C*C* C* Read 43 #CassandraSummit @doanduyhai @BriceDutheil
  • 44. Migration Phase 2 • On live production, migrate old contacts SQSLQ L SQL C* C* C*C* C* For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL 44 Old contacts created before phase 1 #CassandraSummit @doanduyhai @BriceDutheil
  • 45. Migration Phase 2 • On live production, migrate old contacts SQSLQ L SQL C* Logged batches of INSERT INTO contacts(..) VALUES(…) USING TIMESTAMP now() - 1 week C* C*C* C* For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL 45 Old contacts created before phase 1 #CassandraSummit @doanduyhai @BriceDutheil
  • 46. Migration Phase 2 USING TIMESTAMP now() - 1 week  46 #CassandraSummit @doanduyhai @BriceDutheil
  • 47. Migration Phase 2 • During data migration … 47 #CassandraSummit @doanduyhai @BriceDutheil
  • 48. Migration Phase 2 • During data migration … • … concurrent writes from the migration batch … 48 #CassandraSummit @doanduyhai @BriceDutheil
  • 49. Migration Phase 2 • During data migration … • … concurrent writes from the migration batch … • … and updates from production for the same contact 49 #CassandraSummit @doanduyhai @BriceDutheil
  • 50. Migration Phase 2 Update from production Insert from batch (to the past) contact_uuid name (now -1 week) … name (now) … Johny … Johnny … 50 #CassandraSummit @doanduyhai @BriceDutheil
  • 51. Migration Phase 2 Future reads pick the most up-to-date value contact_uuid name (now -1 week) … name (now) … Johny … Johnny … 51 #CassandraSummit @doanduyhai @BriceDutheil
  • 52. Migration Phase 2 "Write to the Past… to save the Future" Libon – 2014/10/08 52 #CassandraSummit @doanduyhai @BriceDutheil
  • 53. Migration Phase 3 Back end server · · · ❌ SQSLQ L SQL C* C* C*C* C* Write 53 #CassandraSummit @doanduyhai @BriceDutheil
  • 54. Business Code Refactoring #CassandraSummit @doanduyhai @BriceDutheil
  • 55. Code Inventory • Written for RDBMS 55 #CassandraSummit @doanduyhai @BriceDutheil
  • 56. Code Inventory • Written for RDBMS • Lots of joins (no surprise) 56 #CassandraSummit @doanduyhai @BriceDutheil
  • 57. Code Inventory • Written for RDBMS • Lots of joins (no surprise) • Designed around transactions 57 #CassandraSummit @doanduyhai @BriceDutheil
  • 58. Code Inventory • Written for RDBMS • Lots of joins (no surprise) • Designed around transactions • Spring @Transactional everywhere 58 #CassandraSummit @doanduyhai @BriceDutheil
  • 59. Code Inventory cont. • Entities go through Services & Repositories Serv ices ContactEntity Repositories 59 #CassandraSummit @doanduyhai @BriceDutheil
  • 60. Code Inventory cont. • Hibernate is auto-magic 60 #CassandraSummit @doanduyhai @BriceDutheil
  • 61. Code Inventory cont. • Hibernate is auto-magic • lazy loading • 1st level cache • N+1 select 61 Serv ices ContactEntity Repositories #CassandraSummit @doanduyhai @BriceDutheil
  • 62. Which options ? • Throw existing code … • … and re-design from scratch for Cassandra 62 #CassandraSummit @doanduyhai @BriceDutheil
  • 63. Which options ? • Throw existing code … • … and re-design from scratch for Cassandra No way ! 63 #CassandraSummit @doanduyhai @BriceDutheil
  • 64. Code Quality • Existing business code has… • … ≈ 3500 unit tests 64 #CassandraSummit @doanduyhai @BriceDutheil
  • 65. Code Quality • Existing business code has… • … ≈ 3500 unit tests • and ≈600+ integration tests 65 #CassandraSummit @doanduyhai @BriceDutheil
  • 66. Code Quality • We are TDD aficionados … 66 #CassandraSummit @doanduyhai @BriceDutheil
  • 67. Code Quality • We are TDD aficionados … • … and we love our code coverage 67 #CassandraSummit @doanduyhai @BriceDutheil
  • 68. Code Quality "The code coverage is one of your most valuable technical asset" Libon – since beginning 68 #CassandraSummit @doanduyhai @BriceDutheil
  • 69. Refactoring Strategy Services ContactMatchingServicContactSyncContactService e Repositories ContactEntity n 1 n n 69 #CassandraSummit @doanduyhai @BriceDutheil
  • 70. Refactoring Strategy Services Repositories ContactMatchingServicContactServicee Proxy ContactNoSQLEntity ContactSync ContactEntity n 1 n n 70 #CassandraSummit @doanduyhai @BriceDutheil
  • 71. Refactoring Strategy Services Repositories ContactMatchingServicContactServicee Proxy ContactNoSQLEntity ContactSync ContactEntity n 1 n n Denorm1Denorm2 … DenormN 71 #CassandraSummit @doanduyhai @BriceDutheil
  • 72. Refactoring Strategy • Use CQRS • ContactReadRepository • ContactWriteRepository • ContactUpdateRepository • ContactDeleteRepository 72 #CassandraSummit @doanduyhai @BriceDutheil
  • 73. Refactoring Strategy • ContactReadRepository • direct sequential read • no joins • 1 read ≈ 1 SELECT 73 #CassandraSummit @doanduyhai @BriceDutheil
  • 74. Refactoring Strategy • ContactWriteRepository • write to all denormalized tables • using CQL logged batches • use TTLs 74 #CassandraSummit @doanduyhai @BriceDutheil
  • 75. Refactoring Strategy • ContactUpdateRepository • read-before-write most of the time  • rare updates ☞ acceptable perf penalty 75 #CassandraSummit @doanduyhai @BriceDutheil
  • 76. Refactoring Strategy • ContactDeleteRepository • delete • update contact modification date 76 #CassandraSummit @doanduyhai @BriceDutheil
  • 77. Outcome • 5 months of 2 men work 77 #CassandraSummit @doanduyhai @BriceDutheil
  • 78. Outcome • 5 months of 2 men work • Many iterations to fix bugs (thanks to IT) 78 #CassandraSummit @doanduyhai @BriceDutheil
  • 79. Outcome • 5 months of 2 men work • Many iterations to fix bugs (thanks to IT) • Lots of performance benchmarks using Gatling 79 #CassandraSummit @doanduyhai @BriceDutheil
  • 80. Gatling Output 80 #CassandraSummit @doanduyhai @BriceDutheil
  • 81. Outcome • 5 months of 2 men work • Many iterations to fix bugs (thanks to IT) • Lots of performance benchmarks using Gatling ☞ data model & code validation 81 #CassandraSummit @doanduyhai @BriceDutheil
  • 82. Outcome • 5 months of 2 men work • Many iterations to fix bugs (thanks to IT) • Lots of performance benchmarks using Gatling ☞ data model & code validation • … we are almost there for production 82 #CassandraSummit @doanduyhai @BriceDutheil
  • 83. Data Model #CassandraSummit @doanduyhai @BriceDutheil
  • 84. Denormalization, the good • Support fast reads • 1 read ≈ 1 SELECT • Worthy because mostly read, few updates 84 #CassandraSummit @doanduyhai @BriceDutheil
  • 85. Denormalization, the bad • Updating mutable data can be nightmare • Data model bound by existing client-facing API • Update paths very error-prone without tests 85 #CassandraSummit @doanduyhai @BriceDutheil
  • 86. Data model in detail Contacts_by_identifiers Contacts_by_id Contacts_in_profiles Contacts_by_modification_date Contacts_linked_user Contacts_by_firstname_lastname 86 #CassandraSummit @doanduyhai @BriceDutheil
  • 87. Data model in detail Contacts_by_identifiers user_id always component of partition key Contacts_by_id Contacts_in_profiles Contacts_by_modification_date Contacts_linked_user Contacts_by_firstname_lastname 87 #CassandraSummit @doanduyhai @BriceDutheil
  • 88. Scalable design C n3 G n7 88 A n1 B n2 D n4 E n5 F n6 H n8 user_id1 user_id2 user_id3 user_id4 user_id5 #CassandraSummit @doanduyhai @BriceDutheil
  • 89. Scalable design C n3 user_id5 user_id2 user_id1 G n7 89 A n1 B n2 D n4 E n5 F n6 H n8 user_id3 user_id4 #CassandraSummit @doanduyhai @BriceDutheil
  • 90. Bloom filters in action • For some tables, partition key = (user_id, contact_id) ☞ fast look-up, leverages Bloom filters ☞ touches 1 SSTable most of the time 90 #CassandraSummit @doanduyhai @BriceDutheil
  • 91. Data model in detail Contacts_by_identifiers Contacts_by_id Contacts_in_profiles Contacts_by_modification_date Contacts_linked_user Wide partition Bucketed Contacts_by_firstname_lastname 91 #CassandraSummit @doanduyhai @BriceDutheil
  • 92. A "queue" story • contacts_by_modification_date • queue-like pattern  92 #CassandraSummit @doanduyhai @BriceDutheil
  • 93. A "queue" story • contacts_by_modification_date • queue-like pattern  ☞ buckets to the rescue date11 date12 … … … … … date35 date12 … … … … … … 93 user_id:2014-11 user_id:2014-12 … date34 date47 #CassandraSummit @doanduyhai @BriceDutheil
  • 94. Data model summary • 7 tables for denormalization 94 #CassandraSummit @doanduyhai @BriceDutheil
  • 95. Data model summary • 7 tables for denormalization • Normalize some tables because rare access 95 #CassandraSummit @doanduyhai @BriceDutheil
  • 96. Data model summary • 7 tables for denormalization • Normalize some tables because rare access • Read-before write in most update scenarios  96 #CassandraSummit @doanduyhai @BriceDutheil
  • 97. Notes on contact_id • In SQL, auto-generated long using sequence • In Cassandra, auto-generated timeuuid 97 #CassandraSummit @doanduyhai @BriceDutheil
  • 98. Notes on contact_id • How to store both types ? 98 #CassandraSummit @doanduyhai @BriceDutheil
  • 99. Notes on contact_id • How to store both types ? • As text ? ☞ easy solution … 99 #CassandraSummit @doanduyhai @BriceDutheil
  • 100. Notes on contact_id • How to store both types ? • As text ? ☞ easy solution … • … but waste of space ! • because encoded as UTF-8 or ASCII in Cassandra 100 #CassandraSummit @doanduyhai @BriceDutheil
  • 101. Notes on contact_id • Long ☞ 8 bytes • Long as text(UTF-8: 1 byte) ☞ "digits count" bytes 101 #CassandraSummit @doanduyhai @BriceDutheil
  • 102. Notes on contact_id • UUID ☞ 16 bytes • 32 hex chars + 4 hyphens = 36 chars • UUID as text(UTF-8: 1 byte) ☞ 36 bytes • Bytes overhead = 36 – 16 = 20 bytes 102 #CassandraSummit @doanduyhai @BriceDutheil
  • 103. Notes on contact_id • 20 bytes wasted per contact uuid 103 #CassandraSummit @doanduyhai @BriceDutheil
  • 104. Notes on contact_id • 20 bytes wasted per contact uuid • × 7 denormalizations = 140 bytes per contact uuid 104 #CassandraSummit @doanduyhai @BriceDutheil
  • 105. Notes on contact_id • 20 bytes wasted per contact uuid • × 7 denormalizations = 140 bytes per contact uuid • × 109 contacts = 140 GB wasted  not even counting replication factor … 105 #CassandraSummit @doanduyhai @BriceDutheil
  • 106. Notes on contact_id • ☞ just save contact id as byte[ ] 106 #CassandraSummit @doanduyhai @BriceDutheil
  • 107. Notes on contact_id • ☞ just save contact id as byte[ ] • Achilles @TypeTransformer for automatic conversion (see later) 107 #CassandraSummit @doanduyhai @BriceDutheil
  • 108. Notes on contact_id • ☞ just save contact id as byte[ ] • Achilles @TypeTransformer for automatic conversion (see later) • Use blobAsBigInt( ) or blobAsUUID( ) to view data 108 #CassandraSummit @doanduyhai @BriceDutheil
  • 109. Achilles • Advanced "object mapper" • Fluent API • Tons of features • TDD friendly 109 #CassandraSummit @doanduyhai @BriceDutheil
  • 110. Achilles • Dirty checking, why is it important ? 110 #CassandraSummit @doanduyhai @BriceDutheil
  • 111. Achilles • Dirty checking, why is it important ? • 1 contact ≈ 8 mutable fields 111 #CassandraSummit @doanduyhai @BriceDutheil
  • 112. Achilles • Dirty checking, why is it important ? • 1 contact ≈ 8 mutable fields • × 7 denormalizations = 56 update combinations … 112 #CassandraSummit @doanduyhai @BriceDutheil
  • 113. Achilles • Dirty checking, why is it important ? • 1 contact ≈ 8 mutable fields • × 7 denormalizations = 56 update combinations … • and not even counting multiple fields updates … 113 #CassandraSummit @doanduyhai @BriceDutheil
  • 114. Achilles • Are you going to manually generate 56+ prepared statements for all possible updates ? 114 #CassandraSummit @doanduyhai @BriceDutheil
  • 115. Achilles • Are you going to manually generate 56+ prepared statements for all possible updates ? • Or just use dynamic plain string statements and get some perf penalty ? 115 #CassandraSummit @doanduyhai @BriceDutheil
  • 116. Achilles • Dirty check in action //No read-before-write ContactEntity proxy = manager.forUpdate(ContactEntity.class, contactId); proxy.setFirstName(…); proxy.setLastName(…); //type-safe updates proxy.setAddress(…); manager.update(proxy); 116 #CassandraSummit @doanduyhai @BriceDutheil
  • 117. Achilles Proxy Setters interception DirtyMap Empty Entity PrimaryKey 117 #CassandraSummit @doanduyhai @BriceDutheil
  • 118. Achilles • Dynamic statements generation UPDATE contacts SET firstname=?, lastname=?,address=? WHERE contact_id=? prepared statements are cached, of course 118 #CassandraSummit @doanduyhai @BriceDutheil
  • 119. Achilles • Insert strategy, what is it ? 119 #CassandraSummit @doanduyhai @BriceDutheil
  • 120. Achilles • Simple INSERT prepared statement INSERT INTO contacts(contact_id,name,age,address,gender,avatar,…) VALUES(?, ?, ?, ? … ?); 120 #CassandraSummit @doanduyhai @BriceDutheil
  • 121. Achilles • Runtime values binding • some columns are optional preparedStatement.bind(49374,’John DOE’,33, null, null, …, null); 121 #CassandraSummit @doanduyhai @BriceDutheil
  • 122. Achilles Wait … are you saying inserting null in CQL???  122 #CassandraSummit @doanduyhai @BriceDutheil
  • 123. Achilles Inserting null ≡ creating tombstones 123 #CassandraSummit @doanduyhai @BriceDutheil
  • 124. Achilles Inserting null ≡ creating tombstones × 7 denormalizations 124 #CassandraSummit @doanduyhai @BriceDutheil
  • 125. Achilles Inserting null ≡ creating tombstones × 7 denormalizations × billions of contacts created  not even counting replication factor … 125 #CassandraSummit @doanduyhai @BriceDutheil
  • 126. Achilles • Simple annotation @Entity(table = "contacts_by_id ») @Strategy(insert = InsertStrategy.NOT_NULL_FIELDS) public class ContactById { } 126 #CassandraSummit @doanduyhai @BriceDutheil
  • 127. Achilles • Runtime dynamic INSERT statement INSERT INTO contacts(contact_id, name, age, address,) VALUES(:contact_id, :name, :age, :address); prepared statements are cached, of course 127 #CassandraSummit @doanduyhai @BriceDutheil
  • 128. Achilles • Remember the contactId ⇄ byte[ ] conversion ? @PartitionKey @Column(name = "contact_id") @TypeTransformer(valueCodecClass = ContactIdToBytes.class) private ContactId contactId; BYOC ☞ Bring Your Own Codec 128 #CassandraSummit @doanduyhai @BriceDutheil
  • 129. Achilles public interface Codec<FROM, TO> { Class<FROM> sourceType(); Class<TO> targetType(); TO encode(FROM fromJava) FROM decode(TO fromCassandra); } 129 #CassandraSummit @doanduyhai @BriceDutheil
  • 130. Achilles • Dynamic logging in action 2014-12-01 14:25:20,554 Bound statement : [INSERT INTO contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES (:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...] 2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND (modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead] 130 #CassandraSummit @doanduyhai @BriceDutheil
  • 131. Achilles • Dynamic logging • runtime activation • no need to recompile/re-deploy • save us hours of debugging • TRACE log level ☞ query tracing 131 #CassandraSummit @doanduyhai @BriceDutheil
  • 132. Take Away #CassandraSummit @doanduyhai @BriceDutheil
  • 133. Conditions for success • Data modeling is crucial 133 #CassandraSummit @doanduyhai @BriceDutheil
  • 134. Conditions for success • Data modeling is crucial • Double-run strategy & timestamp trick FTW 134 #CassandraSummit @doanduyhai @BriceDutheil
  • 135. Conditions for success • Data modeling is crucial • Double-run strategy & timestamp trick FTW • Data type conversion can be tricky 135 #CassandraSummit @doanduyhai @BriceDutheil
  • 136. Conditions for success • Data modeling is crucial • Double-run strategy & timestamp trick FTW • Data type conversion can be tricky • Benchmark ! 136 #CassandraSummit @doanduyhai @BriceDutheil
  • 137. Conditions for success • Data modeling is crucial • Double-run strategy & timestamp trick FTW • Data type conversion can be tricky • Benchmark ! • Mindset shifts for the team 137 #CassandraSummit @doanduyhai @BriceDutheil