Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Globalcode – Open4education
Cassandra
Why will the relational thinking destroy your system
performance?
Paulo Ricardo R. A...
Globalcode – Open4education
Agenda
• What is Cassandra?
• Why Cassandra?
• Quick Review
• The Problem to tackle
• Relation...
Globalcode – Open4education
What is Cassandra ?
Distributed Fault Tolerant Linear Scalability
Globalcode – Open4education
Pick two of: Availability, Consistency,
Partition Tolerance
Globalcode – Open4education
Why Cassandra ?
● Distributed Cache (Netflix EVCache)
● Real time Processing
● Data doesn't fi...
Globalcode – Open4education
Quick Review
Coordinator
RF = 3
CLIENT
token(partitionKey)
using Partitioner
Keyspace
Globalcode – Open4education
Globalcode – Open4education
https://pandaforme.gitbooks.io/introduction-to-cassandra
Globalcode – Open4education
The problem
Store TDC information (speakers and talks)
Globalcode – Open4education
Relational Way
Globalcode – Open4education
Relational Way
SELECT * FROM speaker
WHERE state = 'PR'
Globalcode – Open4education
Relational Way
SELECT * FROM talk
INNER JOIN speaker
ON speaker.id == talk.speaker_id_a
OR spe...
Globalcode – Open4education
Putting into Cassandra
Globalcode – Open4education
Globalcode – Open4education
Why?
SELECT * FROM speaker WHERE state = 'PR'
ALLOW FILTERING
Retrieve all rows and filters on...
Globalcode – Open4education
Secondary index to
Improve read performance
Globalcode – Open4education
Secondary Index
CREATE INDEX speaker_name ON speaker (name);
Globalcode – Open4education
Secondary Index
0312 Paulo Almeida
2315 Gessica Dutra
...
0003 Jefferson
….
5 lookups 1 respon...
Globalcode – Open4education
Limitations
● No JOIN, LIKE… support
● No constraints
● No transaction (ACID)
● No consistency...
Globalcode – Open4education
Goals and Non-Goals
● Non-Goals
○ Minimize number of writes
○ Minimize data duplication
● Goal...
Globalcode – Open4education
Query first!
● Know your queries first and model around them
○ Don't model around relations
○ ...
Globalcode – Open4education
● Speaker by state
● Speaker by name
● Talks by speaker name
● Talks by keywords
● Talks by tr...
Globalcode – Open4education
Cassandra Way
Globalcode – Open4education
Cassandra Way
Globalcode – Open4education
Data Modeling
CREATE KEYSPACE tdc WITH REPLICATION =
{
'class': 'SimpleStrategy',
'replication...
Globalcode – Open4education
Data Modeling
CREATE TABLE tdc.speaker (
id uuid,
name text,
email text,
bio text,
city text,
...
Globalcode – Open4education
Data Modeling
CREATE TABLE tdc.speaker_by_name (
speaker_id uuid,
name text
PRIMARY KEY (name,...
Globalcode – Open4education
Data Modeling
SELECT * FROM tdc.speaker_by_state
WHERE state = 'PR'
CREATE TABLE tdc.speaker_b...
Globalcode – Open4education
Data Modeling
CREATE TABLE tdc.speaker_by_state (
speaker_id uuid,
name text,
state text,
bio ...
Globalcode – Open4education
Data Modeling
BEGIN BATCH
INSERT INTO speaker (id, …) VALUES (...);
INSERT INTO speaker_by_nam...
Globalcode – Open4education
Data Modeling
CREATE TABLE tdc.talk_by_speaker_name(
talk_id uuid,
talk_name text,
speaker_nam...
Globalcode – Open4education
Data Modeling
CREATE INDEX talk_by_track_name ON tdc.talk (track_name)
SELECT * FROM tdc.talk ...
Globalcode – Open4education
Netflix benchmark
https://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks
Globalcode – Open4education
Netflix benchmark
https://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks
N...
Globalcode – Open4education
Globalcode – Open4education
Resources
● Cassandra - The definitive guide
● Datastax self-paced Training
○ https://academy....
Globalcode – Open4education
Thank you!
/rochapaulo
/pauloricardoalmeida
almeida.paulorocha@gmail.com
Upcoming SlideShare
Loading in …5
×

TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at - Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

76 views

Published on

Trilha NOSQL How we figured out we had a SRE team at - Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Published in: Education
  • Be the first to comment

TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at - Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

  1. 1. Globalcode – Open4education Cassandra Why will the relational thinking destroy your system performance? Paulo Ricardo R. Almeida OCJP, 2 years working with Cassandra
  2. 2. Globalcode – Open4education Agenda • What is Cassandra? • Why Cassandra? • Quick Review • The Problem to tackle • Relational solution and its drawbacks • Addressing the problem with C* thinking • Goals and Non-Goals • Query First • The Cassandra solution • Benchmarking • Additional resources
  3. 3. Globalcode – Open4education What is Cassandra ? Distributed Fault Tolerant Linear Scalability
  4. 4. Globalcode – Open4education Pick two of: Availability, Consistency, Partition Tolerance
  5. 5. Globalcode – Open4education Why Cassandra ? ● Distributed Cache (Netflix EVCache) ● Real time Processing ● Data doesn't fit in one place ● High write workload ○ Time series data ○ Log storage/analysis ● Geographical distribution ● Performance
  6. 6. Globalcode – Open4education Quick Review Coordinator RF = 3 CLIENT token(partitionKey) using Partitioner Keyspace
  7. 7. Globalcode – Open4education
  8. 8. Globalcode – Open4education https://pandaforme.gitbooks.io/introduction-to-cassandra
  9. 9. Globalcode – Open4education The problem Store TDC information (speakers and talks)
  10. 10. Globalcode – Open4education Relational Way
  11. 11. Globalcode – Open4education Relational Way SELECT * FROM speaker WHERE state = 'PR'
  12. 12. Globalcode – Open4education Relational Way SELECT * FROM talk INNER JOIN speaker ON speaker.id == talk.speaker_id_a OR speaker.id == talk.speaker_id_b
  13. 13. Globalcode – Open4education Putting into Cassandra
  14. 14. Globalcode – Open4education
  15. 15. Globalcode – Open4education Why? SELECT * FROM speaker WHERE state = 'PR' ALLOW FILTERING Retrieve all rows and filters one by one
  16. 16. Globalcode – Open4education Secondary index to Improve read performance
  17. 17. Globalcode – Open4education Secondary Index CREATE INDEX speaker_name ON speaker (name);
  18. 18. Globalcode – Open4education Secondary Index 0312 Paulo Almeida 2315 Gessica Dutra ... 0003 Jefferson …. 5 lookups 1 response = poor performance SELECT * FROM tdc.speaker WHERE name = 'Paulo Almeida'
  19. 19. Globalcode – Open4education Limitations ● No JOIN, LIKE… support ● No constraints ● No transaction (ACID) ● No consistency (Strong) ● Secondary Index doesn't scale well
  20. 20. Globalcode – Open4education Goals and Non-Goals ● Non-Goals ○ Minimize number of writes ○ Minimize data duplication ● Goals ○ Spread data evenly around the cluster ○ Minimize the number of partitions read
  21. 21. Globalcode – Open4education Query first! ● Know your queries first and model around them ○ Don't model around relations ○ Don't model around objects ○ Try to create a CF where you can satisfy the query by reading one partition
  22. 22. Globalcode – Open4education ● Speaker by state ● Speaker by name ● Talks by speaker name ● Talks by keywords ● Talks by track Queries
  23. 23. Globalcode – Open4education Cassandra Way
  24. 24. Globalcode – Open4education Cassandra Way
  25. 25. Globalcode – Open4education Data Modeling CREATE KEYSPACE tdc WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 3 }
  26. 26. Globalcode – Open4education Data Modeling CREATE TABLE tdc.speaker ( id uuid, name text, email text, bio text, city text, state text, PRIMARY KEY (id) ); keyspace PartitionKey
  27. 27. Globalcode – Open4education Data Modeling CREATE TABLE tdc.speaker_by_name ( speaker_id uuid, name text PRIMARY KEY (name, speaker_id) ); SELECT speaker_id FROM tdc.speaker_by_name; SELECT * FROM tdc.speaker = $speaker_id Better approach, requires 2 lookups in any case Partition Key
  28. 28. Globalcode – Open4education Data Modeling SELECT * FROM tdc.speaker_by_state WHERE state = 'PR' CREATE TABLE tdc.speaker_by_state ( speaker_id uuid, name text, state text, bio text, PRIMARY KEY (state, name, speaker_id) ) WITH CLUSTERING ORDER BY (city ASC, name ASC); Partition Key Clustering Key
  29. 29. Globalcode – Open4education Data Modeling CREATE TABLE tdc.speaker_by_state ( speaker_id uuid, name text, state text, bio text, PRIMARY KEY (state, city, name, speaker_id) ) WITH CLUSTERING ORDER BY (city ASC, name ASC); Partition Key Clustering Key
  30. 30. Globalcode – Open4education Data Modeling BEGIN BATCH INSERT INTO speaker (id, …) VALUES (...); INSERT INTO speaker_by_name (name, ...) VALUES (...); INSERT INTO speaker_by_state (state, ...) VALUES (...); APPLY BATCH;
  31. 31. Globalcode – Open4education Data Modeling CREATE TABLE tdc.talk_by_speaker_name( talk_id uuid, talk_name text, speaker_name text, date timestamp, PRIMARY KEY (speaker_name, date DESC, talk_id) );
  32. 32. Globalcode – Open4education Data Modeling CREATE INDEX talk_by_track_name ON tdc.talk (track_name) SELECT * FROM tdc.talk WHERE track_name = 'Test';
  33. 33. Globalcode – Open4education Netflix benchmark https://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks
  34. 34. Globalcode – Open4education Netflix benchmark https://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks Nodes Cassandra Couchbase HBase MongoDB 1 18,925.59 1,554.14 973.85 1,278.81 2 35,539.69 2,985.28 3,430.59 1,441.32 4 64,911.39 3,755.28 6,451.95 1,801.06 8 117,237.91 10,138.80 6,262.95 2,195.92 16 210,237.90 11,761.31 15,268.93 1,230.96 32 348,682.44 21,375.02 58,463.15 2,335.14 Operations/sec
  35. 35. Globalcode – Open4education
  36. 36. Globalcode – Open4education Resources ● Cassandra - The definitive guide ● Datastax self-paced Training ○ https://academy.datastax.com/resources/ds220-data-modeling ● Datastax CQL Reference ○ http://docs.datastax.com/en/cql/3.1/cql/cql_reference/cqlReferenceTOC.ht ml ● Cassandra-demo-middle: ○ https://github.com/rochapaulo/cassandra-demo-middle ● Presentation source code: ○ https://github.com/rochapaulo/TDC-SP-2017-Cassandra ● Youtube videos
  37. 37. Globalcode – Open4education Thank you! /rochapaulo /pauloricardoalmeida almeida.paulorocha@gmail.com

×