Cassandra @
BrightTag
Matthew Kemp
Our Problem(s)
Initial Iteration
"We want to make a match table"
Second Iteration
"We want to support flexible lookups"
Th...
Cassandra Timeline: 2011
Evaluated
Cassandra 0.8,
Couch, Mongo
and Riak
Summer 2011 Fall 2011 Winter 2011
Decided on
Cassa...
Our Very First Schema
create column family brighttag_data
with comparator = 'UTF8Type'
and default_validation_class = 'UTF...
Example Data Rows
Row Key Columns ...
a111 wwww xxxx yyyy
w111 x111 y111
b222 xxxx zzzz
x222 z222
c333 wwww yyyy zzzz
w333...
Cassandra Timeline: 2012
Attempted to use
SuperColumns
and started using
Priam
Summer 2012 Fall 2012 Winter 2012
Attempted...
Priam
Because nobody wants to deal with this
Super Column Failed
Row Key Columns ...
a111 wwww xxxx
id attr_1 attr_2
w111 value1 value2
id attr_3 attr_4
x111 value3 va...
Composite Column Almost
Row Key Columns ...
a111 (wwww,id) (wwww,attr_1) (wwww,attr_2) (xxxx,id)
w111 value1 value2 x111
b...
Solution? Roll Your Own
Row Key Columns ...
a111 wwww:id wwww:attr_1 wwww:attr_2 xxxx:id
w111 value1 value2 x111
b222 xxxx...
Schema Additions
create column family brighttag_index
with comparator = 'UTF8Type'
and default_validation_class = 'UTF8Typ...
Example Index Rows
Row Key Columns ... ... continued ...
wwww:w111 bt_id xxxx:x222 bt_id
a111 b222
xxxx:x111 bt_id zzzz:z2...
Cassandra Timeline: 2013
Performance
testing, evaluating
1.2.x, switched to
Agathon
Summer 2013 Fall 2013 Winter 2013
Upgr...
Keyspace
CREATE KEYSPACE bt WITH replication = {
'class': 'NetworkTopologyStrategy',
'eu-west': '2',
'ap-northeast': '2',
...
Data Table
CREATE TABLE brighttag_data (
btid uuid,
identifier varchar,
ids map<varchar, varchar>,
data map<varchar, varch...
Example Data Rows
btid identifier ids data
a111 wwww {'id':'w111'} {'attr_1':'value1'}
a111 xxxx {'xid':'x111'} {'attr_2':...
Index Table
CREATE TABLE brighttag_index (
identifier varchar,
id_name varchar,
id_value varchar,
btid uuid
PRIMARY KEY (i...
Example Index Rows
identifier id_name id_value btid
wwww id w111 a111
xxxx xid x111 a111
yyyy id y111 a111
xxxx xid x222 b...
Switching to Virtual Nodes
Old Ring
(read only)
New Ring
(read & write)
Data Access
Service
"Migration"
Script
Agathon
Similar to Priam, except not tied to AWS and
does not have to run as a co-process
Seed Provider (drop in jar) & Ja...
Agathon In Practice
Agathon
"Manifest"
provider
SimpleDB
???
Cassandra Timeline: 2014
???
Summer 2014 Fall 2014 Winter 2014Spring 2014
Upgraded to
i2.xlarges and
Cassandra
1.2.16
Sche...
AWS Instance Types
Instance Type ECUs Virtual CPUs Memory Disks
m2.xlarge 6.5 2 17.1 1x420GB
m2.2xlarge 13 4 34.2 4x420GB
...
SSD Performance: Writes
Upgrade to SSDs Upgrade to 1.2.16
SSD Performance: Reads
Upgrade to SSDs Upgrade to 1.2.16
Current JVM Settings
-Xms16G
-Xmx16G
-Xmn1000M
-XX:+HeapDumpOnOutOfMemoryError
-Xss180k
-XX:+UseParNewGC
-XX:+UseConcMarkS...
JVM Settings Being Tested
# lets the CMS run in increments
-XX:+CMSIncrementalMode
# we probably want this, forces a minor...
Some Ring Stats
44 Nodes (i2.xlarge)
1B+ rows in data table
Estimated 3B+ rows in index table
~3.5 TB of data
New Use of Cassandra
We're starting to use this pipeline for metrics
Storm + Trident + KairosDB + Cassandra
Different conf...
Dealing With Failure
We've survived …
Amazon swallowing nodes
Disk Failures
Intra and inter region network issues
Advice
Use SSDs, don't skimp on hardware
Stay current, but not too current
Test various configurations for your use case
S...
The Best Advice
"The best way to approach data modeling for
Cassandra is to start with your queries and
work backwards fro...
Questions?
Upcoming SlideShare
Loading in …5
×

Cassandra at BrightTag

2,833 views

Published on

Published in: Education, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,833
On SlideShare
0
From Embeds
0
Number of Embeds
824
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Cassandra at BrightTag

  1. 1. Cassandra @ BrightTag Matthew Kemp
  2. 2. Our Problem(s) Initial Iteration "We want to make a match table" Second Iteration "We want to support flexible lookups" Third Iteration "We want to support more attributes"
  3. 3. Cassandra Timeline: 2011 Evaluated Cassandra 0.8, Couch, Mongo and Riak Summer 2011 Fall 2011 Winter 2011 Decided on Cassandra (1.0 just released) Initial AWS deployment in two regions (8 total nodes) Spring 2011
  4. 4. Our Very First Schema create column family brighttag_data with comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.0 and gc_grace = 864000;
  5. 5. Example Data Rows Row Key Columns ... a111 wwww xxxx yyyy w111 x111 y111 b222 xxxx zzzz x222 z222 c333 wwww yyyy zzzz w333 y333 z333
  6. 6. Cassandra Timeline: 2012 Attempted to use SuperColumns and started using Priam Summer 2012 Fall 2012 Winter 2012 Attempted to use Composite Columns and switched to m2.xlarges Spring 2012 Upgraded to Cassandra 1.1 and start of production usage Added index column family
  7. 7. Priam Because nobody wants to deal with this
  8. 8. Super Column Failed Row Key Columns ... a111 wwww xxxx id attr_1 attr_2 w111 value1 value2 id attr_3 attr_4 x111 value3 value4 To read a single value, you have to unpack the whole super column.
  9. 9. Composite Column Almost Row Key Columns ... a111 (wwww,id) (wwww,attr_1) (wwww,attr_2) (xxxx,id) w111 value1 value2 x111 b222 (xxxx,id) x222 c333 (wwww,id) (wwww,attr_2) w333 something
  10. 10. Solution? Roll Your Own Row Key Columns ... a111 wwww:id wwww:attr_1 wwww:attr_2 xxxx:id w111 value1 value2 x111 b222 xxxx:id x222 c333 wwww:id wwww:attr_2 w333 something
  11. 11. Schema Additions create column family brighttag_index with comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.0 and gc_grace = 864000;
  12. 12. Example Index Rows Row Key Columns ... ... continued ... wwww:w111 bt_id xxxx:x222 bt_id a111 b222 xxxx:x111 bt_id zzzz:z222 bt_id a111 b222 yyyy:y111 bt_id wwww:w333 bt_id a111 c333
  13. 13. Cassandra Timeline: 2013 Performance testing, evaluating 1.2.x, switched to Agathon Summer 2013 Fall 2013 Winter 2013 Upgraded to Cassandra 1.2.9 and schema redesign Spring 2013 Upgraded to Virtual Nodes and started using OpsCenter
  14. 14. Keyspace CREATE KEYSPACE bt WITH replication = { 'class': 'NetworkTopologyStrategy', 'eu-west': '2', 'ap-northeast': '2', 'us-west': '2', 'us-east': '2' };
  15. 15. Data Table CREATE TABLE brighttag_data ( btid uuid, identifier varchar, ids map<varchar, varchar>, data map<varchar, varchar> PRIMARY KEY (btid, identifier) );
  16. 16. Example Data Rows btid identifier ids data a111 wwww {'id':'w111'} {'attr_1':'value1'} a111 xxxx {'xid':'x111'} {'attr_2':'value-a'} a111 yyyy {'id':'y111'} {} b222 xxxx {'xid':'x222'} {'attr_2':'value-b'} b222 zzzz {'zid':'z222'} {}
  17. 17. Index Table CREATE TABLE brighttag_index ( identifier varchar, id_name varchar, id_value varchar, btid uuid PRIMARY KEY (identifier, id_name, id_value) );
  18. 18. Example Index Rows identifier id_name id_value btid wwww id w111 a111 xxxx xid x111 a111 yyyy id y111 a111 xxxx xid x222 b222 zzzz zid z222 b222
  19. 19. Switching to Virtual Nodes Old Ring (read only) New Ring (read & write) Data Access Service "Migration" Script
  20. 20. Agathon Similar to Priam, except not tied to AWS and does not have to run as a co-process Seed Provider (drop in jar) & Java Webapp Multi-ring support Open Sourced (Github)
  21. 21. Agathon In Practice Agathon "Manifest" provider SimpleDB ???
  22. 22. Cassandra Timeline: 2014 ??? Summer 2014 Fall 2014 Winter 2014Spring 2014 Upgraded to i2.xlarges and Cassandra 1.2.16 Schema Improvements (planned)
  23. 23. AWS Instance Types Instance Type ECUs Virtual CPUs Memory Disks m2.xlarge 6.5 2 17.1 1x420GB m2.2xlarge 13 4 34.2 4x420GB i2.xlarge 14 4 30.5 1x800GB SSD
  24. 24. SSD Performance: Writes Upgrade to SSDs Upgrade to 1.2.16
  25. 25. SSD Performance: Reads Upgrade to SSDs Upgrade to 1.2.16
  26. 26. Current JVM Settings -Xms16G -Xmx16G -Xmn1000M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:SurvivorRatio=8 -XX:+UseTLAB
  27. 27. JVM Settings Being Tested # lets the CMS run in increments -XX:+CMSIncrementalMode # we probably want this, forces a minor GC as well -XX:+CMSScavengeBeforeRemark # CMS tunings -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 # adjust tenuring thresholds -XX:InitialTenuringThreshold=10 -XX:MaxTenuringThreshold=15
  28. 28. Some Ring Stats 44 Nodes (i2.xlarge) 1B+ rows in data table Estimated 3B+ rows in index table ~3.5 TB of data
  29. 29. New Use of Cassandra We're starting to use this pipeline for metrics Storm + Trident + KairosDB + Cassandra Different configuration One ring per region Each ring is still pretty small (~4 nodes)
  30. 30. Dealing With Failure We've survived … Amazon swallowing nodes Disk Failures Intra and inter region network issues
  31. 31. Advice Use SSDs, don't skimp on hardware Stay current, but not too current Test various configurations for your use case Simulating production load is important
  32. 32. The Best Advice "The best way to approach data modeling for Cassandra is to start with your queries and work backwards from there. Think about the actions your application needs to perform, how you want to access the data, and then design column families to support those access patterns."
  33. 33. Questions?

×