Your SlideShare is downloading. ×
0
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Cassandra at BrightTag
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cassandra at BrightTag

1,571

Published on

Published in: Education, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,571
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cassandra @ BrightTag Matthew Kemp
  • 2. Our Problem(s) Initial Iteration "We want to make a match table" Second Iteration "We want to support flexible lookups" Third Iteration "We want to support more attributes"
  • 3. Cassandra Timeline: 2011 Evaluated Cassandra 0.8, Couch, Mongo and Riak Summer 2011 Fall 2011 Winter 2011 Decided on Cassandra (1.0 just released) Initial AWS deployment in two regions (8 total nodes) Spring 2011
  • 4. Our Very First Schema create column family brighttag_data with comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.0 and gc_grace = 864000;
  • 5. Example Data Rows Row Key Columns ... a111 wwww xxxx yyyy w111 x111 y111 b222 xxxx zzzz x222 z222 c333 wwww yyyy zzzz w333 y333 z333
  • 6. Cassandra Timeline: 2012 Attempted to use SuperColumns and started using Priam Summer 2012 Fall 2012 Winter 2012 Attempted to use Composite Columns and switched to m2.xlarges Spring 2012 Upgraded to Cassandra 1.1 and start of production usage Added index column family
  • 7. Priam Because nobody wants to deal with this
  • 8. Super Column Failed Row Key Columns ... a111 wwww xxxx id attr_1 attr_2 w111 value1 value2 id attr_3 attr_4 x111 value3 value4 To read a single value, you have to unpack the whole super column.
  • 9. Composite Column Almost Row Key Columns ... a111 (wwww,id) (wwww,attr_1) (wwww,attr_2) (xxxx,id) w111 value1 value2 x111 b222 (xxxx,id) x222 c333 (wwww,id) (wwww,attr_2) w333 something
  • 10. Solution? Roll Your Own Row Key Columns ... a111 wwww:id wwww:attr_1 wwww:attr_2 xxxx:id w111 value1 value2 x111 b222 xxxx:id x222 c333 wwww:id wwww:attr_2 w333 something
  • 11. Schema Additions create column family brighttag_index with comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.0 and gc_grace = 864000;
  • 12. Example Index Rows Row Key Columns ... ... continued ... wwww:w111 bt_id xxxx:x222 bt_id a111 b222 xxxx:x111 bt_id zzzz:z222 bt_id a111 b222 yyyy:y111 bt_id wwww:w333 bt_id a111 c333
  • 13. Cassandra Timeline: 2013 Performance testing, evaluating 1.2.x, switched to Agathon Summer 2013 Fall 2013 Winter 2013 Upgraded to Cassandra 1.2.9 and schema redesign Spring 2013 Upgraded to Virtual Nodes and started using OpsCenter
  • 14. Keyspace CREATE KEYSPACE bt WITH replication = { 'class': 'NetworkTopologyStrategy', 'eu-west': '2', 'ap-northeast': '2', 'us-west': '2', 'us-east': '2' };
  • 15. Data Table CREATE TABLE brighttag_data ( btid uuid, identifier varchar, ids map<varchar, varchar>, data map<varchar, varchar> PRIMARY KEY (btid, identifier) );
  • 16. Example Data Rows btid identifier ids data a111 wwww {'id':'w111'} {'attr_1':'value1'} a111 xxxx {'xid':'x111'} {'attr_2':'value-a'} a111 yyyy {'id':'y111'} {} b222 xxxx {'xid':'x222'} {'attr_2':'value-b'} b222 zzzz {'zid':'z222'} {}
  • 17. Index Table CREATE TABLE brighttag_index ( identifier varchar, id_name varchar, id_value varchar, btid uuid PRIMARY KEY (identifier, id_name, id_value) );
  • 18. Example Index Rows identifier id_name id_value btid wwww id w111 a111 xxxx xid x111 a111 yyyy id y111 a111 xxxx xid x222 b222 zzzz zid z222 b222
  • 19. Switching to Virtual Nodes Old Ring (read only) New Ring (read & write) Data Access Service "Migration" Script
  • 20. Agathon Similar to Priam, except not tied to AWS and does not have to run as a co-process Seed Provider (drop in jar) & Java Webapp Multi-ring support Open Sourced (Github)
  • 21. Agathon In Practice Agathon "Manifest" provider SimpleDB ???
  • 22. Cassandra Timeline: 2014 ??? Summer 2014 Fall 2014 Winter 2014Spring 2014 Upgraded to i2.xlarges and Cassandra 1.2.16 Schema Improvements (planned)
  • 23. AWS Instance Types Instance Type ECUs Virtual CPUs Memory Disks m2.xlarge 6.5 2 17.1 1x420GB m2.2xlarge 13 4 34.2 4x420GB i2.xlarge 14 4 30.5 1x800GB SSD
  • 24. SSD Performance: Writes Upgrade to SSDs Upgrade to 1.2.16
  • 25. SSD Performance: Reads Upgrade to SSDs Upgrade to 1.2.16
  • 26. Current JVM Settings -Xms16G -Xmx16G -Xmn1000M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:SurvivorRatio=8 -XX:+UseTLAB
  • 27. JVM Settings Being Tested # lets the CMS run in increments -XX:+CMSIncrementalMode # we probably want this, forces a minor GC as well -XX:+CMSScavengeBeforeRemark # CMS tunings -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 # adjust tenuring thresholds -XX:InitialTenuringThreshold=10 -XX:MaxTenuringThreshold=15
  • 28. Some Ring Stats 44 Nodes (i2.xlarge) 1B+ rows in data table Estimated 3B+ rows in index table ~3.5 TB of data
  • 29. New Use of Cassandra We're starting to use this pipeline for metrics Storm + Trident + KairosDB + Cassandra Different configuration One ring per region Each ring is still pretty small (~4 nodes)
  • 30. Dealing With Failure We've survived … Amazon swallowing nodes Disk Failures Intra and inter region network issues
  • 31. Advice Use SSDs, don't skimp on hardware Stay current, but not too current Test various configurations for your use case Simulating production load is important
  • 32. The Best Advice "The best way to approach data modeling for Cassandra is to start with your queries and work backwards from there. Think about the actions your application needs to perform, how you want to access the data, and then design column families to support those access patterns."
  • 33. Questions?

×