0
Running Cassandra      in the Cloud:An Introduction to Priam           Jason Brown      @jasobrown jasedbrown@gmail.com   ...
About me● Senior Software Engineer, Netflix● Apache Cassandra committer● E-Commerce Architect, Major League  Baseball Adva...
Netflix Databases● Oracle in the datacenter● Migrate to EC2  ○ SimpleDB at first  ○ Cassandra
Cassandra meet EC2● shell script(s)● python scripts  ● backup / restore  ● centralized model  ● installing 2.7 broke CentO...
Hello, Priam!Priam, the father of Cassandra  (http://en.wikipedia.org/wiki/Priam)Java web app● Token Assignment● Backup / ...
Brancheseach priam branch corresponds to a c* version● priam 1.1 -> c* 1.1● priam master -> c* 1.2● ??? -> c* trunk
Token Assignment● Cassandra needs an assigned token● Priam tries to  ○ replace a dead instance  ○ join as a new node● Exte...
Replacing a dead node● Get known nodes in region/AZ from storage  ○ {A, B, C}● Get live nodes in region/AZ from ASG api  ○...
Joining as a new node● Calculate token  ○ per-region offset  ○ determine slot in region/AZ  ○ derive token
Region hash offset● Each region needs a different base offset   ○ avoids token collisionsint hash = "us-east-1".hashCode();
Determining slotNew nodes takes next numbered slot in AZ- looks for other registered nodes in sdb
Node Slotting Layout        +--------+--------+--------+        | zone A | zone B | zone C |        +--------+--------+---...
Heres your tokenMAXIMUM_TOKEN .divide(regionNodeCount) .multiply(mySlot) .add(regionHashOffset);example:100 / 10 (ten node...
Seeds● first node in each AZ, in every region● except if current node is in the first slot   ○ seeds cannot auto bootstrap
Multi-region communicationAWS security groups block ingress requestsIntra-region: whitelist by other in-region SGInter-reg...
Whitelisting IP address● Seed nodes compare  ○ current regions SG IP address  ○ entries in SimpleDB database● Add new node...
++     us-east-1        ||          eu-west-1    +-------------+   ||    | simpleDB    |   ||    +-------------+   ||     ...
BackupTwo types:● Snapshot  ○ invokes nodetool snapshot  ○ once a day, cron-like● Incremental  ○ copy all newly flushed ss...
Backup locationUpload to S3 bucket in same regionBucket lifecycle rules● configure TTL for data
Backup pathBucket: netflix-cassandra-dataPath:base dir / region / cluster name / token / snapshot time / [SNP | SST | META...
Restore● best with same size cluster as source● best if tokens match with sourceUses (besides the obvious)● prod to test r...
Configuration ManagementControl aspects of priam and c*● yaml● startup script(s) env valuesNetflix needs this as we have ~...
So, does Netflix actually use Priam?55 production clusters, > 750 nodesInternal extensions● Hook into internal DNS, proper...
Monitoring● Poll C* every 60 seconds● selected JMX metrics● publish to internal metrics aggregator  ○ currently uses Netfl...
Next directionsCommit log backupsDatastax Enterprise support● security● solr● configurationc* 1.2 virtual nodes (a/k/a vno...
Thank you!             Q & A time              @jasobrown
An Introduction to Priam
Upcoming SlideShare
Loading in...5
×

An Introduction to Priam

3,858

Published on

In-depth exploration of Priam, a side kick application to help cassandra run inside of Amazon's cloud.

Published in: Technology
2 Comments
8 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,858
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
42
Comments
2
Likes
8
Embeds 0
No embeds

No notes for slide

Transcript of "An Introduction to Priam"

  1. 1. Running Cassandra in the Cloud:An Introduction to Priam Jason Brown @jasobrown jasedbrown@gmail.com www.linkedin.com/in/jasedbrown
  2. 2. About me● Senior Software Engineer, Netflix● Apache Cassandra committer● E-Commerce Architect, Major League Baseball Advanced Media● Wireless developer (J2ME and BREW)
  3. 3. Netflix Databases● Oracle in the datacenter● Migrate to EC2 ○ SimpleDB at first ○ Cassandra
  4. 4. Cassandra meet EC2● shell script(s)● python scripts ● backup / restore ● centralized model ● installing 2.7 broke CentOS yum ● first time we ran it in prod, my cluster was destroyed
  5. 5. Hello, Priam!Priam, the father of Cassandra (http://en.wikipedia.org/wiki/Priam)Java web app● Token Assignment● Backup / Restore● Multi-region support● Configuration management
  6. 6. Brancheseach priam branch corresponds to a c* version● priam 1.1 -> c* 1.1● priam master -> c* 1.2● ??? -> c* trunk
  7. 7. Token Assignment● Cassandra needs an assigned token● Priam tries to ○ replace a dead instance ○ join as a new node● External storage for known cluster members ○ host name/IP addr/instance id ○ token ○ region/availability zone
  8. 8. Replacing a dead node● Get known nodes in region/AZ from storage ○ {A, B, C}● Get live nodes in region/AZ from ASG api ○ {A, B}● Take over a dead nodes token ○ C● uses c*s replace_token
  9. 9. Joining as a new node● Calculate token ○ per-region offset ○ determine slot in region/AZ ○ derive token
  10. 10. Region hash offset● Each region needs a different base offset ○ avoids token collisionsint hash = "us-east-1".hashCode();
  11. 11. Determining slotNew nodes takes next numbered slot in AZ- looks for other registered nodes in sdb
  12. 12. Node Slotting Layout +--------+--------+--------+ | zone A | zone B | zone C | +--------+--------+--------+ | 0 | 1 | 2 | +--------+--------+--------+ | 3 | 4 | 5 | +--------+--------+--------+ | 6 | 7 | 8 | +--------+--------+--------+ | 9 | 10 | 11 | +--------------------------+ (ascii art rocks)
  13. 13. Heres your tokenMAXIMUM_TOKEN .divide(regionNodeCount) .multiply(mySlot) .add(regionHashOffset);example:100 / 10 (ten nodes in region) 3 + (in slot three) + 12 = 42
  14. 14. Seeds● first node in each AZ, in every region● except if current node is in the first slot ○ seeds cannot auto bootstrap
  15. 15. Multi-region communicationAWS security groups block ingress requestsIntra-region: whitelist by other in-region SGInter-region: whitelist by IP address ○ must use public IP address!
  16. 16. Whitelisting IP address● Seed nodes compare ○ current regions SG IP address ○ entries in SimpleDB database● Add new nodess to SG● Remove dead nodes from SG
  17. 17. ++ us-east-1 || eu-west-1 +-------------+ || | simpleDB | || +-------------+ || || +--+ || +--+ |S | || |S | |e | || |e | |c | || |c |+----------+ |G | || |G | +----------+| c* 1 | |r | || |r | | c* 2 |+----------+ |p | || |p | +----------+ | | || | | |1 | || |2 | +--+ || +--+ || ++
  18. 18. BackupTwo types:● Snapshot ○ invokes nodetool snapshot ○ once a day, cron-like● Incremental ○ copy all newly flushed sstables
  19. 19. Backup locationUpload to S3 bucket in same regionBucket lifecycle rules● configure TTL for data
  20. 20. Backup pathBucket: netflix-cassandra-dataPath:base dir / region / cluster name / token / snapshot time / [SNP | SST | META] / keyspace / column family / data fileexample:test_backup/us-east-1/cass_jasobrown/42/1234567/ SNP/jasobrown/dog/jasobrown-dog-ja-1-Data.db
  21. 21. Restore● best with same size cluster as source● best if tokens match with sourceUses (besides the obvious)● prod to test refresh● reproduce prod data problems● incremental restore - WIP
  22. 22. Configuration ManagementControl aspects of priam and c*● yaml● startup script(s) env valuesNetflix needs this as we have ~55 productionclusters, with slightly different configs
  23. 23. So, does Netflix actually use Priam?55 production clusters, > 750 nodesInternal extensions● Hook into internal DNS, properties systems● Alternative storage to SimpleDB● BI messaging integration - WIP● C* JMX monitoring
  24. 24. Monitoring● Poll C* every 60 seconds● selected JMX metrics● publish to internal metrics aggregator ○ currently uses Netflixs OSS Servo library (github. com/Netflix/servo)
  25. 25. Next directionsCommit log backupsDatastax Enterprise support● security● solr● configurationc* 1.2 virtual nodes (a/k/a vnodes)auto scaling
  26. 26. Thank you! Q & A time @jasobrown
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×