An Introduction to Priam

  • 2,867 views
Uploaded on

In-depth exploration of Priam, a side kick application to help cassandra run inside of Amazon's cloud.

In-depth exploration of Priam, a side kick application to help cassandra run inside of Amazon's cloud.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
2,867
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
38
Comments
2
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Running Cassandra in the Cloud:An Introduction to Priam Jason Brown @jasobrown jasedbrown@gmail.com www.linkedin.com/in/jasedbrown
  • 2. About me● Senior Software Engineer, Netflix● Apache Cassandra committer● E-Commerce Architect, Major League Baseball Advanced Media● Wireless developer (J2ME and BREW)
  • 3. Netflix Databases● Oracle in the datacenter● Migrate to EC2 ○ SimpleDB at first ○ Cassandra
  • 4. Cassandra meet EC2● shell script(s)● python scripts ● backup / restore ● centralized model ● installing 2.7 broke CentOS yum ● first time we ran it in prod, my cluster was destroyed
  • 5. Hello, Priam!Priam, the father of Cassandra (http://en.wikipedia.org/wiki/Priam)Java web app● Token Assignment● Backup / Restore● Multi-region support● Configuration management
  • 6. Brancheseach priam branch corresponds to a c* version● priam 1.1 -> c* 1.1● priam master -> c* 1.2● ??? -> c* trunk
  • 7. Token Assignment● Cassandra needs an assigned token● Priam tries to ○ replace a dead instance ○ join as a new node● External storage for known cluster members ○ host name/IP addr/instance id ○ token ○ region/availability zone
  • 8. Replacing a dead node● Get known nodes in region/AZ from storage ○ {A, B, C}● Get live nodes in region/AZ from ASG api ○ {A, B}● Take over a dead nodes token ○ C● uses c*s replace_token
  • 9. Joining as a new node● Calculate token ○ per-region offset ○ determine slot in region/AZ ○ derive token
  • 10. Region hash offset● Each region needs a different base offset ○ avoids token collisionsint hash = "us-east-1".hashCode();
  • 11. Determining slotNew nodes takes next numbered slot in AZ- looks for other registered nodes in sdb
  • 12. Node Slotting Layout +--------+--------+--------+ | zone A | zone B | zone C | +--------+--------+--------+ | 0 | 1 | 2 | +--------+--------+--------+ | 3 | 4 | 5 | +--------+--------+--------+ | 6 | 7 | 8 | +--------+--------+--------+ | 9 | 10 | 11 | +--------------------------+ (ascii art rocks)
  • 13. Heres your tokenMAXIMUM_TOKEN .divide(regionNodeCount) .multiply(mySlot) .add(regionHashOffset);example:100 / 10 (ten nodes in region) 3 + (in slot three) + 12 = 42
  • 14. Seeds● first node in each AZ, in every region● except if current node is in the first slot ○ seeds cannot auto bootstrap
  • 15. Multi-region communicationAWS security groups block ingress requestsIntra-region: whitelist by other in-region SGInter-region: whitelist by IP address ○ must use public IP address!
  • 16. Whitelisting IP address● Seed nodes compare ○ current regions SG IP address ○ entries in SimpleDB database● Add new nodess to SG● Remove dead nodes from SG
  • 17. ++ us-east-1 || eu-west-1 +-------------+ || | simpleDB | || +-------------+ || || +--+ || +--+ |S | || |S | |e | || |e | |c | || |c |+----------+ |G | || |G | +----------+| c* 1 | |r | || |r | | c* 2 |+----------+ |p | || |p | +----------+ | | || | | |1 | || |2 | +--+ || +--+ || ++
  • 18. BackupTwo types:● Snapshot ○ invokes nodetool snapshot ○ once a day, cron-like● Incremental ○ copy all newly flushed sstables
  • 19. Backup locationUpload to S3 bucket in same regionBucket lifecycle rules● configure TTL for data
  • 20. Backup pathBucket: netflix-cassandra-dataPath:base dir / region / cluster name / token / snapshot time / [SNP | SST | META] / keyspace / column family / data fileexample:test_backup/us-east-1/cass_jasobrown/42/1234567/ SNP/jasobrown/dog/jasobrown-dog-ja-1-Data.db
  • 21. Restore● best with same size cluster as source● best if tokens match with sourceUses (besides the obvious)● prod to test refresh● reproduce prod data problems● incremental restore - WIP
  • 22. Configuration ManagementControl aspects of priam and c*● yaml● startup script(s) env valuesNetflix needs this as we have ~55 productionclusters, with slightly different configs
  • 23. So, does Netflix actually use Priam?55 production clusters, > 750 nodesInternal extensions● Hook into internal DNS, properties systems● Alternative storage to SimpleDB● BI messaging integration - WIP● C* JMX monitoring
  • 24. Monitoring● Poll C* every 60 seconds● selected JMX metrics● publish to internal metrics aggregator ○ currently uses Netflixs OSS Servo library (github. com/Netflix/servo)
  • 25. Next directionsCommit log backupsDatastax Enterprise support● security● solr● configurationc* 1.2 virtual nodes (a/k/a vnodes)auto scaling
  • 26. Thank you! Q & A time @jasobrown