Cassandra@Coursera: AWS deploy and MySQL transition

Cassandra @ Coursera
Deploying in AWS
MySQL Transition
Daniel Chia
@DanielJHChia
Software Engineer, Infrastructure

Overview
• Why Cassandra
• What goes into a good deployment
• MySQL → Cassandra transition experience

110 partners
!
698 courses
!
8.5 million learners

Your Final Project
This is your chance to apply the course concepts to real-world situations

Identity Veriﬁed Certiﬁcates

Technical
• 100% hosted on AWS
• Service-oriented architecture
• Mix of MySQL and Cassandra for persistence

We care about…
• Availability
• Scalability
• Operational Ease
• Latency
• (Bonus) Multi-region writes

EBS Outage (2012)
Master
us-east-1a
Slave
us-east-1c

Sharded by class
class1
class2
class3
class4
class5
Machine 1
class6
class7
class8
class9
class10
Machine 2
class11
class12
class13
class14
class15
Machine 3

New use-case
Uh-oh… doesn’t ﬁt in existing sharding

We care about…
• Availability
• Scalability
• Operational Ease
• Performance
• (Bonus) Multi-region

Try Cassandra!
So we decided to…

–Albert Einstein
“But if you judge a ﬁsh by its ability to climb a tree, it
will live its whole life believing that it is stupid.”

Time to deploy Cassandra!
sudo apt-get install dse-full

A good deployment
Machine-level
Cluster-level

Picking a machine
• Disk
• IOPS… IOPS… IOPS
• Latency
Author: D-Kuru/Wikimedia Commons
Licence: CC-BY-SA-3.0-AT

Picking a machine
• CPU
Author: Mark Sze
Licence: CC BY-NC-ND 2.0

Picking a machine
• Memory
• Save some for page
cache!
Author: brutalSoCal
Licence: CC BY-NC-ND 2.0

On AWS
• Ephemeral disks.
• Please don’t use EBS. Really.
• IOPS usually the problem
• Instance sizes:
• spinning disk: m1.large, m1.xlarge, m2.4xlarge
• ssd: m3.xlarge, c3.2xlarge, i2.*

Set up the machine
• Lots of documentation / talks about this
• Recommended reading: Datastax guide [1]
[1] http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html

Priam
care and feeding of Cassandra on AWS
https://github.com/Netﬂix/Priam

Cluster Topology
• We use RF=3
• Ring balanced within datacenter
• Nodes alternate racks (or AZs)

Cluster Topology (Priam)
• Token assignments stored in a database
• Can takeover token in instance of node failure

Cluster Topology (Priam)
• Priam assigns tokens
evenly per region
• Alternates AZs within
region
az1
az3
az2
az1
az2
az3

Autoscaling groups
• Recover from lost instance
• We don't use it for scaling with trafﬁc

Important: Need one ASG per AZ
east-1a east-1a east-1a
east-1b east-1beast-1b
east-1ceast-1c east-1c
ASG
size: 9

ASG
size: 9
east-1ceast-1c
east-1b

ASG-1a
size: 3
east-1ceast-1c
ASG-1b
size: 3
ASG-1c
size: 3
east-1c

Backups
• Data on ephemeral disks
• Guard against application errors
• SSTables immutable -> ship to S3
• Priam does this

Restore
• Have to be able use your backup
• Also useful for QA / test
• Priam handles this rather nicely

Deployed!
Time to chill?
https://www.ﬂickr.com/photos/spunkinator/2394514059  
Creative Commons

Monitoring
working / not working doesn’t count.

We have our own custom reporter agent for Datadog
There’s pluggable reporter support in 2.0.2 now.

Transition takes
time
mindset shift
expertise
(some) risk

Our experience
• Pick one feature ﬁrst
• Mindset shift
• Data modeling consulting
• Libraries / Patterns / Data-as-a-service

Pick one feature
• Don’t go all in with
Cassandra with
something important
right away
• Work closely with that
team

You probably will make mistakes
Oops!

Mindset shift
• Everyone knows SQL
• Not everyone knows Cassandra / NoSQL
• Need to know queries beforehand

Enrollment Example
• Learners enroll into a course
• learner (many-to-many) course
• Need to keep track of this membership

MySQL Model
CREATE TABLE `courses_learners` (
`id` INT(11) NOT NULL auto_increment,
`course_id` INT(11) NOT NULL,
`learner_id` INT(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c_l` (`learner_id`, `course_id`),
CONSTRAINT `ref1` FOREIGN KEY (`course_id`)
CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)
)

Cassandra Style
CREATE TABLE courses_by_learner (
learner_id uuid,
course_id uuid,
PRIMARY KEY (learner_id, course_id)
)

Data modeling consulting
• Build core team proﬁcient at C* data modeling
• Available to consult for trickier use cases

Libraries / Patterns
• Abstract away simple (but common) use-cases
• Key-value storage
• Simple time series
• Maybe every developer won’t need deep C*
knowledge?
• More radical: data as a service (e.g. STAASH)
STAASH: https://github.com/Netﬂix/staash

It’s a long road
but we’ll get there…
Author: Carissa Rogers
License: CC BY 2.0

Conclusion
• Know Cassandra
• Know what makes a good deployment
• Know that new skills have to be acquired

Questions?
We’re hiring!
coursera.org/jobs

Cassandra@Coursera: AWS deploy and MySQL transition

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cassandra@Coursera: AWS deploy and MySQL transition

Similar to Cassandra@Coursera: AWS deploy and MySQL transition (20)

Recently uploaded

Recently uploaded (20)

Cassandra@Coursera: AWS deploy and MySQL transition