mParticle's Journey to Scylla from Cassandra

11
MARCH 27, 2017
How to Process 50 Billion Monthly
Messages with Full Availability &
Performance
Yuan Ren, Head of Data Science
Aug 15, 2017

22
Agenda
• What’s mParticle
• What problem to solve
• Our journey from Cassandra to Scylla

33
The single, secure API
for Growth
mParticle provides a single, secure API to integrate and orchestrate your entire
marketing stack so that brands can enhance analytics and optimize acquisition,
engagement, and monetization in a multi-screen world.

44
Trusted by the very best brands

55
mParticle is created to solve modern data challenges
Identity
Resolution
Platforms
Feeds
Events
Data Warehouses
Audiences
Audience
Builder
Profile
Enrichment
Rules &
Filters
Security

66
mParticle Platform Stats
•Monthly Unique users/devices by major platforms
•350M iOS devices monthly
•1B Android devices monthly
•Monthly data volume
•50B batches
•100B events
•150TB of data in binary format added to S3

77
Need a Near Real Time Data Store
• Data streams in and out in near real time
•Handle various types of data load
•Full availability

88
Data Schema & Ingestion Rate
• Each message is a batch of events,
avg 20 kB per message
•An array of events
•User Info
•App Info
•Device Info
• Data sent to mParticle from user
devices and S2S
•Avg 20K msg / sec
•Peak 40K msg / sec

99
Data Processing Notes
• Real time processing. On every received batch
•Query all historical batches/events of the user, and process them through a rule
engine
•Read latency must be low
•Write the received batch into the database
• Batch processing
•Whenever there’s a change in rules engine, we query all historical data and reprocess
them
SDK
S2S
● User profile enrichment
● Evaluations based on
users’ full history
Events
Data
Warehouses
Audiences

1010
Cassandra got us started
•High read and write throughput
•Horizontal scalability
•A widely adopted technology with proven track record
•Low cost with DataStax startup program
•We used Cassandra until Q4 of 2016

1111
Cassandra Data Model
•Data partitioned by client, i.e., separate tables per client
•Each table is partitioned by mparticle userid
•For real time processing, we read / write by mparticle userid
•For batch processing, we split read into partitions and query by partition
CREATE TABLE {table_per_client} (
userid bigint,
time timestamp,
eventdata blob,
primary key (userid, time)
)
WITH clustering order by (time DESC)

1212
Cassandra Stats (Sept, 2016)
•We scale up/down our systems
to meet our read/write latency
requirements
•One Cassandra cluster running
on 12 EC2 nodes
•c3.8xlarge (32 vCPU, 60GB
memory, 640GB SSD storage)
•Two factors with biggest
impact to latencies
•Batch processing
•S2S data loads
* Latency stats not available for 2016

1313
Cassandra Stats (continued)
• A bottleneck was hit because of compactions
•If any higher load was pushed to the cluster, compactions would get out of control and either crash
the C* service, or the cluster would get unresponsive
•Having a backlog of compactions means that the read latencies are much worse than they could be

1414
Cassandra Pain Points
•Amount of human labor involved in tuning
•Lack of affordable support from DataStax
•We had consulted with a 3rd party Cassandra consulting company, which turned out to be a bad
experience
•Ended up with an over complicated setup that is hard to modify and scale
•At the end of our Cassandra journey, we were having backlogs of data
processing up to 20 hours, on a good day
•Just not a good fit for us at that time

1515
Scylla POC
• Why Scylla?
•Compatible with Cassandra and doesn’t require code
change
•Rewrite in C++. We really don’t like tuning JVM
• POC Process
•We engaged them in POC as soon as they released
version 1.0
•Tested with real data on the same hardware
•Scylla beat Cassandra significantly in our case
•Much lower compaction backlog
•Ease of configuration
•Self-tuning during installation
•Highly responsive and knowledgeable support
from Scylla engineers
* Scylla websites has more rigorous performance comparison between C* and Scylla
Live C*
cluster
Test SQS
1
Test SQS
2
Test
Cassandra
Cluster
Test Scylla
Cluster

1616
• Essentially only need to migrate data
• No code change
• No data model change
• Except that Scylla helped us pick a better data model that should’ve been used in C*
• Migration Steps
• Migrated one client at a time
• Temporarily paused data ingestion for a client
• Migrated the client’s data from C* to Scylla
• Resumed data ingestion
• After migration, Scylla immediately kept up with our data loads in real
time, or minimal backlog
Cassandra to Scylla Migration

1717
• Our current data volume is about 3 times as large as 2016/6, and going up
• One scylla cluster running on 10 i3.16xlarge nodes
• 64 vCPU, 488GB RAM, 15TB storage
• Scylla automatically determines compaction rate dynamically
• In C* you can configure compactions to be stronger or weaker, but it could become invalid as your
workload changes
• Our Scylla cluster showed minimal pending compactions
Scylla Stats - Compactions

1818
• With much lower pending compactions, read / write latency is naturally
lower
• With Scylla, we don’t have data backlogs
Scylla Stats - Latencies

1919
• Ability to isolate background and
foreground tasks and determine the
best rate of things like compaction and
repairs
• Basically no change to scylla.yaml file
• Scylla does kernel tuning at deployment
time
Scylla’s Self Tuning

2020
• We used Scylla AMIs for AWS EC2
• For our initial deployment of the cluster, it’s as simple as running ScyllaDB setup utility
• Lessons learned on deploying to i3 instances
• We started using i3 as soon as it became available
• Currently we use 10 i3.16xlarge nodes
• i3 instance were not the most stable types
• Scylla’s fast recovery time helps
• A node could be brought back within 8 hours, with live data load and replicating 7TB of data
• Customizations of Scylla AMIs are needed, e.g., for kernel tuning
• Scylla’s latest AMI supports i3
• Use small number of big nodes instead of many small nodes
Scylla AMIs and I3

2121
• Direct Quotes from our Director of DevOps
From a devops perspective, when it comes to getting support from a third-party vendor on their own product, the best you can
hope for is product competency, professionalism and infrastructure competency.
Product competency is relatively common to find – most support teams know their stuff pretty well. Underlying infrastructure
proficiency is not as common as it should be – I have dealt with many support teams that have the attitude of "it's not our
product, it is the OS/hardware/etc. – you should contact their support", which not only does not help resolve the issue quickly,
but may end up causing more trouble in the long run, because of the disconnect between the layers. Lastly, professionalism and
dependability – you want to have the support team be there for you until the issue is resolved, no matter what.
With Scylla's support team, you get all three at 100% and beyond.
Their engineers know the ins and outs of their product, without having to "get back to you" hours later. All engineers are
responsive and on top of an issue or question so you get the response, and ultimately the resolution, as fast as it can possibly be
done.
They are absolutely dedicated and reliable and will make sure that your issue is resolved, or they will work with you 24/7
until you are satisfied.
Lastly, they are experts in the OS and hardware on which their customers run the product. This may seem like a side note or a
"nice to have", but knowing the underlying infrastructure, how it behaves, what you can expect from it, how you can tune it to get
the most out of it, is an absolute gem. It can not only help, but in certain cases it can mean the difference between a quick and
solid resolution, and a prolonged case involving multiple vendors. It can mean the difference between mediocre performance, or
the one Scylla offers.
Scylla Support

2222
• We used Cassandra / Scylla for high read/write throughput and low
latencies
• If you struggle with tuning Cassandra, definitely consider Scylla
• Better performance
• Makes DevOps life easier
• Awesome support
• Future plans with Scylla
Summary

2323
Thank you!
GET IN TOUCH!
We are hiring!
mParticle
257 Park Avenue South, 9th Floor
New York, NY 10010
@mparticles | http://www.mparticle.com

mParticle's Journey to Scylla from Cassandra

More Related Content

What's hot

Viewers also liked

Similar to mParticle's Journey to Scylla from Cassandra

More from ScyllaDB

Recently uploaded

mParticle's Journey to Scylla from Cassandra