Making It To Veteren Cassandra Status

MAKING IT TO VETERAN
CASSANDRA STATUS
Been There, Done That, Survived

Eric Lubow @elubow
PERSONAL VANITY
๏ CTO of SimpleReach
๏ Co-Author of Practical
Cassandra
๏ Skydiver, Mixed Martial
Artist, Motorcyclist, Dog Dad
(IG: @charliedognyc), NY
Giants fan

Eric Lubow @elubow
SIMPLEREACH
๏ Identify the best content
๏ Use engagement metrics
๏ Stream processing ingest
๏ Many metrics, time sliced
๏ Multiple data stores

Eric Lubow @elubow
๏Started using Cassandra at 0.2 in Sep of 2009
๏First put Cassandra in production at 1.0
๏Helped in building multiple drivers
๏Filed lots of Jira tickets (40+)
๏Beta tested features
๏Large counter deployment (largest?)
AM I QUALIFIED TO BE A VETERAN

Eric Lubow @elubow
DID I MENTION I CO-WROTE A BOOK?

Eric Lubow @elubow
What are we actually going to talk
about today?

Eric Lubow @elubow
HOW DOES ONE BECOME A VETERAN
It’s not all unicorns and rainbows

Eric Lubow @elubow
๏ Use Cassandra
๏ Dig in to the code from time to time (server and drivers)
๏ Know strengths and weaknesses and understand why
๏ Follow the changelogs and mailing lists
๏ Stress Cassandra in unconventional ways
๏ Learn the failure scenarios and how to fix them (hang out on IRC)
๏ Break the rules from time to time to see what happens
๏ “Those who do not know the past are condemned to repeat it.” -
George Santayana
HOW DO I LEVEL UP?

Eric Lubow @elubow
HOW DID SIMPLEREACH GET FROM …

Eric Lubow @elubow
๏ What’s the latest cool technology?
CHOOSING A DATABASE IS EASY, #AMIRITE
๏ What is my data volume?
๏ What are my query patterns?
๏ Is my data (un)structured?
๏ Will data remain consistent?
๏ Am I read heavy or write heavy?
๏ Am I batch loading data?
๏ Is eventually consistent data ok?
๏ Can I have a DR plan?
๏ Legal/compliance requirements?
๏ Are there experts/enterprise support?
๏ What’s the community like?
๏ Easy to administer?
๏ Tooling, monitoring, language support?
๏ Cloud or iron?
๏ High volume ingestion or batch loading?
๏ Fault tolerance?
๏ Open source vs enterprise system?
๏ Employee learning curve vs. learning cost?

Eric Lubow @elubow
LET’S LOOK AT SOME USE-CASES

Eric Lubow @elubow
WRITE: High volume/High velocity ingestion
USE-CASE: READ/WRITE PATTERNS
๏ Log structured storage; fast writes
๏ Writes do not affect reads
๏ Row creation unaffected by table size
๏ Indexing does not affect writes
๏ No locking, uses vector clock/LWW
๏ Goals
๏ Document storage; slower writes
๏ MMAP reads affected by writes
๏ Slow document creation in large
collections
๏ Poor indexing can destroy entire DB
๏ Server level, db level, collection level locks
๏ Goals
READ: Recency, key/value lookups, ETL
Cassandra Mongo

Eric Lubow @elubow
HELPERS FOR A MORE AFFORDABLE CLUSTER
Aggregator
Mongo Writer
Broadcast
Redis Writer
Cassandra Writer
Solr Writer
Calculator
NSQ
Vertica Writer

Eric Lubow @elubow
HOW DO WE KNOW WHAT WORKS BEST

Eric Lubow @elubow
USE-CASE: ADMINISTRATION
๏ Every node is the same base
๏ No master node
๏ All monitoring through JMX
๏ One step to add/remove nodes
๏ Tunables, lots of em
๏ Easily wrote our own chef cookbook
๏ Goals
๏ Config nodes, Shard nodes, Replica nodes
๏ Master/slave nodes, leader election
๏ Monitoring via mongostat sometimes
๏ Two step to add/remove nodes
๏ No tunables
๏ Many non-well working chef cookbooks
๏ Goals
BASICALLY JUST ME
Cassandra Mongo

Eric Lubow @elubow
๏ Primarily Datastax ๏ Community
Contributions
๏ Who is the
community?
CASSANDRA IS OPEN SOURCE

Eric Lubow @elubow
SERIOUSLY 40+ JIRA TICKETS?
SPARK-6949 Pyspark and datetime
OPSC-6186 Rebalance - while calling decorator (IndexError): list index out of range
CASSANDRA-9871 Cannot replace token does not exist - DN node removed as Fat Client
OPSC-6045 Agent CPU on startup 800 Seconds
OPSC-5346 Opsc Repair service system_traces system_auth
CASSANDRA-7409 LCS improvement
CASSANDRA-8611 Socket timeout shitty default
CASSANDRA-9279 Gossip (and mutations) lock up on Startup
OPSC-4879 OpsC Agent JMX Connections and Cassandra Operations Fail Incessantly
CASSANDRA-8086 Too many connections - Cassandra Defense
CASSANDRA-7122 System peers
CASSANDRA-6506 Counters++ Final Performance
CASSANDRA-7510 Up node gossip messages -- affects drivers
PYTHON-202 More control for metadata updates
PYTHON-201 Optionally randomize contact points
OPSC-3672 OpsC - Repair Service Restarts on Node Flopping
DSP-3059 / SOLR-5463 Solr 4.10 - and Deep Paging
CASSANDRA-8548 Cleanup Dump
DSP-4560 Possible ticket Upgrade from 4.5.2 to 4.5.3
DSP-3341 In-memory Phase 2 (off heap and remove GB limit)
DSP-3970 Solr indexes even when values don't change
CASSANDRA-8150 Stump's JVM Tuning

Eric Lubow @elubow
SIMPLEREACH CONTEXT
๏ 100 million URLs
๏ 350 million Tweets
๏ 50k - 100k events per second (tens of billions of events per day)
๏ 225G new per hour
๏ 700T of total data (10T per month)
๏ 10T of hot data
๏ 72 nodes Cassandra cluster
๏ 52 Realtime Nodes
๏ 9 Search Nodes
๏ 11 Spark Nodes

Eric Lubow @elubow
Solr
Solr
Vertica + Cassandra
Vertica + Cassandra
Vertica
Mongo

Eric Lubow @elubow
๏ Average over 200k counter writes per second
๏ Pre-aggregate writes (saved us 10x the writes)
๏ Trying to defeat the counter time bomb
๏ Breaking the rules with CASSANDRA-8150
๏ Many many JVM tuning changes
๏ All things possible through monitoring
๏ Upgraded every node in the cluster by hand one at a time
๏ Upgrading to 2.1 definitely sealed the deal
CONQUERING COUNTERS

Eric Lubow @elubow
๏ Nodes might have removed themselves from a cluster because the
disk was full
๏ Apps might lose connections to the cluster and then take 45 min to
reconnect (or longer on bigger clusters)
๏ A slow node might make the entire cluster unusable
๏ A poorly gossiping node might overwork itself out of the cluster
๏ Adding a node to the cluster might take down all connected apps
๏ Sometimes you just can’t removenode (or bootstrap)
UNDERSTAND FAILURE SCENARIOS

Eric Lubow @elubow
WHAT SHOULD YOU WALK AWAY WITH?
๏ Incredibly important to have a deep
understanding around your cases
๏ Sometimes database tuning has nothing to do
with database settings
๏ Understand failure scenarios for your use-cases
๏ Give back, it helps everything get better
๏ Ignoring best practices is almost never a good
idea

Eric Lubow @elubow
THANKS FOR LISTENING

Eric Lubow @elubow
QUESTIONS IN LIFE ARE GUARANTEED,
ANSWERS AREN’T.
Eric Lubow
@elubow
NYC Cassandra Day

Making It To Veteren Cassandra Status

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Making It To Veteren Cassandra Status

Similar to Making It To Veteren Cassandra Status (20)

Recently uploaded

Recently uploaded (20)

Making It To Veteren Cassandra Status