1. Cassandra at Mahalo.com
Noah Silas John Watson
Backend Developer Data Systems Architect
noah@mahalo.com johnw@mahalo.com
twitter: @noah256 twitter: @dctrwatson
2. About Mahalo
Mahalo.com is one of the top 200 domains on the net1
We serve ~ 12 Million unique visitors per month
Served out of two geographically disparate data-centers
nginx, Apache, Python, Django stack
Primary Datastore - Replicated MySQL Cluster
1. reported by quantcast.com
4. Current Use Case - Activity Log
Near real-time feeds documenting site usage
Appears on user profiles, detailed page change logs
Actions on the site are recorded in between 4 and 4000 feeds
- requirement: "Stupidly Fast Writes"
Data Model:
Two Column Families
ActivityLog
ActivityLogIndexes
Important Lesson: Pick Unambiguous keys!
5. Current Use Case - Content Pages
Mahalo Content Pages provide comprehensive search results
Search results can be curated by our staff of Guides
Curated results must be stored and ordered
- This was leading to large MySQL tables, with one table in
particular exploding to nearly 20 million rows with ~ 15GB
of data
Only one query generally performed against this data - given a
page slug, find the curated results for this page.
When we migrated this table from MySQL into Cassandra we
saw immediate performance gains across our MySQL cluster
6. Our Experiences /
Boneheaded Mistakes
Plan Ahead!!!
CASSANDRA-16 - Large Rows
Nagios Monitoring for Cassandra -
http://www.mahalo.com/how-to-monitor-cassandra-with-nagios
Cassandra Upgrades solve problems. Usually.
The CommitLog really does belong on a dedicated disk
Storing data encoded in difficult formats is a bad plan
- example: python pickles
7. Our Experiences /
Boneheaded Mistakes
Problems can be solved by throwing more memory at the JMX
heap, right?
Cluster Load Balancing - HA Proxy is Awesome!
- but it sometimes obscures which node is experiencing
issues.
We have found that we don't need a memcached instance in
front of Cassandra
Onboarding Devs for cassandra is Hard!
- Terminology is overloaded from RDBMS world