Minnebar 2013 - Scaling with Cassandra

3,063 views
3,016 views

Published on

NativeX (formerly W3i) recently transitioned a large portion of their backend infrastructure from MS SQL Server to Apache Cassandra. Today, its Cassandra cluster backs its mobile advertising network supporting over 10 million daily active users producing over 10,000 transactions per second with an average database request latency of under 2 milliseconds. Going from relational to noSQL required NativeX's engineers to re-train, re-tool and re-think the way it architects applications and infrastructure. Learn why Cassandra was selected as a replacement, what challenges were encountered along the way, and what architecture and infrastructure were involved in the implementation.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,063
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Minnebar 2013 - Scaling with Cassandra

  1. 1. Scaling With Cassandra Jeff Bollinger – CTO - @jbollinger Jeff Smoley – Infrastructure Architect
  2. 2. AgendaAbout NativeXThe BackstoryWhy CassandraCassandra OverviewNativeX Cassandra Implementation / MetricsWhat we Learned
  3. 3. NativeXFormerly W3iMarketing technology platformthat enables developers to buildsuccessful businesses aroundtheir apps.
  4. 4. Vanity MetricsOver 620M unique devices on our networkOver 500 apps in network> 100M Monthly Active Users100 GB of data ingest per week
  5. 5. BackstoryA growing mobile advertising network API Requests 6 Billions 5 4 3 2 1 0 2011 Q4 2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1
  6. 6. Infrastructure Intensive Model Session Calls by Week After User Acquired 12Millions Lifetime of user 10 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 11 12
  7. 7. Scale Up ArchitectureMicrosoft SQL Server 2 Node Cluster (failover) 12 cores / node 192 GB of / nodeCompellent SAN 172 Disk (SSD,FC,SATA)
  8. 8. CAP Theorem ConsistencySQL Server, MySQL MongoDB Partition Availability Tolerance Cassandra
  9. 9. Objectives Scale Resiliency•Horizontal •No single point•Incremental of failure cost structure •Geographically distributed
  10. 10. What Needed to ScaleWeb Application TierDatabase TierWeb Application Tier is already a server farm that can scalehorizontally through our VMWare environment.Database Tier was one giant monolithic Microsoft SQLServer machine.
  11. 11. What is NoSQL?Stands for Not Only SQLThe NoSQL movement is not about silver bullets andblack boxes.It’s about understanding problems and focusing onsolutions.It’s about using the right tool for the right problem.
  12. 12. Selecting CassandraDB Distributed Maturity High Availability Style Documentation Native Language Drivers PopularityMongoDB Yes Medium Yes Document - NoSQL Excellent Major Languages HighVoltDB Yes Low Yes RDBMS - SQL Good Major Languages LowMySQL Cluster Yes High Yes RDBMS - SQL & Key/Value Excellent Major Languages MediumMySQL ScaleDB Yes Low Yes RDBMS - SQL Good Major Languages LowCassandra Yes Medium Yes Key/Value - Column Family Excellent Major; Poor .Net HighCouchDB No Medium Yes Document - NoSQL ? No - REST only MediumRavenDB Yes? Low No Document - NoSQL Poor C#, JS, REST MediumCouchbase Yes Medium Yes Key/Value - Document Good Major Languages Medium http://nosql.mypopescu.com/ is a helpful site for discovering and learning about different DB Systems. *Disclaimer, this data was complied in spring of 2012 and my not reflect the current state of each database system shown here.
  13. 13. Top ChoicesConsidered Multiple DB Providers MySQL Cluster Relational and very familiar. Has physical row limitations. MongoDB Data modeling was simpler than C*. Not very clear if it had multi-cluster support. Cassandra At the very core it’s all about scalability and resiliency. Data modeling a little scary, limited .Net support.
  14. 14. CassandraMulti-nodeMulti-cluster Tunable ConsistencyHighly AvailableDurable Shared Nothing
  15. 15. C* at NativeXC* was not a replacement DB system, but an addition.C* solves a very specific problem (for us). Writing large volumes of data quickly. Reading very specific data out of a large record set.NoSQL solutions, like C*, are not meant to be areplacement for everything. You will make your lifer harder if you try!The same should be said about Relational Databases. They don’t solve every problem!
  16. 16. Data ClassificationWe have three major classifications of data. Configuration Activity Tracking Device History
  17. 17. Configuration DataThis data is relatively small in total size and is usedto operationally run our products. Examplesinclude: Mobile Apps Offers Campaigns Restrictions Queue SettingsThis data is typically relational and thereforecontinues to be stored in MS SQL Server.
  18. 18. The Very Basics of C* Data ModelingData is stored inside of Column Families using nested Key/Value pairs.A Row Key maps to a collection of Columns.A Column Name (AKA Column Key) maps to a Column Value.The Column Name is stored along side the Value.A common strategy is to store JSON/XML in the Column Value.(Side note, if you’ve heard of Super Columns, forget about them, theyhurt more than they help)
  19. 19. Activity Tracking DataRaw tracking data for all activities used by the ETL process toproduce OLAP data on an hourly basis.Synonymous with Time Series, Event Series, or Logging data.Examples include: Running of Mobile Apps Viewing Offers Clicking on Offers Receiving Rewards
  20. 20. Device History DataHistorical activities that each device has performed whilebeing part of NativeX’s network.Used for offer classification for a given device.Examples include: Clicking on Offers Running Mobile Apps Redeeming Rewards
  21. 21. Hardware12 NodesCisco UCS Blades 12 Cores @ 2.0GHz with Hyper-threading 64GB of Ram2 x 480GB Intel commodity SSDs in RAID 0 10.5 TB total, ~7 TB usableRed Hat Linux
  22. 22. Commodity Vs. EnterpriseWe chose to use Enterprise hardware for the serversso that we would have support for them.However, our work load is very read heavy and 15Krpm rotational disks were a bottle neck.We chose to swap out the rotational for commoditySSDs. (Enterprise SSDs were 10x as expensive)We have limited support on the hardware because ofthis.
  23. 23. Internal C* Cluster Stats240 peak Writes per second per node 2,880/sec cluster wide888 peak Reads per second per node 10,656/sec cluster wide0.53 ms average Write Latency per request1.7 ms average Read Latency per requestAlmost 3 TB of data adding 1 TB a month
  24. 24. Application Side LatenciesMS SQL Writes 12 ms Reads 1.5 msC* Writes 3 ms Reads 4 ms
  25. 25. Can We Make Reads Faster?We think that in SQL Server, reads were fasterbecause most of the data sat in memory.We might be able to achieve lower latencies in C* ifwe gave each node just as much memory as our SQLServer.To counter act the increased latencies we usedcertain techniques like parallel reads using multi-threading in our web application.
  26. 26. Not all RosesThere are still challenges with C*, like any complexsystem.More moving parts and things that need to stay insync.Misconfigurations can literally destroy your data.Certain config settings cannot be changed after youare live, such as the number of virtual Racks.
  27. 27. Lessons LearnedGet into production earlyData Import = RealityBreak down communication barriersUnderstanding your IO profile is really importantCassandra changes quickly, you need to keep upScalable systems like C* have a massive amount ofknobs, you need to know themLeverage cloud resources in working toward rightsizing your cluster
  28. 28. ThanksWe’re hiring http://nativex.com/careers/Join the MSP C* Meetup http://www.meetup.com/Minneapolis-St-Paul-Cassandra- Meetup/Email us Jeff.Smoley@nativex.com Jeff.Bollinger@nativex.com or @jbollingerSlide Deck http://www.slideshare.net/JBollinger/minnebar-2013-scaling- with-cassandra

×