Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra @ Sony: The good, the bad, and the ugly part 2


Published on

This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.

Published in: Technology

Cassandra @ Sony: The good, the bad, and the ugly part 2

  1. 1. Cassandra:
  2. 2. Who is talking Alexander Filipchik (PSN: LaserToy) Principal Software Engineer at Sony Network Entertainment
  3. 3. Who is talking Alexander Filipchik (PSN: LaserToy) Principal Software Engineer at Sony Interactive Entertainment
  4. 4. Me
  5. 5. The Rise of PlayStation4 PlayStation Network is big and growing. – Over 65 million monthly active users. – Hundreds of millions of users. – A Lot of Services.
  6. 6. PlayStation 4 growth • Pre warm – November 2013, couple thousands PS4s for Taco Bell. • Launch Day – 1,000,000 PS4s several days later. • Adding 1.3 Millions devices a month.
  7. 7. Let’s compare us with
  8. 8. 2009 MySql Year Unicorn’s Tech Our Tech 2011 MongoDB/MySql 2012 Redis/MySql PS3: MySQL + Memcached 2013 Redis/Postgres MySQL + Memcached/Cassandra 2014 Redis/Shards For Postgres + MySql MySQL + Memcached/Cassandra 2015 Riak/Shards For Postgres + MySql MySQL + Memcached/Cassandra + Redis 2016 ??? MySQL + Memcached/Cassandra + Redis Ready for BigBang
  9. 9. The Problem • Legacy System use well known Relational DB to handle our transactions. • It is state of the art software that doesn’t scale well in our circumstances. • We wanted to allow client to run any queries without consulting with hundreds of DBAs first. • Sharding sounds like a pain. • Multiple regions should be easy.
  10. 10. Solution
  11. 11. The Bad Axiom It is Not Easy to Replace Relational Database with Cassandra for user facing traffic.
  12. 12. Simple Digital Store Model Anotherhundredtables
  13. 13. CQL Going to Save Us!!! • No Joins. • No Transactions. • No search. • Just weird.
  14. 14. What if we denormalize? Purchased
  15. 15. Thrift Schema Account1 Json 1 Json 2 …. Json n Now it horizontally scalable We have in row transactions Read is very fast – no joins Now we need to propagate user purchases from DB to C* And figure out how to support queries And sometimes to synchronize changes in related objects (metadata)
  16. 16. Solving the Puzzle • There are number of ways we can use to notify C* about account level changes in the source of truth - let’s not talk about it for now. • Same applies to syncing meta (I’d love to have a separate presentation on how we can use Apache Samza to do it). • Let’s talk about queries.
  17. 17. Going deeper • What client wants: – Search, sort, filter. • What can we do: – Use secondary Index. – Use Solr integration. – Fetch everything in memory and process it.
  18. 18. Can We Do Better? • We can index, and writing indexer sounds like a lot of fun. • Wait, someone already had the fun and made:
  19. 19. Account1 Json 1 Json 2 …. Json n Thrift Schema v2 Account1 Json 1 Json n Version Now We can Search on anything inside the row that represents the user Index is small and it is fast to pull it from C* But we still pulling all this bytes all he time And what if 2 servers write to the same row?
  20. 20. Distributed Cache? • It is nice to keep things as close to our MicroService as possible. • In something that can do fast reads. • And we have a lot of RAM these days. • So we can have a beefy Memcached/Redis box. • And Still pay Network penalty and think about scaling them. • What if
  21. 21. Semi Sticky Approach • Cache lives inside the MicroService, so no network penalty. • Requests for the same user are processed on the same instance, so we can save network roundtrip and also have some optimizations done (sequencing). • Changes to State also are replicated to the storage (C*) and are identified with some version number. • If instance goes down, user session will be moved to another alive instance automatically. • It is much easier to scale up Microservices than C*.
  22. 22. Or in Other Words Account 1 Version Account 2 Version Account 3 Version Account 4 Version Account 5 Version Account 6 Version Account1 jsons Version Account2 jsons Version Account3 jsons Version Account4 jsons Version Account5 jsons Version …. … … … Account n jsons Version Instance 1 Instance 2 Instance 3 Cassandra
  23. 23. My Fish Phrase Give a man a fish and you will have to give him one every day. Teach a man how to fish and move on to something more interesting.
  24. 24. Personalized Search (Friends) Real time Indexer Friends Graph Local Cache In Memory Personal Index Get Friends
  25. 25. Some Stats • Around 1 Million of Documents are indexed per second. • 10s of thousands of searches per second. • Couple dozens of moderate powered EC2s.
  26. 26. Astyanax/Thrift. Memory Leak Connections Buffers … Max connections per node
  27. 27. Astyanax Row Slice …
  28. 28. Astyanax Row Slice. Improved …
  29. 29. The most important link: /jira/browse/cassandra/ Check it daily.
  30. 30. Invisible Assassin • Small key space in a medium cluster (30 Rows, 1kb). • CQL: select * from BlockList. • Cache it in a local cache for 5 minutes. • CPU 100%, timeouts across the cluster. • Cluster of 20 nodes DIED after 3 hours. • Root cause was never found.
  31. 31. Non VNodes to VNodes migration Assigned Tokens: dc1 Vnodes: dc2 Applications
  32. 32. Went Wrong For Cache Assigned Tokens: dc1 Vnodes: dc2 Applications CL_ONE Local=dc1 CL_ONE Fastest Replica But it is empty Downstream dependency now in trouble
  33. 33. Conclusion • Pretty Stable and Scalable. • Important link: sandra/ • Keeps you in shape. • Easy To Fork to experiment.
  34. 34. But How To Get Replication Info?
  35. 35. Replication Logs Example 17:06:52 Received from DC1, R1: update KS Test CF test K 1000 C hello Size 76 Timestamp 1456333612729000 at 1456333612735000. Diff is: 6000 17:06:53 Received from DC2, R1: update KS Test CF test K 1000 C hello Size 76 Timestamp 1456333613344000 at 1456333613345000. Diff is: 10000 17:06:53 Received from DC1, R2: update KS Test CF test K 1000 C hello Size 76 Timestamp 1456333613698000 at 1456333613700000. Diff is: 2000