SlideShare a Scribd company logo
1 of 42
Cassandra:
Who is talking
Alexander Filipchik (PSN: LaserToy)
Principal Software Engineer
at Sony Network Entertainment
Who is talking
Alexander Filipchik (PSN: LaserToy)
Principal Software Engineer
at Sony Interactive Entertainment
Me
The Rise of PlayStation4
PlayStation Network is big and growing.
– Over 65 million monthly active users.
– Hundreds of millions of users.
– A Lot of Services.
PlayStation 4 growth
• Pre warm – November 2013, couple
thousands PS4s for Taco Bell.
• Launch Day – 1,000,000 PS4s several days
later.
• Adding 1.3 Millions devices a month.
Let’s compare us with
2009 MySql
Year Unicorn’s Tech Our Tech
2011 MongoDB/MySql
2012 Redis/MySql PS3: MySQL + Memcached
2013 Redis/Postgres MySQL + Memcached/Cassandra
2014 Redis/Shards For Postgres + MySql MySQL + Memcached/Cassandra
2015 Riak/Shards For Postgres + MySql MySQL + Memcached/Cassandra +
Redis
2016 ??? MySQL + Memcached/Cassandra +
Redis
Ready for BigBang
The Problem
• Legacy System use well known Relational DB to
handle our transactions.
• It is state of the art software that doesn’t scale
well in our circumstances.
• We wanted to allow client to run any queries
without consulting with hundreds of DBAs first.
• Sharding sounds like a pain.
• Multiple regions should be easy.
Solution
The Bad
Axiom
It is Not Easy to Replace Relational Database
with Cassandra for user facing traffic.
Simple Digital Store Model
Anotherhundredtables
CQL Going to Save Us!!!
• No Joins.
• No Transactions.
• No search.
• Just weird.
What if we denormalize?
Purchased
Thrift Schema
Account1 Json 1 Json 2 …. Json n
Now it horizontally scalable
We have in row transactions
Read is very fast – no joins
Now we need to propagate user purchases
from DB to C*
And figure out how to support queries
And sometimes to synchronize changes
in related objects (metadata)
Solving the Puzzle
• There are number of ways we can use to notify
C* about account level changes in the source of
truth - let’s not talk about it for now.
• Same applies to syncing meta (I’d love to have a
separate presentation on how we can use Apache
Samza to do it).
• Let’s talk about queries.
Going deeper
• What client wants:
– Search, sort, filter.
• What can we do:
– Use secondary Index.
– Use Solr integration.
– Fetch everything in memory and process it.
Can We Do Better?
• We can index, and writing indexer sounds like
a lot of fun.
• Wait, someone already had the fun and made:
Account1 Json 1 Json 2 …. Json n
Thrift Schema v2
Account1 Json 1 Json n Version
Now We can Search on anything inside the row that represents the user
Index is small and it is fast to pull it from C*
But we still pulling all this bytes all he time
And what if 2 servers write to the same row?
Distributed Cache?
• It is nice to keep things as close to our MicroService as
possible.
• In something that can do fast reads.
• And we have a lot of RAM these days.
• So we can have a beefy Memcached/Redis box.
• And Still pay Network penalty and think about scaling
them.
• What if
Semi Sticky Approach
• Cache lives inside the MicroService, so no network penalty.
• Requests for the same user are processed on the same
instance, so we can save network roundtrip and also have
some optimizations done (sequencing).
• Changes to State also are replicated to the storage (C*) and
are identified with some version number.
• If instance goes down, user session will be moved to
another alive instance automatically.
• It is much easier to scale up Microservices than C*.
Or in Other Words
Account
1
Version
Account 2
Version
Account 3
Version
Account
4
Version
Account
5
Version
Account 6
Version
Account1 jsons Version
Account2 jsons Version
Account3 jsons Version
Account4 jsons Version
Account5 jsons Version
…. … … …
Account n jsons Version
Instance 1
Instance 2
Instance 3
Cassandra
My Fish Phrase
Give a man a fish and you will have
to give him one every day.
Teach a man how to fish
and move on to something
more interesting.
Personalized Search (Friends)
Real time Indexer
Friends Graph
Local Cache
In Memory
Personal
Index
Get Friends
Some Stats
• Around 1 Million of Documents are indexed
per second.
• 10s of thousands of searches per second.
• Couple dozens of moderate powered EC2s.
Astyanax/Thrift. Memory Leak
Connections Buffers
…
Max connections
per node
Astyanax Row Slice
…
Astyanax Row Slice. Improved
…
The most important link:
https://issues.apache.org
/jira/browse/cassandra/
Check it daily.
Invisible Assassin
• Small key space in a medium cluster (30 Rows,
1kb).
• CQL: select * from BlockList.
• Cache it in a local cache for 5 minutes.
• CPU 100%, timeouts across the cluster.
• Cluster of 20 nodes DIED after 3 hours.
• Root cause was never found.
Non VNodes to VNodes migration
Assigned Tokens: dc1 Vnodes: dc2
Applications
Went Wrong For Cache
Assigned Tokens: dc1 Vnodes: dc2
Applications
CL_ONE
Local=dc1
CL_ONE
Fastest Replica
But it is empty
Downstream dependency
now in trouble
Conclusion
• Pretty Stable and Scalable.
• Important
link:https://issues.apache.org/jira/browse/cas
sandra/
• Keeps you in shape.
• Easy To Fork to experiment.
But How To Get Replication Info?
Replication Logs Example
17:06:52 Received from DC1, R1: update KS Test CF test K 1000 C hello Size 76
Timestamp 1456333612729000 at 1456333612735000. Diff is: 6000
17:06:53 Received from DC2, R1: update KS Test CF test K 1000 C hello Size 76
Timestamp 1456333613344000 at 1456333613345000. Diff is: 10000
17:06:53 Received from DC1, R2: update KS Test CF test K 1000 C hello Size 76
Timestamp 1456333613698000 at 1456333613700000. Diff is: 2000
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2

More Related Content

What's hot

PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
DataStax
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
Axel Liljencrantz
 

What's hot (20)

Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profiling
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
Target: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at TargetTarget: Performance Tuning Cassandra at Target
Target: Performance Tuning Cassandra at Target
 
RedisConf18 - Ultra Scaling with Redis Enterprise
RedisConf18 - Ultra Scaling with Redis EnterpriseRedisConf18 - Ultra Scaling with Redis Enterprise
RedisConf18 - Ultra Scaling with Redis Enterprise
 
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) |...
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
 
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache CassandraGöteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
 
Webinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful ConsistencyWebinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful Consistency
 
How to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesHow to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instances
 
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with ScyllaScylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Apache Cassandra Management
Apache Cassandra ManagementApache Cassandra Management
Apache Cassandra Management
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Ubiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil TwinUbiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil Twin
 

Viewers also liked

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 

Viewers also liked (20)

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Cassandra: One (is the loneliest number)
Cassandra: One (is the loneliest number)Cassandra: One (is the loneliest number)
Cassandra: One (is the loneliest number)
 
Netflix Operational Simplicity with Apache Cassandra
Netflix Operational Simplicity with Apache CassandraNetflix Operational Simplicity with Apache Cassandra
Netflix Operational Simplicity with Apache Cassandra
 
Successful Software Development with Apache Cassandra
Successful Software Development with Apache CassandraSuccessful Software Development with Apache Cassandra
Successful Software Development with Apache Cassandra
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and Cassandra
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 

Similar to Cassandra @ Sony: The good, the bad, and the ugly part 2

Similar to Cassandra @ Sony: The good, the bad, and the ugly part 2 (20)

AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianC* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
 
Store stream data on Data Lake
Store stream data on Data LakeStore stream data on Data Lake
Store stream data on Data Lake
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL database
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
 

More from DataStax Academy

More from DataStax Academy (7)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Traveler's Guide to Cassandra
Traveler's Guide to CassandraTraveler's Guide to Cassandra
Traveler's Guide to Cassandra
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
New features in 3.0
New features in 3.0New features in 3.0
New features in 3.0
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Cassandra @ Sony: The good, the bad, and the ugly part 2

  • 2. Who is talking Alexander Filipchik (PSN: LaserToy) Principal Software Engineer at Sony Network Entertainment
  • 3. Who is talking Alexander Filipchik (PSN: LaserToy) Principal Software Engineer at Sony Interactive Entertainment
  • 4. Me
  • 5. The Rise of PlayStation4 PlayStation Network is big and growing. – Over 65 million monthly active users. – Hundreds of millions of users. – A Lot of Services.
  • 6. PlayStation 4 growth • Pre warm – November 2013, couple thousands PS4s for Taco Bell. • Launch Day – 1,000,000 PS4s several days later. • Adding 1.3 Millions devices a month.
  • 8. 2009 MySql Year Unicorn’s Tech Our Tech 2011 MongoDB/MySql 2012 Redis/MySql PS3: MySQL + Memcached 2013 Redis/Postgres MySQL + Memcached/Cassandra 2014 Redis/Shards For Postgres + MySql MySQL + Memcached/Cassandra 2015 Riak/Shards For Postgres + MySql MySQL + Memcached/Cassandra + Redis 2016 ??? MySQL + Memcached/Cassandra + Redis Ready for BigBang
  • 9.
  • 10. The Problem • Legacy System use well known Relational DB to handle our transactions. • It is state of the art software that doesn’t scale well in our circumstances. • We wanted to allow client to run any queries without consulting with hundreds of DBAs first. • Sharding sounds like a pain. • Multiple regions should be easy.
  • 12. The Bad Axiom It is Not Easy to Replace Relational Database with Cassandra for user facing traffic.
  • 13. Simple Digital Store Model Anotherhundredtables
  • 14. CQL Going to Save Us!!! • No Joins. • No Transactions. • No search. • Just weird.
  • 15. What if we denormalize? Purchased
  • 16. Thrift Schema Account1 Json 1 Json 2 …. Json n Now it horizontally scalable We have in row transactions Read is very fast – no joins Now we need to propagate user purchases from DB to C* And figure out how to support queries And sometimes to synchronize changes in related objects (metadata)
  • 17. Solving the Puzzle • There are number of ways we can use to notify C* about account level changes in the source of truth - let’s not talk about it for now. • Same applies to syncing meta (I’d love to have a separate presentation on how we can use Apache Samza to do it). • Let’s talk about queries.
  • 18. Going deeper • What client wants: – Search, sort, filter. • What can we do: – Use secondary Index. – Use Solr integration. – Fetch everything in memory and process it.
  • 19. Can We Do Better? • We can index, and writing indexer sounds like a lot of fun. • Wait, someone already had the fun and made:
  • 20. Account1 Json 1 Json 2 …. Json n Thrift Schema v2 Account1 Json 1 Json n Version Now We can Search on anything inside the row that represents the user Index is small and it is fast to pull it from C* But we still pulling all this bytes all he time And what if 2 servers write to the same row?
  • 21. Distributed Cache? • It is nice to keep things as close to our MicroService as possible. • In something that can do fast reads. • And we have a lot of RAM these days. • So we can have a beefy Memcached/Redis box. • And Still pay Network penalty and think about scaling them. • What if
  • 22. Semi Sticky Approach • Cache lives inside the MicroService, so no network penalty. • Requests for the same user are processed on the same instance, so we can save network roundtrip and also have some optimizations done (sequencing). • Changes to State also are replicated to the storage (C*) and are identified with some version number. • If instance goes down, user session will be moved to another alive instance automatically. • It is much easier to scale up Microservices than C*.
  • 23. Or in Other Words Account 1 Version Account 2 Version Account 3 Version Account 4 Version Account 5 Version Account 6 Version Account1 jsons Version Account2 jsons Version Account3 jsons Version Account4 jsons Version Account5 jsons Version …. … … … Account n jsons Version Instance 1 Instance 2 Instance 3 Cassandra
  • 24. My Fish Phrase Give a man a fish and you will have to give him one every day. Teach a man how to fish and move on to something more interesting.
  • 25. Personalized Search (Friends) Real time Indexer Friends Graph Local Cache In Memory Personal Index Get Friends
  • 26. Some Stats • Around 1 Million of Documents are indexed per second. • 10s of thousands of searches per second. • Couple dozens of moderate powered EC2s.
  • 27.
  • 28. Astyanax/Thrift. Memory Leak Connections Buffers … Max connections per node
  • 30. Astyanax Row Slice. Improved …
  • 31. The most important link: https://issues.apache.org /jira/browse/cassandra/ Check it daily.
  • 32. Invisible Assassin • Small key space in a medium cluster (30 Rows, 1kb). • CQL: select * from BlockList. • Cache it in a local cache for 5 minutes. • CPU 100%, timeouts across the cluster. • Cluster of 20 nodes DIED after 3 hours. • Root cause was never found.
  • 33.
  • 34. Non VNodes to VNodes migration Assigned Tokens: dc1 Vnodes: dc2 Applications
  • 35. Went Wrong For Cache Assigned Tokens: dc1 Vnodes: dc2 Applications CL_ONE Local=dc1 CL_ONE Fastest Replica But it is empty Downstream dependency now in trouble
  • 36. Conclusion • Pretty Stable and Scalable. • Important link:https://issues.apache.org/jira/browse/cas sandra/ • Keeps you in shape. • Easy To Fork to experiment.
  • 37.
  • 38.
  • 39. But How To Get Replication Info?
  • 40. Replication Logs Example 17:06:52 Received from DC1, R1: update KS Test CF test K 1000 C hello Size 76 Timestamp 1456333612729000 at 1456333612735000. Diff is: 6000 17:06:53 Received from DC2, R1: update KS Test CF test K 1000 C hello Size 76 Timestamp 1456333613344000 at 1456333613345000. Diff is: 10000 17:06:53 Received from DC1, R2: update KS Test CF test K 1000 C hello Size 76 Timestamp 1456333613698000 at 1456333613700000. Diff is: 2000