0
CASSANDRA
IN E-COMMERCE

Alexander Solovyev
solovyov.a.g@gmail.com
A PILOT PROJECT: ONLINE PRODUCT CATALOG
FOR A E-COMMERCE PLATFORM, MIGRATION TO
CASSANDRA
Previous version was based on In...
ARCHITECTURE IN A NUTSHELL
•  Application server: all business logic + web-services
•  stateless
•  with local caches
•  D...
HOW A PRODUCT SHOULD LOOK LIKE TO MEET
THE REQUIREMENTS?
Some hypotheses:
•  Data is on disk – available immediately after...
BASIC REQUIREMENTS / USE-CASES
•  reads: ~5K TPS
•  transactions can include more that one round-trip to the
storage, as w...
CANDIDATES
•  MongoDB
•  HBase
•  Oracle Coherence + data persistence (a la Riak)
•  Cassandra
PERFORMANCE TESTING ENVIRONMENT
•  Production-ready implementation
•  4 boxes (16 CPU, 24 GB) x 1 Cassandra instance
•  2 ...
WHAT DID HELP
• 

configure your Cassandra cluster
•  “OS swap off”
•  different physical disks for different file-sets - ...
WHAT DID HELP
• 

Use the key of a parent entity as a first component of the children keys:
PRIMARY KEY (parent-ID, child-...
WHAT DID NOT HELP
• 

Java GC monitoring on Cassandra boxes
•  with recommended settings GC takes 7% maximum from overall ...
INTERESTING EXPERIMENTS
• 

another implementation of the token-aware query routing

• 

JSON or any other data format, if...
SUMMARY
• 

Cassandra is stable and mature enough product

• 

Can compete with in-memory caches and data grids, at least ...
THANK YOU

…and your questions J
Upcoming SlideShare
Loading in...5
×

Александр Соловьев "Cassandra in-e commerce". Выступление на Cassandra conf 2013

553

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
553
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Александр Соловьев "Cassandra in-e commerce". Выступление на Cassandra conf 2013"

  1. 1. CASSANDRA IN E-COMMERCE Alexander Solovyev solovyov.a.g@gmail.com
  2. 2. A PILOT PROJECT: ONLINE PRODUCT CATALOG FOR A E-COMMERCE PLATFORM, MIGRATION TO CASSANDRA Previous version was based on In-Memory Data Grid Oracle Coherence. All data from the primary storage (a relational database) is cached in the data grid. Goals of the migration: •  minimization of time required for system restart •  at least two copies of the data in different data-centers •  quick and simple backup
  3. 3. ARCHITECTURE IN A NUTSHELL •  Application server: all business logic + web-services •  stateless •  with local caches •  Data storage •  Oracle Coherence, then Cassandra via DataStax Java Driver •  Batch data loading based on Spring Batch
  4. 4. HOW A PRODUCT SHOULD LOOK LIKE TO MEET THE REQUIREMENTS? Some hypotheses: •  Data is on disk – available immediately after restart •  OS disk cache brings all the data to memory •  Key-value storage to simplify migration of the codebase Nice to have: •  Simple deployment configuration as a plus •  Java-based solution as a plus
  5. 5. BASIC REQUIREMENTS / USE-CASES •  reads: ~5K TPS •  transactions can include more that one round-trip to the storage, as well as more than one key in a query (“multi-gets”) •  ~50K TPS on side of the storage •  full data reload (once per 24 hours) •  partial update of values (e.g. of product attributes) •  availability 24x7 •  millions of products •  tens of millions of related entities (product attributes etc.)
  6. 6. CANDIDATES •  MongoDB •  HBase •  Oracle Coherence + data persistence (a la Riak) •  Cassandra
  7. 7. PERFORMANCE TESTING ENVIRONMENT •  Production-ready implementation •  4 boxes (16 CPU, 24 GB) x 1 Cassandra instance •  2 boxes x 2 app servers •  100 GB of test data - fits in memory •  Main test is read queries: •  one hour •  up to 500 users •  even distribution of requested keys
  8. 8. WHAT DID HELP •  configure your Cassandra cluster •  “OS swap off” •  different physical disks for different file-sets - e.g. data vs. commit log •  choose right (“private”) network interface •  async queries for multi-gets + token-aware rouring on the app server side: +15% TPS and latency •  use last Cassandra version •  a good example: 1.2.6 => 1.2.8 – 15% TPS, latency 2x better
  9. 9. WHAT DID HELP •  Use the key of a parent entity as a first component of the children keys: PRIMARY KEY (parent-ID, child-ID) •  to minimize number of queries / disk seeks •  +15% TPS, latency 2x better •  use local (“near”) caches on app server side: +15% TPS •  local EHCache
  10. 10. WHAT DID NOT HELP •  Java GC monitoring on Cassandra boxes •  with recommended settings GC takes 7% maximum from overall time of the tests •  caching == ALL •  all data in OS disk cache
  11. 11. INTERESTING EXPERIMENTS •  another implementation of the token-aware query routing •  JSON or any other data format, if partial updates are not needed – a pure key-value model •  allows to avoid creation of tombstones in the case of updates, if values contain Cassandra collections •  another option is tuning of tombstone GC
  12. 12. SUMMARY •  Cassandra is stable and mature enough product •  Can compete with in-memory caches and data grids, at least if dataset is small enough to be placed into memory •  Actively developing. Has a large community. Good commercial support from DataStax
  13. 13. THANK YOU …and your questions J
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×