Your SlideShare is downloading. ×
Cassandra in e-commerce
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Cassandra in e-commerce

861
views

Published on

Slides from Moscow BigData/Cassandra September 2013 meetup

Slides from Moscow BigData/Cassandra September 2013 meetup

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
861
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CASSANDRA IN E-COMMERCE Alexander Solovyev solovyov.a.g@gmail.com
  • 2. A PILOT PROJECT: ONLINE PRODUCT CATALOG FOR A E-COMMERCE PLATFORM, MIGRATION TO CASSANDRA Previous version was based on In-Memory Data Grid Oracle Coherence. All data from the primary storage (a relational database) is cached in the data grid. Goals of the migration: •  minimization of time required for system restart •  at least two copies of the data in different data-centers •  quick and simple backup
  • 3. ARCHITECTURE IN A NUTSHELL •  Application server: all business logic + web-services •  stateless •  with local caches •  Data storage •  Oracle Coherence, then Cassandra via DataStax Java Driver •  Batch data loading based on Spring Batch
  • 4. HOW A PRODUCT SHOULD LOOK LIKE TO MEET THE REQUIREMENTS? Some hypotheses: •  Data is on disk – available immediately after restart •  OS disk cache brings all the data to memory •  Key-value storage to simplify migration of the codebase Nice to have: •  Simple deployment configuration as a plus •  Java-based solution as a plus
  • 5. BASIC REQUIREMENTS / USE-CASES •  reads: ~5K TPS •  transactions can include more that one round-trip to the storage, as well as more than one key in a query (“multi-gets”) •  ~50K TPS on side of the storage •  full data reload (once per 24 hours) •  partial update of values (e.g. of product attributes) •  availability 24x7 •  millions of products •  tens of millions of related entities (product attributes etc.)
  • 6. CANDIDATES •  MongoDB •  HBase •  Oracle Coherence + data persistence (a la Riak) •  Cassandra
  • 7. PERFORMANCE TESTING ENVIRONMENT •  Production-ready implementation •  4 boxes (16 CPU, 24 GB) x 1 Cassandra instance •  2 boxes x 2 app servers •  100 GB of test data - fits in memory •  Main test is read queries: •  one hour •  up to 500 users •  even distribution of requested keys
  • 8. WHAT DID HELP •  configure your Cassandra cluster •  “OS swap off” •  different physical disks for different file-sets - e.g. data vs. commit log •  choose right (“private”) network interface •  async queries for multi-gets + token-aware rouring on the app server side: +15% TPS and latency •  use last Cassandra version •  a good example: 1.2.6 => 1.2.8 – 15% TPS, latency 2x better
  • 9. WHAT DID HELP •  Use the key of a parent entity as a first component of the children keys: PRIMARY KEY (parent-ID, child-ID) •  to minimize number of queries / disk seeks •  +15% TPS, latency 2x better •  use local (“near”) caches on app server side: +15% TPS •  local EHCache
  • 10. WHAT DID NOT HELP •  Java GC monitoring on Cassandra boxes •  with recommended settings GC takes 7% maximum from overall time of the tests •  caching == ALL •  all data in OS disk cache
  • 11. INTERESTING EXPERIMENTS •  another implementation of the token-aware query routing •  JSON or any other data format, if partial updates are not needed – a pure key-value model •  allows to avoid creation of tombstones in the case of updates, if values contain Cassandra collections •  another option is tuning of tombstone GC
  • 12. SUMMARY •  Cassandra is stable and mature enough product •  Can compete with in-memory caches and data grids, at least if dataset is small enough to be placed into memory •  Actively developing. Has a large community. Good commercial support from DataStax
  • 13. THANK YOU …and your questions J