Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration

620 views

Published on

In last few years, technology has seen a major drift in the dominance of traditional / RDMBS databases across different domains. Expeditious adoption of NoSQL databases especially Cassandra in the industry opens up a lot more discussions on what are the major challenges that are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude that migration or POC (proof of concept) is not successful; however the real flaw might be in the data modeling, identifying the right hardware configurations, database parameters, right consistency level and so on. There's no one good model or configuration which fits all use cases and all applications. Performance tuning an application is truly an art and requires perseverance. This paper delve into different performance tuning considerations and anti-patterns that need to be considered during Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra, what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database Management Systems.

Published in: Technology
  • Be the first to comment

Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration

  1. 1. Performance  tuning  -­ A  key  to  successful  Cassandra  migration
  2. 2. 1.0 Abstract 2.0 Dominance  of  traditional  RDBMS  and  Adoption  of  NoSQL 3.0 DataStax Cassandra  – ‘The  Visionary’ 4.1 Our  journey  through  Cassandra  optimization :  Data  Model 4.2 Our  journey  through  Cassandra  optimization :  Integration 4.3 Our  journey  through  Cassandra  optimization :  DB Parameters 5.0 The  only  thing  constant  is  change 6.0 Performance  tuning  -­ Key  to  success 2©  2015.  All  Rights  Reserved.    
  3. 3. Abstract 3©  2015.  All  Rights  Reserved.     In last few years, technology has seen a major drift in the dominance of traditional / RDMBS databases across different domains. Expeditious adoption of NoSQL databases especially Cassandra in the industry opens up a lot more discussions on what are the major challenges that are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude that migration or POC (proof of concept) is not successful;; however the real flaw might be in the data modeling, identifying the right hardware configurations, database parameters, right consistency level and so on. There's no one good model or configuration which fits all use cases and all applications. Performance tuning an application is truly an art and requires perseverance. This paper delve into different performance tuning considerations and anti-­patterns that need to be considered during Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra, what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database Management Systems.
  4. 4. Dominance  of  RDBMS  and  NoSQL  adoption 4©  2015.  All  Rights  Reserved.     Ø Storage  of  high  volume  data Ø Transaction  control Ø Security  management Ø Common  key  concepts Ø Evolved  over  a  period Ø Common  construct  for  querying Why  don’t  I  try  if   these  databases   can  offer  more? Ø Support  for  clusters Ø Cost Ø Impedance  mismatch Ø Adaptability  to  newer  workload
  5. 5. DataStax Cassandra  – ‘The  Visionary’  …… 5©  2015.  All  Rights  Reserved.     Ø As per Gartner’s Magic Quadrant, DataStax Cassandra is listed as a ‘Visionary’ Ø Magic Quadrant clearly calls out the differentiating factors ü High performance ü In-­memory options ü Search capabilities ü Integration with Spark and Hadoop ü Experience in doing business with the vendor Source:  www.gartner.com
  6. 6. ……  But 6©  2015.  All  Rights  Reserved.     Ø One of the major challenges listed in Gartner Magic Quadrant analysis is the poor performance during POCs Two  major  pit  falls.. Ø POCs  are  conducted  as  quick  and  dirty ü No  capacity  planning ü Performance  Tuning Ø Moving  to  production  without  enough  performance   testing
  7. 7. Don’t  be  in  dark… 7©  2015.  All  Rights  Reserved.     Have  you  tried  out  all  possible  tuning  techniques   before  concluding  the  results  ???... ü Data  model ü Integration  best  practices ü Database  parameters  
  8. 8. Performance  tuning  -­ Key  to  success 8©  2015.  All  Rights  Reserved.     Ø For a successful migration / implementation due diligence need to be done on all different aspects • Distribution • De-­Normalization • Indexing • Query  patterns Data  Model • ‘Batch’  statements • Consistency  levels • Load  balancing • Tombstones Integration • Hidden  data • Compaction • Cache DB   Parameters
  9. 9. Our  journey  through  Cassandra  optimization.. 9©  2015.  All  Rights  Reserved.     • Distribution • De-­Normalization • Indexing • Query  patterns Data  Model • ‘Batch’  statements • Consistency  levels • Load  balancing • Tombstones Integration • Hidden  data • Compaction • Cache DB   Parameters
  10. 10. Data  model 10©  2015.  All  Rights  Reserved.     Ø Equal  distribution  of  data  across   partitions Ø De-­normalization Ø Redundancy  of  data  is  acceptable   to  cater  to  different  read  use  cases Ø Reduce  client  side  joins Think  out  of  the  box  (RDBMS)  !  !  !
  11. 11. Data  model  contd.. 11©  2015.  All  Rights  Reserved.     Ø Limit secondary indexes Ø Do clustering based on the read pattern CREATE TABLE cust_interaction ( cust_id text, intr_id timeuuid, intr_tx text, PRIMARY KEY (cust_id, intr_id) ) WITH CLUSTERING ORDER BY (intr_id DESC); A  table  /  CF  that   supports  read  for  most   recent  customer   interactions
  12. 12. Our  journey  through  Cassandra  optimization.. 12©  2015.  All  Rights  Reserved.     • Distribution • De-­Normalization • Indexing • Query  patterns Data  Model • ‘Batch’  statements • Consistency  levels • Load  balancing • Tombstones Integration • Hidden  data • Compaction • Cache DB   Parameters
  13. 13. ‘Batch’  is  not  for  performance  improvement 13©  2015.  All  Rights  Reserved.     Ø Batching the statements can really harm the performance Ø Use individual inserts wherever possible N1 N2 N3 N4 N5 N6 N1 N2 N3 N4 N5 N6 Individual  Inserts Batch  Inserts
  14. 14. Consistency  levels 14©  2015.  All  Rights  Reserved.     Ø Decide consistency levels based on ü Workload ü Need for immediate consistency Read  Heavy Write  Heavy Mixed  work  load High Consistency   (Immediate) RC  :  ONE WC  :  All RC  :  All WC  :  ONE RC  :  Quorum WC  :  Quorum Relaxed   consistency RC  :  ONE WC  :  ONE, TWO RC  :  ONE,  TWO WC  :  ONE RC  :  ONE,  TWO WC  :  ONE, TWO Considered  RF  =  3
  15. 15. Load  balancing  strategy 15©  2015.  All  Rights  Reserved.     Ø Consider topology Ø Be aware of distribution of clients / users ü TokenAwarePolicy acts as a wrapper ü With multiple data centers, most preferred approach is to go with DCAwareRoundRobinPolicy with TokenAwarePolicy ü In case of single data center installations, RoundRobinPolicy with TokenAwarePolicy can be considered
  16. 16. Beware  of  Tombstones 16©  2015.  All  Rights  Reserved.     Ø Querying  data  which  has  columns  with  tombstone  set  can  bring   down  the  performance Ø Marker  in  a  row  indicates  the  delete Ø Compaction  removes  the  Tombstone  based  on  GC Ø Do  not  insert  NULL  to  Cassandra Ø IGNORE_NULLS  to  TRUE Image   Source:   www.datastax.com
  17. 17. Our  journey  through  Cassandra  optimization.. 17©  2015.  All  Rights  Reserved.     • Distribution • De-­Normalization • Indexing • Query  patterns Data  Model • ‘Batch’  statements • Consistency  levels • Load  balancing • Tombstones Integration • Hidden  data • Compaction • Cache DB   Parameters
  18. 18. Watch  for  hidden  data 18©  2015.  All  Rights  Reserved.     Ø TTL  and  gc_grace_seconds  goes  hand  in  hand Ø Even after the data is deleted (tombstone is set), it still occupies the space till it passes gc_grace_seconds Ø Direct impact on storage and performance Ø Default GC is 10 days Image   Source:   www.datastax.com
  19. 19. Compaction 19©  2015.  All  Rights  Reserved.     Ø Size Tiered Compaction : Ø Leveled Compaction : Ø Date Tiered Compaction : Ø Full  replacement  is  default Ø Incremental  Replacement Ø Anti-­compaction Ø Clients  can  read  data  directly   from  the  new  SSTable   even   before  it  finishes  writing Ø Reduce  Compaction  I/O   contention Image   Source:   www.datastax.com
  20. 20. Compaction  Cont... 20©  2015.  All  Rights  Reserved.     Ø Default is Size-­tiered Ø Alter column family to change compaction type Image   Source:   www.datastax.com
  21. 21. Compaction  Cont... 21©  2015.  All  Rights  Reserved.     Ø Handle  Time  series-­like  data DateTiered Compaction  Strategy Image   Source:   www.datastax.com
  22. 22. Cache  what  you  need 22©  2015.  All  Rights  Reserved.     Cassandra read path = A lot of in-­memory components.. Be Optimal... Image   Source:   https://academy.datastax.com/ Row  cache  hit Ø Row Cache – Turned OFF by default ü Caches the complete data ü Earlier versions used to load the whole partition ü From  2.1,  number  of  rows  cached   per  partition  is  configurable ü Optimal  for  low  volume  data  that   are  frequently  accessed
  23. 23. Cache  what  you  need  contd.. 23©  2015.  All  Rights  Reserved.     Image   Source:   https://academy.datastax.com/ Key  cache  hit Ø Key  Cache  – Turned  ON  by  default ü Caches  just  the  key ü Turning  OFF  à Increase  the   response  time  for  retrieves ü Place  frequently  and  sparsely   read  data  to  different  CF No  one  configuration  fits  all.  Tuning  has  to  be  iterative
  24. 24. The  only  thing  constant  is  change 24©  2015.  All  Rights  Reserved.     2011  –2012 -­ Secondary  Indexes -­ Online  schema             changes -­ Introduction  of  CQL -­ Zero-­downtime   upgrade -­ Leveled  compaction 2013  -­2014 -­ Virtual  nodes -­ Inter-­node     communication -­ Light  weight  tnxs -­ Triggers -­ Change  in  data  and   log  location -­ User  defined  data   types 2015 -­ Commit  log     compression -­ JSON  support -­ Role-­based     authorization -­ User  defined   functions -­ Windows  support -­ Monthly  versions   Keep  up  with  the  pace..  Changes  can  impact  the  performance  a  lot..
  25. 25. Performance  tuning  -­ Key  to  success 25©  2015.  All  Rights  Reserved.     DBA Developer Sys  Admin Traditional  DBMS  world NoSQL  World Database  Engineer Boundary  between  different  roles  has  blurred..   Onus  is  on  ‘us’  to  tune,  tune  and  tune  the  system  to  make  the  Cassandra   implementation  successful..  !!!
  26. 26. Question  &  Answers 26©  2015.  All  Rights  Reserved.     ???
  27. 27. Authors 27©  2015.  All  Rights  Reserved.     Tiju  Francis,   Principal  Technology  Architect,  Infosys  Ltd https://www.linkedin.com/in/tijufrancis Ramkumar  Nottath,   Technology  Architect,  Infosys  Ltd   https://www.linkedin.com/in/ramnottath Arunshankar  Arjunan,   Technology  Architect,  Infosys  Ltd https://www.linkedin.com/in/arunshankararjunan
  28. 28. Thanks.. 28©  2015.  All  Rights  Reserved.     Ø Thanks to all great minds who contributed towards this presentation. ü Srivas J, Infosys Ltd ü Srivas G, Infosys Ltd ü Lakshman G, Infosys Ltd ü Kiran N G Infosys Ltd ü Sivaram K Infosys Ltd ü Chethan Danivas, Infosys Ltd ü Badrinath Narayanan, Infosys Ltd ü Gautam Tiwari, Infosys Ltd ü Shailesh Janrao Barde , Infosys Ltd
  29. 29. References 29©  2015.  All  Rights  Reserved.     Ø NoSQL Distilled by Pramod J. Sadalage and Martin Fowler Ø https://academy.datastax.com/courses Ø http://www.gartner.com/ Ø Mastering  Apache  Cassandra Ø http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ Ø http://www.planetcassandra.org/cassandra/ Ø http://jonathanhui.com/cassandra-­performance-­tuning-­and-­monitoring Source:  www.gartner.com
  30. 30. Thank  you

×