Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics


Published on

This webinar will walk through the solution CIGNEX developed for a real-time event logging application along with some of the key technical considerations, like selecting the proper shard key. Yash will explain the key decision factors and performance statistics that went into their solution. By selecting the correct shard key MongoDB is able to handle approximately 30 Million inserts and 5 million updates per hour. This case study will cover everything from hardware recommendations to cluster configuration management with scale.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

  1. 1. Scaling  MongoDB  with   Sharding  –  A  Case  Study   Presented  by   Yash  Badiani  and  Rahul  Nair          CIGNEX  Datamatics  Con1idential   www.cignex.com  
  2. 2. About  CIGNEX  Datamatics   A  subsidiary  of  Datamatics  Global  Services   Limited    CIGNEX  Datamatics  Con1idential   www.cignex.com   2  
  3. 3. Introduction  of  Datamatics  (DGSL)   •  Mission   Strategic  Alliances   –  Experts  in  improving   Enterprise  productivity     through    Process  Engineering  &     Information  Management   Solutions   •  Key  Highlights   –  Founded  in  1975   –  Publicly  listed  in  India   –  Annual  consolidated  revenue  of   US$100  Million   –  Fortune  500  clients   –  4,400+  employees  across  22   of1ices  in  9  countries  CIGNEX  Datamatics  Con1idential   www.cignex.com   3  
  4. 4. What  Does  CIGNEX  Datamatics  Do?  Since  2000,  making  Open  Source  work  for  the  enterprise  through  adoption  and  integration  to:   Portal   Solutions   Content     •  Address  business  goals   Solutions   •  Increase  business  velocity   •  Lower  the  cost  of  doing  business   •  Reduce  TCO   Big  Data   •  Gain  competitive  advantage   Solutions   400+  implementations  worldwide  across  industries  CIGNEX  Datamatics  Con1idential   www.cignex.com   4  
  5. 5. Where  We  Can  Help  You    SOLUTIONS   •  Intranet     •  S o c i a l   Portals   Liferay,  Drupal,  JBoss,   •  •  Extranet   EAI   Collabora>on   •  Mobile  Portals   User  eXperience     ZK,  HTML5,   •  SOA     PlaRorm   MuleSoW   Alfresco,  Adobe  CQ,     •  WCM   Content   Drupal,  Magento,     •  DM   •  E-­‐Commerce   Enterprise  Content   •  RM   •  E-­‐learning    JBoss,  Moodle,  EphesoW,   •  CMS   •  ERP   Management   •  DAM   •  Imaging   Liferay            Solu>ons       •  Analy>cs   •  DW  -­‐  BI   Hadoop,    MongoDB,  Neo4j,   Big  Data   •  •  Mobile   Social   •  Log  Processing   Flume,  Hive     and  Analysis     Making  Data  Work   •  Web   •  Enterprise   Solr,    Pentaho,  JaspersoW   •  Real-­‐>me   Search      SERVICES   UI,    Development  ,  Integra>on,    Customiza>on,    Migra>on  ,  Tes>ng,      Training  ,    Support  (24*7)   Managed  Cloud  Services  -­‐  Develop,  Deploy,  Manage   VAR/Annual  Product  Subscrip>on  -­‐  Liferay,  Alfresco,  Cloudera  Hadoop,  MongoDB     Extended  Development  Center  –  Center  of  Excellence    CIGNEX  Datamatics  Con1idential   www.cignex.com   5  
  6. 6. About  the  Presenters   •  Yash  Badiani  is  the  Big  Data  Practice  Lead  at  CIGNEX  Datamatics  and   focuses  on  Big  Data  Technologies  including  MongoDB  &  Hadoop.  He   has  worked  extensively  on  large  Data  warehousing  &  Business   Intelligence  projects  with  tools  such  as  Business  Objects,  Microsoft  SQL   Server,  Microstrategy,  IBM  Cognos.         •  Gaurav  Khambhala  works  at  CIGNEX  Datamatics  as  Technical  Lead.   He  is  the  senior  member  of  the  PHP  Practice  at  CIGNEX  Datamatics  and   is  involved  on  various  technology  initiatives  like  Big  Data  where  he   focuses  on  integration  of  PHP  with  NoSQL  sources  like  MongoDB.  He   has  a  wide  industry  experience  in  software  development  &   management  in  Open  Source  technologies  such  as  Drupal  &  Moodle  CIGNEX  Datamatics  Con1idential   www.cignex.com   6  
  7. 7. Agenda   •  CIGNEX  Datamatics  –  Introduction  &  Offerings   •  Use  Case  &  Database  Requirements   •  Challenges  with  Traditional  Databases   •  Why  MongoDB?   •  Solution     –  Approach   –  Architecture  and  Hardware  Sizing   •  Scaling  with  Sharding   –  Sharding  Basics   –  Sharding  –  Choosing  the  RIGHT  Shard  Key   –  Benchmarking  with  Results   •  Key  Takeaways    CIGNEX  Datamatics  Con1idential   www.cignex.com   7  
  8. 8. Big  Data  Practice  At  CIGNEX  Datamatics   Brief  Snapshot   •  ~40  employee  Big  Data  Practice   Technology  Partnership   focused  on  Hadoop,  MongoDB,  Neo4j,   Solr   •  Professionals  formally  trained  /   certi1ied  from  Cloudera  and  10gen   •  Expertize  in  Hadoop  Eco-­‐System   (HBase,  Pig,  Hive,  Flume,  Sqoop,   Oozie,  Zookeeper)   •  Strong  partnerships:   •  System  Integration  partners   with  Cloudera  for  CDH   •  Global  partner  with  10gen  for   MongoDB  –  multiple  webinars   on  different  solutions  CIGNEX  Datamatics  Con1idential   www.cignex.com   8  
  9. 9. Our  Offerings  –  Big  Data   Support  &   Consulting   Implementation   Training   Consulting   Implementation   Support  &  Training   •  Business  Analysis     •  UI  Development   •  DBA  Support   •  Technology  Evaluation   •  Application  Integration   •  Application  Support   •  Architecture     •  Customization   •  Enhancements   •  Design  Framework   •  Migration   •  24*7  Production   •  Cluster  sizing   •  Testing   Support(Tier  1/2/3)   •  Deployment  planning   •  Performance  Tuning   •  Trainings   •  Proof-­‐of-­‐Concept   •  Health  Check   •  Performance   Benchmarking  CIGNEX  Datamatics  Con1idential   www.cignex.com   9  
  10. 10. Use  Case     Load   Users   Devices   Database   Balancer     Data  Storage   App.  Layer   End  Users   Devices   7  Million  Users   8  devices  /  user   Load  Balancer   mongoDB  cluster   Spread  Across   Home/OfMice/ Receives    high   Sharding   Geography   Anywhere   volume  of   Replication  with   concurrent  CRUD   Automatic   requests   Failover   Routes  request   Indexes   trafMic  to  DB   cluster    CIGNEX  Datamatics  Con1idential   www.cignex.com   10  
  11. 11. Database  Requirements   Flexibility     High   in  Schema   Performance   Agility  in     Development     &  Deployment   Availability   Enterprise     Level  Support  CIGNEX  Datamatics  Con1idential   www.cignex.com   11  
  12. 12. Limitations  of  RDBMS   Support  limited  to   Manage  only  Structured   RDBMS  doesn’t  scale   Feature  rich  but  slow    terabytes   Data   inherently   performance     $   Complex  to  Shard/Partition   Limitations  in  scaling  High   Specialized  Hardware  -­‐   Vertical  Scaling  expensive   due  to  maintenance  of  schema   volume  of  concurrent  CRUD   Expensive   and  dif1icult  to  scale   RDBMS  can’t  manage  all  dimensions    of  data  with  speed  &  at  lower  cost.  CIGNEX  Datamatics  Con1idential   www.cignex.com   12
  13. 13. Why  MongoDB?    Flexibility     High     in  Schema   Performance   •  Easy  integration   •  Concurrent  CRUD     •  Ease  of  schema     •  Fast  Updates   Agility  in                  design   •  Write  distribution     Development     •  Document  oriented                  with  Sharding    &  Deployment                storage   Schema  free   •  Programming     Indexes  &  Sharding                Language  drivers   •  Shorter  Dev  cycle   •  Faster  deployment   Enterprise   Availability   Level   Support   Driver  Support   •  Automatic  failover   •  Global  Coverage   •  Redundancy   •  24x7  Support   •  100%  uptime   •  Ease  of                    maintenance   Replication   Strong  Community  CIGNEX  Datamatics  Con1idential   www.cignex.com   13  
  14. 14. Solution:  Approach     Schema   • Schema  Design                                                                     • Collections  and  Field  De1initions   • Document  Size   Database  Size   • Total  expected  data  size   • Frequency  of  CRUD  operations     Concurrent  Load   • Read/Write  ratio   • Automatic  Failover   Availability   • Replication  and  Backup   • Working  Set   Indexing   • Access  Patterns   • Horizontal  Scaling   Sharding   • Query  Performance   • Cluster  sizing   Hardware  Sizing   • RAM  and  Disk  storage  CIGNEX  Datamatics  Con1idential   www.cignex.com   14  
  15. 15. Solution:  Architecture   Con1ig  Servers   Shard  1   mongos   mongod   Server   App   Primary   Mongod   mongod   mongod     Arbiter   mongod   Secondary   mongos   Server   mongod     App     Shard  2   mongod   Primary   Mongod   mongos   Server     Arbiter   App   Balancer   Data  Tier   Load     mongod   Secondary   Routed  Requests  from   mongos  to  shards   mongos   Server     App   Shard  3   mongod   Primary   Mongod     Arbiter   mongos   mongod   Server   App   Secondary   Shard  4   mongod   mongos   Server   Primary   App     Mongod   Arbiter   mongod   Secondary   App  Tier   Routed  for  non-­‐ sharded  collections   Replica  Set   mongod   Primary   Mongod   Arbiter   mongod   Secondary  CIGNEX  Datamatics  Con1idential   www.cignex.com   15  
  16. 16. Sharding  –  What  is  it?   •  Distributes  single  logical  database  system  across  clusters   •  Allows  to  partition  a  collection  across  #  of  mongod   instances(shards)   •  Advantages:   –  Increases  write  capacity   –  Ability  to  support  larger  working  sets   –  Raises  limits  of  data  size  beyond  a  single  node    CIGNEX  Datamatics  Con1idential   www.cignex.com   16  
  17. 17. Sharding  -­‐  Features   •  Range-­‐based  Data  Partitioning   •  Automatic  Data  volume  distribution   •  Transparent  query  routing   •  Horizontal  capacity   –  Additional  write  capacity  through  distribution   –  Right  shard  key  allows  expansion  of  working  set    CIGNEX  Datamatics  Con1idential   www.cignex.com   17  
  18. 18. Sharding  –  When  to  use?   Your  data  set  approaches  or  exceeds  the  storage   capacity  of  a  single  node  in  your  system   Storage   Drive   The  size  of  your  system’s  active  working  set  will  soon   exceed  the  capacity  of  the  maximum  amount  of  RAM   for  your  system   RAM   Working  Set   Your  system  has  a  large  amount  of  write   activity,  a  single  MongoDB  instance  cannot   Storage   write  data  fast  enough  to  meet  demand,  and  all   Drive   other  approaches  have  not  reduced  contention      CIGNEX  Datamatics  Con1idential   www.cignex.com   18  
  19. 19. Shard  Keys   Shard  Keys:   •     The  ideal  shard  key  :   Exist  in  every  document  in  a   collection  that  MongoDB  uses  to   –  Easily  divisible  which  makes  it   distribute  documents  among  the   shards  like  indexes,  they  can  be   easy  for  MongoDB  to  distribute   either  a  single  1ield,  or  a   compound  key   content  among  the  shards   –  Higher  “randomness”   –  Targeted  queries   –  May  need  to  be  computed  CIGNEX  Datamatics  Con1idential   www.cignex.com   19  
  20. 20. Choosing  Right  Shard  Key   Different  approach  for  Shard  Keys     •  Approach  1:  Random  Key    –  UserId   •  Approach  2:  Coarsely  ascending  key  +  Random  Key  –     YearMonth  +  UserId    CIGNEX  Datamatics  Con1idential   www.cignex.com   20  
  21. 21. Benchmarking  /  Load  Testing  Approach   Automated  scripts  with  varied  load      CIGNEX  Datamatics  Con1idential   www.cignex.com   21  
  22. 22. Results  -­‐  INSERTS   Approach  1   Over  80  million  documents  inserted   with  a  decreasing  threshold  over  10   million   Approach  2   Over  225  million  documents  inserted  at   a  stable  rate  of  6000  documents/sec  Benchmarks  done  on  8GB  Test  H/W  Machines  CIGNEX  Datamatics  Con1idential   www.cignex.com   22  
  23. 23. Results  -­‐  UPDATES   Approach  1   Over  50  million  documents  updated  at   avg.  400  documents/sec   Approach  2   Over  100  million  documents  updated  at   as  high  as.  4000  documents/sec  Benchmarks  done  on  8GB  Test  H/W  Machines  CIGNEX  Datamatics  Con1idential   www.cignex.com   23  
  24. 24. Results  –  INSERT,  UPDATE   Approach  2   Simultaneous  INSERT   >6000  documents/  second   >70  million  records   Simultaneous  UPDATE   >6000  documents/  second   >50  million  records  Benchmarks  done  on  8GB  Test  H/W  Machines  CIGNEX  Datamatics  Con1idential   www.cignex.com   24  
  25. 25. Benchmarking  –  Sharding  Vs  Non  Sharding   Operation   Sharding  (YearMonth  +   Non-­‐Sharding   UserId)   INSERTS   ~6000  docs/sec   ~2900  docs/sec   UPDATES   ~4000  docs/sec   ~620  updates/sec   INSERT  &   ~6000  docs/sec  &   ~2000  docs/sec  &   UPDATES   ~6100  docs/sec   ~600  docs/sec  Benchmarks  done  on  8GB  Test  H/W  Machines  CIGNEX  Datamatics  Con1idential   www.cignex.com   25  
  26. 26. Key  Takeaways   •  Comprehensive  approach  on  Performance  Tuning   •  Plan  Early  for  Performance   •  MongoDB  scales  &  shines   •  Sharding  scales  INSERTS/UPDATES  vs.  Non  sharding   •  Sharding  with  Approach  2  (Coarsely  ascending  Key  +  Random   Key)  provides  sustained  results  &  better  utilization  of  the  RAM     •  Different  set  of  server/s  for  NON-­‐Sharded  collections   •  Indexes  to  be  de1ined  carefully   •  Sharded  collections  to  have  minimal  number  of  indexes  CIGNEX  Datamatics  Con1idential   www.cignex.com   26  
  27. 27. Thank  You.  Any  Questions  ?   Making  Open  Source  Work     For  queries  reach  out  to  us  at  info@cignex.com          CIGNEX  Datamatics  Con1idential   www.cignex.com