Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit Anchlia


Published on

This session talks about Intuit’s journey of our Consumer Financial Platform that is built to scale to petabytes of data. The original system used a major RDBMS and from there, we redesigned to use the distributed nature of Cassandra. This talk will go through our transition including the data model used for the final product. As with any large system transition, many hard lessons are learned and we will discuss those and share our experiences.

Published in: Technology
  • Be the first to comment

C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit Anchlia

  1. 1. Intuit Proprietary & ConfidentialpeopleThe Consumer Financial Platform (CFP)Mohit AnchliaArchitect, Intuit
  2. 2. Intuit Proprietary & ConfidentialAgenda2•  Background•  Problem statement•  Idea of a Platform•  Why Cassandra?•  CFP Stack•  CFP Cassandra Data Model•  Learning in Production•  Q&A
  3. 3. Intuit Proprietary & ConfidentialBackground3•  Intuit is maker of TurboTax, Quicken, Quickbooks and many other productsfor SBUs.•  Many services work together to deliver awesome product experience
  4. 4. Intuit Proprietary & ConfidentialProblem Statement (Service explosion)4•  Service explosion over the years–  Code duplication–  Cross cutting concern–  Data silos (information silos)–  Operational challenges - schema design, installs–  Added overhead to test and repeat test in production –slow prototyping
  5. 5. Intuit Proprietary & Confidential5Idea of a Platform•  Brings information togetherto avoid data silos•  Quick turnaround time•  Plug and play serviceframework•  Don’t need IT andoperations•  Highly personalizedexperience•  Security•  Share data betweenproducts, betweenusersto plug ‘n’play
  6. 6. Intuit Proprietary & ConfidentialData Platform/Tier6•  Principles – Highly Available, Highly Scalable, Fast, Easy to operatesoftware only solution for structured and unstructured data (blobs)•  Projection – Petabyte in 2-3 yrs•  Support – Critical application with 99.99%(5 nines) SLA•  But Wait …No Stress
  7. 7. Intuit Proprietary & ConfidentialTraditional RDBMS?7•  Challenges with availability andscalability•  Sharding works well, but introduces new challenges as well
  8. 8. Intuit Proprietary & ConfidentialNoSQL?8•  Easy?•  Core use cases – Most of the use cases don’t need transactions and withgood design, consistency can be managed properly.•  Evaluated Hbase, MongoDB and Cassandra.
  9. 9. Intuit Proprietary & ConfidentialWhy Cassandra?9•  Scalable–  Easy to scale horizontally•  Availability–  Highly Available, can be designed for no SPOF–  Easy to setup clusters and replication between DC–  Fast snapshots–  Rolling upgrades•  Operations–  Easy to install and operate–  Easy to make schema changes•  Fast–  Given the right hardware, Cassandra provides low latency response times.
  10. 10. Intuit Proprietary & ConfidentialHigh Level CFP Stack10Data PlatformServices PlatformMule ESBQueue Service Cache serviceCassandraRedHat Storage(DFS)Analytics PlatformMule ESB(services)Mule ESBHBase Hadoop Search Engine MPPFlume•  MuleSoft ESB forbusiness logicorchestration, withframeworks foradditionalauthoringCassandra-poweredschemaless databasewrapped in entity andrelationship logic.RHS – a distributedfile system for blobstorageHadoop/Hbase/Solr/CEP-to meet batchprocessing and nearreal time analytics
  11. 11. Intuit Proprietary & ConfidentialCFP Active/Active Multi-Data Center11Data PlatformServices PlatformCassandraRedHat Storage(DFS)Analytics PlatformHadoopMuleData PlatformServices PlatformCassandraRedHat Storage(DFS)Analytics PlatformHadoopMuleReplicationReplicationReplicationLoadBalancerLoadBalancerGlobal LoadBalancer•  30mt Sessionstickiness•  Provides HA•  Low LatencyDC-A DC-B
  12. 12. Intuit Proprietary & ConfidentialCFP Schema12•  Represented as a graph–  Entity–  Relationships•  Additional CF for indexes–  Inverted Indexes driven by schemaEntity UserEntityDocumentIndex CF
  13. 13. Intuit Proprietary & ConfidentialLearning in Production13•  Monitor Heap Usage–  High and uneven CPU usage–  Add nodes if you can–  Reduce Bloom Filters–  Increase heap if you have to, don’t be scaredBefore After•  Monitor Data per Node – Most importantly keys per node•  Monitor disk IO
  14. 14. Intuit Proprietary & ConfidentialThe End14We are hiring.Contact @