Successfully reported this slideshow.
Your SlideShare is downloading. ×

C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit Anchlia

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Visualizing Kafka Security
Visualizing Kafka Security
Loading in …3
×

Check these out next

1 of 14 Ad

C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit Anchlia

Download to read offline

This session talks about Intuit’s journey of our Consumer Financial Platform that is built to scale to petabytes of data. The original system used a major RDBMS and from there, we redesigned to use the distributed nature of Cassandra. This talk will go through our transition including the data model used for the final product. As with any large system transition, many hard lessons are learned and we will discuss those and share our experiences.

This session talks about Intuit’s journey of our Consumer Financial Platform that is built to scale to petabytes of data. The original system used a major RDBMS and from there, we redesigned to use the distributed nature of Cassandra. This talk will go through our transition including the data model used for the final product. As with any large system transition, many hard lessons are learned and we will discuss those and share our experiences.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit Anchlia (20)

More from DataStax Academy (20)

Advertisement

Recently uploaded (20)

C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit Anchlia

  1. 1. Intuit Proprietary & Confidential people The Consumer Financial Platform (CFP) Mohit Anchlia Architect, Intuit
  2. 2. Intuit Proprietary & Confidential Agenda 2 •  Background •  Problem statement •  Idea of a Platform •  Why Cassandra? •  CFP Stack •  CFP Cassandra Data Model •  Learning in Production •  Q&A
  3. 3. Intuit Proprietary & Confidential Background 3 •  Intuit is maker of TurboTax, Quicken, Quickbooks and many other products for SBUs. •  Many services work together to deliver awesome product experience
  4. 4. Intuit Proprietary & Confidential Problem Statement (Service explosion) 4 •  Service explosion over the years –  Code duplication –  Cross cutting concern –  Data silos (information silos) –  Operational challenges - schema design, installs –  Added overhead to test and repeat test in production – slow prototyping
  5. 5. Intuit Proprietary & Confidential 5 Idea of a Platform •  Brings information together to avoid data silos •  Quick turnaround time •  Plug and play service framework •  Don’t need IT and operations •  Highly personalized experience •  Security •  Share data between products, between users to plug ‘n’ play
  6. 6. Intuit Proprietary & Confidential Data Platform/Tier 6 •  Principles – Highly Available, Highly Scalable, Fast, Easy to operate software only solution for structured and unstructured data (blobs) •  Projection – Petabyte in 2-3 yrs •  Support – Critical application with 99.99%(5 nines) SLA •  But Wait …No Stress
  7. 7. Intuit Proprietary & Confidential Traditional RDBMS? 7 •  Challenges with availability and scalability •  Sharding works well, but introduces new challenges as well
  8. 8. Intuit Proprietary & Confidential NoSQL? 8 •  Easy? •  Core use cases – Most of the use cases don’t need transactions and with good design, consistency can be managed properly. •  Evaluated Hbase, MongoDB and Cassandra.
  9. 9. Intuit Proprietary & Confidential Why Cassandra? 9 •  Scalable –  Easy to scale horizontally •  Availability –  Highly Available, can be designed for no SPOF –  Easy to setup clusters and replication between DC –  Fast snapshots –  Rolling upgrades •  Operations –  Easy to install and operate –  Easy to make schema changes •  Fast –  Given the right hardware, Cassandra provides low latency response times.
  10. 10. Intuit Proprietary & Confidential High Level CFP Stack 10 Data Platform Services Platform Mule ESB Queue Service Cache service Cassandra RedHat Storage (DFS) Analytics Platform Mule ESB (services) Mule ESB HBase Hadoop Search Engine MPP Flume •  MuleSoft ESB for business logic orchestration, with frameworks for additional authoring Cassandra-powered schemaless database wrapped in entity and relationship logic. RHS – a distributed file system for blob storage Hadoop/Hbase/Solr/ CEP-to meet batch processing and near real time analytics
  11. 11. Intuit Proprietary & Confidential CFP Active/Active Multi-Data Center 11 Data Platform Services Platform Cassandra RedHat Storage (DFS) Analytics Platform Hadoop Mule Data Platform Services Platform Cassandra RedHat Storage (DFS) Analytics Platform Hadoop Mule Replication Replication Replication Load Balancer Load Balancer Global Load Balancer •  30mt Session stickiness •  Provides HA •  Low Latency DC-A DC-B
  12. 12. Intuit Proprietary & Confidential CFP Schema 12 •  Represented as a graph –  Entity –  Relationships •  Additional CF for indexes –  Inverted Indexes driven by schema Entity User Entity Document Index CF
  13. 13. Intuit Proprietary & Confidential Learning in Production 13 •  Monitor Heap Usage –  High and uneven CPU usage –  Add nodes if you can –  Reduce Bloom Filters –  Increase heap if you have to, don’t be scared Before After •  Monitor Data per Node – Most importantly keys per node •  Monitor disk IO
  14. 14. Intuit Proprietary & Confidential The End 14 We are hiring. Contact @ mohit_anchlia@intuit.com

×