SlideShare a Scribd company logo
1 of 14
Download to read offline
Intuit Proprietary & Confidential
people
The Consumer Financial Platform (CFP)
Mohit Anchlia
Architect, Intuit
Intuit Proprietary & Confidential
Agenda
2
•  Background
•  Problem statement
•  Idea of a Platform
•  Why Cassandra?
•  CFP Stack
•  CFP Cassandra Data Model
•  Learning in Production
•  Q&A
Intuit Proprietary & Confidential
Background
3
•  Intuit is maker of TurboTax, Quicken, Quickbooks and many other products
for SBUs.
•  Many services work together to deliver awesome product experience
Intuit Proprietary & Confidential
Problem Statement (Service explosion)
4
•  Service explosion over the years
–  Code duplication
–  Cross cutting concern
–  Data silos (information silos)
–  Operational challenges - schema design, installs
–  Added overhead to test and repeat test in production –
slow prototyping
Intuit Proprietary & Confidential
5
Idea of a Platform
•  Brings information together
to avoid data silos
•  Quick turnaround time
•  Plug and play service
framework
•  Don’t need IT and
operations
•  Highly personalized
experience
•  Security
•  Share data between
products, between
users
to plug ‘n’
play
Intuit Proprietary & Confidential
Data Platform/Tier
6
•  Principles – Highly Available, Highly Scalable, Fast, Easy to operate
software only solution for structured and unstructured data (blobs)
•  Projection – Petabyte in 2-3 yrs
•  Support – Critical application with 99.99%(5 nines) SLA
•  But Wait …No Stress
Intuit Proprietary & Confidential
Traditional RDBMS?
7
•  Challenges with availability and
scalability
•  Sharding works well, but introduces new challenges as well
Intuit Proprietary & Confidential
NoSQL?
8
•  Easy?
•  Core use cases – Most of the use cases don’t need transactions and with
good design, consistency can be managed properly.
•  Evaluated Hbase, MongoDB and Cassandra.
Intuit Proprietary & Confidential
Why Cassandra?
9
•  Scalable
–  Easy to scale horizontally
•  Availability
–  Highly Available, can be designed for no SPOF
–  Easy to setup clusters and replication between DC
–  Fast snapshots
–  Rolling upgrades
•  Operations
–  Easy to install and operate
–  Easy to make schema changes
•  Fast
–  Given the right hardware, Cassandra provides low latency response times.
Intuit Proprietary & Confidential
High Level CFP Stack
10
Data Platform
Services Platform
Mule ESB
Queue Service Cache service
Cassandra
RedHat Storage
(DFS)
Analytics Platform
Mule ESB
(services)
Mule ESB
HBase Hadoop Search Engine MPP
Flume
•  MuleSoft ESB for
business logic
orchestration, with
frameworks for
additional
authoring
Cassandra-powered
schemaless database
wrapped in entity and
relationship logic.
RHS – a distributed
file system for blob
storage
Hadoop/Hbase/Solr/
CEP-to meet batch
processing and near
real time analytics
Intuit Proprietary & Confidential
CFP Active/Active Multi-Data Center
11
Data Platform
Services Platform
Cassandra
RedHat Storage
(DFS)
Analytics Platform
Hadoop
Mule
Data Platform
Services Platform
Cassandra
RedHat Storage
(DFS)
Analytics Platform
Hadoop
Mule
Replication
Replication
Replication
Load
Balancer
Load
Balancer
Global Load
Balancer
•  30mt Session
stickiness
•  Provides HA
•  Low Latency
DC-A DC-B
Intuit Proprietary & Confidential
CFP Schema
12
•  Represented as a graph
–  Entity
–  Relationships
•  Additional CF for indexes
–  Inverted Indexes driven by schema
Entity User
Entity
Document
Index CF
Intuit Proprietary & Confidential
Learning in Production
13
•  Monitor Heap Usage
–  High and uneven CPU usage
–  Add nodes if you can
–  Reduce Bloom Filters
–  Increase heap if you have to, don’t be scared
Before After
•  Monitor Data per Node – Most importantly keys per node
•  Monitor disk IO
Intuit Proprietary & Confidential
The End
14
We are hiring.
Contact @ mohit_anchlia@intuit.com

More Related Content

What's hot

January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 

What's hot (20)

January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Hive vs. Impala
Hive vs. ImpalaHive vs. Impala
Hive vs. Impala
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafka
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architectures
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 

Viewers also liked

Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...
Aerospike, Inc.
 
Kai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s DynamoKai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s Dynamo
Takeru INOUE
 

Viewers also liked (20)

Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per month
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013
 
Best Practices for couchDB developers on Microsoft Azure
Best Practices for couchDB developers on Microsoft AzureBest Practices for couchDB developers on Microsoft Azure
Best Practices for couchDB developers on Microsoft Azure
 
Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...
 
Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?
 
Kai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s DynamoKai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s Dynamo
 
Descargando blobs desde el blob storage
Descargando blobs desde el blob storage Descargando blobs desde el blob storage
Descargando blobs desde el blob storage
 
Copiando blobs desde el blob storage
Copiando blobs desde el blob storage Copiando blobs desde el blob storage
Copiando blobs desde el blob storage
 
Configuración de escritorio remoto en windows azure
Configuración de escritorio remoto en windows azureConfiguración de escritorio remoto en windows azure
Configuración de escritorio remoto en windows azure
 
Azure tip crea y elimina bases de datos sql azure desde código
Azure tip crea y elimina bases de datos sql azure desde códigoAzure tip crea y elimina bases de datos sql azure desde código
Azure tip crea y elimina bases de datos sql azure desde código
 
Como configurar visual svn server desde azure
Como configurar visual svn server desde azureComo configurar visual svn server desde azure
Como configurar visual svn server desde azure
 
24 ejecutando ie con azure remoteapp
24 ejecutando ie con azure remoteapp24 ejecutando ie con azure remoteapp
24 ejecutando ie con azure remoteapp
 
Como escalar una máquina virtual
Como escalar una máquina virtualComo escalar una máquina virtual
Como escalar una máquina virtual
 
Agregando y actualizando entidades desde el table storage
Agregando y actualizando entidades desde el table storage Agregando y actualizando entidades desde el table storage
Agregando y actualizando entidades desde el table storage
 
Crear un sitio web joomla
Crear un sitio web joomlaCrear un sitio web joomla
Crear un sitio web joomla
 
Almacenando con blobs en el blob storage
Almacenando con blobs en el blob storage Almacenando con blobs en el blob storage
Almacenando con blobs en el blob storage
 
Como ejecutar aplicaciones desde cualquier plataforma
Como ejecutar aplicaciones desde cualquier plataformaComo ejecutar aplicaciones desde cualquier plataforma
Como ejecutar aplicaciones desde cualquier plataforma
 
Eliminando entidades de una tabla desde el table storage
Eliminando entidades de una tabla desde el table storage Eliminando entidades de una tabla desde el table storage
Eliminando entidades de una tabla desde el table storage
 

Similar to C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit Anchlia

start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3
David Byte
 

Similar to C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit Anchlia (20)

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
start_your_datacenter_sds_v3
start_your_datacenter_sds_v3start_your_datacenter_sds_v3
start_your_datacenter_sds_v3
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
 
MySQL :What's New #GIDS16
MySQL :What's New #GIDS16MySQL :What's New #GIDS16
MySQL :What's New #GIDS16
 
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
 Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart... Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
 
Intel Cloud Foundry and OpenStack
Intel Cloud Foundry and OpenStackIntel Cloud Foundry and OpenStack
Intel Cloud Foundry and OpenStack
 

More from DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 

Recently uploaded (20)

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 

C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to Cassandra by Mohit Anchlia

  • 1. Intuit Proprietary & Confidential people The Consumer Financial Platform (CFP) Mohit Anchlia Architect, Intuit
  • 2. Intuit Proprietary & Confidential Agenda 2 •  Background •  Problem statement •  Idea of a Platform •  Why Cassandra? •  CFP Stack •  CFP Cassandra Data Model •  Learning in Production •  Q&A
  • 3. Intuit Proprietary & Confidential Background 3 •  Intuit is maker of TurboTax, Quicken, Quickbooks and many other products for SBUs. •  Many services work together to deliver awesome product experience
  • 4. Intuit Proprietary & Confidential Problem Statement (Service explosion) 4 •  Service explosion over the years –  Code duplication –  Cross cutting concern –  Data silos (information silos) –  Operational challenges - schema design, installs –  Added overhead to test and repeat test in production – slow prototyping
  • 5. Intuit Proprietary & Confidential 5 Idea of a Platform •  Brings information together to avoid data silos •  Quick turnaround time •  Plug and play service framework •  Don’t need IT and operations •  Highly personalized experience •  Security •  Share data between products, between users to plug ‘n’ play
  • 6. Intuit Proprietary & Confidential Data Platform/Tier 6 •  Principles – Highly Available, Highly Scalable, Fast, Easy to operate software only solution for structured and unstructured data (blobs) •  Projection – Petabyte in 2-3 yrs •  Support – Critical application with 99.99%(5 nines) SLA •  But Wait …No Stress
  • 7. Intuit Proprietary & Confidential Traditional RDBMS? 7 •  Challenges with availability and scalability •  Sharding works well, but introduces new challenges as well
  • 8. Intuit Proprietary & Confidential NoSQL? 8 •  Easy? •  Core use cases – Most of the use cases don’t need transactions and with good design, consistency can be managed properly. •  Evaluated Hbase, MongoDB and Cassandra.
  • 9. Intuit Proprietary & Confidential Why Cassandra? 9 •  Scalable –  Easy to scale horizontally •  Availability –  Highly Available, can be designed for no SPOF –  Easy to setup clusters and replication between DC –  Fast snapshots –  Rolling upgrades •  Operations –  Easy to install and operate –  Easy to make schema changes •  Fast –  Given the right hardware, Cassandra provides low latency response times.
  • 10. Intuit Proprietary & Confidential High Level CFP Stack 10 Data Platform Services Platform Mule ESB Queue Service Cache service Cassandra RedHat Storage (DFS) Analytics Platform Mule ESB (services) Mule ESB HBase Hadoop Search Engine MPP Flume •  MuleSoft ESB for business logic orchestration, with frameworks for additional authoring Cassandra-powered schemaless database wrapped in entity and relationship logic. RHS – a distributed file system for blob storage Hadoop/Hbase/Solr/ CEP-to meet batch processing and near real time analytics
  • 11. Intuit Proprietary & Confidential CFP Active/Active Multi-Data Center 11 Data Platform Services Platform Cassandra RedHat Storage (DFS) Analytics Platform Hadoop Mule Data Platform Services Platform Cassandra RedHat Storage (DFS) Analytics Platform Hadoop Mule Replication Replication Replication Load Balancer Load Balancer Global Load Balancer •  30mt Session stickiness •  Provides HA •  Low Latency DC-A DC-B
  • 12. Intuit Proprietary & Confidential CFP Schema 12 •  Represented as a graph –  Entity –  Relationships •  Additional CF for indexes –  Inverted Indexes driven by schema Entity User Entity Document Index CF
  • 13. Intuit Proprietary & Confidential Learning in Production 13 •  Monitor Heap Usage –  High and uneven CPU usage –  Add nodes if you can –  Reduce Bloom Filters –  Increase heap if you have to, don’t be scared Before After •  Monitor Data per Node – Most importantly keys per node •  Monitor disk IO
  • 14. Intuit Proprietary & Confidential The End 14 We are hiring. Contact @ mohit_anchlia@intuit.com