SlideShare a Scribd company logo
1 of 19
Download to read offline
Using Cassandra In Building A Reporting Platform
Javed Roshan – Director, Data Services
Mukaram Aziz – Sr. Manager, Data Services
1 Use Case
2 New Data Platform
3 Design Decisions
4 Solution Stack
5 Challenges
2© 2015. All Rights Reserved.
Use Case
•  Fast Data requirements in an Operational Space
–  Metrics and Reports for intra-day business decisions
–  Process Monitoring
•  Current Landscape
–  Multiple data sources
–  Traditional batched ETL
–  Multiple data destinations
–  Reporting Tools
•  Opportunity Areas
–  Make reports near real time
–  Achieve 99.99% SLAs
–  Time to market delivery
–  Make enhancements inexpensive
3© 2015. All Rights Reserved.
Existing
Data Sources RDBMS, Files
ETL File Based
Data Distribution Files
Data Destination RDBMS
Reporting Tools Various
New Data Platform
•  Platform
–  Data Distribution: Kafka
–  Data Processing: Go / Docker
–  Data Store: Cassandra
–  AWS
•  Design Decisions
–  Move data when available
–  Transform when all data available
•  Cassandra
–  CAP: Emphasis on A & P with tunable C
–  Wide row tables
–  Linear scalability to handle large data sizes
–  Out of the box multi-DC deployment
4© 2015. All Rights Reserved.
Existing New
Data Sources RDBMS, Files RDBMS, Files
ETL File based Go / Docker
Data Distribution Files Kafka
Data Destination RDBMS Cassandra
Reporting Tools Various Streamlined
Design Decisions
•  Data Modeling
–  Partition Key/Size
–  “Read” Response time
–  Handling Consistency
–  Collection Columns: Sets & Maps
–  Logical separation of raw & processed data
–  All lookup data in a single table
•  Indexes
–  Primary, Inverted, Secondary, DSE Search Indexes
•  DSE Search
–  range-queries
–  regular-expression
–  non-equality
–  faceted
5© 2015. All Rights Reserved.
Design Decisions
•  Consistency
–  W Consistency + R Consistency > Replication Factor
•  Indexes
6© 2015. All Rights Reserved.
Data Access Options
Data /
Index
Storage
Response
Time
Maintain Cardinality Search Consistency
Primary Key Data High V Fast App High Limited Tunable
Duplicate Data (Primary Key) Data High V Fast App High Limited Tunable
Inverted Index Index Low Fast App High Limited Tunable
Secondary Index Index Low Medium System Low Limited Tunable
DSE Search Index Medium
Slow
(relative)
System Any Versatile One (R)
Benchmarking: Indexes
0.02 0.04
1.6
0.6
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Primary Index Inverted Index DSE Search Secondary Index
7© 2015. All Rights Reserved.
Timeinseconds
Index Type
•  22.6 million rows
•  6 node cluster
Performance
•  3 Replication Factor
•  Write Heavy
–  Increased concurrent writes to 64 (from 32)
–  Decreased concurrent reads to 16 (from 32)
–  Size-tiered compaction strategy
•  Cassandra cluster with DSE Search enabled on all nodes
•  Virtual nodes set to 16
•  All caches disabled except filter cache
•  EC2 Snitch on AWS – 3 AZs
•  DSE Search soft auto-commit max time to 10s
8© 2015. All Rights Reserved.
Solution Stack
9© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
Solution Stack: Plug-In Framework
10© 2015. All Rights Reserved.
•  Go Service: Plugins chained in a single process
•  Packaged & deployed in a Docker Container
•  Bootstrapped from a config
•  100% developed in-house
RUNNER
PLUGIN
IN
CHANNEL
OUT
CHANNEL
RUNNER
PLUGIN
IN
CHANNEL
OUT
CHANNEL
RUNNER
PLUGIN
IN
CHANNEL
OUT
CHANNEL
GO	
  SERVICE
Solution Stack
11© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
Solution Stack
12© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
Solution Stack
13© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
Solution Stack
14© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
Solution Stack
15© 2015. All Rights Reserved.
MESOSPHERE
MARATHON
DOCKER
GO	
  SERVICES
INGESTION
DOCKER
GO	
  SERVICES
PROCESSORKAFKA
API	
  SERVER
CONSUL
16© 2015. All Rights Reserved.
•  Cassandra
•  Data Storage
•  Go-Based Plugin Framework
•  Go services for data Ingestion & Processing
•  Docker
•  Packaging and deployment
•  Mesosphere
•  Single view of infrastructure
•  Marathon
•  Launch containers
•  Kafka
•  Data transfer and distribution
•  Consul
•  Service discovery and configuration management
•  Jenkins
•  Continuous Integration
Solution Stack
Benchmarking: Data Processing
17© 2015. All Rights Reserved.
•  Test for a functional group
•  Cassandra: 6 node cluster
•  Kafka: 6 node cluster
•  Go Services: 3
•  Primary Data Source: Oracle
•  Time: 360 minutes
•  Data Size: 1 year
Description Measure
Total rows processed 450 million
De-normalized rows 11.8 million
Rate of processing (Go Services) ~300k tps
Rate of processing (Platform) ~21k tps
% time waiting on data ingestion 75%
Challenges
•  Not all query patterns are known in advance
•  Index rebuilds are costly
•  Business adjusting to near real-time data
•  Operational support adjustments
•  Backup/Restore
•  Finding Talent – We are hiring!
18© 2015. All Rights Reserved.
Thank you

More Related Content

Similar to Capital One: Using Cassandra In Building A Reporting Platform

SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data IntegrationJeffrey T. Pollock
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...Vladi Vexler
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-OverviewHarry Frost
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data avanttic Consultoría Tecnológica
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...Daniel Cohen
 
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...In-Memory Computing Summit
 
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataBig Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataMatt Stubbs
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - OverviewJeffrey T. Pollock
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.
 
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudDr. Wilfred Lin (Ph.D.)
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 

Similar to Capital One: Using Cassandra In Building A Reporting Platform (20)

SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
Continuous Availability and Scale-out for MySQL with ScaleBase Lite & Enterpr...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
 
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
 
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataBig Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast Data
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloud
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Capital One: Using Cassandra In Building A Reporting Platform

  • 1. Using Cassandra In Building A Reporting Platform Javed Roshan – Director, Data Services Mukaram Aziz – Sr. Manager, Data Services
  • 2. 1 Use Case 2 New Data Platform 3 Design Decisions 4 Solution Stack 5 Challenges 2© 2015. All Rights Reserved.
  • 3. Use Case •  Fast Data requirements in an Operational Space –  Metrics and Reports for intra-day business decisions –  Process Monitoring •  Current Landscape –  Multiple data sources –  Traditional batched ETL –  Multiple data destinations –  Reporting Tools •  Opportunity Areas –  Make reports near real time –  Achieve 99.99% SLAs –  Time to market delivery –  Make enhancements inexpensive 3© 2015. All Rights Reserved. Existing Data Sources RDBMS, Files ETL File Based Data Distribution Files Data Destination RDBMS Reporting Tools Various
  • 4. New Data Platform •  Platform –  Data Distribution: Kafka –  Data Processing: Go / Docker –  Data Store: Cassandra –  AWS •  Design Decisions –  Move data when available –  Transform when all data available •  Cassandra –  CAP: Emphasis on A & P with tunable C –  Wide row tables –  Linear scalability to handle large data sizes –  Out of the box multi-DC deployment 4© 2015. All Rights Reserved. Existing New Data Sources RDBMS, Files RDBMS, Files ETL File based Go / Docker Data Distribution Files Kafka Data Destination RDBMS Cassandra Reporting Tools Various Streamlined
  • 5. Design Decisions •  Data Modeling –  Partition Key/Size –  “Read” Response time –  Handling Consistency –  Collection Columns: Sets & Maps –  Logical separation of raw & processed data –  All lookup data in a single table •  Indexes –  Primary, Inverted, Secondary, DSE Search Indexes •  DSE Search –  range-queries –  regular-expression –  non-equality –  faceted 5© 2015. All Rights Reserved.
  • 6. Design Decisions •  Consistency –  W Consistency + R Consistency > Replication Factor •  Indexes 6© 2015. All Rights Reserved. Data Access Options Data / Index Storage Response Time Maintain Cardinality Search Consistency Primary Key Data High V Fast App High Limited Tunable Duplicate Data (Primary Key) Data High V Fast App High Limited Tunable Inverted Index Index Low Fast App High Limited Tunable Secondary Index Index Low Medium System Low Limited Tunable DSE Search Index Medium Slow (relative) System Any Versatile One (R)
  • 7. Benchmarking: Indexes 0.02 0.04 1.6 0.6 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Primary Index Inverted Index DSE Search Secondary Index 7© 2015. All Rights Reserved. Timeinseconds Index Type •  22.6 million rows •  6 node cluster
  • 8. Performance •  3 Replication Factor •  Write Heavy –  Increased concurrent writes to 64 (from 32) –  Decreased concurrent reads to 16 (from 32) –  Size-tiered compaction strategy •  Cassandra cluster with DSE Search enabled on all nodes •  Virtual nodes set to 16 •  All caches disabled except filter cache •  EC2 Snitch on AWS – 3 AZs •  DSE Search soft auto-commit max time to 10s 8© 2015. All Rights Reserved.
  • 9. Solution Stack 9© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 10. Solution Stack: Plug-In Framework 10© 2015. All Rights Reserved. •  Go Service: Plugins chained in a single process •  Packaged & deployed in a Docker Container •  Bootstrapped from a config •  100% developed in-house RUNNER PLUGIN IN CHANNEL OUT CHANNEL RUNNER PLUGIN IN CHANNEL OUT CHANNEL RUNNER PLUGIN IN CHANNEL OUT CHANNEL GO  SERVICE
  • 11. Solution Stack 11© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 12. Solution Stack 12© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 13. Solution Stack 13© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 14. Solution Stack 14© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 15. Solution Stack 15© 2015. All Rights Reserved. MESOSPHERE MARATHON DOCKER GO  SERVICES INGESTION DOCKER GO  SERVICES PROCESSORKAFKA API  SERVER CONSUL
  • 16. 16© 2015. All Rights Reserved. •  Cassandra •  Data Storage •  Go-Based Plugin Framework •  Go services for data Ingestion & Processing •  Docker •  Packaging and deployment •  Mesosphere •  Single view of infrastructure •  Marathon •  Launch containers •  Kafka •  Data transfer and distribution •  Consul •  Service discovery and configuration management •  Jenkins •  Continuous Integration Solution Stack
  • 17. Benchmarking: Data Processing 17© 2015. All Rights Reserved. •  Test for a functional group •  Cassandra: 6 node cluster •  Kafka: 6 node cluster •  Go Services: 3 •  Primary Data Source: Oracle •  Time: 360 minutes •  Data Size: 1 year Description Measure Total rows processed 450 million De-normalized rows 11.8 million Rate of processing (Go Services) ~300k tps Rate of processing (Platform) ~21k tps % time waiting on data ingestion 75%
  • 18. Challenges •  Not all query patterns are known in advance •  Index rebuilds are costly •  Business adjusting to near real-time data •  Operational support adjustments •  Backup/Restore •  Finding Talent – We are hiring! 18© 2015. All Rights Reserved.