SlideShare a Scribd company logo
1 of 23
Download to read offline
CASSANDRA for BigData Event
logging
Ramesh Veeramani
PROBLEM SCENARIO
• Event logs and Permission logs Storage , Retrieval and Maintenance
poor performance and scalability issue
• Currently HBase and Oracle DB used.
• Need for better storage and query capability
• Data are immutable.
• Eventual consistency acceptable.
• New Infra to scale to 1 PB data.
Option n Cassandra
• Why Cassandra ?
• How with Cassandra?
• Internal working
• Setup
• Tools
• Maintenance
• Cassandra constraints
• Comparison
Why Cassandra
• Scales Incrementally
• Highly Available
• Uses Peer – Peer topology ; instead of naive Master slave.
• Good for OLTP and for data changes.
• Availability a high priority
• Write Throughput higher than Read Throughput.
• Lot like Relational model easy to develop.
• Well established since 2008
CAP THEOREM
Cassandra - Internal
• Tokens and Hashing
• Virtual nodes
• Allows for equal CPU utilization for all the server in case of removing and adding nodes.
• Token ID assignments automatic.
• Configured in Cassandra.yaml [num_nodes =256]
• Replication
• Gossip / Snitch
• Read and Write ?
• Compaction Strategy.
• Add and Remove nodes
TOKEN AND HASHING
REPLICATION
Constraints with Replication
• Consistency an issue.
• Aims for Eventual Consistency.
• Read = Write Possible with strict consistency.
• Configurable Consistency to different level
• ConfigurationLevel = {All, Quorum, One}
• Discretion of the Co-Ordinator node to enforces Replication and CL
• There is possibility of stale data in production
• Operation effort to synchronize the data (nodetool repair <node>)
• Synchonizes the data on each node is timely operation to be done
Gossip /Snitch
• Cassandra uses Gossip Protocol instead of naïve ping or other comm
protocol
• Gossip is epidemic and probabilistic protocol .
• Gossip is not deterministic.
Why write
are faster…
Why read are relatively slower….
• Retrieving of rows and columns from the datastore.
• If all columns present in the MemTable. The results are returned.
• If data not found control pass through the SSTable in order of entry.
• Bloom filter expedites the search in SSTable.
• BF is a Hash table Datastructure signifying if criteria in the SSTable
COMPACTION STATERGY
• Four compaction strategy
• Size
• Data
• Time
• Level [** Recommended for read intensive workload ]
SETUP
• 2 NODE EACH
• processor Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
• processor CPU
• memory 8188MiB System Memory
• memory 8188MiB DIMM RAM
• CentOS Linux release 7.5.1804 (Core)
• sudo vim /etc/yum.repos.d/cassandra.repo
[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS
• yum -y install cassandra
• systemctl start cassandra
• systemctl enable cassandra
• sudo systemctl enable cassandra
• nodetool status
*** Extra configuration needed to setup MULTI NODE CLUSTER
CREATING A KEYSPACE. (A.K.A Db)
CREATE KEYSPACE events WITH
REPLICATION = {‘class’:’SimpleStratergy’,’replication_factor’:1};
CREATE TABLE
• CREATE TABLE events (
ID UUID,
USER_TYPE text,
ACCOUNT_ID text,
CLASS_NAME text,
CREATE_DATE timestamp,
PRIMARY KEY((ID),ACCOUNT_ID)
);
CREATING SASI (SECONDARY INDEX)
• CREATE CUSTOM INDEX class_user ON events (class_name,user_type)
USING 'org.apache.cassandra.index.sasi.SASIIndex’
WITH OPTIONS ={'mode':'contains’};
• CREATE CUSTOM INDEX user_index ON events
(user_type)
USING 'org.apache.cassandra.index.sasi.SASIIndex’
WITH OPTIONS ={'mode':'contains'};
CREATING MATERIALIZED VIEW
CREATE MATERIALIZED VIEW MV AS
SELECT account_id, class_name,id,create_date,user_type from events2
where account_id='MYSPACE' AND user_type= 'VIP’
PRIMARY KEY ((user_type,class_name),account_id);
CREATE MATERIALIZED VIEW MV AS
SELECT account_id, class_name,id,create_date,user_type from events2 where
account_id='MYSPACE' and user_type is not null and class_name is not null
PRIMARY KEY ((user_type,class_name),account_id);
Benchmarking
• A million record in 444 seconds when it’s a single threaded
sequential.
• A million record in 240 seconds for 2 concurrent threads that are
sequential
Other benchmarking
Maintenance tools
• calls – python-based tool to query cassandra using CQL (Cassandra's
query language)
• cassandra-stress – benchmarking tool
• nodetool – command line administration tool that uses JMX to get
operational information from Cassandra nodes and to kick off
administration tasks (repair, compaction, cleanup)
• DSE OpsCenter DSE *– Visual monitoring with Enterprise license
• Paid Monitoring
Conclusion
• A good candidate for logging system
• Easy to scale.
• Native driver for PHP
• Data model design to be given lot of thought
• Data query to be known and designed as per application
Apache Ignite

More Related Content

What's hot

CrateDB - Giacomo Ceribelli
CrateDB - Giacomo CeribelliCrateDB - Giacomo Ceribelli
CrateDB - Giacomo Ceribelliceribbo
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Cassandra
Cassandra Cassandra
Cassandra Pooja GV
 
Cassandra@Coursera: AWS deploy and MySQL transition
Cassandra@Coursera: AWS deploy and MySQL transitionCassandra@Coursera: AWS deploy and MySQL transition
Cassandra@Coursera: AWS deploy and MySQL transitionDaniel Jin Hao Chia
 
Cassandra Distributions and Variants
Cassandra Distributions and VariantsCassandra Distributions and Variants
Cassandra Distributions and VariantsAnant Corporation
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowMigrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowAnant Corporation
 
Kafka website activity architecture
Kafka website activity architectureKafka website activity architecture
Kafka website activity architectureOmid Vahdaty
 
How to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesHow to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesScyllaDB
 
Cassandra Workshop - Cassandra from scratch in one day
Cassandra Workshop - Cassandra from scratch in one dayCassandra Workshop - Cassandra from scratch in one day
Cassandra Workshop - Cassandra from scratch in one dayCarlos Alonso Pérez
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...Omid Vahdaty
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)Karthik .P.R
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
 
Migrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraMigrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraDemi Ben-Ari
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystifiedOmid Vahdaty
 
Cassandra Lunch #92: Securing Apache Cassandra - Managing Roles and Permissions
Cassandra Lunch #92: Securing Apache Cassandra - Managing Roles and PermissionsCassandra Lunch #92: Securing Apache Cassandra - Managing Roles and Permissions
Cassandra Lunch #92: Securing Apache Cassandra - Managing Roles and PermissionsAnant Corporation
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed
 

What's hot (20)

CrateDB - Giacomo Ceribelli
CrateDB - Giacomo CeribelliCrateDB - Giacomo Ceribelli
CrateDB - Giacomo Ceribelli
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Cassandra
Cassandra Cassandra
Cassandra
 
Cassandra@Coursera: AWS deploy and MySQL transition
Cassandra@Coursera: AWS deploy and MySQL transitionCassandra@Coursera: AWS deploy and MySQL transition
Cassandra@Coursera: AWS deploy and MySQL transition
 
Cassandra Distributions and Variants
Cassandra Distributions and VariantsCassandra Distributions and Variants
Cassandra Distributions and Variants
 
Cassandra vs Databases
Cassandra vs Databases Cassandra vs Databases
Cassandra vs Databases
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and HowMigrating from a Relational Database to Cassandra: Why, Where, When and How
Migrating from a Relational Database to Cassandra: Why, Where, When and How
 
Kafka website activity architecture
Kafka website activity architectureKafka website activity architecture
Kafka website activity architecture
 
How to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesHow to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instances
 
Cassandra Workshop - Cassandra from scratch in one day
Cassandra Workshop - Cassandra from scratch in one dayCassandra Workshop - Cassandra from scratch in one day
Cassandra Workshop - Cassandra from scratch in one day
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
 
Migrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraMigrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to Cassandra
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystified
 
Cassandra Lunch #92: Securing Apache Cassandra - Managing Roles and Permissions
Cassandra Lunch #92: Securing Apache Cassandra - Managing Roles and PermissionsCassandra Lunch #92: Securing Apache Cassandra - Managing Roles and Permissions
Cassandra Lunch #92: Securing Apache Cassandra - Managing Roles and Permissions
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 

Similar to Using cassandra as a distributed logging to store pb data

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writesInstaclustr
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...DataStax
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMariaDB plc
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMariaDB plc
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
My Database Skills Killed the Server
My Database Skills Killed the ServerMy Database Skills Killed the Server
My Database Skills Killed the ServerColdFusionConference
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructureharendra_pathak
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1sqlserver.co.il
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with CassandraDataStax Academy
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 

Similar to Using cassandra as a distributed logging to store pb data (20)

Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writes
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
My Database Skills Killed the Server
My Database Skills Killed the ServerMy Database Skills Killed the Server
My Database Skills Killed the Server
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Using cassandra as a distributed logging to store pb data

  • 1. CASSANDRA for BigData Event logging Ramesh Veeramani
  • 2. PROBLEM SCENARIO • Event logs and Permission logs Storage , Retrieval and Maintenance poor performance and scalability issue • Currently HBase and Oracle DB used. • Need for better storage and query capability • Data are immutable. • Eventual consistency acceptable. • New Infra to scale to 1 PB data.
  • 3. Option n Cassandra • Why Cassandra ? • How with Cassandra? • Internal working • Setup • Tools • Maintenance • Cassandra constraints • Comparison
  • 4. Why Cassandra • Scales Incrementally • Highly Available • Uses Peer – Peer topology ; instead of naive Master slave. • Good for OLTP and for data changes. • Availability a high priority • Write Throughput higher than Read Throughput. • Lot like Relational model easy to develop. • Well established since 2008
  • 6. Cassandra - Internal • Tokens and Hashing • Virtual nodes • Allows for equal CPU utilization for all the server in case of removing and adding nodes. • Token ID assignments automatic. • Configured in Cassandra.yaml [num_nodes =256] • Replication • Gossip / Snitch • Read and Write ? • Compaction Strategy. • Add and Remove nodes
  • 9. Constraints with Replication • Consistency an issue. • Aims for Eventual Consistency. • Read = Write Possible with strict consistency. • Configurable Consistency to different level • ConfigurationLevel = {All, Quorum, One} • Discretion of the Co-Ordinator node to enforces Replication and CL • There is possibility of stale data in production • Operation effort to synchronize the data (nodetool repair <node>) • Synchonizes the data on each node is timely operation to be done
  • 10. Gossip /Snitch • Cassandra uses Gossip Protocol instead of naïve ping or other comm protocol • Gossip is epidemic and probabilistic protocol . • Gossip is not deterministic.
  • 12. Why read are relatively slower…. • Retrieving of rows and columns from the datastore. • If all columns present in the MemTable. The results are returned. • If data not found control pass through the SSTable in order of entry. • Bloom filter expedites the search in SSTable. • BF is a Hash table Datastructure signifying if criteria in the SSTable
  • 13. COMPACTION STATERGY • Four compaction strategy • Size • Data • Time • Level [** Recommended for read intensive workload ]
  • 14. SETUP • 2 NODE EACH • processor Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz • processor CPU • memory 8188MiB System Memory • memory 8188MiB DIMM RAM • CentOS Linux release 7.5.1804 (Core) • sudo vim /etc/yum.repos.d/cassandra.repo [cassandra] name=Apache Cassandra baseurl=https://www.apache.org/dist/cassandra/redhat/311x/ gpgcheck=1 repo_gpgcheck=1 gpgkey=https://www.apache.org/dist/cassandra/KEYS • yum -y install cassandra • systemctl start cassandra • systemctl enable cassandra • sudo systemctl enable cassandra • nodetool status *** Extra configuration needed to setup MULTI NODE CLUSTER
  • 15. CREATING A KEYSPACE. (A.K.A Db) CREATE KEYSPACE events WITH REPLICATION = {‘class’:’SimpleStratergy’,’replication_factor’:1};
  • 16. CREATE TABLE • CREATE TABLE events ( ID UUID, USER_TYPE text, ACCOUNT_ID text, CLASS_NAME text, CREATE_DATE timestamp, PRIMARY KEY((ID),ACCOUNT_ID) );
  • 17. CREATING SASI (SECONDARY INDEX) • CREATE CUSTOM INDEX class_user ON events (class_name,user_type) USING 'org.apache.cassandra.index.sasi.SASIIndex’ WITH OPTIONS ={'mode':'contains’}; • CREATE CUSTOM INDEX user_index ON events (user_type) USING 'org.apache.cassandra.index.sasi.SASIIndex’ WITH OPTIONS ={'mode':'contains'};
  • 18. CREATING MATERIALIZED VIEW CREATE MATERIALIZED VIEW MV AS SELECT account_id, class_name,id,create_date,user_type from events2 where account_id='MYSPACE' AND user_type= 'VIP’ PRIMARY KEY ((user_type,class_name),account_id); CREATE MATERIALIZED VIEW MV AS SELECT account_id, class_name,id,create_date,user_type from events2 where account_id='MYSPACE' and user_type is not null and class_name is not null PRIMARY KEY ((user_type,class_name),account_id);
  • 19. Benchmarking • A million record in 444 seconds when it’s a single threaded sequential. • A million record in 240 seconds for 2 concurrent threads that are sequential
  • 21. Maintenance tools • calls – python-based tool to query cassandra using CQL (Cassandra's query language) • cassandra-stress – benchmarking tool • nodetool – command line administration tool that uses JMX to get operational information from Cassandra nodes and to kick off administration tasks (repair, compaction, cleanup) • DSE OpsCenter DSE *– Visual monitoring with Enterprise license • Paid Monitoring
  • 22. Conclusion • A good candidate for logging system • Easy to scale. • Native driver for PHP • Data model design to be given lot of thought • Data query to be known and designed as per application