SlideShare a Scribd company logo
Robert Koletka
What is Cassandra
●   Basically a key value store
    ●   With some stuff.


●   It is a NoSQL database that is
    ●   Decentralized : No single point of failure
    ●   Elastic : Linear Scalability
    ●   Fault Tolerant : Replication
    ●   Optimized for writes, reads don't do badly at all
        though.
What is Cassandra
●   Based on two papers
    ●   Bigtable: Google
    ●   Dynamo: Amazon
●   Dynamo partitioning and replication
●   Bigtable data model
●   CAP Theorem
    ●   Consistent NO
    ●   Available YES
    ●   Partition Tolerant YES
What is Cassandra
●   Uses consistent hashing
Data Model
●   Cluster
●   Keyspace : like a DB
●   Column Families : like a Table
●   Super Columns (optional)
●   Columns
●   Values
Data Model
●   Keyspace groups column families together
●   Column Family groups data together
●   Example :
    ●   User Keyspace has
        –   UserProfiles Column Family
        –   Friends Column Family
Data Model
●   Cassandra doesn't require schema's like
    traditional DB's
●   UserProfiles Example
    ●   Rk = {Name:Robert, Surname:Koletka,
        Gender:Male}
    ●   Js = {Name:John, Surname:Smith, Location:WC}
●   Rk & Js both valid entries in UserProfiles
    Column Family even though different columns.
Data Model
●   Think about QUERIES not de-normalizing data.
●   Use Case: “I want to get friends name's and
    surname's for a given UserID”
●   Name & Surname needs to be in the friend
    column family.
●   Js = {Ac:{Name:Alice,Surname:Cook}, Bb:
    {Name:Betty,Surname:Blah}} (Super)
●   Js = {Ac:”Alice Bob”, Bb:”Betty Blah”}
Data Model
●   Column
    ●   Rowkey = {ColumnName:Value,CN=V,CN=V}
●   Super Column
    ●   Rowkey = {SuperColumnName:{CN=V,CN=V},
               SCN:{CN=V,CN=V}}
●   Super Columns group columns together
●   Cannot index on a Sub column.
Define Keyspace
●   create keyspace <keyspace> with <att1>=<value1> and
    <att2>=<value2> ...;
●   create keyspace UserKeyspace with placement_strategy
    = 'org.apache.cassandra.locator.SimpleStrategy' and
    strategy_options = {replication_factor:2};
●   Simple Strategy – place replica on next node
●   NetworkTopologyStrategy – for multiple data centers
●   OldNetworkToplogyStrategy – different data centers and
    different racks
Define Column Family
●   create column family <name> with
    <att1>=<value1> and <att2>=<value2>...;
●   create column family UserProfiles with
    comparator=UTF8Type and
    default_validation_class=UTF8Type and
    key_validation_class=UTF8Type and
    column_metadata=[{column_name:Location,
    validation_class:UTF8Type,
    Index_Type:KEYS}];
Define Column Family
●   Comparator = Column Name validator and
    compare column names
●   default_validation_class = Validation for values
    in columns which are not listed in
    column_metadata
●   key_validation_class = Validate key

●   Default is BytesType
Define Column Family
●   Other Available Types
    ●   AsciiType
    ●   BytesType
    ●   CounterColumnType (distributed counter column, a CF
        either contains counters or non at all)
    ●   Int32Type
    ●   IntegerType (a generic variable-length integer type)
    ●   LexicalUUIDType
    ●   LongType
    ●   UTF8Type
Define Column Family
●   Many more options
    ●   bloom_filter_fp_chance : false positives
    ●   gc_grace : garbage collection
    ●   keys_cached
    ●   row_cache_save_period
    ●   max_compaction_threshold
    ●   ...
Read and Writes
●   Cassandra is optimized for writes
    ●   First written to a commitlog
    ●   Then to an in-memory table (memtable)
    ●   Then periodically written to disk (SStable)
●   Reads
    ●   Read from all SStables and memtables
    ●   Bloom filters used to speed up Sstable lookups
●   Compaction
    ●   Periodically Cassandra merges SStables
Indexes
●   Row Keys
    ●   Cassandra keeps and index of its Row keys
●   Column Indexes
    ●   Known as Secondary Indexes, build an index on
        column values.
    ●   Indexes existing data in the background
    ●   Query by using equality predicates
        –   Then additional filters
Indexes
●   Get userprofiles where location = 'WC'
●   Get userprofiles where location = 'WC' and age
    > 18
●   NOT
    ●   Get userprofiles where age > 18
Consistency
●   Allows for configurable consistency settings
    ●   Read
        –   One, Quorum((Replication Factor / 2) +1), Local/Each_Quorum
            (Data Centers), All.
    ●   Write
        –   Any, One, Quorum, Local/Each_Quorum (Data Centers), All.
●   Any means that data can be written to co-ordinator if
    replica's are down till replica's come back up.
●   Quorum allows for some consistency and tolerating
    some failures.
●   All replica's must be up.
Consistency
●   Read
    ●   At least one node needs to be up to read data from,
        obvious.
    ●   Reads from a number of replicas returning the
        latest data, based on timestamp.
    ●   Read repair ensures data remains consistent,
        updates out of date nodes with latest data. Runs in
        background.
Cassandra Query Language
●   Allows for
    ●   Select
        –   SELECT [FIRST N] [REVERSED] <SELECT EXPR> FROM <COLUMN FAMILY> [USING
            <CONSISTENCY>] [WHERE <CLAUSE>] [LIMIT N];
        –   SELECT [FIRST N] [REVERSED] name1..nameN FROM
        –   Unlike SQL, no guarantee that columns will be returned
        –   SELECT ... WHERE KEY >= startkey and KEY =< endkey AND name1 = value1
    ●   Insert
    ●   Delete
    ●   Update
    ●   Batch
    ●   Truncate
    ●   Create Keyspace
    ●   Create Column Family
    ●   Create Index
    ●   Drop
Other Stuff
●   Cassandra stores columns in sorted order
    ●   Allows you to get the first or last X number of columns
    ●   Potentially store historical user data
●   Single column cannot hold more than 2gb
●   Max number of columns per row is 2 billion
●   Key and Column Names must be <64kb
●   Most Languages have client libraries (Python,
    Java, Scala, Node.js, PHP, C++...)
●   Try not to use raw thrift.
Last Example
●   User Statuses
●   Columns stored in sorted order... use timestamp as column
    name
●   Rk = {1:'Good morning all',2:'lunch was good',3:'time to get
    drunk',4:'so many regrets from last night'}
●   Create column family UserStatuses with comparator =
    LongType and Key_validation_class=UTF8Type and
    default_validation_class=UTF8Type
●   Get last X number of Columns, Get first X number of
    columns

More Related Content

What's hot

Android Level 2
Android Level 2Android Level 2
Android Level 2
DevMix
 
Installing postgres & postgis
Installing postgres & postgisInstalling postgres & postgis
Installing postgres & postgis
John Ashmead
 
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Dinesh Neupane
 
BITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQLBITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQL
BITS
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
PoguttuezhiniVP
 
Scala with MongoDB
Scala with MongoDBScala with MongoDB
Scala with MongoDB
Abdhesh Kumar
 
Xephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backendsXephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backends
University of California, Santa Cruz
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3
PoguttuezhiniVP
 
MongoDB basics & Introduction
MongoDB basics & IntroductionMongoDB basics & Introduction
MongoDB basics & Introduction
Jerwin Roy
 
memcached Distributed Cache
memcached Distributed Cachememcached Distributed Cache
memcached Distributed Cache
Aniruddha Chakrabarti
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
DataStax
 
CouchDB
CouchDBCouchDB
CouchDB
King Huang
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
Martin Zapletal
 
Group Replication in MySQL 8.0 ( A Walk Through )
Group Replication in MySQL 8.0 ( A Walk Through ) Group Replication in MySQL 8.0 ( A Walk Through )
Group Replication in MySQL 8.0 ( A Walk Through )
Mydbops
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
M Malai
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory
Fabio Fumarola
 
Discover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDiscover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQL
Dave Stokes
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for Beginners
Enoch Joshua
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
mysqlops
 

What's hot (19)

Android Level 2
Android Level 2Android Level 2
Android Level 2
 
Installing postgres & postgis
Installing postgres & postgisInstalling postgres & postgis
Installing postgres & postgis
 
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
 
BITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQLBITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQL
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
Scala with MongoDB
Scala with MongoDBScala with MongoDB
Scala with MongoDB
 
Xephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backendsXephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backends
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3
 
MongoDB basics & Introduction
MongoDB basics & IntroductionMongoDB basics & Introduction
MongoDB basics & Introduction
 
memcached Distributed Cache
memcached Distributed Cachememcached Distributed Cache
memcached Distributed Cache
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
CouchDB
CouchDBCouchDB
CouchDB
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
 
Group Replication in MySQL 8.0 ( A Walk Through )
Group Replication in MySQL 8.0 ( A Walk Through ) Group Replication in MySQL 8.0 ( A Walk Through )
Group Replication in MySQL 8.0 ( A Walk Through )
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory
 
Discover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDiscover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQL
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for Beginners
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 

Similar to Cassandra

Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
Stu Hood
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
Stu Hood
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
valstadsve
 
Cassandra
CassandraCassandra
Cassandra
Carbo Kuo
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
Sean Murphy
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
Saeid Zebardast
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
Aaron Ploetz
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
Cédrick Lunven
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
András Fehér
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
Sergey Titov, Ph.D.
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
Eric Evans
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
Eric Evans
 
Cassandra1.2
Cassandra1.2Cassandra1.2
Cassandra1.2
Tianlun Zhang
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous Persistence
Jervin Real
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
Knoldus Inc.
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Artur Mkrtchyan
 
Big Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to CassandraBig Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to Cassandra
Robbie Strickland
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
nehabsairam
 

Similar to Cassandra (20)

Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Cassandra1.2
Cassandra1.2Cassandra1.2
Cassandra1.2
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous Persistence
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Big Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to CassandraBig Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to Cassandra
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 

Recently uploaded

Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

Cassandra

  • 2. What is Cassandra ● Basically a key value store ● With some stuff. ● It is a NoSQL database that is ● Decentralized : No single point of failure ● Elastic : Linear Scalability ● Fault Tolerant : Replication ● Optimized for writes, reads don't do badly at all though.
  • 3. What is Cassandra ● Based on two papers ● Bigtable: Google ● Dynamo: Amazon ● Dynamo partitioning and replication ● Bigtable data model ● CAP Theorem ● Consistent NO ● Available YES ● Partition Tolerant YES
  • 4. What is Cassandra ● Uses consistent hashing
  • 5. Data Model ● Cluster ● Keyspace : like a DB ● Column Families : like a Table ● Super Columns (optional) ● Columns ● Values
  • 6. Data Model ● Keyspace groups column families together ● Column Family groups data together ● Example : ● User Keyspace has – UserProfiles Column Family – Friends Column Family
  • 7. Data Model ● Cassandra doesn't require schema's like traditional DB's ● UserProfiles Example ● Rk = {Name:Robert, Surname:Koletka, Gender:Male} ● Js = {Name:John, Surname:Smith, Location:WC} ● Rk & Js both valid entries in UserProfiles Column Family even though different columns.
  • 8. Data Model ● Think about QUERIES not de-normalizing data. ● Use Case: “I want to get friends name's and surname's for a given UserID” ● Name & Surname needs to be in the friend column family. ● Js = {Ac:{Name:Alice,Surname:Cook}, Bb: {Name:Betty,Surname:Blah}} (Super) ● Js = {Ac:”Alice Bob”, Bb:”Betty Blah”}
  • 9. Data Model ● Column ● Rowkey = {ColumnName:Value,CN=V,CN=V} ● Super Column ● Rowkey = {SuperColumnName:{CN=V,CN=V}, SCN:{CN=V,CN=V}} ● Super Columns group columns together ● Cannot index on a Sub column.
  • 10. Define Keyspace ● create keyspace <keyspace> with <att1>=<value1> and <att2>=<value2> ...; ● create keyspace UserKeyspace with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:2}; ● Simple Strategy – place replica on next node ● NetworkTopologyStrategy – for multiple data centers ● OldNetworkToplogyStrategy – different data centers and different racks
  • 11. Define Column Family ● create column family <name> with <att1>=<value1> and <att2>=<value2>...; ● create column family UserProfiles with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type and column_metadata=[{column_name:Location, validation_class:UTF8Type, Index_Type:KEYS}];
  • 12. Define Column Family ● Comparator = Column Name validator and compare column names ● default_validation_class = Validation for values in columns which are not listed in column_metadata ● key_validation_class = Validate key ● Default is BytesType
  • 13. Define Column Family ● Other Available Types ● AsciiType ● BytesType ● CounterColumnType (distributed counter column, a CF either contains counters or non at all) ● Int32Type ● IntegerType (a generic variable-length integer type) ● LexicalUUIDType ● LongType ● UTF8Type
  • 14. Define Column Family ● Many more options ● bloom_filter_fp_chance : false positives ● gc_grace : garbage collection ● keys_cached ● row_cache_save_period ● max_compaction_threshold ● ...
  • 15. Read and Writes ● Cassandra is optimized for writes ● First written to a commitlog ● Then to an in-memory table (memtable) ● Then periodically written to disk (SStable) ● Reads ● Read from all SStables and memtables ● Bloom filters used to speed up Sstable lookups ● Compaction ● Periodically Cassandra merges SStables
  • 16. Indexes ● Row Keys ● Cassandra keeps and index of its Row keys ● Column Indexes ● Known as Secondary Indexes, build an index on column values. ● Indexes existing data in the background ● Query by using equality predicates – Then additional filters
  • 17. Indexes ● Get userprofiles where location = 'WC' ● Get userprofiles where location = 'WC' and age > 18 ● NOT ● Get userprofiles where age > 18
  • 18. Consistency ● Allows for configurable consistency settings ● Read – One, Quorum((Replication Factor / 2) +1), Local/Each_Quorum (Data Centers), All. ● Write – Any, One, Quorum, Local/Each_Quorum (Data Centers), All. ● Any means that data can be written to co-ordinator if replica's are down till replica's come back up. ● Quorum allows for some consistency and tolerating some failures. ● All replica's must be up.
  • 19. Consistency ● Read ● At least one node needs to be up to read data from, obvious. ● Reads from a number of replicas returning the latest data, based on timestamp. ● Read repair ensures data remains consistent, updates out of date nodes with latest data. Runs in background.
  • 20. Cassandra Query Language ● Allows for ● Select – SELECT [FIRST N] [REVERSED] <SELECT EXPR> FROM <COLUMN FAMILY> [USING <CONSISTENCY>] [WHERE <CLAUSE>] [LIMIT N]; – SELECT [FIRST N] [REVERSED] name1..nameN FROM – Unlike SQL, no guarantee that columns will be returned – SELECT ... WHERE KEY >= startkey and KEY =< endkey AND name1 = value1 ● Insert ● Delete ● Update ● Batch ● Truncate ● Create Keyspace ● Create Column Family ● Create Index ● Drop
  • 21. Other Stuff ● Cassandra stores columns in sorted order ● Allows you to get the first or last X number of columns ● Potentially store historical user data ● Single column cannot hold more than 2gb ● Max number of columns per row is 2 billion ● Key and Column Names must be <64kb ● Most Languages have client libraries (Python, Java, Scala, Node.js, PHP, C++...) ● Try not to use raw thrift.
  • 22. Last Example ● User Statuses ● Columns stored in sorted order... use timestamp as column name ● Rk = {1:'Good morning all',2:'lunch was good',3:'time to get drunk',4:'so many regrets from last night'} ● Create column family UserStatuses with comparator = LongType and Key_validation_class=UTF8Type and default_validation_class=UTF8Type ● Get last X number of Columns, Get first X number of columns