10. COLUMN FAMILY DATABASES
A column family model is in general…
Suited for:
• Structured and unstructured data
• Large data volumes
• Geographically distributed users
• Scaling
• Reliability
Not suited for:
• ACID operations (transactions)
• Ad-hoc and/or complex queries (joins)
10
13. CASSANDRA’S ARCHITECTURE
A Cassandra cluster…
• Is a distributed system.
• Does not have a leader or master node (all nodes are equal).
• Replicates data across nodes (allows fault tolerance).
• Partitions data across nodes (allows fast writes/reads).
• Is fault tolerant.
• Is eventually consistent.
• Can be distributed across data centers.
• Is usually called a Ring.
13
16. CQL
CQL…
• Stands for Cassandra Query Language.
• Is similar to SQL.
• Does not support joins.
• Supports user defined types.
• Supports user defined functions.
• Supports materialized views.
16
22. PRIMARY KEYS
A primary key fulfill the following purposes…
• Uniqueness of a record within a table
• Search criteria (what columns can be used to filter the table)
22
25. PARTITION KEYS
Partition keys define…
• How the data is going to be partitioned across the cluster
• Which columns require an equality search criteria
• Columns defined in the partition key must be present in the where clause of a query
25
28. CLUSTERING COLUMNS
Clustering columns define…
• How the data is grouped within a partition
• The ordering of the data within the table
Clustering columns also…
• Can be used both as equality and inequality search criteria
• Are not required in a where clause.
28
31. DATA MODELING
Data modeling in Cassandra is composed by the following steps…
• Definition of the Conceptual Model
• Definition of the Application Workflow
• Derivation and selection of the Logical Model
• Derivation of the Physical Model
• Translation of the Physical Model into CQL scripts
Cassandra data modeling principles:
• Know your data
• Know your queries
• Nest data
• Duplicate data (denormalize)
31
34. CONCEPTUAL MODEL
A conceptual model…
• Is usually described as an Entity Relationship diagram.
• Describes domain entities, its attributes and relationships.
• Enables collaboration between business and technical people.
• Allows abstraction of the implementation details.
• Facilitates understanding of the domain.
34
38. APPLICATION WORKFLOW
The application workflow…
• Helps identifying tasks and its dependencies.
• Defines which queries are required by the application.
• Helps identifying which queries need to run first.
38
41. LOGICAL MODEL
A logical model…
• It’s an abstraction of the physical schema
• Identifies tables along with its columns, partition keys, clustering columns and other
metadata
• Allows us to think about the data model, partitioning and ordering, without worrying
about the implementation details.
41
49. PHYSICAL MODEL
The physical model…
• Is the actual implementation of the logical model
• Defines column types
• It’s where we think about optimizations
49
51. ANTI-PATTERNS
Things we should not do in Cassandra…
• Table scan (too expensive, searches the entire cluster)
• Secondary indexes (stored locally in a node, too expensive, searches the entire cluster)
• Multi-table (requires multiple look ups and joins in the client side)
• Reads before writes
51