Apache Cassandra, part 1 – principles, data model
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Apache Cassandra, part 1 – principles, data model

on

  • 10,377 views

Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ...

Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.

Statistics

Views

Total Views
10,377
Views on SlideShare
10,347
Embed Views
30

Actions

Likes
15
Downloads
276
Comments
0

2 Embeds 30

http://heeha.wordpress.com 27
http://www.linkedin.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Apache Cassandra, part 1 – principles, data model Presentation Transcript

  • 1. Apache Cassandra, part 1 – principles, data model
  • 2. I. RDBMS Pros and Cons
  • 3. Pros
    Good balance between functionality and usability. Powerful tools support.
    SQL has feature rich syntax
    Set of widely accepted standards.
    Consistency
  • 4. Scalability
    RDBMS were mainstream for tens years till requirements for scalability were increased dramatically.
    Complexity of processed data structures was increased dramatically.
  • 5. Scaling
    Two ways to achieve scalability:
    Vertical scaling
    Horizontal scaling
  • 6. CAP Theorem
  • 7. Cons
    Cost of distributed transactions
    No availability support . Two DB with 99.9% have availability 100% - 2 * (100% - DB availability) = 99.8% (43 min. downtime per month).
    Additional synchronization overhead.
    As slow as slowest DB node + network latency.
    2PC is blocking protocol.
    It is possible to lock resources forever.
  • 8. Cons
    Usage of master - slave replication.
    Makes write side (master) performance bottleneck and requires additional CPU/IO resources.
    There is no partition tolerance.
  • 9. Sharding
    Feature sharding
    Hash code sharding
    Lookup table - Node that contains lookup table is performance bottleneck and single point of failure.
  • 10. Feature sharding
    DB instances are divided by DB functions.
  • 11. Hash code sharding
    Data is divided through DB instances by hash code ranges.
  • 12. Sharding consistency
    For efficient sharding data should be eventually consistent.
  • 13. Feature vs. hash code sharding
    Feature sharding allows to perform consistency tuning on the domain logic granularity. But load may be not well balanced.
    Hash code sharding allows to perform good load balancing but does not allow consistency on domain logic level.
  • 14. Cassandra sharding
    Cassandra uses hash code load balancing
    Cassandra better fits for reporting than for business logic processing.
    Cassandra + Hadoop == OLAP server with high performance and availability.
  • 15. II. Apache Cassandra. Overview
  • 16. Cassandra
    Amazon Dynamo
    (architecture)
    DHT
    Eventual consistency
    Tunable trade-offs, consistency
    Google BigTable
    (data model)
    • Values are structured and indexed
    • 17. Column families and columns
    +
  • 18. Distributed and decentralized
    No master/slave nodes (server symmetry)
    No single point of failure
  • 19. DHT
    Distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table; (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key.
  • 20. DHT
    Keyspace
    Keyspace partitioning
    Overlay network
  • 21. Keyspace
    Abstract keyspace, such as the set of 128 or 160 bit strings.
    A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes.
  • 22. Keyspace partitioning
    Keyspace distance function δ(k1,k2) 
    A node with ID ix owns all the keys km for which ix is the closest ID, measured according to δ(km,ix).
  • 23. Keyspace partitioning
    Imagine mapping range from 0 to 2128 into a circle so the values wrap around. 
  • 24. Keyspace partitioning
    Consider what happens if node C is removed
  • 25. Keyspace partitioning
    Consider what happens if node D is added
  • 26. Overlay network
    For any key k, each node either has a node ID that owns k or has a link to a node whose node ID is closer to k
    Greedy algorithm (that is not necessarily globally optimal): at each step, forward the message to the neighbor whose ID is closest to k
  • 27. Elastic scalability
    Adding/removing new node doesn’t require reconfiguring of Cassandra, changing application queries or restarting system
  • 28. High availability and fault tolerance
    Cassandra picks A and P from CAP
    Eventual consistency
  • 29. Tunable consistency
    Replication factor (number of copies of each piece of data)
    Consistency level (number of replicas to access on every read/write operation)
  • 30. Quorum consistency level
    R = N/2 + 1
    W = N/2 + 1
    R + W > N
  • 31. Hybrid orientation
    Column orientation
    columns aren’t fixed
    columns can be sorted
    columns can be queried for a certain range
    Row orientation
    each row is uniquely identifiable by key
    rows group columns and super columns
  • 32. Schema-free
    You don’t have to define columns when you create data model
    You think of queries you will use and then provide data around them
  • 33. High performance
    50 GB reading and writing
    • Cassandra
    - write 0.12 ms
    - read : 15 ms
    • MySQL
    - write : 300 ms
    - read : 350 ms
  • 34. III. Data Model
  • 35. Database
    Table1
    Table2
    Relational data model
  • 36. Cassandra data model
    Keyspace
    Column Family
    Column1
    Column2
    Column3
    RowKey1
    Value3
    Value2
    Value1
    Column4
    Column1
    RowKey2
    Value4
    Value1
  • 37. Keyspace
    Keyspace is close to a relational database
    Basic attributes:
    replication factor
    replica placement strategy
    column families (tables from relational model)
    Possible to create several keyspaces per application (for example, if you need different replica placement strategy or replication factor)
  • 38. Column family
    Container for collection of rows
    Column family is close to a table from relational data model
    Column Family
    Row
    Column1
    Column2
    Column3
    RowKey
    Value3
    Value2
    Value1
  • 39. Column family vs. Table
    Store represents four-dimensional hash map[Keyspace][ColumnFamily][Key][Column]
    The columns are not strictly defined in column family and you can freely add any column to any row at any time
    A column family can hold columns or super columns (collection of subcolumns)
  • 40. Column family vs. Table
    Column family has an comparator attribute which indicated how columns will be sorted in query results (according to long, byte, UTF8, etc)
    Each column family is stored in separate file on disk so it’s useful to keep related columns in the same column family
  • 41. Column
    Basic unit of data structure
    Column
    name: byte[]
    value: byte[]
    clock: long
  • 42. Skinny and wide rows
    Wide rows – huge number of columns and several rows (are used to store lists of things)
    Skinny rows – small number of columns and many different rows (close to the relational model)
  • 43. Disadvantages of wide rows
    Badly work with RowCash
    If you have many rows and many columns you end up with larger indexes
    (~ 40GB of data and 10GB index)
  • 44. Column sorting
    Column sorting is typically important only with wide model
    Comparator – is an attribute of column family that specifies how column names will be compared for sort order
  • 45. Comparator types
    Cassandra has following predefined types:
    AsciiType
    BytesType
    LexicalUUIDType
    IntegerType
    LongType
    TimeUUIDType
    UTF8Type
  • 46. Super column
    Stores map of subcolumns
    Super column
    name: byte[]
    cols: Map<byte[], Column>
    • Cannot store map of super columns (only one level deep)
    • 47. Five-dimensional hash:
    [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]
  • 48. Super column
    • Sometimes it is useful to use composite keys instead of super columns.
    • 49. Necessity more then one level depth
    • 50. Performance issues
  • Super column family
    Column families:
    Standard (default)
    Can combine columns and super columns
    Super
    More strict schema constraints
    Can store only super columns
    Subcomparator can be specified for subcolumns
  • 51. Note that
    There are no joins in Cassandra, so you can
    join data on a client side
    create denormalized second column family
  • 52. IV. Advanced column types
  • 53. TTL column type
    TTL column is column value of which expires after given period of time.
    Useful to store session token.
  • 54. Counter column
    In eventual consistent environment old versions of column values are overridden by new one, but counters should be cumulative.
    Counter columns are intended to support increment/decrement operations in eventual consistent environment without losing any of them.
  • 55. CounterColumn internals
    CounterColumn structure:
    name
    …….
    [
    (replicaId1, counter1, logical clock1),
    (replicaId2, counter2, logical clock2),
    ………………..
    (replicaId3, counter3, logical clock3)
    ]
  • 56. CounterColumn write - before
    UPDATE CounterCF SET count_me = count_me + 2
    WHERE key = 'counter1‘
    [
    (A, 10, 2),
    (B, 3, 4),
    (C, 6, 7)
    ]
  • 57. CounterColumn write -after
    A is leader
    [
    (A, 10 + 2, 2 + 1),
    (B, 3, 4),
    (C, 6, 7)
    ]
  • 58. CounterColumn Read
    All Memtables and SSTables are read through using following algorithm:
    All tuples with local replicaId will be summarized, tuple with maximum logical clock value will be chosen for foreign replica.
    Counters of foreign replicas are updated during read repair , during replicate on write procedure or by AES
  • 59. CounterColumn read - example
    Memtable - (A, 12, 4) (B, 3, 5) (C, 10, 3)
    SSTable1 – (A, 5, 3) (B, 1, 6) (C, 5, 4)
    SSTable2 – (A, 2, 2) (B, 2, 4) (C, 6, 2)
    Result:
    (A, 19, 9) + (B, 1,6) + (C, 5, 4) =19 + 1 + 5 = 25
  • 60. Resources
    Home of Apache Cassandra Project http://cassandra.apache.org/
    Apache Cassandra Wiki http://wiki.apache.org/cassandra/
    Documentation provided by DataStaxhttp://www.datastax.com/docs/0.8/
    Good explanation of creation secondary indexes http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
    Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY, 2010, ISBN: 978-1-449-39041-9
  • 61. Authors
    Lev Sivashov- lsivashov@gmail.com
    Andrey Lomakin - lomakin.andrey@gmail.com, twitter: @Andrey_LomakinLinkedIn: http://www.linkedin.com/in/andreylomakin
    Artem Orobets – enisher@gmail.comtwitter: @Dr_EniSh
    Anton Veretennik - tennik@gmail.com