Apache Cassandra, part 2 – data model example, machinery

9,135 views

Published on

Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.

Published in: Technology
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
9,135
On SlideShare
0
From Embeds
0
Number of Embeds
50
Actions
Shares
0
Downloads
317
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide
  • Endpoint snitch can be wrapped with a dynamic snitch, which will monitor read latencies and avoid reading from hosts that have slowed (due to compaction, for instance)
  • Apache Cassandra, part 2 – data model example, machinery

    1. 1. Apache Cassandra, part 2 – data model example, machinery<br />
    2. 2. V. Data model example - Twissandra<br />
    3. 3. Twissandra Use Cases<br />Get the friends of a username<br />Get the followers of a username<br />Get a timeline of a specific user’s tweets<br />Create a tweet<br />Create a user<br />Add friends to a user<br />
    4. 4. Twissandra – DB User<br />User<br />id<br />user_name<br />password<br />
    5. 5. Twissandra - DB Followers<br />User<br />User<br />Followers<br />id<br />user_name<br />password<br />id<br />user_name<br />password<br />user_id<br />follower_id<br />
    6. 6. Twissandra - DB Following<br />User<br />User<br />Following<br />id<br />user_name<br />password<br />id<br />user_name<br />password<br />user_id<br />following_id<br />
    7. 7. Twissandra – DB Tweets<br />User<br />Tweet<br />id<br />user_name<br />password<br />id<br />user_id<br />body<br />timestamp<br />
    8. 8. Twissandra column families<br />User<br />Username<br />Friends, Followers<br />Tweet<br />Userline<br />Timeline<br />
    9. 9. Twissandra – Users CF<br /><<CF>> User<br /><<CF>> Username<br /><<RowKey>> userid<br />+ username<br />+ password<br /><<RowKey>> username<br />+ userid<br />
    10. 10. Twissandra–Friends and Followers CFs<br /><<CF>> Friends<br /><<CF>> Followers<br /><<RowKey>> userid<br /><<RowKey>> userid<br />friendid<br />followerid<br />timestamp<br />timestamp<br />
    11. 11. Twissandra – Tweet CF<br /><<CF>> Tweet<br /><<RowKey>> tweetid<br /> + userid<br /> + body<br /> + timestamp<br />
    12. 12. Twissandra–Userline and Timeline CFs<br /><<CF>> Userline<br /><<CF>> Timeline<br /><<RowKey>> userid<br /><<RowKey>> userid<br />timestamp<br />timestamp<br />tweetid<br />tweetid<br />
    13. 13. Cassandra QL – User creation<br />BATCH BEGIN BATCH <br />INSERT INTO User (KEY, username, password) VALUES (‘id', ‘konstantin’, ‘******’)<br />INSERT INTO Username (KEY, userid) VALUES ( ‘konstantin’, ‘id’)<br />APPLY BATCH<br />
    14. 14. Cassandra QL – following a friend<br />BATCH BEGIN BATCH<br />INSERT INTO Friends (KEY, friendid) VALUES (‘userid‘, ‘friendid’)<br />INSERT INTO Followers (KEY, userid) VALUES (‘friendid ‘, ‘userid’)<br />APPLY BATCH<br />
    15. 15. Cassandra QL – Tweet creation <br />BATCH BEGIN BATCH<br />INSERT INTO Tweet (KEY, userid, body, timestamp) VALUES (‘tweetid‘, ‘userid’, ’@ericflo thanks for Twissandra, it helps!’, 123656459847)<br />INSERT INTO Userline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’)<br />INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’)<br />……..<br />INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘followerid’, ‘tweetid’)<br />……<br />APPLY BATCH<br />
    16. 16. Cassandra QL – Getting user tweets<br />SELECT * FROM Userline KEY = ‘userid’<br />SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)<br />
    17. 17. Cassandra QL – Getting user timeline<br />SELECT * FROM Timeline KEY = ‘userid’<br />SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)<br />
    18. 18. Design patterns<br />Materialized View<br />create a second column family to represent additional queries<br />Valueless Column<br />use column names for values<br />Aggregate Key<br />If you need to find sub item, use composite key<br />
    19. 19. Indexes<br /><<CF>> Item_Properties<br /><<CF>> Container_Items<br /><<RowKey>> item_id<br /><<RowKey>> container_id<br />property_name<br />item_id<br />property_value<br />insertion_timestamp<br />
    20. 20. Indexes<br /><<CF>> Container_Items_Property_Index<br /><<RowKey>> <br />container_id + property_name<br />composite(property_value, item_id, entry_timestamp)<br />item_id<br />Comparator: compositecomparer.CompositeType<br />
    21. 21. Problem with eventual consistency<br />When we update value, we should add new value to index, and remove old value.<br />However, eventual consistency and lack of transactions make it impossible<br />
    22. 22. Solution<br /><<CF>> Container_Item_Property_Index_Entries<br /><<RowKey>> <br />container_id + item_id<br /> + property_name<br />entry_timestamp<br />property_value<br />
    23. 23. VI. Architecture<br />
    24. 24. Partitioners<br />Partitioners decide where a key maps onto the ring.<br />Key 1<br />Key 2<br />Key 3<br />Key 4<br />
    25. 25. Partitioners<br />RandomPartitioner<br />OrderPreservingPartitioner<br />ByteOrderedPartitioner<br />CollatingOrderPreservingPartitioner<br />
    26. 26. Replication<br />Replication controlled by the replication_factor setting in the keyspace definition<br />The actual placement of replicas in the cluster is determined by the Replica Placement Strategies. <br />
    27. 27. Placement Strategies<br />SimpleStrategy - returns the nodes that are next to each other on the ring.<br />
    28. 28. Placement Strategies<br />OldNetworkTopologyStrategy - places one replica in a different data center while placing the others on different racks in the current data center.<br />
    29. 29. Placement Strategies<br />NetworkTopologyStrategy - Allows you to configure the number of replicas per data center as specified in the strategy_options.<br />
    30. 30. Snitches<br />Give Cassandra information about the network topology of the cluster<br />Endpoint snitch – gives information about network topology.<br />Dynamic snitch – monitor read latencies<br />
    31. 31. Endpoint Snitch Implementations<br />SimpleSnitch(default)- can be efficient for locating nodes in clusters limited to a single data center. <br />
    32. 32. Endpoint Snitch Implementations<br />RackInferringSnitch - extrapolates the topolology of the network by analyzing IP addresses.<br />192.168.191.71<br />In the same rack<br />192.168.191.21<br />192.168.191.71<br />In the same datacenter<br />192.168.171.21<br />192.78.19.71<br />In different datacenters<br />192.18.11.21<br />
    33. 33. Endpoint Snitch Implementations<br />PropertyFileSnitch - determines the location of nodes by referring to a user-defined description of the network details located in the property file cassandra-topology.properties. <br />
    34. 34. Commit Log<br /><ul><li> Durability
    35. 35. sequential writes only</li></ul>Memtable<br /><ul><li> no disk access, batched writes</li></ul>SSTable<br /><ul><li> become read‐only
    36. 36. indexes</li></ul>Memtables, SSTables, Commit Logs<br />
    37. 37. Write properties<br />Write properties<br />No reads<br />No seeks<br />Fast<br />Atomic within ColumnFamily<br />Always writable<br />
    38. 38. Write/Read properties<br />Read properties<br />Read multiple SSTables<br />Slower than writes (but still fast)<br />Seeks can be mitigated with more RAM<br />Scales to billions of rows<br />
    39. 39. Commit Log durability<br />Durability settings reflects PostgreSQL settings.<br />Periodic sync of commit log. With potential probability for data loss.<br />Batch sync of commit log. Write is acknowledged only if commit log is flushed on disk. It is strongly recommended to have separate device for commit log in such case.<br />
    40. 40. Gossip protocol<br />Intra-ring communication<br />Runs periodically<br />Failure detection,hinted handoffs and nodes exchange<br />
    41. 41. Gossip protocol<br />org.apache.cassandra.gms.Gossiper<br />Has the list of nodes that are alive and dead<br />Chooses a random node and starts “chat” with it. One gossip round requires three messages<br />Failure detection uses a suspicion level to decide whether the node is alive or dead<br />
    42. 42. Hinted handoff<br />Write<br />Hint<br />Cassandra is always available for write<br />
    43. 43. Consistency level<br />
    44. 44. Tombstones<br />The data is not immediately deleted<br />Deleted values are marked<br />Tombstones will be suppressed during next compaction<br />GCGraceSeconds – amount of seconds that server will wait to garbage-collect a tombstone<br />
    45. 45. Compaction<br />Merging SSTables into one<br />merging keys<br />combining columns<br />creating new index<br />Main aims:<br />Free up space<br />Reduce number of required seeks<br />
    46. 46. Compaction<br />Minor:<br />Triggered when at least N SSTables have been flushed on disk (N is tunable, 4 – by default)<br />Merging SSTables of the similar size<br />Major:<br />Merging all SSTables<br />Done manually through nodetool compact<br />discarding tombstones<br />
    47. 47. Replica synchronization<br />Anti-entropy<br />Read repair<br />
    48. 48. Anti-entropy<br />During major compaction the node exchanges Merkle trees (hash of its data) with another nodes<br />If the trees don’t match, they are repaired<br />Nodes maintain timestamp index and exchange only the most recent updates<br />
    49. 49. Read repair<br />During read operation replicas with stale values are brought up to date<br />Week consistency level (ONE):<br /> after the data is returned<br />Strong consistency level (QUORUM, ALL):<br /> before the data is returned<br />
    50. 50. Bloom filters<br />A bit array<br />Test whether value is a member of set<br />Reduce disk access (improve performance)<br />
    51. 51. Bloom filters<br />On write:`<br />several hashes are generated per key<br />bits for each hash are marked<br />On read:<br />hashes are generated for the key<br />if all bits of this hashes are non-empty then the key may probably exist in SSTable<br />if at least one bit is empty then the key has been never written to SSTable<br />
    52. 52. Bloom filters<br />Read<br />Write<br />1<br />0<br />0<br />Hash1<br />Hash1<br />0<br />0<br />0<br />Key1<br />Hash2<br />Key2<br />Hash2<br />0<br />1<br />0<br />Hash3<br />1<br />Hash3<br />0<br />SSTable<br />
    53. 53. Resources<br />Home of Apache Cassandra Project http://cassandra.apache.org/<br />Apache Cassandra Wiki http://wiki.apache.org/cassandra/<br />Documentation provided by DataStaxhttp://www.datastax.com/docs/0.8/<br />Good explanation of creation secondary indexes http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html<br />Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY, 2010, ISBN: 978-1-449-39041-9<br />
    54. 54. Authors<br />Lev Sivashov- lsivashov@gmail.com<br />Andrey Lomakin - lomakin.andrey@gmail.com, twitter: @Andrey_LomakinLinkedIn: http://www.linkedin.com/in/andreylomakin<br />Artem Orobets – enisher@gmail.comtwitter: @Dr_EniSh<br />Anton Veretennik - tennik@gmail.com<br />

    ×