NoSQL with Cassandra

4,263 views

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,263
On SlideShare
0
From Embeds
0
Number of Embeds
29
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

NoSQL with Cassandra

  1. 1. NoSQL with Cassandra [email_address]
  2. 2. Agenda <ul><ul><li>Introduction </li></ul></ul><ul><ul><li>How it work </li></ul></ul><ul><ul><li>Data Model </li></ul></ul><ul><ul><li>Roadmap </li></ul></ul>
  3. 3. Cassandra? <ul><ul><li>A high scalable, distributed, structured key-value database. </li></ul></ul><ul><ul><li>Apache Top Level Project </li></ul></ul><ul><ul><li>Open sourced by Facebook in 2008 </li></ul></ul><ul><ul><li>BigTable+Dynamo </li></ul></ul>
  4. 5. How it works? <ul><ul><li>Decentralized (no single points of failure) </li></ul></ul><ul><ul><li>Fault Tolerant </li></ul></ul><ul><ul><li>Eventually Consistency </li></ul></ul>
  5. 8. Partitioner <ul><ul><li>RandomPartitioner </li></ul></ul><ul><ul><li>OrderPreservingPartitioner </li></ul></ul>
  6. 9. Read/Write
  7. 10. When write <ul><ul><li>write to a disk commit log (sequential) </li></ul></ul><ul><ul><li>replicate </li></ul></ul><ul><ul><li>Memtable </li></ul></ul><ul><ul><li>SSTable - stands for Sorted Strings Table </li></ul></ul><ul><ul><li>compaction </li></ul></ul><ul><ul><li>tombstone </li></ul></ul>
  8. 12. When read <ul><ul><li>Any node </li></ul></ul><ul><ul><li>Wait for R responses </li></ul></ul><ul><ul><li>Read-Repair </li></ul></ul><ul><ul><li>Hinted-Handoff </li></ul></ul><ul><ul><li>Slower than writes (but still fast) </li></ul></ul><ul><ul><li>RowCached, KeyCached </li></ul></ul><ul><ul><li>Scales to billions of rows </li></ul></ul>
  9. 13. CAP theorem <ul><ul><li>Consistency - all nodes see the same data at the same time </li></ul></ul><ul><ul><li>Availibility - nodes failures do not prevent survivors from continue to operate </li></ul></ul><ul><ul><li>Partition Tolerance - the system continues to operate despite arbitrary message lose </li></ul></ul>from wikipedia
  10. 14. Consistency <ul><ul><li>Write </li></ul></ul><ul><ul><ul><li>ZERO - asynchronously </li></ul></ul></ul><ul><ul><ul><li>ANY </li></ul></ul></ul><ul><ul><ul><li>ONE </li></ul></ul></ul><ul><ul><ul><li>QUORUM - N / 2 + 1 </li></ul></ul></ul><ul><ul><ul><li>ALL </li></ul></ul></ul><ul><ul><li>Read </li></ul></ul><ul><ul><ul><li>ONE - first node </li></ul></ul></ul><ul><ul><ul><li>QUORUM - recent timestamp </li></ul></ul></ul><ul><ul><li>If W + R > N, you will have consistency </li></ul></ul><ul><ul><ul><li>W=1, R=N </li></ul></ul></ul><ul><ul><ul><li>W=N, R=1 </li></ul></ul></ul><ul><ul><ul><li>W=Q, R=Q where Q = N / 2 + 1 </li></ul></ul></ul>
  11. 15. Data Model <ul><ul><li>Column </li></ul></ul><ul><ul><li>SuperColumn </li></ul></ul><ul><ul><li>Row </li></ul></ul><ul><ul><li>ColumnFamily </li></ul></ul><ul><ul><li>Keyspace </li></ul></ul>
  12. 16. Column <ul><li>{ </li></ul><ul><li>name: &quot;mail&quot;, </li></ul><ul><li>value: &quot;ieon@pixnet.tw&quot;, </li></ul><ul><li>timestamp: 123456789 </li></ul><ul><li>} </li></ul>
  13. 17. ColumnFamily <ul><li>User { // Standard CF </li></ul><ul><li>     ma19: { // row key </li></ul><ul><li>         name: &quot; 馬一九 &quot;, // columns </li></ul><ul><li>         phone: &quot;1919119&quot;, </li></ul><ul><li>         mail: &quot;ma@foo&quot; </li></ul><ul><li>     }, </li></ul><ul><li>     small_ben: { </li></ul><ul><li>         name: &quot; 陳小扁 &quot;, </li></ul><ul><li>         phone: &quot;4848448&quot;, </li></ul><ul><li>         mail: &quot;chen@bar&quot;, </li></ul><ul><li>         is_jailed: &quot;true&quot; </li></ul><ul><li>     } </li></ul><ul><li>} </li></ul>
  14. 18. Traditional RDBMS name phone address 1 王小明 40666888 台北市 2 王中明 28825252 台中市 3 王大明 4129889 台南市
  15. 19. Flexible Schema name phone address 1 王小明 40666888 台北市 name phone address msn 2 王中明 28825252 台中市 [email_address] name mail address 3 王大明 [email_address] 台南市
  16. 20. Super Column <ul><li>Contact {    // Super CF </li></ul><ul><li>         gasol: {     // row key </li></ul><ul><li>                 __all__: {     // super column </li></ul><ul><li>                         dad: &quot;&quot;,     // columns </li></ul><ul><li>                         beer: &quot;&quot;, </li></ul><ul><li>                         ronny: &quot;&quot; </li></ul><ul><li>                 }, </li></ul><ul><li>                 pixnet: {    // super column </li></ul><ul><li>                         beer: &quot;&quot;, </li></ul><ul><li>                         ronny: &quot;&quot; </li></ul><ul><li>                 }, </li></ul><ul><li>                 family: {    // super column </li></ul><ul><li>                         dad: &quot;&quot; </li></ul><ul><li>                 } </li></ul><ul><li>         } </li></ul><ul><li>} </li></ul>
  17. 21. Sorting - Comparator <ul><ul><li>BytesType - no validation </li></ul></ul><ul><ul><li>AsciiType - like BytesType, but validates as ASCII </li></ul></ul><ul><ul><li>LongType - 64 bit long </li></ul></ul><ul><ul><li>UTF8Type - A string encoded as utf8 </li></ul></ul><ul><ul><li>LexicalUUIDType - A 128 bit UUID, usually version 4 </li></ul></ul><ul><ul><li>TimeUUIDType - a 128 bit version 1 UUID, compared by timestamp </li></ul></ul>
  18. 22. Client API <ul><ul><li>THRIFT-601 sending random data crashed thrift service </li></ul></ul><ul><ul><li>THRIFT-347 PHP TSocket timeout issues </li></ul></ul><ul><ul><li>Thrift sucks and ugly </li></ul></ul><ul><ul><li>Apache Avro in trunk </li></ul></ul>struct SliceRange {      1: required binary start,      2: required binary finish,      3: required bool reversed=0,      4: required i32 count=100, } struct SlicePredicate {      1: optional list<binary> column_names,      2: optional SliceRange   slice_range, }
  19. 23. <ul><ul><li>get(keyspace, key, ColumnPath) </li></ul></ul><ul><ul><li>get_slice(keyspace, key, ColumnParent, SlicePredicate) </li></ul></ul><ul><ul><li>multiget() * </li></ul></ul><ul><ul><li>multiget_slice(keyspace, keys, ColumnParent, SlicePredicate) </li></ul></ul><ul><ul><li>get_count() ! </li></ul></ul><ul><ul><li>get_range_slice() * </li></ul></ul><ul><ul><li>get_range_slices(keyspace, ColumnParent, SlicePredicate, KeyRange) </li></ul></ul><ul><ul><li>insert(keyspace, key, ColumnPath, value, timestamp) </li></ul></ul><ul><ul><li>batch_insert() * </li></ul></ul><ul><ul><li>remove(keyspace, key, ColumnPath, timestamp) </li></ul></ul><ul><ul><li>batch_mutate(keyspace, map<CF, list<Mutation>) </li></ul></ul>ignore consistency_level * deprecated ! slow, deserialized all columns
  20. 24. Roadmap <ul><ul><li>SSTable compression </li></ul></ul><ul><ul><li>dynamic column family changes </li></ul></ul><ul><ul><li>Vector clock support </li></ul></ul><ul><ul><li>truncate support </li></ul></ul><ul><ul><li>Memory efficient compactions </li></ul></ul><ul><ul><li>Avro </li></ul></ul>0.7
  21. 25. Thank you  

×