Your SlideShare is downloading. ×
0
A tunably consistent, highly-available, Distributed Database     Tom Wilkie @tom_wilkie  Founder & VP Engineering, Acunu  ...
•   Overview•   Distribution•   Storage•   Datamodel•   Usecases                   2
•   Overview•   Distribution•   Storage•   Datamodel•   Usecases                   3
•   A distributed database for Big Data    •   Scale out on commodity servers    •   Best of bread performance    •   Mult...
5    5
BigTable, 2006                        Dynamo, 2007                 Open sourced, 2008                                TLP, ...
BigTable: ...    •   Simple but powerful datamodel    •   Write-optimised storage system    •   Consistent, available but ...
Dynamo: ...    •   Sophisticated distribution system with tradable        consistency and availability    •   Over-simple ...
•   Overview•   Distribution•   Storage•   Datamodel•   Usecases                   9
Distribution: Consistent Hashing                                     →                              r1, c1	 v1            ...
Distribution: Scaling11                             11
Distribution: Scaling12                             12
Distribution: Scaling     •   .13                                     13
Distribution: Scaling14                             14
Distribution: Scaling15                             15
Distribution: Replication                                  →                           r1, c1	 v116                       ...
Distribution: Replication17                                 17
Distribution: Consistency     Tuneable, per-operation consistency     Timestamped values, N > R + W                  W    ...
Distribution: Read Repair19                                 19
Distribution: Read Repair20                                 20
Distribution: Read Repair21                                 21
Distribution: Read Repair22                                 22
•   Overview•   Distribution•   Storage•   Datamodel•   Usecases                   23
Writing to Cassandra     Row Key   Column   Column   Column   Column24                                                   24
Writing to Cassandra     In the JVM       Row   Colu   Colu   Colu   Colu                                                 ...
Writing to Cassandra     In the JVM                            Full Memtable      On disk     Commit                    lo...
Writing to Cassandra     In the JVM                           New Memtable      On disk     Commit                        ...
Writing to Cassandra     On disk    Commit                             SSTable                  log                       ...
Writing to Cassandra     On disk    Commit                  log                             SSTable29                     ...
Reading from Cassandra                         30
2 Off-heap                              Row cache (no GC)                         1 In the JVM                            ...
•   Overview•   Distribution•   Storage•   Datamodel•   Usecases                   32
SQL                                     Cassandra     Database   row/key col_1    col_2                                   ...
col1   col2   col3   col4   col5   col6   col7     row1           x                    x      x     row2    x      x      ...
alice: {        m2: {           Sender: bob,           Subject: ‘paper!’, ...        }     }     bob: {        m1: {      ...
•   Overview•   Distribution•   Storage•   Datamodelling•   Usecases                    36
Perfect for high velocity data               Web, SCM, Retail    Location Services   Cloud Monitoring                   So...
Not Covered...     •   Distribution: Hinted Handoff, Anti-entropy repair,         Counter distribution     •   Storage: Co...
• Distributed, scalable database• Opensource, widely used• Tunably consistent• Highly-available• Partition tolerant• Write...
Data Platform                40
Data PlatformData driven applications   Web UI   Acunu Analytics                           Control  Apache Cassandra      ...
Control Center“Ive had the EC2 instance running for a little while and Ihave to say, Im impressed. You guys have done well...
Control Center“The new UI has been critical in helping us work out           what is wrong in our code”                   ...
Castle: Built for Big Data     •        Storage engine optimized for large slow disks,              many cores, Big Data w...
45
Rebuild time                            5                            4     Rebuild Time (Hours)                           ...
Analytics                                     counter                                     updatesClick stream    events   ...
Questions? tom@acunu.com   @tom_wilkie www.acunu.com                 48
Introduction     Live & historical       aggregates...49                                        49
Realtime trends...50                          50
Drill downs     and roll ups51                    51
Solution              Con                        Scalability                          $$$                        Not realt...
Upcoming SlideShare
Loading in...5
×

Progressive NOSQL: Cassandra

1,256

Published on

Tom Wilkie's talk at Progressive NOSQL conference in London on 11/05/12.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,256
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
43
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Progressive NOSQL: Cassandra"

  1. 1. A tunably consistent, highly-available, Distributed Database Tom Wilkie @tom_wilkie Founder & VP Engineering, Acunu 1
  2. 2. • Overview• Distribution• Storage• Datamodel• Usecases 2
  3. 3. • Overview• Distribution• Storage• Datamodel• Usecases 3
  4. 4. • A distributed database for Big Data • Scale out on commodity servers • Best of bread performance • Multi-master architecture, no SPOF • Powerful multi data centre support4 4
  5. 5. 5 5
  6. 6. BigTable, 2006 Dynamo, 2007 Open sourced, 2008 TLP, 2010 Incubator, 2009 v1.0 2011 6
  7. 7. BigTable: ... • Simple but powerful datamodel • Write-optimised storage system • Consistent, available but not partition tolerant • Master-slave distribution system, SPOF http://goo.gl/7T1Ej7 7
  8. 8. Dynamo: ... • Sophisticated distribution system with tradable consistency and availability • Over-simple datamodel http://goo.gl/Q80b48 8
  9. 9. • Overview• Distribution• Storage• Datamodel• Usecases 9
  10. 10. Distribution: Consistent Hashing → r1, c1 v1 → r2, c2 v2 → r3, c3 v310 10
  11. 11. Distribution: Scaling11 11
  12. 12. Distribution: Scaling12 12
  13. 13. Distribution: Scaling • .13 13
  14. 14. Distribution: Scaling14 14
  15. 15. Distribution: Scaling15 15
  16. 16. Distribution: Replication → r1, c1 v116 16
  17. 17. Distribution: Replication17 17
  18. 18. Distribution: Consistency Tuneable, per-operation consistency Timestamped values, N > R + W W R18 18
  19. 19. Distribution: Read Repair19 19
  20. 20. Distribution: Read Repair20 20
  21. 21. Distribution: Read Repair21 21
  22. 22. Distribution: Read Repair22 22
  23. 23. • Overview• Distribution• Storage• Datamodel• Usecases 23
  24. 24. Writing to Cassandra Row Key Column Column Column Column24 24
  25. 25. Writing to Cassandra In the JVM Row Colu Colu Colu Colu Memtable On disk Commit log25 25
  26. 26. Writing to Cassandra In the JVM Full Memtable On disk Commit log26 26
  27. 27. Writing to Cassandra In the JVM New Memtable On disk Commit SSTable log27 27
  28. 28. Writing to Cassandra On disk Commit SSTable log SSTable SSTable SSTable SSTable SSTable28 28
  29. 29. Writing to Cassandra On disk Commit log SSTable29 29
  30. 30. Reading from Cassandra 30
  31. 31. 2 Off-heap Row cache (no GC) 1 In the JVM Memtable 3 4 5 SSTable Bloom filter Key cache index 6 On disk Commit log SSTable31 31
  32. 32. • Overview• Distribution• Storage• Datamodel• Usecases 32
  33. 33. SQL Cassandra Database row/key col_1 col_2 Keyspace row/key col_1 col_1 row/ col_1 col_1 Table Column Family33 33
  34. 34. col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x34 34
  35. 35. alice: { m2: { Sender: bob, Subject: ‘paper!’, ... } } bob: { m1: { Sender: alice, Subject: ‘rock?’, ... } } charlie: { m1: { Sender: alice, Subject: ‘rock?’, ... }, m2: { Sender: bob, Subject: ‘paper!’, ... } }35 35
  36. 36. • Overview• Distribution• Storage• Datamodelling• Usecases 36
  37. 37. Perfect for high velocity data Web, SCM, Retail Location Services Cloud Monitoring Social Gaming Social Media Ad Marketplaces Fraud Detection Smart Metering Oil/Gas Sensors 37 Confidential 6Wednesday, 25 April 12 37
  38. 38. Not Covered... • Distribution: Hinted Handoff, Anti-entropy repair, Counter distribution • Storage: Counter storage, different compaction strategies, partitioning etc • Datamodel: de-normalisation, TTLs, secondary indexes, CQL, super-columns, schema optional • Operations: backup, nodetool, performance tuning • Integration: Hadoop, Client Libraries etc38 38
  39. 39. • Distributed, scalable database• Opensource, widely used• Tunably consistent• Highly-available• Partition tolerant• Write-optimised• Schema-optional 39
  40. 40. Data Platform 40
  41. 41. Data PlatformData driven applications Web UI Acunu Analytics Control Apache Cassandra CenterAcunu Storage Engine Configured and tuned OS Commodity Hardware 41
  42. 42. Control Center“Ive had the EC2 instance running for a little while and Ihave to say, Im impressed. You guys have done well with this product.” - Lloyd, JustDevelopIt 42
  43. 43. Control Center“The new UI has been critical in helping us work out what is wrong in our code” - Matt, TellyBug 43
  44. 44. Castle: Built for Big Data • Storage engine optimized for large slow disks, many cores, Big Data workloads • Enterprise density on commodity hardware • Lightning disk rebuilds:10x faster than RAID Shared memory interface Castle keys Userspace Acunu Kernel userspace interface values In-kernel async, shared memory ring workloads shared buffers kernelspace Streaming interface interface range key buffered key buffered queries insert value insert get value get Doubling Arrays • Opensource (GPLv2, MIT doubling array mapping layer for user libraries) insert Bloom filters queues key get arrays x range arrays queries management http://goo.gl/gzihe key • insert merges http://bitbucket.org/acunu Arrays mapping layer • modlist btree key Version tree Loadable Kernel Module, insert btree key get btree targeting CentOS’s 2.6.18 range queries value arrays44 • Cache block mapping & http://www.acunu.com/ cacheing layer "Extent" layer prefetcher extent block extent cache blogs/andy-twigg/why- freespace allocator manager 44 flusher
  45. 45. 45
  46. 46. Rebuild time 5 4 Rebuild Time (Hours) 3 2 1 0 RAID10, 8 Disks RAID5, 8 Disks RDA, 8 Disks RDA, 15 Disks46 46
  47. 47. Analytics counter updatesClick stream events AcunuSensor data Analytics etc • Simple, real-time, incremental analytics • Push processing into ingest phase 47
  48. 48. Questions? tom@acunu.com @tom_wilkie www.acunu.com 48
  49. 49. Introduction Live & historical aggregates...49 49
  50. 50. Realtime trends...50 50
  51. 51. Drill downs and roll ups51 51
  52. 52. Solution Con Scalability $$$ Not realtime Inefficient Recomputation Spartan query semantics => complex, DIY solutions52 52
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×