Cassandra as Memcache Edward Capriolo Media6Degrees.com
What we learned  in Operating Systems <ul><li>CPU (and registers) - Super FAST!
Main Memory - Fast
Hard Disks - Slow </li></ul>
What has changed since my first computer <ul><li>100 MHZ
8 MB RAM
1 GB Disk
14.4kbps Modem
686 Windowz 3.11
Packard Bell </li></ul><ul><li>Multiple Cores
@ 4GHZ
2GB RAM
2TB Disk
1/10Gb Ethernet
64 bit FC 14
Sadly no more Packard bell </li></ul>
The Present Situation <ul><li>Computers are  not  and  never will be  fast or big enough
Until they take over and then they will be too fast and too big </li></ul>
Traditional two tier  Web Application  <ul><li>User facing tier  </li><ul><ul><li>Usually Apache|Tomcat|...
Speaks some CGI alternative php|jsp|cfm|...
Logging
Display </li></ul></ul><li>Back end </li><ul><ul><li>Usually an RDBMS
Stores and indexes data
Upcoming SlideShare
Loading in...5
×

Cassandra as Memcache

14,836

Published on

Cassandra, TTL, used Memcache.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
14,836
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
76
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Cassandra as Memcache

  1. 1. Cassandra as Memcache Edward Capriolo Media6Degrees.com
  2. 2. What we learned in Operating Systems <ul><li>CPU (and registers) - Super FAST!
  3. 3. Main Memory - Fast
  4. 4. Hard Disks - Slow </li></ul>
  5. 5. What has changed since my first computer <ul><li>100 MHZ
  6. 6. 8 MB RAM
  7. 7. 1 GB Disk
  8. 8. 14.4kbps Modem
  9. 9. 686 Windowz 3.11
  10. 10. Packard Bell </li></ul><ul><li>Multiple Cores
  11. 11. @ 4GHZ
  12. 12. 2GB RAM
  13. 13. 2TB Disk
  14. 14. 1/10Gb Ethernet
  15. 15. 64 bit FC 14
  16. 16. Sadly no more Packard bell </li></ul>
  17. 17. The Present Situation <ul><li>Computers are not and never will be fast or big enough
  18. 18. Until they take over and then they will be too fast and too big </li></ul>
  19. 19. Traditional two tier Web Application <ul><li>User facing tier </li><ul><ul><li>Usually Apache|Tomcat|...
  20. 20. Speaks some CGI alternative php|jsp|cfm|...
  21. 21. Logging
  22. 22. Display </li></ul></ul><li>Back end </li><ul><ul><li>Usually an RDBMS
  23. 23. Stores and indexes data
  24. 24. Supports a data abstraction and manipulation language </li></ul></ul></ul>
  25. 25. Simple Schema <ul><li>create table user (
  26. 26. id int auto_increment, name varchar UNIQUE,
  27. 27. pass varchar
  28. 28. )
  29. 29. create table book ( id int auto_increment, name varchar 25 unique, author varchar 25
  30. 30. )
  31. 31. Create table users_books ( uid int , bid int , unique (uid,bid), index (bid)
  32. 32. ) </li></ul>
  33. 33. Some Queries you might see (user login) <ul><li>Select id,pass from users where user.name=?
  34. 34. Totally random queries based on user login
  35. 35. Not often read - may not be helpful to cache </li></ul>
  36. 36. Some queries you might see (Books a user has read) <ul><li>Select user.name, book.name
  37. 37. FROM user JOIN users_books
  38. 38. ON user.id=users_books.uid
  39. 39. JOIN book ON book.id=bid
  40. 40. WHERE user.id=?
  41. 41. More complex query
  42. 42. Two join conditions
  43. 43. Result might be on users start page
  44. 44. Result might be often used by algorithms </li></ul>
  45. 45. Some queries you might see (count all the read books) <ul><li>Select user_books.bid, book.name, count(*) from user_books inner join books on user_books.bid=book.id group by user_books.bid, book.name
  46. 46. No where clause!
  47. 47. Possible table scan
  48. 48. Possible intermediate results to temp file
  49. 49. Result displayed on main index page </li></ul>
  50. 50. How fast are these queries? <ul><li>Trick question!
  51. 51. How much data? </li><ul><ul><li>The Log-O for 'small' data sets is negligible </li></ul></ul><li>How fast are the disks? </li><ul><ul><li>Streaming much faster then seeking* </li></ul></ul><li>How many QPS? </li><ul><ul><li>More requests means more contention </li></ul></ul><li>How much RAM? </li><ul><ul><li>Unallocated RAM works as page cache... </li></ul></ul></ul>
  52. 52. Wait..Page Cache... what? <ul><li>Virtual File System or VFS cache
  53. 53. RAM not in use by a process
  54. 54. Used to Cache Disk
  55. 55. Blocks read often get cached in RAM
  56. 56. large disk to RAM ratio reduces hit chance </li></ul>
  57. 57. Scaling RDBMS challenges <ul><li>Scaling up </li><ul><ul><li>More RAM, DISK
  58. 58. Upper limit </li></ul></ul><li>Adding Slaves </li><ul><ul><li>Add read capacity
  59. 59. Does not add write capacity
  60. 60. Monitoring/fixing replication </li></ul></ul><li>Shard-ed </li><ul><ul><li>Possibly giving up DB features
  61. 61. Re-shard with growth </li></ul></ul></ul>
  62. 62. Enter Memcache <ul><li>Key value store with no persistence*
  63. 63. Works with memory slabs
  64. 64. Set a key, value, and a Time To Live
  65. 65. Typically client controlled sharing
  66. 66. Normal Use Case </li><ul><ul><li>Check cache
  67. 67. If found in cache return
  68. 68. Else query and save in cache </li></ul></ul><li>Save resource by not re-querying mostly static, non transactional, and non time sensitive data </li></ul>
  69. 69. Memcache...Good Things <ul><li>More control of cache then VFS cache
  70. 70. Saves web server memory vs HttpSession
  71. 71. Fast to store and access data
  72. 72. Simple to use
  73. 73. Clients for many languages </li></ul>
  74. 74. Memcache (possibly not so good things) <ul><li>Memcache empty on shutdown
  75. 75. 8GB hash table better then 8GB more in your database machine?
  76. 76. Another tier to manage
  77. 77. Is it scalable?... </li></ul>
  78. 78. A highly un-suggested deployment
  79. 79. Enter Cassandra... <ul><li>Data sharding and replication
  80. 80. Writing </li><ul><ul><li>Structured log format
  81. 81. Linear Writes to sorted memtable
  82. 82. Memtables flush (time,size,ops) </li></ul></ul><li>Reading </li><ul><ul><li>VFS Cache
  83. 83. Bloom filters
  84. 84. Row Cache
  85. 85. Key Cache </li></ul></ul><li>0.7.X brings TTL fields! </li></ul>
  86. 86. So then... Cassandra is faster then memcache? <ul><li>No! </li><ul><ul><li>Memcache is an in memory datastore
  87. 87. Cassandra has to persist data </li></ul></ul><li>But may be faster, more efficient, and easier to manage then separate memcache + database tier </li></ul>
  88. 88. Configuration 1: Defacto Standard <ul><li>5 Nodes
  89. 89. Replication Factor = 3
  90. 90. Key Cache
  91. 91. Results in: </li><ul><ul><li>Good Performance
  92. 92. Strong consistency
  93. 93. Highly fault tolerant </li></ul></ul></ul>
  94. 94. Configuration 2: Do not care about stale reads <ul><li>5 nodes
  95. 95. Replication Factor = 3
  96. 96. Row cache
  97. 97. Read Repair Chance = 0 %
  98. 98. Results in: </li><ul><ul><li>1/3 rd the read traffic
  99. 99. Minor possibility of not found/out of sync data (not much different then memcache) </li></ul></ul></ul>
  100. 100. Configuration 3: Snitches get stitches <ul><li>5 nodes
  101. 101. Replication Factor = 3
  102. 102. Row Cache
  103. 103. Read Repair Chance = 0%
  104. 104. Dynamic Snitches + Pinning
  105. 105. Results in: </li><ul><ul><li>Reads should hit the same node not random replica
  106. 106. Caches on each node have less duplication </li></ul></ul></ul>
  107. 107. Configuration 4: Little Data, Big Request load! <ul><li>20 nodes
  108. 108. Replication Factor 20! (only this keyspace)
  109. 109. Row Cache
  110. 110. Read Repair Chance = 0%
  111. 111. Results in: </li><ul><ul><li>20 nodes capable of serving this reads!
  112. 112. Writes do not scale (like master-slave replication) </li></ul></ul></ul>
  113. 113. To recap... Cassandra <ul><li>0.7.X brings Time To Live
  114. 114. 0.7.X brings Read Repair Chance
  115. 115. Can serve purely from memory
  116. 116. Can serve from disk
  117. 117. Replication Factor, Caching, Sharding many ways to tune
  118. 118. General Awesomeness </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×