NoSQL, No sweat with JBoss Data Grid

2,209 views
2,063 views

Published on

How clustered caches evolved in to data grids via NOSQL and Big Data.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,209
On SlideShare
0
From Embeds
0
Number of Embeds
932
Actions
Shares
0
Downloads
24
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

NoSQL, No sweat with JBoss Data Grid

  1. 1. NoSQL: No sweat with JBoss Data Grid Shane Johnson Technical Marketing Manager Tristan Tarrant Principal Software Engineer 10/08/20121 Shane K Johnson / Tristan Tarrant
  2. 2. NoSQL NOSQL2 Shane K Johnson / Tristan Tarrant
  3. 3. Agenda ● Data Stores ● Data Grid ● NOSQL ● Cache ● Big Data ● Use Cases ● Q&A3 Shane K Johnson / Tristan Tarrant
  4. 4. Data Stores ● Key / Value ● Document ● Graph ● Column Family ● And more...4 Shane K Johnson / Tristan Tarrant
  5. 5. Data Grid?5 Shane K Johnson / Tristan Tarrant
  6. 6. 6 Shane K Johnson / Tristan Tarrant
  7. 7. 7 Shane K Johnson / Tristan Tarrant
  8. 8. 8 Shane K Johnson / Tristan Tarrant
  9. 9. NOSQL ● Elasticity ● Distributed Data ● Concurrency ● CAP Theorem ● Flexibility9 Shane K Johnson / Tristan Tarrant
  10. 10. Elasticity ● Node Discovery ● Failure Detection10 Shane K Johnson / Tristan Tarrant
  11. 11. How?11 Shane K Johnson / Tristan Tarrant
  12. 12. JBoss Data Grid is built on a reliable group membership protocol: JGroups.12 Shane K Johnson / Tristan Tarrant
  13. 13. Distributed Data13 Shane K Johnson / Tristan Tarrant
  14. 14. Replicated14 Shane K Johnson / Tristan Tarrant
  15. 15. Distributed15 Shane K Johnson / Tristan Tarrant
  16. 16. How?16 Shane K Johnson / Tristan Tarrant
  17. 17. Consistent Hashing JBoss Data Grid Implementation: MurmurHash317 Shane K Johnson / Tristan Tarrant
  18. 18. Hash Wheel18 Shane K Johnson / Tristan Tarrant
  19. 19. Virtual Nodes19 Shane K Johnson / Tristan Tarrant
  20. 20. Linear Scaling20 Shane K Johnson / Tristan Tarrant
  21. 21. Concurrency21 Shane K Johnson / Tristan Tarrant
  22. 22. How?22 Shane K Johnson / Tristan Tarrant
  23. 23. Multi Version Concurrency Control23 Shane K Johnson / Tristan Tarrant
  24. 24. Internals ● Transactions ● 2 PC ● Isolation Level ● Read Committed ● Repeatable Read ● Locking ● Optimistic ● Pessimistic ● Write Skew ● Version – Vector Clocks24 Shane K Johnson / Tristan Tarrant
  25. 25. Consistency25 Shane K Johnson / Tristan Tarrant
  26. 26. CAP Theorem Eric Brewer26 Shane K Johnson / Tristan Tarrant
  27. 27. CAP Theorem ● Consistency ● Availability ● Partition Tolerance27 Shane K Johnson / Tristan Tarrant
  28. 28. JBoss Data Grid + CAP Theorem ● No Physical Partition ● Consistent and Available (C + A) ● Physical Partition ● Available (A + P) ● Pseudo Partition (e.g. Unresponsive Node) ● Consistent or Available (C + P / A + P)28 Shane K Johnson / Tristan Tarrant
  29. 29. Flexibility29 Shane K Johnson / Tristan Tarrant
  30. 30. Flexibility ● Replicated Data ● Replication Queue ● State Transfer – Enable / Disabled ● Distributed Data ● Number of Owners ● Rehash – Enable / Disable ● Communication – Synchronous / Asynchronous ● Isolation – Read Committed / Repeatable Read ● Locking – Optimistic / Pessimistic30 Shane K Johnson / Tristan Tarrant
  31. 31. 31 Shane K Johnson / Tristan Tarrant
  32. 32. Caching and Data Grids for JEE Caching Data Grids JSR-107 JSR-34732 Shane K Johnson / Tristan Tarrant
  33. 33. Caching in Java ● Developers have been doing it forever ● To increase performance ● To offload legacy data-stores from unnecessary requests ● Home-brew approach based on Hashtables and Maps ● Many Free and commercial libraries but... ● … no Standard !33 Shane K Johnson / Tristan Tarrant
  34. 34. JSR-107: Caching for JEE ● Local (single JVM) and Distributed (multiple JVMs) caches ● CacheManager: a way to obtain caches ● Cache, “inspired” by the Map API with extensions for entry expiration and additional atomic operations ● A Cache Lifecycle (starting, stopping) ● Entry Listeners for specific events ● Optional features: JTA support and annotations ● One of the oldest JSRs, dormant for a long time, recently revived by JSR-34734 Shane K Johnson / Tristan Tarrant
  35. 35. And now ? ● Now that Ive put a lot of data in my distributed cache, what can I do with it ? ● And most importantly... ● HOW ?35 Shane K Johnson / Tristan Tarrant
  36. 36. Multiple clustering options ● Replication ● All nodes have all of the data. ● Grid Size == smallest node ● Distribution ● The Grid maintains n copies of each time of data on different nodes ● Grid Size == total size / n36 Shane K Johnson / Tristan Tarrant
  37. 37. We like asynchronous ● So much that we want it in the API: ● Future<V> getAsync(K); ● Future<V> getAndPut(K, V);37 Shane K Johnson / Tristan Tarrant
  38. 38. Keeping things close together ● If I need to access semantically-close data quickly, why not keep it on the same node ? ● Grouping API ● Distribution per-group and not per-key ● Via annotations ● Via a Grouper class38 Shane K Johnson / Tristan Tarrant
  39. 39. Eventual consistency ● One step further than asynchronous clustering for higher performance ● Entries are tagged with a version (e.g. a timestamp or a time-based UUID): newer versions will eventually replace all older versions in the cluster ● Applications retrieving data may get an older entry, which may be “good enough”39 Shane K Johnson / Tristan Tarrant
  40. 40. Big Data40 Shane K Johnson / Tristan Tarrant
  41. 41. Remote Query41 Shane K Johnson / Tristan Tarrant
  42. 42. Distributed Query42 Shane K Johnson / Tristan Tarrant
  43. 43. Performing parallel computation ● Distributed Executors ● Run on all nodes where a cache exists ● Each executor works on the slice of data local to itself ● Fastest access ● Parallelization of operations ● Usually returns43 Shane K Johnson / Tristan Tarrant
  44. 44. Map / Reduce ● A mapper function iterates through a set of key/values transforming them and sending them to a collector void map(KIn, VIn, Collector<KOut, Vout>) ● A reducer works through the collected values for each key, returning a single value VOut reduce(KOut, Iterator<VOut>) ● Finally a collator processes the reduced key/values and returns a result to the invoker R collate(Map<KOut, VOut> reducedResults)44 Shane K Johnson / Tristan Tarrant
  45. 45. Use Cases45 Shane K Johnson / Tristan Tarrant
  46. 46. Replicated Use Case ● Finance ● Master / Slave ● High Availability ● Failover ● Performance + Consistency ● Data – Lifespan ● Servers – Few ● Memory – Medium46 Shane K Johnson / Tristan Tarrant
  47. 47. Distributed Use Case #1 ● Telecom / Media ● Performance > Consistency ● Data ● Infinite ● Calculated ● Servers – Few ● Memory – Large47 Shane K Johnson / Tristan Tarrant
  48. 48. Distributed Use Case #2 ● Telecom ● Consistency > Performance ● Data ● Continuous ● Limited Lifespan ● Servers – Many ● Memory - Normal48 Shane K Johnson / Tristan Tarrant
  49. 49. Q&A Look for a follow up on the howtojboss.com blog.49 Shane K Johnson / Tristan Tarrant
  50. 50. Thanks for joining us.50 Shane K Johnson / Tristan Tarrant

×