NoSQL: No sweat with JBoss Data Grid    Shane Johnson    Technical Marketing Manager    Tristan Tarrant    Principal Softw...
NoSQL NOSQL2     Shane K Johnson / Tristan Tarrant
Agenda    ●   Data Stores    ●   Data Grid         ●   NOSQL         ●   Cache    ●   Big Data    ●   Use Cases    ●   Q&A...
Data Stores    ●   Key / Value    ●   Document    ●   Graph    ●   Column Family    ●   And more...4                      ...
Data Grid?5   Shane K Johnson / Tristan Tarrant
6   Shane K Johnson / Tristan Tarrant
7   Shane K Johnson / Tristan Tarrant
8   Shane K Johnson / Tristan Tarrant
NOSQL    ●   Elasticity    ●   Distributed Data    ●   Concurrency    ●   CAP Theorem    ●   Flexibility9                 ...
Elasticity     ●   Node Discovery     ●   Failure Detection10                           Shane K Johnson / Tristan Tarrant
How?11   Shane K Johnson / Tristan Tarrant
JBoss Data Grid is built on a reliable group          membership protocol: JGroups.12                  Shane K Johnson / T...
Distributed Data13    Shane K Johnson / Tristan Tarrant
Replicated14           Shane K Johnson / Tristan Tarrant
Distributed15            Shane K Johnson / Tristan Tarrant
How?16   Shane K Johnson / Tristan Tarrant
Consistent Hashing     JBoss Data Grid Implementation: MurmurHash317                 Shane K Johnson / Tristan Tarrant
Hash Wheel18           Shane K Johnson / Tristan Tarrant
Virtual Nodes19              Shane K Johnson / Tristan Tarrant
Linear Scaling20               Shane K Johnson / Tristan Tarrant
Concurrency21   Shane K Johnson / Tristan Tarrant
How?22   Shane K Johnson / Tristan Tarrant
Multi Version Concurrency Control23               Shane K Johnson / Tristan Tarrant
Internals     ●   Transactions          ●   2 PC          ●   Isolation Level               ●   Read Committed            ...
Consistency25   Shane K Johnson / Tristan Tarrant
CAP Theorem         Eric Brewer26   Shane K Johnson / Tristan Tarrant
CAP Theorem     ●   Consistency     ●   Availability     ●   Partition Tolerance27                             Shane K Joh...
JBoss Data Grid + CAP Theorem     ●   No Physical Partition          ●   Consistent and Available (C + A)     ●   Physical...
Flexibility29   Shane K Johnson / Tristan Tarrant
Flexibility     ●   Replicated Data          ●   Replication Queue          ●   State Transfer – Enable / Disabled     ●  ...
31   Shane K Johnson / Tristan Tarrant
Caching and Data Grids for JEE       Caching                                            Data Grids                 JSR-107...
Caching in Java     ●   Developers have been doing it forever          ●   To increase performance          ●   To offload...
JSR-107: Caching for JEE     ●   Local (single JVM) and Distributed (multiple JVMs)         caches     ●   CacheManager: a...
And now ?     ●   Now that Ive put a lot of data in my distributed cache,         what can I do with it ?     ●   And most...
Multiple clustering options     ●   Replication     ●   All nodes have all of the data.     ●   Grid Size == smallest node...
We like asynchronous     ●   So much that we want it in the API:     ●   Future<V> getAsync(K);     ●   Future<V> getAndPu...
Keeping things close together     ●   If I need to access semantically-close data quickly, why         not keep it on the ...
Eventual consistency     ●   One step further than asynchronous clustering for         higher performance     ●   Entries ...
Big Data40   Shane K Johnson / Tristan Tarrant
Remote Query41             Shane K Johnson / Tristan Tarrant
Distributed Query42                  Shane K Johnson / Tristan Tarrant
Performing parallel computation     ●   Distributed Executors     ●   Run on all nodes where a cache exists     ●   Each e...
Map / Reduce     ●   A mapper function iterates through a set of key/values         transforming them and sending them to ...
Use Cases45   Shane K Johnson / Tristan Tarrant
Replicated Use Case     ●   Finance         ●   Master / Slave         ●   High Availability         ●   Failover         ...
Distributed Use Case #1     ●   Telecom / Media          ●   Performance > Consistency          ●   Data               ●  ...
Distributed Use Case #2     ●   Telecom         ●   Consistency > Performance         ●   Data              ●   Continuous...
Q&A     Look for a follow up on the howtojboss.com blog.49                    Shane K Johnson / Tristan Tarrant
Thanks for joining us.50       Shane K Johnson / Tristan Tarrant
Upcoming SlideShare
Loading in...5
×

NoSQL, No sweat with JBoss Data Grid

1,851

Published on

How clustered caches evolved in to data grids via NOSQL and Big Data.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,851
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
22
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

NoSQL, No sweat with JBoss Data Grid

  1. 1. NoSQL: No sweat with JBoss Data Grid Shane Johnson Technical Marketing Manager Tristan Tarrant Principal Software Engineer 10/08/20121 Shane K Johnson / Tristan Tarrant
  2. 2. NoSQL NOSQL2 Shane K Johnson / Tristan Tarrant
  3. 3. Agenda ● Data Stores ● Data Grid ● NOSQL ● Cache ● Big Data ● Use Cases ● Q&A3 Shane K Johnson / Tristan Tarrant
  4. 4. Data Stores ● Key / Value ● Document ● Graph ● Column Family ● And more...4 Shane K Johnson / Tristan Tarrant
  5. 5. Data Grid?5 Shane K Johnson / Tristan Tarrant
  6. 6. 6 Shane K Johnson / Tristan Tarrant
  7. 7. 7 Shane K Johnson / Tristan Tarrant
  8. 8. 8 Shane K Johnson / Tristan Tarrant
  9. 9. NOSQL ● Elasticity ● Distributed Data ● Concurrency ● CAP Theorem ● Flexibility9 Shane K Johnson / Tristan Tarrant
  10. 10. Elasticity ● Node Discovery ● Failure Detection10 Shane K Johnson / Tristan Tarrant
  11. 11. How?11 Shane K Johnson / Tristan Tarrant
  12. 12. JBoss Data Grid is built on a reliable group membership protocol: JGroups.12 Shane K Johnson / Tristan Tarrant
  13. 13. Distributed Data13 Shane K Johnson / Tristan Tarrant
  14. 14. Replicated14 Shane K Johnson / Tristan Tarrant
  15. 15. Distributed15 Shane K Johnson / Tristan Tarrant
  16. 16. How?16 Shane K Johnson / Tristan Tarrant
  17. 17. Consistent Hashing JBoss Data Grid Implementation: MurmurHash317 Shane K Johnson / Tristan Tarrant
  18. 18. Hash Wheel18 Shane K Johnson / Tristan Tarrant
  19. 19. Virtual Nodes19 Shane K Johnson / Tristan Tarrant
  20. 20. Linear Scaling20 Shane K Johnson / Tristan Tarrant
  21. 21. Concurrency21 Shane K Johnson / Tristan Tarrant
  22. 22. How?22 Shane K Johnson / Tristan Tarrant
  23. 23. Multi Version Concurrency Control23 Shane K Johnson / Tristan Tarrant
  24. 24. Internals ● Transactions ● 2 PC ● Isolation Level ● Read Committed ● Repeatable Read ● Locking ● Optimistic ● Pessimistic ● Write Skew ● Version – Vector Clocks24 Shane K Johnson / Tristan Tarrant
  25. 25. Consistency25 Shane K Johnson / Tristan Tarrant
  26. 26. CAP Theorem Eric Brewer26 Shane K Johnson / Tristan Tarrant
  27. 27. CAP Theorem ● Consistency ● Availability ● Partition Tolerance27 Shane K Johnson / Tristan Tarrant
  28. 28. JBoss Data Grid + CAP Theorem ● No Physical Partition ● Consistent and Available (C + A) ● Physical Partition ● Available (A + P) ● Pseudo Partition (e.g. Unresponsive Node) ● Consistent or Available (C + P / A + P)28 Shane K Johnson / Tristan Tarrant
  29. 29. Flexibility29 Shane K Johnson / Tristan Tarrant
  30. 30. Flexibility ● Replicated Data ● Replication Queue ● State Transfer – Enable / Disabled ● Distributed Data ● Number of Owners ● Rehash – Enable / Disable ● Communication – Synchronous / Asynchronous ● Isolation – Read Committed / Repeatable Read ● Locking – Optimistic / Pessimistic30 Shane K Johnson / Tristan Tarrant
  31. 31. 31 Shane K Johnson / Tristan Tarrant
  32. 32. Caching and Data Grids for JEE Caching Data Grids JSR-107 JSR-34732 Shane K Johnson / Tristan Tarrant
  33. 33. Caching in Java ● Developers have been doing it forever ● To increase performance ● To offload legacy data-stores from unnecessary requests ● Home-brew approach based on Hashtables and Maps ● Many Free and commercial libraries but... ● … no Standard !33 Shane K Johnson / Tristan Tarrant
  34. 34. JSR-107: Caching for JEE ● Local (single JVM) and Distributed (multiple JVMs) caches ● CacheManager: a way to obtain caches ● Cache, “inspired” by the Map API with extensions for entry expiration and additional atomic operations ● A Cache Lifecycle (starting, stopping) ● Entry Listeners for specific events ● Optional features: JTA support and annotations ● One of the oldest JSRs, dormant for a long time, recently revived by JSR-34734 Shane K Johnson / Tristan Tarrant
  35. 35. And now ? ● Now that Ive put a lot of data in my distributed cache, what can I do with it ? ● And most importantly... ● HOW ?35 Shane K Johnson / Tristan Tarrant
  36. 36. Multiple clustering options ● Replication ● All nodes have all of the data. ● Grid Size == smallest node ● Distribution ● The Grid maintains n copies of each time of data on different nodes ● Grid Size == total size / n36 Shane K Johnson / Tristan Tarrant
  37. 37. We like asynchronous ● So much that we want it in the API: ● Future<V> getAsync(K); ● Future<V> getAndPut(K, V);37 Shane K Johnson / Tristan Tarrant
  38. 38. Keeping things close together ● If I need to access semantically-close data quickly, why not keep it on the same node ? ● Grouping API ● Distribution per-group and not per-key ● Via annotations ● Via a Grouper class38 Shane K Johnson / Tristan Tarrant
  39. 39. Eventual consistency ● One step further than asynchronous clustering for higher performance ● Entries are tagged with a version (e.g. a timestamp or a time-based UUID): newer versions will eventually replace all older versions in the cluster ● Applications retrieving data may get an older entry, which may be “good enough”39 Shane K Johnson / Tristan Tarrant
  40. 40. Big Data40 Shane K Johnson / Tristan Tarrant
  41. 41. Remote Query41 Shane K Johnson / Tristan Tarrant
  42. 42. Distributed Query42 Shane K Johnson / Tristan Tarrant
  43. 43. Performing parallel computation ● Distributed Executors ● Run on all nodes where a cache exists ● Each executor works on the slice of data local to itself ● Fastest access ● Parallelization of operations ● Usually returns43 Shane K Johnson / Tristan Tarrant
  44. 44. Map / Reduce ● A mapper function iterates through a set of key/values transforming them and sending them to a collector void map(KIn, VIn, Collector<KOut, Vout>) ● A reducer works through the collected values for each key, returning a single value VOut reduce(KOut, Iterator<VOut>) ● Finally a collator processes the reduced key/values and returns a result to the invoker R collate(Map<KOut, VOut> reducedResults)44 Shane K Johnson / Tristan Tarrant
  45. 45. Use Cases45 Shane K Johnson / Tristan Tarrant
  46. 46. Replicated Use Case ● Finance ● Master / Slave ● High Availability ● Failover ● Performance + Consistency ● Data – Lifespan ● Servers – Few ● Memory – Medium46 Shane K Johnson / Tristan Tarrant
  47. 47. Distributed Use Case #1 ● Telecom / Media ● Performance > Consistency ● Data ● Infinite ● Calculated ● Servers – Few ● Memory – Large47 Shane K Johnson / Tristan Tarrant
  48. 48. Distributed Use Case #2 ● Telecom ● Consistency > Performance ● Data ● Continuous ● Limited Lifespan ● Servers – Many ● Memory - Normal48 Shane K Johnson / Tristan Tarrant
  49. 49. Q&A Look for a follow up on the howtojboss.com blog.49 Shane K Johnson / Tristan Tarrant
  50. 50. Thanks for joining us.50 Shane K Johnson / Tristan Tarrant
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×