• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NoSQL, No sweat with JBoss Data Grid
 

NoSQL, No sweat with JBoss Data Grid

on

  • 1,775 views

How clustered caches evolved in to data grids via NOSQL and Big Data.

How clustered caches evolved in to data grids via NOSQL and Big Data.

Statistics

Views

Total Views
1,775
Views on SlideShare
944
Embed Views
831

Actions

Likes
2
Downloads
15
Comments
0

10 Embeds 831

http://howtojboss.com 472
http://planet.jboss.org 246
http://www.dcjbug.com 72
http://blog-dcjbug.rhcloud.com 17
http://www.dc-jbug.com 15
http://www.linkedin.com 3
http://webcache.googleusercontent.com 2
http://blog.dcjbug.com 2
http://www.365dailyjournal.com 1
https://www.linkedin.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    NoSQL, No sweat with JBoss Data Grid NoSQL, No sweat with JBoss Data Grid Presentation Transcript

    • NoSQL: No sweat with JBoss Data Grid Shane Johnson Technical Marketing Manager Tristan Tarrant Principal Software Engineer 10/08/20121 Shane K Johnson / Tristan Tarrant
    • NoSQL NOSQL2 Shane K Johnson / Tristan Tarrant
    • Agenda ● Data Stores ● Data Grid ● NOSQL ● Cache ● Big Data ● Use Cases ● Q&A3 Shane K Johnson / Tristan Tarrant
    • Data Stores ● Key / Value ● Document ● Graph ● Column Family ● And more...4 Shane K Johnson / Tristan Tarrant
    • Data Grid?5 Shane K Johnson / Tristan Tarrant
    • 6 Shane K Johnson / Tristan Tarrant
    • 7 Shane K Johnson / Tristan Tarrant
    • 8 Shane K Johnson / Tristan Tarrant
    • NOSQL ● Elasticity ● Distributed Data ● Concurrency ● CAP Theorem ● Flexibility9 Shane K Johnson / Tristan Tarrant
    • Elasticity ● Node Discovery ● Failure Detection10 Shane K Johnson / Tristan Tarrant
    • How?11 Shane K Johnson / Tristan Tarrant
    • JBoss Data Grid is built on a reliable group membership protocol: JGroups.12 Shane K Johnson / Tristan Tarrant
    • Distributed Data13 Shane K Johnson / Tristan Tarrant
    • Replicated14 Shane K Johnson / Tristan Tarrant
    • Distributed15 Shane K Johnson / Tristan Tarrant
    • How?16 Shane K Johnson / Tristan Tarrant
    • Consistent Hashing JBoss Data Grid Implementation: MurmurHash317 Shane K Johnson / Tristan Tarrant
    • Hash Wheel18 Shane K Johnson / Tristan Tarrant
    • Virtual Nodes19 Shane K Johnson / Tristan Tarrant
    • Linear Scaling20 Shane K Johnson / Tristan Tarrant
    • Concurrency21 Shane K Johnson / Tristan Tarrant
    • How?22 Shane K Johnson / Tristan Tarrant
    • Multi Version Concurrency Control23 Shane K Johnson / Tristan Tarrant
    • Internals ● Transactions ● 2 PC ● Isolation Level ● Read Committed ● Repeatable Read ● Locking ● Optimistic ● Pessimistic ● Write Skew ● Version – Vector Clocks24 Shane K Johnson / Tristan Tarrant
    • Consistency25 Shane K Johnson / Tristan Tarrant
    • CAP Theorem Eric Brewer26 Shane K Johnson / Tristan Tarrant
    • CAP Theorem ● Consistency ● Availability ● Partition Tolerance27 Shane K Johnson / Tristan Tarrant
    • JBoss Data Grid + CAP Theorem ● No Physical Partition ● Consistent and Available (C + A) ● Physical Partition ● Available (A + P) ● Pseudo Partition (e.g. Unresponsive Node) ● Consistent or Available (C + P / A + P)28 Shane K Johnson / Tristan Tarrant
    • Flexibility29 Shane K Johnson / Tristan Tarrant
    • Flexibility ● Replicated Data ● Replication Queue ● State Transfer – Enable / Disabled ● Distributed Data ● Number of Owners ● Rehash – Enable / Disable ● Communication – Synchronous / Asynchronous ● Isolation – Read Committed / Repeatable Read ● Locking – Optimistic / Pessimistic30 Shane K Johnson / Tristan Tarrant
    • 31 Shane K Johnson / Tristan Tarrant
    • Caching and Data Grids for JEE Caching Data Grids JSR-107 JSR-34732 Shane K Johnson / Tristan Tarrant
    • Caching in Java ● Developers have been doing it forever ● To increase performance ● To offload legacy data-stores from unnecessary requests ● Home-brew approach based on Hashtables and Maps ● Many Free and commercial libraries but... ● … no Standard !33 Shane K Johnson / Tristan Tarrant
    • JSR-107: Caching for JEE ● Local (single JVM) and Distributed (multiple JVMs) caches ● CacheManager: a way to obtain caches ● Cache, “inspired” by the Map API with extensions for entry expiration and additional atomic operations ● A Cache Lifecycle (starting, stopping) ● Entry Listeners for specific events ● Optional features: JTA support and annotations ● One of the oldest JSRs, dormant for a long time, recently revived by JSR-34734 Shane K Johnson / Tristan Tarrant
    • And now ? ● Now that Ive put a lot of data in my distributed cache, what can I do with it ? ● And most importantly... ● HOW ?35 Shane K Johnson / Tristan Tarrant
    • Multiple clustering options ● Replication ● All nodes have all of the data. ● Grid Size == smallest node ● Distribution ● The Grid maintains n copies of each time of data on different nodes ● Grid Size == total size / n36 Shane K Johnson / Tristan Tarrant
    • We like asynchronous ● So much that we want it in the API: ● Future<V> getAsync(K); ● Future<V> getAndPut(K, V);37 Shane K Johnson / Tristan Tarrant
    • Keeping things close together ● If I need to access semantically-close data quickly, why not keep it on the same node ? ● Grouping API ● Distribution per-group and not per-key ● Via annotations ● Via a Grouper class38 Shane K Johnson / Tristan Tarrant
    • Eventual consistency ● One step further than asynchronous clustering for higher performance ● Entries are tagged with a version (e.g. a timestamp or a time-based UUID): newer versions will eventually replace all older versions in the cluster ● Applications retrieving data may get an older entry, which may be “good enough”39 Shane K Johnson / Tristan Tarrant
    • Big Data40 Shane K Johnson / Tristan Tarrant
    • Remote Query41 Shane K Johnson / Tristan Tarrant
    • Distributed Query42 Shane K Johnson / Tristan Tarrant
    • Performing parallel computation ● Distributed Executors ● Run on all nodes where a cache exists ● Each executor works on the slice of data local to itself ● Fastest access ● Parallelization of operations ● Usually returns43 Shane K Johnson / Tristan Tarrant
    • Map / Reduce ● A mapper function iterates through a set of key/values transforming them and sending them to a collector void map(KIn, VIn, Collector<KOut, Vout>) ● A reducer works through the collected values for each key, returning a single value VOut reduce(KOut, Iterator<VOut>) ● Finally a collator processes the reduced key/values and returns a result to the invoker R collate(Map<KOut, VOut> reducedResults)44 Shane K Johnson / Tristan Tarrant
    • Use Cases45 Shane K Johnson / Tristan Tarrant
    • Replicated Use Case ● Finance ● Master / Slave ● High Availability ● Failover ● Performance + Consistency ● Data – Lifespan ● Servers – Few ● Memory – Medium46 Shane K Johnson / Tristan Tarrant
    • Distributed Use Case #1 ● Telecom / Media ● Performance > Consistency ● Data ● Infinite ● Calculated ● Servers – Few ● Memory – Large47 Shane K Johnson / Tristan Tarrant
    • Distributed Use Case #2 ● Telecom ● Consistency > Performance ● Data ● Continuous ● Limited Lifespan ● Servers – Many ● Memory - Normal48 Shane K Johnson / Tristan Tarrant
    • Q&A Look for a follow up on the howtojboss.com blog.49 Shane K Johnson / Tristan Tarrant
    • Thanks for joining us.50 Shane K Johnson / Tristan Tarrant