Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007

Breaking the Clustering Limits Baruch Sadogursky Consultant, AlphaCSP

Agenda Clustering Definition Why Clustering? Evolution of Clustering in Java Grids Implementations Other Solutions

Clustering Definition Group of tightly coupled computers Work together closely Viewed as single computer Commonly connected through fast LANs

Motivation Deployed to improve Scalability & load-balancing Throughput (e.g. hits per second) Fail-over Availability (e.g. 99.999%) Resource virtualization Much more cost-effective than single computers

Why Clustering? Why not single machine? Moore’s Law is dead Why not adding CPUs? Threads, locks and context switches are expensive Why not via DB? DB access sloooow Single point of failure Cluster DB?

Accessing Data According to the Long Tail theory, 20% of objects used 80% of the time We need distributed access to those 20%

Evolution of Clustering in Java In the beginning there where dinosaurs application servers and J2EE programming model Clustering aspect never made it to the Java EE spec Proprietary solutions

Classical Clustering Replicate the state between the nodes Provides stateful beans scalability Provides entity beans caching Provides HTTP session replication Balance the load Smart Java client HTTP load-balancer Central node manages the cluster topology Slow detection of topology changes New coordinator elected by voting (slow)

Coordinating the Cluster According to the Eight Fallacies of Distributed Computing: The network is reliable Topology doesn't change According to real life Communication fails Nodes leave and join Coordinator election in case of failure is expensive

Scary, scary clustering “Avoid broken mirrors, Friday the 13 th , multithreading and clustered stateful applications” Poor implementations gave clustering a bad name

Clustered Caches Drawbacks Copying all the data across cluster can’t provide linear scalability More nodes you have, more copying occurs Topology communication slows the cluster down Cache needs eviction policy to deal with stale data

Clustered Caches Drawbacks Operates only on simple and serializable types Mutated objects have to be returned to the cache Coarse-grained (whole object is replicated) Can’t handle object graphs Serialization issue

Evolution of Clustering in Java Spring, JBoss micro-container, Pico container and others brought the POJO to enterprise world The rise of the POJO standardized the clustering services Clustering market is on fire

From Cache to Grid Computing “Caches” are out, “Grids” are in… So what is “Grid Computing”? There is no technology called "Grid Computing“

From Cache to Grid Computing Definition of set of distributed computing use cases that have certain technical aspects in common Data Grids Computational Grids On-Demand Grids First two are relevant for Java Enterprise applications clustering On-Demand Grid is about leasing computing time

Data Grids Split lots of data to subsets of data Each node gets only subset of data it currently needs Combine results from the different nodes Also natural fail-over State replication

Computational Grids Split long task into multiple sub-tasks Execute each sub-task in parallel on a separate computer Combine results from the sub-tasks

Functional Languages and Grids Functional languages considered the best tool for grid programming Full statelessness Isolated functions Get all the needed data via parameters Scala compiles to JVM bytecode www.scala-lang.org

Map/Reduce Example Input for mapping: <data, “two witches watch two watches; which witch watch which watch?”> Map output (and reduce input): <two, 1> <witch, 1> <watch, 1> <two, 2> <watch, 2> <which, 1> <witch, 2> <watch, 3> <which, 2> <watch, 4> Reduce output: <two, 2> <witch, 2> <watch, 4> <which, 2>

Map/Reduce Example Both map() and reduce() can be easily distributed, since they are stateless Google uses their implementation for analyzing the Internet labs.google.com/papers/mapreduce.html

Java ComputeGrid Vision Sun spec for Service Oriented Architectures www.jini.org Released in 1998(!) and was totally ahead its time Didn’t make to J2EE spec and was pretty abandoned Basis for JavaSpaces The concept is sending code over the wire Pure Java Code executed locally No network exceptions during the execution

Java ComputeGrid Vision JavaSpaces - “Space” based technology javaspaces.org/ “ Space” definition: A place on the network to share and store objects Both data and tasks Associative shared memory for the network Unifies storage and communications

EHCache OpenSource ehcache.sourceforge.net Fast In-process caching Asynchronous replication Small 110KB Simple RMI communication Map based API Inc. JCache (JSR 107) implementation Never released

GlassFish Shoal Backbone for GlassFish AS clustering Open Source at dev.java.net Can be used standalone Group Management Service (GMS) centric GMS Themes Group Sensory-Action Theme Lifecycle notifications Group Communication Theme Group communications provider SPI JXTA - default Can plugin JGroups insteadGroup communications API Send and receive plain messages Shared or Distributed Storage Theme Map implementation Concurrent

Oracle Tangosol Coherence DataGrid Fast! Planned as clustering backbone for Oracle AS Can be used standalone Commercial Oracle product now Single JAR

Oracle Tangosol Coherence “ Organic cluster” – all the nodes are equal Partitioned Topology Every node holds subset of data Replicated for fail-over Replicated Topology Behaves like cache Every node holds all the data Fast elimination (no voting)

Oracle Tangosol Coherence Supports queries and indices Map interface implementation Lifecycle listeners Drawbacks Usual cache drawbacks Closed source Costly

JBoss POJO Cache Subproject of JBossCache Clustering backbone of JBoss AS OpenSource at JBoss labs http://labs.jboss.com/jbosscache Transactional Bytecode instrumented POJOs Don’t have to be serializable

JBoss POJO Cache Fine-grained replication Graphs are allowed Changes detection POJOs need to be annotated and attached to the cache Tree implementation JGroups communication

GigaSpaces JavaSpaces implementation gigaspaces.com OpenSpaces JavaSpaces implementation Spring configuration OpenSource Enterprise DataGrid Map interface Queries Lifecycle listeners Etc. Commercial

GigaSpaces XAP XAP – eXtreme Application Platform Kind of application server Processing Units have strongly defined directory structure (like container) Total solution Relies on “OpenSpaces” Commercial Start-ups special free license

OpenTerracotta JVM is taking care of cross-platform, garbage collection, threading, etc. Terracotta takes clustering concern out to the JVM

OpenTerracotta Clustered JVM semantics OpenSource terracotta.org Network Attached Memory Looks like RAM to the application Runs both in JVM level (JVM plugin) and as separate process Two level cache

JVM Level Simulation JVM abstracts multi-platform concerns It should also abstract multi-nodes concerns Terracotta adds it to the JVM Simulation of single JVM semantics: Garbage collection References Threads synchronization Object identity

OpenTerracotta Bytecode instrumentation is used to mimic JVM behavior Currently supports only Sun’s JVM Support for IBM and JRockIt planned soon Features Low development impact - no in-advance clustering planning needed Linear scalability No APIs Declarative - marking what is clustered No serialization

OpenTerracotta Architecture The Client Nodes - run on a standard JVM Terracotta is installed to the JVM The Terracotta Server Cluster - provides the clustering intelligence Each server is a Java process One Active Server One or many Passive Servers Shared Storage - share the state for the passive server(s) Server/Client architecture considered by some as the drawback of Terracotta

Other Solutions GridGain – map/reduce computation grid gridgain.com Hadoop – map/reduce Java implementation lucene.apache.org/hadoop Globus Toolkit - Open Grid Services Architecture RI globus.org

Conclusion Cache, Data grid, Compute grid or Clustered VM? Open source or commercial? API driven or API less? Container or JAR?

Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007

More Related Content

What's hot

Similar to Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007

More from Baruch Sadogursky

Recently uploaded

Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007