• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
 

Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007

on

  • 1,450 views

 

Statistics

Views

Total Views
1,450
Views on SlideShare
1,449
Embed Views
1

Actions

Likes
1
Downloads
30
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007 Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007 Presentation Transcript

    •  
    • Breaking the Clustering Limits Baruch Sadogursky Consultant, AlphaCSP
    • Cluster at NASA
    • Agenda
      • Clustering Definition
        • Why Clustering?
      • Evolution of Clustering in Java
      • Grids
      • Implementations
      • Other Solutions
    • Clustering Definition
      • Group of tightly coupled computers
      • Work together closely
      • Viewed as single computer
      • Commonly connected through fast LANs
    • Motivation
      • Deployed to improve
        • Scalability & load-balancing
          • Throughput (e.g. hits per second)
        • Fail-over
          • Availability (e.g. 99.999%)
        • Resource virtualization
      • Much more cost-effective than single computers
    • Why Clustering?
      • Why not single machine?
        • Moore’s Law is dead
      • Why not adding CPUs?
        • Threads, locks and context switches are expensive
      • Why not via DB?
        • DB access sloooow
        • Single point of failure
          • Cluster DB?
    • Accessing Data
      • According to the Long Tail theory, 20% of objects used 80% of the time
      • We need distributed access to those 20%
    • Evolution of Clustering in Java
      • In the beginning there where dinosaurs application servers and J2EE programming model
      • Clustering aspect never made it to the Java EE spec
      • Proprietary solutions
    • Classical Clustering
      • Replicate the state between the nodes
        • Provides stateful beans scalability
        • Provides entity beans caching
        • Provides HTTP session replication
      • Balance the load
        • Smart Java client
        • HTTP load-balancer
      • Central node manages the cluster topology
        • Slow detection of topology changes
        • New coordinator elected by voting (slow)
    • Coordinating the Cluster
      • According to the Eight Fallacies of Distributed Computing:
        • The network is reliable
        • Topology doesn't change
      • According to real life
        • Communication fails
        • Nodes leave and join
      • Coordinator election in case of failure is expensive
    • Scary, scary clustering
      • “Avoid broken mirrors, Friday the 13 th , multithreading and clustered stateful applications”
      • Poor implementations gave clustering a bad name
    • Clustered Caches Drawbacks
      • Copying all the data across cluster can’t provide linear scalability
        • More nodes you have, more copying occurs
      • Topology communication slows the cluster down
      • Cache needs eviction policy to deal with stale data
    • Clustered Caches Drawbacks
      • Operates only on simple and serializable types
      • Mutated objects have to be returned to the cache
      • Coarse-grained (whole object is replicated)
      • Can’t handle object graphs
        • Serialization issue
    • Evolution of Clustering in Java
      • Spring, JBoss micro-container, Pico container and others brought the POJO to enterprise world
      • The rise of the POJO standardized the clustering services
      • Clustering market is on fire
    • From Cache to Grid Computing
      • “Caches” are out, “Grids” are in…
      • So what is “Grid Computing”?
      • There is no technology called "Grid Computing“
    • From Cache to Grid Computing
      • Definition of set of distributed computing use cases that have certain technical aspects in common
        • Data Grids
        • Computational Grids
        • On-Demand Grids
      • First two are relevant for Java Enterprise applications clustering
        • On-Demand Grid is about leasing computing time
    • Grid Types
    • Data Grids
    • Data Grids
      • Split lots of data to subsets of data
      • Each node gets only subset of data it currently needs
      • Combine results from the different nodes
      • Also natural fail-over
        • State replication
    • Computational Grids
    • Computational Grids
      • Split long task into multiple sub-tasks
      • Execute each sub-task in parallel on a separate computer
      • Combine results from the sub-tasks
    • Functional Languages and Grids
      • Functional languages considered the best tool for grid programming
      • Full statelessness
      • Isolated functions
        • Get all the needed data via parameters
      • Scala compiles to JVM bytecode
        • www.scala-lang.org
    • Master/Worker
    • Map/Reduce
    • Map/Reduce Example
      • Input for mapping:
        • <data, “two witches watch two watches; which witch watch which watch?”>
      • Map output (and reduce input):
        • <two, 1>
        • <witch, 1>
        • <watch, 1>
        • <two, 2>
        • <watch, 2>
        • <which, 1>
        • <witch, 2>
        • <watch, 3>
        • <which, 2>
        • <watch, 4>
      • Reduce output:
        • <two, 2>
        • <witch, 2>
        • <watch, 4>
        • <which, 2>
    • Map/Reduce Example
      • Both map() and reduce() can be easily distributed, since they are stateless
      • Google uses their implementation for analyzing the Internet
        • labs.google.com/papers/mapreduce.html
    • Java ComputeGrid Vision
      • Sun spec for Service Oriented Architectures
        • www.jini.org
      • Released in 1998(!) and was totally ahead its time
        • Didn’t make to J2EE spec and was pretty abandoned
      • Basis for JavaSpaces
      • The concept is sending code over the wire
        • Pure Java
        • Code executed locally
          • No network exceptions during the execution
    • Java ComputeGrid Vision
      • JavaSpaces - “Space” based technology
        • javaspaces.org/
      • “ Space” definition:
        • A place on the network to share and store objects
          • Both data and tasks
        • Associative shared memory for the network
        • Unifies storage and communications
    • Implementations
    • EHCache
    • EHCache
      • OpenSource
        • ehcache.sourceforge.net
      • Fast
        • In-process caching
        • Asynchronous replication
      • Small
        • 110KB
      • Simple
      • RMI communication
      • Map based API
        • Inc. JCache (JSR 107) implementation
          • Never released
    • EHCache Example
    • GlassFish Shoal
    • GlassFish Shoal
      • Backbone for GlassFish AS clustering
      • Open Source at dev.java.net
      • Can be used standalone
      • Group Management Service (GMS) centric
      • GMS Themes
        • Group Sensory-Action Theme
          • Lifecycle notifications
        • Group Communication Theme
          • Group communications provider SPI
            • JXTA - default
            • Can plugin JGroups insteadGroup communications API
          • Send and receive plain messages
        • Shared or Distributed Storage Theme
          • Map implementation
          • Concurrent
    • JXTA Usage in Shoal
    • Oracle Tangosol Coherence
    • Oracle Tangosol Coherence
      • DataGrid
      • Fast!
      • Planned as clustering backbone for Oracle AS
      • Can be used standalone
      • Commercial
      • Oracle product now
      • Single JAR
    • Coherence Data Grid
    • Oracle Tangosol Coherence
      • “ Organic cluster” – all the nodes are equal
      • Partitioned Topology
        • Every node holds subset of data
          • Replicated for fail-over
      • Replicated Topology
        • Behaves like cache
          • Every node holds all the data
      • Fast elimination (no voting)
    • Oracle Tangosol Coherence
      • Supports queries and indices
      • Map interface implementation
      • Lifecycle listeners
      • Drawbacks
        • Usual cache drawbacks
        • Closed source
        • Costly
    • JBoss POJO Cache
    • JBoss POJO Cache
      • Subproject of JBossCache
        • Clustering backbone of JBoss AS
      • OpenSource at JBoss labs
        • http://labs.jboss.com/jbosscache
      • Transactional
      • Bytecode instrumented POJOs
      • Don’t have to be serializable
    • JBoss POJO Cache
      • Fine-grained replication
      • Graphs are allowed
      • Changes detection
      • POJOs need to be annotated and attached to the cache
      • Tree implementation
      • JGroups communication
    • JBoss POJO Cache Usage
    • JGroups Configuration
    • GigaSpaces
    • GigaSpaces
      • JavaSpaces implementation
      • gigaspaces.com
      • OpenSpaces
        • JavaSpaces implementation
        • Spring configuration
        • OpenSource
      • Enterprise DataGrid
        • Map interface
        • Queries
        • Lifecycle listeners
        • Etc.
        • Commercial
    • GigaSpaces XAP
      • XAP – eXtreme Application Platform
        • Kind of application server
        • Processing Units have strongly defined directory structure (like container)
        • Total solution
      • Relies on “OpenSpaces”
      • Commercial
        • Start-ups special free license
    • GigaSpaces XAP
    • OpenTerracotta
    • OpenTerracotta
      • JVM is taking care of cross-platform, garbage collection, threading, etc.
      • Terracotta takes clustering concern out to the JVM
    • OpenTerracotta
      • Clustered JVM semantics
      • OpenSource
        • terracotta.org
      • Network Attached Memory
        • Looks like RAM to the application
        • Runs both in JVM level (JVM plugin) and as separate process
          • Two level cache
    • JVM Level Simulation
      • JVM abstracts multi-platform concerns
      • It should also abstract multi-nodes concerns
        • Terracotta adds it to the JVM
      • Simulation of single JVM semantics:
        • Garbage collection
        • References
        • Threads synchronization
        • Object identity
    • OpenTerracotta
      • Bytecode instrumentation is used to mimic JVM behavior
      • Currently supports only Sun’s JVM
        • Support for IBM and JRockIt planned soon
      • Features
        • Low development impact - no in-advance clustering planning needed
        • Linear scalability
        • No APIs
        • Declarative - marking what is clustered
        • No serialization
    • OpenTerracotta Architecture
      • The Client Nodes - run on a standard JVM
        • Terracotta is installed to the JVM
      • The Terracotta Server Cluster - provides the clustering intelligence
        • Each server is a Java process
        • One Active Server
        • One or many Passive Servers
      • Shared Storage - share the state for the passive server(s)
      • Server/Client architecture considered by some as the drawback of Terracotta
    • Terracotta Client/Server
    • Terracotta Demo
    • Other Solutions
      • GridGain – map/reduce computation grid
        • gridgain.com
      • Hadoop – map/reduce Java implementation
        • lucene.apache.org/hadoop
      • Globus Toolkit - Open Grid Services Architecture RI
        • globus.org
    • Conclusion
      • Cache, Data grid, Compute grid or Clustered VM?
      • Open source or commercial?
      • API driven or API less?
      • Container or JAR?
      • Q&A