Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007

  • 1,048 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,048
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
30
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1.  
  • 2. Breaking the Clustering Limits Baruch Sadogursky Consultant, AlphaCSP
  • 3. Cluster at NASA
  • 4. Agenda
    • Clustering Definition
      • Why Clustering?
    • Evolution of Clustering in Java
    • Grids
    • Implementations
    • Other Solutions
  • 5. Clustering Definition
    • Group of tightly coupled computers
    • Work together closely
    • Viewed as single computer
    • Commonly connected through fast LANs
  • 6. Motivation
    • Deployed to improve
      • Scalability & load-balancing
        • Throughput (e.g. hits per second)
      • Fail-over
        • Availability (e.g. 99.999%)
      • Resource virtualization
    • Much more cost-effective than single computers
  • 7. Why Clustering?
    • Why not single machine?
      • Moore’s Law is dead
    • Why not adding CPUs?
      • Threads, locks and context switches are expensive
    • Why not via DB?
      • DB access sloooow
      • Single point of failure
        • Cluster DB?
  • 8. Accessing Data
    • According to the Long Tail theory, 20% of objects used 80% of the time
    • We need distributed access to those 20%
  • 9. Evolution of Clustering in Java
    • In the beginning there where dinosaurs application servers and J2EE programming model
    • Clustering aspect never made it to the Java EE spec
    • Proprietary solutions
  • 10. Classical Clustering
    • Replicate the state between the nodes
      • Provides stateful beans scalability
      • Provides entity beans caching
      • Provides HTTP session replication
    • Balance the load
      • Smart Java client
      • HTTP load-balancer
    • Central node manages the cluster topology
      • Slow detection of topology changes
      • New coordinator elected by voting (slow)
  • 11. Coordinating the Cluster
    • According to the Eight Fallacies of Distributed Computing:
      • The network is reliable
      • Topology doesn't change
    • According to real life
      • Communication fails
      • Nodes leave and join
    • Coordinator election in case of failure is expensive
  • 12. Scary, scary clustering
    • “Avoid broken mirrors, Friday the 13 th , multithreading and clustered stateful applications”
    • Poor implementations gave clustering a bad name
  • 13. Clustered Caches Drawbacks
    • Copying all the data across cluster can’t provide linear scalability
      • More nodes you have, more copying occurs
    • Topology communication slows the cluster down
    • Cache needs eviction policy to deal with stale data
  • 14. Clustered Caches Drawbacks
    • Operates only on simple and serializable types
    • Mutated objects have to be returned to the cache
    • Coarse-grained (whole object is replicated)
    • Can’t handle object graphs
      • Serialization issue
  • 15. Evolution of Clustering in Java
    • Spring, JBoss micro-container, Pico container and others brought the POJO to enterprise world
    • The rise of the POJO standardized the clustering services
    • Clustering market is on fire
  • 16. From Cache to Grid Computing
    • “Caches” are out, “Grids” are in…
    • So what is “Grid Computing”?
    • There is no technology called "Grid Computing“
  • 17. From Cache to Grid Computing
    • Definition of set of distributed computing use cases that have certain technical aspects in common
      • Data Grids
      • Computational Grids
      • On-Demand Grids
    • First two are relevant for Java Enterprise applications clustering
      • On-Demand Grid is about leasing computing time
  • 18. Grid Types
  • 19. Data Grids
  • 20. Data Grids
    • Split lots of data to subsets of data
    • Each node gets only subset of data it currently needs
    • Combine results from the different nodes
    • Also natural fail-over
      • State replication
  • 21. Computational Grids
  • 22. Computational Grids
    • Split long task into multiple sub-tasks
    • Execute each sub-task in parallel on a separate computer
    • Combine results from the sub-tasks
  • 23. Functional Languages and Grids
    • Functional languages considered the best tool for grid programming
    • Full statelessness
    • Isolated functions
      • Get all the needed data via parameters
    • Scala compiles to JVM bytecode
      • www.scala-lang.org
  • 24. Master/Worker
  • 25. Map/Reduce
  • 26. Map/Reduce Example
    • Input for mapping:
      • <data, “two witches watch two watches; which witch watch which watch?”>
    • Map output (and reduce input):
      • <two, 1>
      • <witch, 1>
      • <watch, 1>
      • <two, 2>
      • <watch, 2>
      • <which, 1>
      • <witch, 2>
      • <watch, 3>
      • <which, 2>
      • <watch, 4>
    • Reduce output:
      • <two, 2>
      • <witch, 2>
      • <watch, 4>
      • <which, 2>
  • 27. Map/Reduce Example
    • Both map() and reduce() can be easily distributed, since they are stateless
    • Google uses their implementation for analyzing the Internet
      • labs.google.com/papers/mapreduce.html
  • 28. Java ComputeGrid Vision
    • Sun spec for Service Oriented Architectures
      • www.jini.org
    • Released in 1998(!) and was totally ahead its time
      • Didn’t make to J2EE spec and was pretty abandoned
    • Basis for JavaSpaces
    • The concept is sending code over the wire
      • Pure Java
      • Code executed locally
        • No network exceptions during the execution
  • 29. Java ComputeGrid Vision
    • JavaSpaces - “Space” based technology
      • javaspaces.org/
    • “ Space” definition:
      • A place on the network to share and store objects
        • Both data and tasks
      • Associative shared memory for the network
      • Unifies storage and communications
  • 30. Implementations
  • 31. EHCache
  • 32. EHCache
    • OpenSource
      • ehcache.sourceforge.net
    • Fast
      • In-process caching
      • Asynchronous replication
    • Small
      • 110KB
    • Simple
    • RMI communication
    • Map based API
      • Inc. JCache (JSR 107) implementation
        • Never released
  • 33. EHCache Example
  • 34. GlassFish Shoal
  • 35. GlassFish Shoal
    • Backbone for GlassFish AS clustering
    • Open Source at dev.java.net
    • Can be used standalone
    • Group Management Service (GMS) centric
    • GMS Themes
      • Group Sensory-Action Theme
        • Lifecycle notifications
      • Group Communication Theme
        • Group communications provider SPI
          • JXTA - default
          • Can plugin JGroups insteadGroup communications API
        • Send and receive plain messages
      • Shared or Distributed Storage Theme
        • Map implementation
        • Concurrent
  • 36. JXTA Usage in Shoal
  • 37. Oracle Tangosol Coherence
  • 38. Oracle Tangosol Coherence
    • DataGrid
    • Fast!
    • Planned as clustering backbone for Oracle AS
    • Can be used standalone
    • Commercial
    • Oracle product now
    • Single JAR
  • 39. Coherence Data Grid
  • 40. Oracle Tangosol Coherence
    • “ Organic cluster” – all the nodes are equal
    • Partitioned Topology
      • Every node holds subset of data
        • Replicated for fail-over
    • Replicated Topology
      • Behaves like cache
        • Every node holds all the data
    • Fast elimination (no voting)
  • 41. Oracle Tangosol Coherence
    • Supports queries and indices
    • Map interface implementation
    • Lifecycle listeners
    • Drawbacks
      • Usual cache drawbacks
      • Closed source
      • Costly
  • 42. JBoss POJO Cache
  • 43. JBoss POJO Cache
    • Subproject of JBossCache
      • Clustering backbone of JBoss AS
    • OpenSource at JBoss labs
      • http://labs.jboss.com/jbosscache
    • Transactional
    • Bytecode instrumented POJOs
    • Don’t have to be serializable
  • 44. JBoss POJO Cache
    • Fine-grained replication
    • Graphs are allowed
    • Changes detection
    • POJOs need to be annotated and attached to the cache
    • Tree implementation
    • JGroups communication
  • 45. JBoss POJO Cache Usage
  • 46. JGroups Configuration
  • 47. GigaSpaces
  • 48. GigaSpaces
    • JavaSpaces implementation
    • gigaspaces.com
    • OpenSpaces
      • JavaSpaces implementation
      • Spring configuration
      • OpenSource
    • Enterprise DataGrid
      • Map interface
      • Queries
      • Lifecycle listeners
      • Etc.
      • Commercial
  • 49. GigaSpaces XAP
    • XAP – eXtreme Application Platform
      • Kind of application server
      • Processing Units have strongly defined directory structure (like container)
      • Total solution
    • Relies on “OpenSpaces”
    • Commercial
      • Start-ups special free license
  • 50. GigaSpaces XAP
  • 51. OpenTerracotta
  • 52. OpenTerracotta
    • JVM is taking care of cross-platform, garbage collection, threading, etc.
    • Terracotta takes clustering concern out to the JVM
  • 53. OpenTerracotta
    • Clustered JVM semantics
    • OpenSource
      • terracotta.org
    • Network Attached Memory
      • Looks like RAM to the application
      • Runs both in JVM level (JVM plugin) and as separate process
        • Two level cache
  • 54. JVM Level Simulation
    • JVM abstracts multi-platform concerns
    • It should also abstract multi-nodes concerns
      • Terracotta adds it to the JVM
    • Simulation of single JVM semantics:
      • Garbage collection
      • References
      • Threads synchronization
      • Object identity
  • 55. OpenTerracotta
    • Bytecode instrumentation is used to mimic JVM behavior
    • Currently supports only Sun’s JVM
      • Support for IBM and JRockIt planned soon
    • Features
      • Low development impact - no in-advance clustering planning needed
      • Linear scalability
      • No APIs
      • Declarative - marking what is clustered
      • No serialization
  • 56. OpenTerracotta Architecture
    • The Client Nodes - run on a standard JVM
      • Terracotta is installed to the JVM
    • The Terracotta Server Cluster - provides the clustering intelligence
      • Each server is a Java process
      • One Active Server
      • One or many Passive Servers
    • Shared Storage - share the state for the passive server(s)
    • Server/Client architecture considered by some as the drawback of Terracotta
  • 57. Terracotta Client/Server
  • 58. Terracotta Demo
  • 59. Other Solutions
    • GridGain – map/reduce computation grid
      • gridgain.com
    • Hadoop – map/reduce Java implementation
      • lucene.apache.org/hadoop
    • Globus Toolkit - Open Grid Services Architecture RI
      • globus.org
  • 60. Conclusion
    • Cache, Data grid, Compute grid or Clustered VM?
    • Open source or commercial?
    • API driven or API less?
    • Container or JAR?
  • 61.
    • Q&A