Your SlideShare is downloading. ×
  • Like
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007

  • 1,069 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,069
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
30
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1.  
  • 2. Breaking the Clustering Limits Baruch Sadogursky Consultant, AlphaCSP
  • 3. Cluster at NASA
  • 4. Agenda
    • Clustering Definition
      • Why Clustering?
    • Evolution of Clustering in Java
    • Grids
    • Implementations
    • Other Solutions
  • 5. Clustering Definition
    • Group of tightly coupled computers
    • Work together closely
    • Viewed as single computer
    • Commonly connected through fast LANs
  • 6. Motivation
    • Deployed to improve
      • Scalability & load-balancing
        • Throughput (e.g. hits per second)
      • Fail-over
        • Availability (e.g. 99.999%)
      • Resource virtualization
    • Much more cost-effective than single computers
  • 7. Why Clustering?
    • Why not single machine?
      • Moore’s Law is dead
    • Why not adding CPUs?
      • Threads, locks and context switches are expensive
    • Why not via DB?
      • DB access sloooow
      • Single point of failure
        • Cluster DB?
  • 8. Accessing Data
    • According to the Long Tail theory, 20% of objects used 80% of the time
    • We need distributed access to those 20%
  • 9. Evolution of Clustering in Java
    • In the beginning there where dinosaurs application servers and J2EE programming model
    • Clustering aspect never made it to the Java EE spec
    • Proprietary solutions
  • 10. Classical Clustering
    • Replicate the state between the nodes
      • Provides stateful beans scalability
      • Provides entity beans caching
      • Provides HTTP session replication
    • Balance the load
      • Smart Java client
      • HTTP load-balancer
    • Central node manages the cluster topology
      • Slow detection of topology changes
      • New coordinator elected by voting (slow)
  • 11. Coordinating the Cluster
    • According to the Eight Fallacies of Distributed Computing:
      • The network is reliable
      • Topology doesn't change
    • According to real life
      • Communication fails
      • Nodes leave and join
    • Coordinator election in case of failure is expensive
  • 12. Scary, scary clustering
    • “Avoid broken mirrors, Friday the 13 th , multithreading and clustered stateful applications”
    • Poor implementations gave clustering a bad name
  • 13. Clustered Caches Drawbacks
    • Copying all the data across cluster can’t provide linear scalability
      • More nodes you have, more copying occurs
    • Topology communication slows the cluster down
    • Cache needs eviction policy to deal with stale data
  • 14. Clustered Caches Drawbacks
    • Operates only on simple and serializable types
    • Mutated objects have to be returned to the cache
    • Coarse-grained (whole object is replicated)
    • Can’t handle object graphs
      • Serialization issue
  • 15. Evolution of Clustering in Java
    • Spring, JBoss micro-container, Pico container and others brought the POJO to enterprise world
    • The rise of the POJO standardized the clustering services
    • Clustering market is on fire
  • 16. From Cache to Grid Computing
    • “Caches” are out, “Grids” are in…
    • So what is “Grid Computing”?
    • There is no technology called "Grid Computing“
  • 17. From Cache to Grid Computing
    • Definition of set of distributed computing use cases that have certain technical aspects in common
      • Data Grids
      • Computational Grids
      • On-Demand Grids
    • First two are relevant for Java Enterprise applications clustering
      • On-Demand Grid is about leasing computing time
  • 18. Grid Types
  • 19. Data Grids
  • 20. Data Grids
    • Split lots of data to subsets of data
    • Each node gets only subset of data it currently needs
    • Combine results from the different nodes
    • Also natural fail-over
      • State replication
  • 21. Computational Grids
  • 22. Computational Grids
    • Split long task into multiple sub-tasks
    • Execute each sub-task in parallel on a separate computer
    • Combine results from the sub-tasks
  • 23. Functional Languages and Grids
    • Functional languages considered the best tool for grid programming
    • Full statelessness
    • Isolated functions
      • Get all the needed data via parameters
    • Scala compiles to JVM bytecode
      • www.scala-lang.org
  • 24. Master/Worker
  • 25. Map/Reduce
  • 26. Map/Reduce Example
    • Input for mapping:
      • <data, “two witches watch two watches; which witch watch which watch?”>
    • Map output (and reduce input):
      • <two, 1>
      • <witch, 1>
      • <watch, 1>
      • <two, 2>
      • <watch, 2>
      • <which, 1>
      • <witch, 2>
      • <watch, 3>
      • <which, 2>
      • <watch, 4>
    • Reduce output:
      • <two, 2>
      • <witch, 2>
      • <watch, 4>
      • <which, 2>
  • 27. Map/Reduce Example
    • Both map() and reduce() can be easily distributed, since they are stateless
    • Google uses their implementation for analyzing the Internet
      • labs.google.com/papers/mapreduce.html
  • 28. Java ComputeGrid Vision
    • Sun spec for Service Oriented Architectures
      • www.jini.org
    • Released in 1998(!) and was totally ahead its time
      • Didn’t make to J2EE spec and was pretty abandoned
    • Basis for JavaSpaces
    • The concept is sending code over the wire
      • Pure Java
      • Code executed locally
        • No network exceptions during the execution
  • 29. Java ComputeGrid Vision
    • JavaSpaces - “Space” based technology
      • javaspaces.org/
    • “ Space” definition:
      • A place on the network to share and store objects
        • Both data and tasks
      • Associative shared memory for the network
      • Unifies storage and communications
  • 30. Implementations
  • 31. EHCache
  • 32. EHCache
    • OpenSource
      • ehcache.sourceforge.net
    • Fast
      • In-process caching
      • Asynchronous replication
    • Small
      • 110KB
    • Simple
    • RMI communication
    • Map based API
      • Inc. JCache (JSR 107) implementation
        • Never released
  • 33. EHCache Example
  • 34. GlassFish Shoal
  • 35. GlassFish Shoal
    • Backbone for GlassFish AS clustering
    • Open Source at dev.java.net
    • Can be used standalone
    • Group Management Service (GMS) centric
    • GMS Themes
      • Group Sensory-Action Theme
        • Lifecycle notifications
      • Group Communication Theme
        • Group communications provider SPI
          • JXTA - default
          • Can plugin JGroups insteadGroup communications API
        • Send and receive plain messages
      • Shared or Distributed Storage Theme
        • Map implementation
        • Concurrent
  • 36. JXTA Usage in Shoal
  • 37. Oracle Tangosol Coherence
  • 38. Oracle Tangosol Coherence
    • DataGrid
    • Fast!
    • Planned as clustering backbone for Oracle AS
    • Can be used standalone
    • Commercial
    • Oracle product now
    • Single JAR
  • 39. Coherence Data Grid
  • 40. Oracle Tangosol Coherence
    • “ Organic cluster” – all the nodes are equal
    • Partitioned Topology
      • Every node holds subset of data
        • Replicated for fail-over
    • Replicated Topology
      • Behaves like cache
        • Every node holds all the data
    • Fast elimination (no voting)
  • 41. Oracle Tangosol Coherence
    • Supports queries and indices
    • Map interface implementation
    • Lifecycle listeners
    • Drawbacks
      • Usual cache drawbacks
      • Closed source
      • Costly
  • 42. JBoss POJO Cache
  • 43. JBoss POJO Cache
    • Subproject of JBossCache
      • Clustering backbone of JBoss AS
    • OpenSource at JBoss labs
      • http://labs.jboss.com/jbosscache
    • Transactional
    • Bytecode instrumented POJOs
    • Don’t have to be serializable
  • 44. JBoss POJO Cache
    • Fine-grained replication
    • Graphs are allowed
    • Changes detection
    • POJOs need to be annotated and attached to the cache
    • Tree implementation
    • JGroups communication
  • 45. JBoss POJO Cache Usage
  • 46. JGroups Configuration
  • 47. GigaSpaces
  • 48. GigaSpaces
    • JavaSpaces implementation
    • gigaspaces.com
    • OpenSpaces
      • JavaSpaces implementation
      • Spring configuration
      • OpenSource
    • Enterprise DataGrid
      • Map interface
      • Queries
      • Lifecycle listeners
      • Etc.
      • Commercial
  • 49. GigaSpaces XAP
    • XAP – eXtreme Application Platform
      • Kind of application server
      • Processing Units have strongly defined directory structure (like container)
      • Total solution
    • Relies on “OpenSpaces”
    • Commercial
      • Start-ups special free license
  • 50. GigaSpaces XAP
  • 51. OpenTerracotta
  • 52. OpenTerracotta
    • JVM is taking care of cross-platform, garbage collection, threading, etc.
    • Terracotta takes clustering concern out to the JVM
  • 53. OpenTerracotta
    • Clustered JVM semantics
    • OpenSource
      • terracotta.org
    • Network Attached Memory
      • Looks like RAM to the application
      • Runs both in JVM level (JVM plugin) and as separate process
        • Two level cache
  • 54. JVM Level Simulation
    • JVM abstracts multi-platform concerns
    • It should also abstract multi-nodes concerns
      • Terracotta adds it to the JVM
    • Simulation of single JVM semantics:
      • Garbage collection
      • References
      • Threads synchronization
      • Object identity
  • 55. OpenTerracotta
    • Bytecode instrumentation is used to mimic JVM behavior
    • Currently supports only Sun’s JVM
      • Support for IBM and JRockIt planned soon
    • Features
      • Low development impact - no in-advance clustering planning needed
      • Linear scalability
      • No APIs
      • Declarative - marking what is clustered
      • No serialization
  • 56. OpenTerracotta Architecture
    • The Client Nodes - run on a standard JVM
      • Terracotta is installed to the JVM
    • The Terracotta Server Cluster - provides the clustering intelligence
      • Each server is a Java process
      • One Active Server
      • One or many Passive Servers
    • Shared Storage - share the state for the passive server(s)
    • Server/Client architecture considered by some as the drawback of Terracotta
  • 57. Terracotta Client/Server
  • 58. Terracotta Demo
  • 59. Other Solutions
    • GridGain – map/reduce computation grid
      • gridgain.com
    • Hadoop – map/reduce Java implementation
      • lucene.apache.org/hadoop
    • Globus Toolkit - Open Grid Services Architecture RI
      • globus.org
  • 60. Conclusion
    • Cache, Data grid, Compute grid or Clustered VM?
    • Open source or commercial?
    • API driven or API less?
    • Container or JAR?
  • 61.
    • Q&A