Your SlideShare is downloading. ×
0
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Breaking The Clustering Limits @ AlphaCSP JavaEdge 2007

1,116

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,116
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1.  
  • 2. Breaking the Clustering Limits Baruch Sadogursky Consultant, AlphaCSP
  • 3. Cluster at NASA
  • 4. Agenda <ul><li>Clustering Definition </li></ul><ul><ul><li>Why Clustering? </li></ul></ul><ul><li>Evolution of Clustering in Java </li></ul><ul><li>Grids </li></ul><ul><li>Implementations </li></ul><ul><li>Other Solutions </li></ul>
  • 5. Clustering Definition <ul><li>Group of tightly coupled computers </li></ul><ul><li>Work together closely </li></ul><ul><li>Viewed as single computer </li></ul><ul><li>Commonly connected through fast LANs </li></ul>
  • 6. Motivation <ul><li>Deployed to improve </li></ul><ul><ul><li>Scalability &amp; load-balancing </li></ul></ul><ul><ul><ul><li>Throughput (e.g. hits per second) </li></ul></ul></ul><ul><ul><li>Fail-over </li></ul></ul><ul><ul><ul><li>Availability (e.g. 99.999%) </li></ul></ul></ul><ul><ul><li>Resource virtualization </li></ul></ul><ul><li>Much more cost-effective than single computers </li></ul>
  • 7. Why Clustering? <ul><li>Why not single machine? </li></ul><ul><ul><li>Moore’s Law is dead </li></ul></ul><ul><li>Why not adding CPUs? </li></ul><ul><ul><li>Threads, locks and context switches are expensive </li></ul></ul><ul><li>Why not via DB? </li></ul><ul><ul><li>DB access sloooow </li></ul></ul><ul><ul><li>Single point of failure </li></ul></ul><ul><ul><ul><li>Cluster DB? </li></ul></ul></ul>
  • 8. Accessing Data <ul><li>According to the Long Tail theory, 20% of objects used 80% of the time </li></ul><ul><li>We need distributed access to those 20% </li></ul>
  • 9. Evolution of Clustering in Java <ul><li>In the beginning there where dinosaurs application servers and J2EE programming model </li></ul><ul><li>Clustering aspect never made it to the Java EE spec </li></ul><ul><li>Proprietary solutions </li></ul>
  • 10. Classical Clustering <ul><li>Replicate the state between the nodes </li></ul><ul><ul><li>Provides stateful beans scalability </li></ul></ul><ul><ul><li>Provides entity beans caching </li></ul></ul><ul><ul><li>Provides HTTP session replication </li></ul></ul><ul><li>Balance the load </li></ul><ul><ul><li>Smart Java client </li></ul></ul><ul><ul><li>HTTP load-balancer </li></ul></ul><ul><li>Central node manages the cluster topology </li></ul><ul><ul><li>Slow detection of topology changes </li></ul></ul><ul><ul><li>New coordinator elected by voting (slow) </li></ul></ul>
  • 11. Coordinating the Cluster <ul><li>According to the Eight Fallacies of Distributed Computing: </li></ul><ul><ul><li>The network is reliable </li></ul></ul><ul><ul><li>Topology doesn&apos;t change </li></ul></ul><ul><li>According to real life </li></ul><ul><ul><li>Communication fails </li></ul></ul><ul><ul><li>Nodes leave and join </li></ul></ul><ul><li>Coordinator election in case of failure is expensive </li></ul>
  • 12. Scary, scary clustering <ul><li>“Avoid broken mirrors, Friday the 13 th , multithreading and clustered stateful applications” </li></ul><ul><li>Poor implementations gave clustering a bad name </li></ul>
  • 13. Clustered Caches Drawbacks <ul><li>Copying all the data across cluster can’t provide linear scalability </li></ul><ul><ul><li>More nodes you have, more copying occurs </li></ul></ul><ul><li>Topology communication slows the cluster down </li></ul><ul><li>Cache needs eviction policy to deal with stale data </li></ul>
  • 14. Clustered Caches Drawbacks <ul><li>Operates only on simple and serializable types </li></ul><ul><li>Mutated objects have to be returned to the cache </li></ul><ul><li>Coarse-grained (whole object is replicated) </li></ul><ul><li>Can’t handle object graphs </li></ul><ul><ul><li>Serialization issue </li></ul></ul>
  • 15. Evolution of Clustering in Java <ul><li>Spring, JBoss micro-container, Pico container and others brought the POJO to enterprise world </li></ul><ul><li>The rise of the POJO standardized the clustering services </li></ul><ul><li>Clustering market is on fire </li></ul>
  • 16. From Cache to Grid Computing <ul><li>“Caches” are out, “Grids” are in… </li></ul><ul><li>So what is “Grid Computing”? </li></ul><ul><li>There is no technology called &amp;quot;Grid Computing“ </li></ul>
  • 17. From Cache to Grid Computing <ul><li>Definition of set of distributed computing use cases that have certain technical aspects in common </li></ul><ul><ul><li>Data Grids </li></ul></ul><ul><ul><li>Computational Grids </li></ul></ul><ul><ul><li>On-Demand Grids </li></ul></ul><ul><li>First two are relevant for Java Enterprise applications clustering </li></ul><ul><ul><li>On-Demand Grid is about leasing computing time </li></ul></ul>
  • 18. Grid Types
  • 19. Data Grids
  • 20. Data Grids <ul><li>Split lots of data to subsets of data </li></ul><ul><li>Each node gets only subset of data it currently needs </li></ul><ul><li>Combine results from the different nodes </li></ul><ul><li>Also natural fail-over </li></ul><ul><ul><li>State replication </li></ul></ul>
  • 21. Computational Grids
  • 22. Computational Grids <ul><li>Split long task into multiple sub-tasks </li></ul><ul><li>Execute each sub-task in parallel on a separate computer </li></ul><ul><li>Combine results from the sub-tasks </li></ul>
  • 23. Functional Languages and Grids <ul><li>Functional languages considered the best tool for grid programming </li></ul><ul><li>Full statelessness </li></ul><ul><li>Isolated functions </li></ul><ul><ul><li>Get all the needed data via parameters </li></ul></ul><ul><li>Scala compiles to JVM bytecode </li></ul><ul><ul><li>www.scala-lang.org </li></ul></ul>
  • 24. Master/Worker
  • 25. Map/Reduce
  • 26. Map/Reduce Example <ul><li>Input for mapping: </li></ul><ul><ul><li>&lt;data, “two witches watch two watches; which witch watch which watch?”&gt; </li></ul></ul><ul><li>Map output (and reduce input): </li></ul><ul><ul><li>&lt;two, 1&gt; </li></ul></ul><ul><ul><li>&lt;witch, 1&gt; </li></ul></ul><ul><ul><li>&lt;watch, 1&gt; </li></ul></ul><ul><ul><li>&lt;two, 2&gt; </li></ul></ul><ul><ul><li>&lt;watch, 2&gt; </li></ul></ul><ul><ul><li>&lt;which, 1&gt; </li></ul></ul><ul><ul><li>&lt;witch, 2&gt; </li></ul></ul><ul><ul><li>&lt;watch, 3&gt; </li></ul></ul><ul><ul><li>&lt;which, 2&gt; </li></ul></ul><ul><ul><li>&lt;watch, 4&gt; </li></ul></ul><ul><li>Reduce output: </li></ul><ul><ul><li>&lt;two, 2&gt; </li></ul></ul><ul><ul><li>&lt;witch, 2&gt; </li></ul></ul><ul><ul><li>&lt;watch, 4&gt; </li></ul></ul><ul><ul><li>&lt;which, 2&gt; </li></ul></ul>
  • 27. Map/Reduce Example <ul><li>Both map() and reduce() can be easily distributed, since they are stateless </li></ul><ul><li>Google uses their implementation for analyzing the Internet </li></ul><ul><ul><li>labs.google.com/papers/mapreduce.html </li></ul></ul>
  • 28. Java ComputeGrid Vision <ul><li>Sun spec for Service Oriented Architectures </li></ul><ul><ul><li>www.jini.org </li></ul></ul><ul><li>Released in 1998(!) and was totally ahead its time </li></ul><ul><ul><li>Didn’t make to J2EE spec and was pretty abandoned </li></ul></ul><ul><li>Basis for JavaSpaces </li></ul><ul><li>The concept is sending code over the wire </li></ul><ul><ul><li>Pure Java </li></ul></ul><ul><ul><li>Code executed locally </li></ul></ul><ul><ul><ul><li>No network exceptions during the execution </li></ul></ul></ul>
  • 29. Java ComputeGrid Vision <ul><li>JavaSpaces - “Space” based technology </li></ul><ul><ul><li>javaspaces.org/ </li></ul></ul><ul><li>“ Space” definition: </li></ul><ul><ul><li>A place on the network to share and store objects </li></ul></ul><ul><ul><ul><li>Both data and tasks </li></ul></ul></ul><ul><ul><li>Associative shared memory for the network </li></ul></ul><ul><ul><li>Unifies storage and communications </li></ul></ul>
  • 30. Implementations
  • 31. EHCache
  • 32. EHCache <ul><li>OpenSource </li></ul><ul><ul><li>ehcache.sourceforge.net </li></ul></ul><ul><li>Fast </li></ul><ul><ul><li>In-process caching </li></ul></ul><ul><ul><li>Asynchronous replication </li></ul></ul><ul><li>Small </li></ul><ul><ul><li>110KB </li></ul></ul><ul><li>Simple </li></ul><ul><li>RMI communication </li></ul><ul><li>Map based API </li></ul><ul><ul><li>Inc. JCache (JSR 107) implementation </li></ul></ul><ul><ul><ul><li>Never released </li></ul></ul></ul>
  • 33. EHCache Example
  • 34. GlassFish Shoal
  • 35. GlassFish Shoal <ul><li>Backbone for GlassFish AS clustering </li></ul><ul><li>Open Source at dev.java.net </li></ul><ul><li>Can be used standalone </li></ul><ul><li>Group Management Service (GMS) centric </li></ul><ul><li>GMS Themes </li></ul><ul><ul><li>Group Sensory-Action Theme </li></ul></ul><ul><ul><ul><li>Lifecycle notifications </li></ul></ul></ul><ul><ul><li>Group Communication Theme </li></ul></ul><ul><ul><ul><li>Group communications provider SPI </li></ul></ul></ul><ul><ul><ul><ul><li>JXTA - default </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Can plugin JGroups insteadGroup communications API </li></ul></ul></ul></ul><ul><ul><ul><li>Send and receive plain messages </li></ul></ul></ul><ul><ul><li>Shared or Distributed Storage Theme </li></ul></ul><ul><ul><ul><li>Map implementation </li></ul></ul></ul><ul><ul><ul><li>Concurrent </li></ul></ul></ul>
  • 36. JXTA Usage in Shoal
  • 37. Oracle Tangosol Coherence
  • 38. Oracle Tangosol Coherence <ul><li>DataGrid </li></ul><ul><li>Fast! </li></ul><ul><li>Planned as clustering backbone for Oracle AS </li></ul><ul><li>Can be used standalone </li></ul><ul><li>Commercial </li></ul><ul><li>Oracle product now </li></ul><ul><li>Single JAR </li></ul>
  • 39. Coherence Data Grid
  • 40. Oracle Tangosol Coherence <ul><li>“ Organic cluster” – all the nodes are equal </li></ul><ul><li>Partitioned Topology </li></ul><ul><ul><li>Every node holds subset of data </li></ul></ul><ul><ul><ul><li>Replicated for fail-over </li></ul></ul></ul><ul><li>Replicated Topology </li></ul><ul><ul><li>Behaves like cache </li></ul></ul><ul><ul><ul><li>Every node holds all the data </li></ul></ul></ul><ul><li>Fast elimination (no voting) </li></ul>
  • 41. Oracle Tangosol Coherence <ul><li>Supports queries and indices </li></ul><ul><li>Map interface implementation </li></ul><ul><li>Lifecycle listeners </li></ul><ul><li>Drawbacks </li></ul><ul><ul><li>Usual cache drawbacks </li></ul></ul><ul><ul><li>Closed source </li></ul></ul><ul><ul><li>Costly </li></ul></ul>
  • 42. JBoss POJO Cache
  • 43. JBoss POJO Cache <ul><li>Subproject of JBossCache </li></ul><ul><ul><li>Clustering backbone of JBoss AS </li></ul></ul><ul><li>OpenSource at JBoss labs </li></ul><ul><ul><li>http://labs.jboss.com/jbosscache </li></ul></ul><ul><li>Transactional </li></ul><ul><li>Bytecode instrumented POJOs </li></ul><ul><li>Don’t have to be serializable </li></ul>
  • 44. JBoss POJO Cache <ul><li>Fine-grained replication </li></ul><ul><li>Graphs are allowed </li></ul><ul><li>Changes detection </li></ul><ul><li>POJOs need to be annotated and attached to the cache </li></ul><ul><li>Tree implementation </li></ul><ul><li>JGroups communication </li></ul>
  • 45. JBoss POJO Cache Usage
  • 46. JGroups Configuration
  • 47. GigaSpaces
  • 48. GigaSpaces <ul><li>JavaSpaces implementation </li></ul><ul><li>gigaspaces.com </li></ul><ul><li>OpenSpaces </li></ul><ul><ul><li>JavaSpaces implementation </li></ul></ul><ul><ul><li>Spring configuration </li></ul></ul><ul><ul><li>OpenSource </li></ul></ul><ul><li>Enterprise DataGrid </li></ul><ul><ul><li>Map interface </li></ul></ul><ul><ul><li>Queries </li></ul></ul><ul><ul><li>Lifecycle listeners </li></ul></ul><ul><ul><li>Etc. </li></ul></ul><ul><ul><li>Commercial </li></ul></ul>
  • 49. GigaSpaces XAP <ul><li>XAP – eXtreme Application Platform </li></ul><ul><ul><li>Kind of application server </li></ul></ul><ul><ul><li>Processing Units have strongly defined directory structure (like container) </li></ul></ul><ul><ul><li>Total solution </li></ul></ul><ul><li>Relies on “OpenSpaces” </li></ul><ul><li>Commercial </li></ul><ul><ul><li>Start-ups special free license </li></ul></ul>
  • 50. GigaSpaces XAP
  • 51. OpenTerracotta
  • 52. OpenTerracotta <ul><li>JVM is taking care of cross-platform, garbage collection, threading, etc. </li></ul><ul><li>Terracotta takes clustering concern out to the JVM </li></ul>
  • 53. OpenTerracotta <ul><li>Clustered JVM semantics </li></ul><ul><li>OpenSource </li></ul><ul><ul><li>terracotta.org </li></ul></ul><ul><li>Network Attached Memory </li></ul><ul><ul><li>Looks like RAM to the application </li></ul></ul><ul><ul><li>Runs both in JVM level (JVM plugin) and as separate process </li></ul></ul><ul><ul><ul><li>Two level cache </li></ul></ul></ul>
  • 54. JVM Level Simulation <ul><li>JVM abstracts multi-platform concerns </li></ul><ul><li>It should also abstract multi-nodes concerns </li></ul><ul><ul><li>Terracotta adds it to the JVM </li></ul></ul><ul><li>Simulation of single JVM semantics: </li></ul><ul><ul><li>Garbage collection </li></ul></ul><ul><ul><li>References </li></ul></ul><ul><ul><li>Threads synchronization </li></ul></ul><ul><ul><li>Object identity </li></ul></ul>
  • 55. OpenTerracotta <ul><li>Bytecode instrumentation is used to mimic JVM behavior </li></ul><ul><li>Currently supports only Sun’s JVM </li></ul><ul><ul><li>Support for IBM and JRockIt planned soon </li></ul></ul><ul><li>Features </li></ul><ul><ul><li>Low development impact - no in-advance clustering planning needed </li></ul></ul><ul><ul><li>Linear scalability </li></ul></ul><ul><ul><li>No APIs </li></ul></ul><ul><ul><li>Declarative - marking what is clustered </li></ul></ul><ul><ul><li>No serialization </li></ul></ul>
  • 56. OpenTerracotta Architecture <ul><li>The Client Nodes - run on a standard JVM </li></ul><ul><ul><li>Terracotta is installed to the JVM </li></ul></ul><ul><li>The Terracotta Server Cluster - provides the clustering intelligence </li></ul><ul><ul><li>Each server is a Java process </li></ul></ul><ul><ul><li>One Active Server </li></ul></ul><ul><ul><li>One or many Passive Servers </li></ul></ul><ul><li>Shared Storage - share the state for the passive server(s) </li></ul><ul><li>Server/Client architecture considered by some as the drawback of Terracotta </li></ul>
  • 57. Terracotta Client/Server
  • 58. Terracotta Demo
  • 59. Other Solutions <ul><li>GridGain – map/reduce computation grid </li></ul><ul><ul><li>gridgain.com </li></ul></ul><ul><li>Hadoop – map/reduce Java implementation </li></ul><ul><ul><li>lucene.apache.org/hadoop </li></ul></ul><ul><li>Globus Toolkit - Open Grid Services Architecture RI </li></ul><ul><ul><li>globus.org </li></ul></ul>
  • 60. Conclusion <ul><li>Cache, Data grid, Compute grid or Clustered VM? </li></ul><ul><li>Open source or commercial? </li></ul><ul><li>API driven or API less? </li></ul><ul><li>Container or JAR? </li></ul>
  • 61. <ul><li>Q&amp;A </li></ul>

×