Waters Grid & HPC Course

1,345 views

Published on

This is the course that was presented by James Liddle and Adam Vile for Waters in September 2008.

The book of this course can be found at: http://www.lulu.com/content/4334860

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,345
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Waters Grid & HPC Course

    1. 1. Data cache Dealing with compromise <ul><ul><li>Dr Adam Vile </li></ul></ul><ul><ul><li>Head of Grid, HPC and Technical Computing </li></ul></ul>http://www.excelian.com <ul><ul><li>Jim Liddle </li></ul></ul><ul><ul><li>CEO Jana Technology Services </li></ul></ul><ul><ul><li>Dr Adam Vile </li></ul></ul><ul><ul><li>Head of Grid, HPC and Technical Consulting </li></ul></ul><ul><ul><li>Excelian </li></ul></ul>
    2. 2. Agenda <ul><li>We are running 2 connected sessions about data grid today </li></ul><ul><li>Introductions </li></ul><ul><li>Brainstorm: Objectives </li></ul><ul><li>Session 1 – Moving data on the grid </li></ul><ul><li>Break </li></ul><ul><li>Session 2 – Building a Data Grid </li></ul><ul><li>Summary and wrap up </li></ul>http://www.excelian.com
    3. 3. Who are we, and why are we here? Introductions http://www.excelian.com
    4. 4. Possible Objectives <ul><li>Some suggested objectives </li></ul><ul><ul><li>be aware of the variety of approaches to moving data around a large distributed system </li></ul></ul><ul><ul><li>understand limitations and benefits of these approaches </li></ul></ul><ul><ul><li>Understand the different data cache topologies and replication strategies </li></ul></ul><ul><ul><li>know the compromises that must be made in combining scalability, low latency and data movement on a grid </li></ul></ul><ul><ul><li>understand, at a high level, which architectures and topologies are appropriate for each problem </li></ul></ul><ul><ul><li>Understand the Data centre requirements to meet growth in grids in relation to data </li></ul></ul><ul><ul><li>Have a view on Utility computing and its applicability for performance and efficiency in compute and data grid </li></ul></ul><ul><li>We are flexible, and so lets focus on some of the things you atre interested in </li></ul>http://www.excelian.com
    5. 5. Session 1 – Moving Data on the grid Agenda <ul><li>Session 1 – Moving data on the grid </li></ul><ul><li>Presentation: Approaches to data movement on the grid </li></ul><ul><li>Brainstorm: Data storage and movement use cases </li></ul><ul><li>Break </li></ul><ul><li>Presentation: Data cache topologies - issues for scaling data </li></ul><ul><li>Brainstorm: Data cache: scenarios, use cases and applications </li></ul><ul><li>Summary </li></ul>http://www.excelian.com
    6. 6. Brainstorm Objectives http://www.excelian.com
    7. 7. Presentation Approaches to data movement on the grid http://www.excelian.com
    8. 8. Compute grid - Where are we ? <ul><li>(Compute) Grid has addressed a set of needs of the finance industry: </li></ul><ul><ul><li>More (and more) resource </li></ul></ul><ul><ul><li>Scalability </li></ul></ul><ul><ul><li>Robustness </li></ul></ul><ul><ul><li>Higher utilisation </li></ul></ul><ul><ul><li>Control of the hardware cost base </li></ul></ul><ul><li>Putting this in context – Grid has enabled business: </li></ul><ul><ul><li>Pricing and risking of more complex instruments </li></ul></ul><ul><ul><li>Pricing and risking of more instruments </li></ul></ul><ul><ul><li>Making completion or risk runs overnight and T0 P&L a reality </li></ul></ul><ul><ul><li>Keeping up with increased volumes </li></ul></ul><ul><li>There are a set of new issues to address </li></ul><ul><ul><li>Scalability is not unlimited (cf. Ahmdal's law) </li></ul></ul><ul><ul><ul><li>as the grid gets wider, data movement becomes a problem </li></ul></ul></ul><ul><ul><li>Low latency requirements cannot be satisfied by grid </li></ul></ul>
    9. 9. Scaling the compute grid <ul><li>Compute problems in finance are embarrassingly distributed </li></ul><ul><li>To achieve maximum scalability the compute time must outweigh any grid overhead, which is made up of: </li></ul><ul><ul><li>Resource allocation </li></ul></ul><ul><ul><li>Task transfer time </li></ul></ul><ul><ul><li>Task start-up time </li></ul></ul><ul><ul><li>Data transfer time </li></ul></ul><ul><li>To make grid work effectively communication must be kept to a minimum. Hence tasks should be: </li></ul><ul><ul><li>Independent </li></ul></ul><ul><ul><ul><li>data is not shared between tasks </li></ul></ul></ul><ul><ul><li>Stateless </li></ul></ul><ul><ul><ul><li>data is not persistent on compute engines </li></ul></ul></ul><ul><ul><li>While desirable, this is not always possible </li></ul></ul><ul><li>The problem of scalability becomes a problem of how to get the data to the right place, at the right time </li></ul>
    10. 10. Getting the data to the right place at the right time <ul><li>The key reducing job turnaround time is to ensure that data and compute have a good locality of reference </li></ul><ul><li>Data movement patterns for grid </li></ul><ul><ul><li>Move the data with the compute </li></ul></ul><ul><ul><ul><li>Data and compute task are packaged together in the client and then distributed by the grid. </li></ul></ul></ul><ul><ul><ul><li>This is fine for small grids and small data packets (KB to low MB's) </li></ul></ul></ul><ul><ul><li>Move the data and the compute to the same place at the same time </li></ul></ul><ul><ul><ul><li>This is difficult to achieve currently as it requires communication between grid and data delivery vendor software </li></ul></ul></ul><ul><ul><li>Move the data to where the compute is </li></ul></ul><ul><ul><ul><li>Fine for smaller size data packets </li></ul></ul></ul><ul><ul><ul><li>Typically achieved by use of a shared file system, which has limited scale </li></ul></ul></ul><ul><ul><li>Move the compute to where the data is </li></ul></ul><ul><ul><ul><li>Often the most efficient and achievable </li></ul></ul></ul><ul><ul><ul><li>A good use case for this pattern is calibration </li></ul></ul></ul>
    11. 11. Data movement mechanisms –file systems <ul><li>Shared file systems </li></ul><ul><ul><li>Are at best a temporary solution </li></ul></ul><ul><ul><li>Performance and scalability issues </li></ul></ul><ul><ul><li>eventually create a network bottleneck </li></ul></ul><ul><ul><li>A low number of simultaneous reads supported (500) ‏ </li></ul></ul><ul><li>Parallel file systems </li></ul><ul><ul><li>Good for large amounts of data (> 50 GB) ‏ </li></ul></ul><ul><ul><li>Simple, well understood interface </li></ul></ul><ul><ul><li>Good scalability (4026 nodes on GPFS) ‏ </li></ul></ul><ul><li>File system Limitations </li></ul><ul><ul><li>Single point of failure and contention (although there is clustering) ‏ </li></ul></ul><ul><ul><li>Disk based, increasing read and write time </li></ul></ul><ul><ul><li>Limited support for windows (for parallel file systems) </li></ul></ul><ul><ul><li>Limited support for file replication across regions </li></ul></ul><ul><ul><li>Infrastructure, rather than application, centric </li></ul></ul>
    12. 12. Data movement mechanisms – Data Grid <ul><li>Level 0 Data Grid </li></ul><ul><ul><li>Distribute large sets of static data to compute nodes </li></ul></ul><ul><ul><li>Focus is on moving and sharing Terabytes to Petabytes of data </li></ul></ul><ul><ul><li>Examples – CERN (High energy physics), ROADNet (Real time observatories)‏ </li></ul></ul><ul><ul><li>Typically level 0 data grids rely on a Storage Resource Broker (SRB) – middleware that provides an interface to heterogeneous data storage resources over a network </li></ul></ul><ul><ul><ul><li>Supports Shared files systems, Databases, real time data sources etc. </li></ul></ul></ul><ul><li>Level 1 Data Grid </li></ul><ul><ul><li>Distributes and manages dynamic data over large sets of compute nodes </li></ul></ul><ul><ul><li>Supports transactions and events </li></ul></ul><ul><ul><li>The focus is on ensuring that data is available in near real time </li></ul></ul><ul><ul><li>Examples – Real time pricer </li></ul></ul><ul><ul><li>Technology – Data Cache </li></ul></ul>
    13. 13. What do you get in a level 1 Data Grid technology? <ul><li>Access methods </li></ul><ul><ul><li>Map and/or minimal SQL interface </li></ul></ul><ul><li>Management capabilities and policies </li></ul><ul><ul><li>Data integrity </li></ul></ul><ul><ul><li>Data recoverability </li></ul></ul><ul><li>Event notification </li></ul><ul><li>Transactional support </li></ul><ul><ul><li>can support distributed transactions </li></ul></ul><ul><ul><li>Both two phase commit and compensated transactions </li></ul></ul><ul><li>Synchronization of data </li></ul><ul><ul><li>optimistic vs pessimistic (locking / version control) </li></ul></ul><ul><ul><li>synchronous or asynchronous ‏ </li></ul></ul><ul><ul><li>peer to peer or centralization </li></ul></ul>
    14. 14. A Word on Data Cache technology! <ul><li>Used to reduce contention on Database </li></ul><ul><li>Used to handle transient data state </li></ul><ul><ul><li>Not all data needs to be persisted </li></ul></ul><ul><li>Used to increase performance and reduce latency to read and write data </li></ul><ul><ul><li>Locate data near to computation to be performed </li></ul></ul><ul><ul><li>Held in memory for speed </li></ul></ul><ul><li>Represents the state of data at a point in time </li></ul>http://www.excelian.com
    15. 15. Brainstorm <ul><li>What use cases are supported by the following technologies in relation to grid and large scale distribution of data: </li></ul><ul><ul><li>A central database </li></ul></ul><ul><ul><li>Replicated databases </li></ul></ul><ul><ul><li>Shared file system </li></ul></ul><ul><ul><li>GridFTP </li></ul></ul><ul><ul><li>Data Grid </li></ul></ul><ul><ul><li>Data Cache </li></ul></ul>http://www.excelian.com
    16. 16. You say Compute Grid I say Data Grid ! <ul><li>Often used interoperably to describe distributed computing but not the same </li></ul><ul><ul><li>If compute power is the limiting resource then a Compute Grid is needed </li></ul></ul><ul><ul><ul><li>eg. DataSynapse, Platform </li></ul></ul></ul><ul><ul><ul><li>Eg computing A Monte Carlo Simulation </li></ul></ul></ul><ul><ul><li>If access to data or computing lots of data is the limiting resource then a Data Grid is needed </li></ul></ul><ul><ul><ul><li>Combination of Compute Grid and Cache or could be a single product </li></ul></ul></ul><ul><ul><ul><ul><li>Oracle Coherence, Gemstone Gemfire, GigaSpaces XAP </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Eg Foreign Currency Exchange </li></ul></ul></ul></ul>http://www.excelian.com
    17. 17. Compute Grid Topology: Master Worker http://www.excelian.com
    18. 18. Caching Topologies <ul><li>Embedded Local Cache </li></ul>http://www.excelian.com Master / Local Cache <ul><ul><li>Load </li></ul></ul>
    19. 19. Caching Topologies <ul><li>Master / Local </li></ul>http://www.excelian.com Master / Local Cache <ul><ul><li>Load </li></ul></ul><ul><ul><li>Load on Demand </li></ul></ul><ul><ul><li>Data Tier </li></ul></ul><ul><ul><li>Load </li></ul></ul><ul><ul><li>Read on Demand </li></ul></ul><ul><ul><li>Read on Demand </li></ul></ul>
    20. 20. Caching Topologies <ul><li>Replicated Cache </li></ul>http://www.excelian.com <ul><ul><li>Put </li></ul></ul><ul><ul><li>Put </li></ul></ul><ul><ul><li>Get </li></ul></ul><ul><ul><li>Get </li></ul></ul>
    21. 21. Caching Topologies <ul><li>Partitioned Cache </li></ul>http://www.excelian.com <ul><ul><li>Put </li></ul></ul><ul><ul><li>Put </li></ul></ul><ul><ul><li>Get </li></ul></ul>
    22. 22. Caching Topologies <ul><li>Hierarchical Cache </li></ul>http://www.excelian.com <ul><ul><li>Get </li></ul></ul><ul><ul><li>Exists in cache after request </li></ul></ul>
    23. 23. Caching Topologies <ul><li>Write Through Cache </li></ul>http://www.excelian.com <ul><ul><li>Exits in Cache after write </li></ul></ul>
    24. 24. Caching Topologies <ul><li>Read Through Cache </li></ul>http://www.excelian.com <ul><ul><li>If not in Cache, exists in cache after read </li></ul></ul>
    25. 25. Workshop 1 – The Scenario <ul><li>A bank wants to build a new Risk Application that calculates risk across all books within the global market. A common enough scenario. </li></ul><ul><li>To achieve this we want to implement a distributed application that has access to real-time data. </li></ul><ul><li>The business want the system to be scalable enough to cope with all current deal scenarios but also to be able to cope with 5 times the volume growth over the next 3 years. </li></ul><ul><li>We have 4 different topologies about how we could approach this </li></ul><ul><ul><li>What are the pro’s / cons of each ? </li></ul></ul><ul><ul><li>Are there any more topologies we could use ? </li></ul></ul>http://www.excelian.com
    26. 26. Topology 1 http://www.excelian.com
    27. 27. Topology 2 http://www.excelian.com
    28. 28. Topology 3 http://www.excelian.com
    29. 29. Topology 4 http://www.excelian.com
    30. 30. Workshop 2 <ul><li>The business wants a new Trading client system to allow &quot;traders&quot; to monitor the market and submit trades. </li></ul><ul><li>The Read/write ratio is extremely high </li></ul><ul><li>Events have to be delivered in as close to real-time as possible. </li></ul><ul><li>There are three trading countries in which data is monitored in which data and trades are executed in either London, New York or SIngapore </li></ul><ul><li>The Current approaches uses mostly messaging (IIOP, JMS, Sockets) to implement the system but is suffering from broadcast issues and scaling is difficult </li></ul><ul><li>What are the challenges of designing such a system and how could you implement using a caching based solution ? </li></ul>
    31. 31. We started you off http://www.excelian.com
    32. 32. NY London 10Mbs Replication Chall e nges: Bandwidth <ul><li>Solution : </li></ul><ul><li>Batching </li></ul><ul><li>Compression </li></ul><ul><li>Async replication </li></ul><ul><li>Data is kept local </li></ul><ul><li>Update are local based on ownership </li></ul>
    33. 33. Cluster Cluster NY London Sync Replciation Within site ASync Replication between sites 10Mbs Chall e nges: Reliability
    34. 34. NY London DB DB load update Chall e nges: Audit of Record
    35. 35. W hat other challenges ? http://www.excelian.com
    36. 36. Workshop 3 <ul><li>The business wants to build a new algorithmic Trading application which will allow them to have scale and performance and allow them to concentrate on the analysis and algorithms </li></ul><ul><li>What are the options for building this using Caching </li></ul><ul><li>What other products could they look at </li></ul><ul><li>What are the pros/cons of the approaches </li></ul>
    37. 37. Summary Synchronisation and Replication http://www.excelian.com
    38. 38. Brainstorm: <ul><li>Come up with a matrix of scenarios, topologies and replication strategies that match the following use cases: </li></ul><ul><ul><li>I want to run my risk reports on a snapshot of data taken at 5:00 pm. I run them on a grid separated between london and singapore and collect data from both locations. </li></ul></ul><ul><ul><li>I want to cache results from intra-day pricing calculations for a period of time so that I can avoid re-calculating them if I need the price </li></ul></ul><ul><ul><li>I want to run my overnight batch on 10000 nodes and write my results back to the results database as I calculate them </li></ul></ul>http://www.excelian.com
    39. 39. Session 1 ends See you back for session 2 after coffee http://www.excelian.com
    40. 40. Session 2 – Building a data grid <ul><li>Presentation: Achieving low latency </li></ul><ul><li>Brainstorm: Data cache infrastructure </li></ul><ul><li>Brainstorm: Utility and cloud computing </li></ul><ul><li>Presentation: Data Cache vendors, open source and selection critera </li></ul><ul><li>Summary </li></ul>http://www.excelian.com
    41. 41. Achieving really low latency Presentation http://www.excelian.com
    42. 42. There are three aspects of data in a distributed architecture that are difficult to manage simultaneously <ul><li>Scalability </li></ul><ul><ul><li>Is all of the data required for all tasks or can we benefit from partitioning and cache regions? </li></ul></ul><ul><ul><li>Peer to peer replication implies unlimited scalability </li></ul></ul><ul><li>Consistency </li></ul><ul><ul><li>Does every compute task have to have the same data available? </li></ul></ul><ul><ul><li>If one task writes data to the store, does every node need that data (transactions)‏ </li></ul></ul><ul><ul><li>Hierarchical cache improves consistency </li></ul></ul><ul><li>Low latency </li></ul><ul><ul><li>Requires data and compute task to be in the same place at the same time </li></ul></ul><ul><ul><ul><li>a good locality of reference and/or data affinity </li></ul></ul></ul><ul><ul><li>Near cache makes access faster </li></ul></ul>
    43. 43. But what can we really do to achieve low latency at scale with consistency? <ul><li>It is all to do with what you move and how you move it </li></ul><ul><ul><li>Preferably don’t move anything anywhere : </li></ul></ul><ul><ul><ul><li>Use a big machine with loads of CPU and large amounts of onboard RAM </li></ul></ul></ul><ul><ul><ul><ul><li>Multicore </li></ul></ul></ul></ul><ul><ul><ul><ul><li>SMP </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Supercomputers </li></ul></ul></ul></ul><ul><ul><li>If you must move it: </li></ul></ul><ul><ul><ul><li>Move it as fast as you can: </li></ul></ul></ul><ul><ul><ul><ul><li>Use data caching with Infiniband, 10G Ethernet or Myrinet </li></ul></ul></ul></ul><ul><ul><ul><li>And Only move what you need to </li></ul></ul></ul><ul><ul><ul><ul><li>Reduce the granularity of data </li></ul></ul></ul></ul><ul><ul><ul><li>Once it has moved, keep it there and don’t change it </li></ul></ul></ul><ul><ul><ul><ul><li>Capitalise on Data Cache and grid capabilities to support temporal and geographical affinity </li></ul></ul></ul></ul>
    44. 44. Laying out the compromise <ul><li>“ Large scale data distribution” and “near real time” are largely incompatible </li></ul><ul><li>It’s more about compromise </li></ul><ul><ul><li>Which do you want? </li></ul></ul><ul><li>Scalability X Consistency = High latency </li></ul><ul><li>Scalability X low latency = Inconsistency </li></ul><ul><li>Consistency X Low Latency = Low Scalability </li></ul>
    45. 45. Brainstorm: <ul><li>Add to your matrix of scenarios the following: </li></ul><ul><ul><li>I want to drive algorithmic pricing off of a small grid (I need a small grid as some of the models would take too long to run without parallelisation)‏ </li></ul></ul><ul><ul><li>I want to recalculate my model prices in relation to changes in market data as soon as possible using a 100 node grid. What happens if i need to use a 1000 node grid </li></ul></ul>http://www.excelian.com
    46. 46. Discussion Physical DataGrid infrastructure http://www.excelian.com
    47. 47. Discussion <ul><li>Assume you want to take the load off of a central database store that the grid compute nodes accessed during a calculation and that you place a data cache in front of the database. What do you think would be the correct ratio of compute to data grid nodes would be for: </li></ul><ul><ul><li>Read only </li></ul></ul><ul><ul><li>Read and write </li></ul></ul><ul><li>Justify your decision </li></ul><ul><li>What physical infrastructure would you need to build out to enable this? Think about </li></ul><ul><ul><li>CPU </li></ul></ul><ul><ul><li>Memory </li></ul></ul><ul><ul><li>I/O </li></ul></ul><ul><ul><li>Network </li></ul></ul>http://www.excelian.com
    48. 48. Exploiting Cloud and Utility compute – the case for data grid? http://www.excelian.com
    49. 49. Cloud & Utility compute Overview http://www.excelian.com
    50. 50. Vendors to look out for <ul><li>CohesiveFT </li></ul><ul><ul><li>Servers as a Service </li></ul></ul><ul><ul><li>Hypervisor transformation management </li></ul></ul><ul><li>RPath </li></ul><ul><ul><li>Appliances as a service </li></ul></ul><ul><ul><li>Appliance Hypervisor transformation </li></ul></ul><ul><li>FlexiScale </li></ul><ul><ul><li>Enterprise Cloud </li></ul></ul><ul><ul><li>Based on Xen </li></ul></ul><ul><li>RightScale </li></ul><ul><ul><li>Fine Tuning the cloud </li></ul></ul><ul><li>Elastra </li></ul><ul><ul><li>Deploy and manage services on public and private clouds </li></ul></ul><ul><li>Vcloud </li></ul><ul><ul><li>VMWARE cloud computing </li></ul></ul><ul><ul><li>Operating systemf or the data centre </li></ul></ul><ul><li>GigaSpaces </li></ul><ul><ul><li>Scale out Application server for the Cloud </li></ul></ul>http://www.excelian.com
    51. 51. Brainstorm: <ul><li>Add to your matrix of scenarios the following : </li></ul><ul><ul><li>I want to make use of my outsourced compute facility to run grid calculations. For this I need current market and static data. My enterprise market and static data store is 70GB in size. </li></ul></ul><ul><li>Now create other scenarios and add then to your matrix </li></ul>http://www.excelian.com
    52. 52. Presentation Practical considerations – the vendors and selection criteria http://www.excelian.com
    53. 53. The Main vendors <ul><li>GigaSpaces </li></ul><ul><ul><li>XAP </li></ul></ul><ul><ul><li>Originally based on JavaSpaces </li></ul></ul><ul><ul><li>Can function as Compute Grid + DataGrid </li></ul></ul><ul><ul><li>API is java, based on POJO and Spring </li></ul></ul><ul><ul><li>Can still use JavaSpaces API and supports .Net, C++, Scripting, JDBC </li></ul></ul><ul><li>Oracle </li></ul><ul><ul><li>Coherence Originally from Jcache implementation </li></ul></ul><ul><ul><li>Early success as bolt onto J2EE </li></ul></ul><ul><ul><li>API is java, based on distributed Hashmap </li></ul></ul><ul><ul><li>Also supports .Net, C++ </li></ul></ul><ul><li>Gemstone </li></ul><ul><ul><li>Gemfire </li></ul></ul><ul><ul><li>Two versions Java and C++ </li></ul></ul><ul><ul><li>Native C++ is big selling point </li></ul></ul><ul><ul><li>Native C++ ApI but also support .Net and JDBC </li></ul></ul><ul><ul><li>All vendors partner with DataSynapse </li></ul></ul>http://www.excelian.com
    54. 54. What is the Developers view http://www.excelian.com
    55. 55. Reading a Trade Object <ul><li>Simple Scenario to read and write a trade object using major vendors </li></ul>http://www.excelian.com
    56. 56. Reading/Writing a Trade Object GigaSpaces <ul><li>GigaSpaces </li></ul>http://www.excelian.com
    57. 57. Reading/Writing a Trade Object Gemstone <ul><li>Gemstone </li></ul>http://www.excelian.com
    58. 58. Reading/Writing a Trade Object using Coherence <ul><li>Coherence </li></ul>http://www.excelian.com
    59. 59. So what does this tell us ? <ul><li>Results </li></ul><ul><ul><li>All implementations are easy to use </li></ul></ul><ul><ul><ul><li>Easy to create cache </li></ul></ul></ul><ul><ul><ul><li>Easy to write to and from cache </li></ul></ul></ul><ul><ul><li>All provide in-memory implementations </li></ul></ul><ul><ul><li>All of the implementations provide ways to add indexing for fast reads </li></ul></ul><ul><ul><li>All of them provide mechanisms for advanced querying </li></ul></ul><ul><ul><ul><li>GigaSpaces: Supports SQL </li></ul></ul></ul><ul><ul><ul><li>Gemstone: Support SQL </li></ul></ul></ul><ul><ul><ul><li>Coherence: SQL like </li></ul></ul></ul><ul><ul><li>All of the them provide ability to add listeners on cache data change </li></ul></ul><ul><ul><li>All of them provide transactions </li></ul></ul><ul><ul><li>All of them provide locking </li></ul></ul>http://www.excelian.com
    60. 60. What are the open source Data Caching choices http://www.excelian.com
    61. 61. Open Source choices <ul><li>EHCache </li></ul><ul><ul><li>Pure Java in Process Cache </li></ul></ul><ul><ul><li>Acts as pluggable cache to Hibernate 2.1 </li></ul></ul><ul><li>Cache4J </li></ul><ul><ul><li>Cache for Java Objects </li></ul></ul><ul><ul><li>Simple API </li></ul></ul><ul><ul><li>Designed for multi-threading </li></ul></ul><ul><li>SwarmCache </li></ul><ul><ul><li>Simple Distributed Cache </li></ul></ul><ul><ul><li>Optimised for Read Only </li></ul></ul><ul><li>Jcache </li></ul><ul><ul><li>Reference implementation of JSR-107 </li></ul></ul><ul><ul><li>JSR-107 static for along time </li></ul></ul><ul><li>MemCache </li></ul><ul><ul><li>In Memory Hash Table </li></ul></ul><ul><ul><li>Lacks security / authentication </li></ul></ul>http://www.excelian.com
    62. 62. What are the considerations for choosing a vendor ? http://www.excelian.com
    63. 63. What should you think about when choosing a caching product ? <ul><li>What topologies does it support ? </li></ul><ul><li>HA / Resiliency ? </li></ul><ul><li>What management and monitoring features does it have ? </li></ul><ul><li>API support </li></ul><ul><li>Versioning </li></ul><ul><li>Performance </li></ul><ul><li>Scalability </li></ul><ul><li>Product and API integration </li></ul><ul><li>Replication strategies </li></ul><ul><li>Authentication / Security </li></ul><ul><li>Largest cluster support size </li></ul><ul><li>No of Clients that can connect </li></ul><ul><li>Network requirements </li></ul><ul><ul><li>Unicast / multicast </li></ul></ul><ul><li>Read Mostly required or read / write or write mostly ? </li></ul><ul><li>Think about the features you need now and the future </li></ul><ul><ul><li>Collections support; Lease Management, Queries, Continuous queries etc </li></ul></ul>http://www.excelian.com
    64. 64. Review http://www.excelian.com

    ×