Caching principles-solutions


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Caching principles-solutions

  1. 1. Principles of Caching and Distributed Caching (Java) Praveen Manvi
  2. 2. whoamiCurrently working as Senior Technologist @ Thomson Reuters13 + years java development experience (portals, web services, contentmanagement...)
  3. 3. Goals of this presentation• Learn about caching/distributed caching basic principles• Get familiar with prominent use cases and the list of solutions available in this space• Be ‘Distributed cache’ related buzzwords compliant
  4. 4. TwoHardThingsThere are only two hard things in ComputerScience: cache invalidation and naming things.- - Phil KarltonNo other efficient way to increase throughput/scalability than caching.
  5. 5. Why distributed Cache2 minutes video from terracota that explains nicely about caching(
  6. 6. Some fundamentals• Cache A temporary storage area where frequently accessed data can be stored for rapid access. Defining frequently accessed data is a matter of judgment and engineering• Why Cache The differences to fetch data from a CPU register , RAM, disk & network are many orders of magnitude, so it makes perfect sense that keeping the most frequently used data in the closest location that reduces latency. There is no better way to improve the performance.• Memory Hierarchy Pictures of pyramids in the next slides should help to remember memory architecture better
  7. 7. Memory Hierarchy
  8. 8. In terms clock cycles
  9. 9. RAM Vs Disk
  10. 10. Latency Numbers • L1 cache reference 0.5 ns • L2 cache reference 7 ns • Mutex lock/unlock 25 ns • Main memory reference 100 ns • Send 2K bytes over 1 Gbps network 20,000 ns • Read 1 MB sequentially from memory 250,000 ns • Round trip within same datacenter 500,000 ns • Disk seek 10,000,000 ns • Read 1 MB sequentially from disk 20,000,000 ns1 ns = 10-9 seconds1 ms = 10-3 seconds
  11. 11. & Pictorial representation
  12. 12. Distributed Cache• One machine cannot manage huge amount of data• 100s of servers need to be treated as single unit managing the partitions, transaction, security & speed of concurrent access• Distributed caching solutions simplifies the all the hard work required by distributed programming
  13. 13. Shared Memory In JavaNo support (direct) memory mapping (sharing memory across the different processes)• Java is designed to be a (virtual) machine unto itself. It doesnt really support the idea of separate processes. It has robust support for lightweight independent execution path through threads sharing same memory space.• Javas memory guarantees are a more fine-grained version of sharing memory, with type and privacy control, and built-in robust concurrency features.• There are ways access through new java nio APIs and through JNI.Why memory mapped IO• memory-mapped IO allows us to map a file on disk to memory byte buffer so that when we read the memory buffer, the data is read from the file. There is no overhead of the system call and the creation and copying of buffers. More importantly, from Java perspective, the memory buffer resides in the native space and not in the JVMs heap.Why its not so important for Java• Java is a general purpose language, programmers are relieved from dealing with page faults, in-appropriate access of disk sector & providing a layer over them memory
  14. 14. Use cases• share data/state among many servers for better performance• clustering of application• partition your in-memory data• send/receive messages among applications on demand• distribute workload onto many servers• take advantage of parallel processing• provide fail-safe data management• provide secure communication among servers• better utilization of cpu and network bandwidth
  15. 15. & some economicsA blade with 64GB RAM for ~$1.5K$30K we can have 1TB of RAM capacity (20 blades)– Gartner estimates that by 2014 at least 40% of large organizations will deploy an IMDG (In Memory Data Grid) product with the market reaching to $1 billion
  16. 16. Caching topologies- PartitionedA partitioned cache is a clustered, fault-tolerant cache thathas linear scalability.- ReplicationA replicated cache is a clustered, fault tolerant cache wheredata is fully replicated to every member in the cluster.- Near CacheA near cache is a hybrid cache; it typically fronts a distributedcache or a remote cache with a local cache.
  17. 17. Cache Load techniques• Cache Through – Synchronous• Write Behind – Asynchronous• Read Through (Lazy loading technique) – if(get(Key)) is NULL load it. Else return the result obtained from cache
  18. 18. Caching patterns• Minimize the number of hops to locate and access data Separate data and metadata, provide hints, and avoid cache-to-cache transfer• Do not slow down - Cache data close to client Location-hints• Share data among many caches Separate data paths and metadata paths, location-hints and index
  19. 19. Cache Performance Characteristics – Application throughput/latency – JVM : Threads, Heap memory, GC & also CPU, Memory, Disk at OS level The main performance characteristic of a cache is a hit/miss ratio. The hit/miss ratio is calculated as number of cache hits divided by number of cache misses. The hit/miss ratio is calculated using hit and miss counters accumulated over a period of time. A high hit/miss ratio means that a cache performs well. A low hit/miss ratio means that the cache is applied to data that should not be cached. Also, the low hit/miss ratio may mean that a cache is too small to capture temporal locality of data access.
  20. 20. Sample Usage (Hazelcast)Map<String,User> users = new ConcurrentHashMap<String,User>();users.put(“praveen", new AdminUser(“praveen", “yahoo123"));users.put(“suresh", new ClientUser(“suresh", “wipro123")); Single change will do the magic & we can get the users in different JVMs and the hostMap<String,User> users = Hazelcast.getMap("users");users.put(“praveen", new AdminUser(“praveen", “yahoo123"));users.put(“suresh", new ClientUser(“suresh", “wipro123"));
  21. 21. Stand-alone JVM caching JVM’s inbuilt mechanisms to handle the references are important concepts to understand.• Soft Reference• Weak ReferenceEx: MapMarker library from Guava.Map<Key,Graph> graphs = new MapMaker() .concurrencyLevel(4) .weakKeys() .maximumSize(10000) .expireAfterWrite(10, TimeUnit.MINUTES) .makeComputingMap( new Function() { public Graph apply(Key key) { return createExpensiveGraph(key);} });
  22. 22. Distributed Caching Solutions
  23. 23. Some non java based options
  24. 24. Job Trends
  25. 25. Questions?