Your SlideShare is downloading. ×
Data Grids for Extreme Performance, Scalability and Availability JavaOne 2011 Steve Millidge
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Grids for Extreme Performance, Scalability and Availability JavaOne 2011 Steve Millidge

474

Published on

Published in: Technology, Business
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
474
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
1
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • If you don’t have these non-functional requirements then no amount of whizzy functionality will win over customers.Take each one in turn.
  • 97% availability is 11 days a year
  • Two key techniques:Redundancy is doubling up the components therefore reducing the failure rateDecouplingRemoving the lifecycle dependency for service availability through messaging or caching etc.
  • Algorithmic Performance can be helped with profilers etc. in my experience a very rare performance issue in a production systemResource Limitations require tuning and ability to scale yourself out of the problem (See scalability)If you can’t scale you will hit a wallResource contention requires careful tuning as you can’t scale your way out of this problem you must architect and design your system to
  • Latency Factors in Distributed ApplicationsPinging a server in london could be 60ms whereas US could be 150msMobile networks are more unreliable and data rates to the same client can fluctuate rapidlyData size – shifting large amounts of data takes longer than small amountsMaking many fine grained calls means you hit the network latency many times as one or two large calls reduces round trips but may increase data latencyContention on Resources like locks
  • Obviously we all want increased volume so that we know our service is successfulMobile has massively increased the load on some systems due to the nature of always on people accessing systems when they never would be before (e.g. nation rail website)Periodic load variation throughout the dayCloud computing whereby servers can be started on demand enables elastic scaling – remember your capacity planning and HA requirements
  • Scale Out means adding more and more servers and nodes into your cluster using a load balancer.Scale Up means adding more cpus and memory to a single serverl
  • Typically Scalability is non-linear due to other limits in the system – Database, Network, Concurrency Hot Spots
  • Goal is for the load balancer to redirect to “any” node as they are all homogenousSometimes sticky sessions are used to redirect to the state holder for non-critical state.
  • Pseudo Stateless give typical non linear scalability issues as the database must be scaled.
  • INCREASES AVAILABILITY VIA DECOUPLING
  • REDUCES LATENCY BY MOVING DATA CLOSER TO PROCESSING
  • CLUSTERING REQUIRED FOR HA AND SCALABILITY !! Data is not in Node B so need to do Read Through CachingCluster Cache needs to be preloadedSubsequent requests will use cached data
  • GIVES SCALABILITY
  • NOT HA ADD DUPLICATES FOR HA
  • Not a lot of Hardware!!!!Some customers I know (not necessarily coherence) looking at 200 hardware nodes with 32Gb RAM each node for some big data clusters
  • Often “Off Heap Storage”Not a database!!!!
  • Consider was needs to be truly persistent;Financial RecordsCustomer Details and AccountsDon’t need; Who’s sitting at what table What cards they have been dealt How much they raised the bet What positions they took up
  • Without computation you just have a big cache!Good but not radical!Very Expensive to pull all the data across the grid
  • Moves the processing the data not the other way around!Much more efficient and the processor will likely have a small amount of data associated with it whereas the cache size may be very large!Massively REDUCES LATENCY through Not sending the DataMassively INCREASES PARALLELISMCan REDUCE LOCK Overhead as Lock Acquisition is Local
  • Coherence supports programmtic querying of the gridCohQL allows straight queries but there is also a programmticapiCohQL new in July 2010 Coherence 3.6
  • Used for ChatCould be used for new trades for a trader or bids or offers on a securityAsynchronous notificationManagement
  • Huge scalable Grid with Huge data storage capacity – backed by a database with Write Behind ProcessingEvents To PUSH Data to clients via web sockets or AjaxParallel Computation and UpdatesWrite Behind Processing for asynch StorageElastically Expand Capacity through rebalancing the Partition
  • THORETICAL in practice may hit network limits
  • Transcript

    • 1. Data Grids for ExtremePerformance, Scalability and Availability Steve Millidge Director C2B2 © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 2. “Reliability, Availability, Scalability and Performance are prerequisites for functionality!”They are Priority 1 Requirements © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 3. Availability• System is available for customers to use• No availability results in no transactions• Transactions = $$$• Receive your Pink Slip if you can’t sort it! © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 4. Multipliers in Availability System System System 1 2 3 99% Availability 99% Availability 99% AvailabilityOverall Availability = 0.99*0.99*0.99 = 97% © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 5. HA Techniques Redundancy Decoupling 99% Availability System System 99% Availability 1 99% Availability 99% Availability System System 2Pair = 1 – (0.01*0.01) = 99.99% System 99% Availability 3Overall = 0.9999 x 0.9999 x 0.9999 = 99% Overall = 99% © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 6. Performance How fast does a single transaction take to execute!• Faster Performance = Happier Customers• Faster Performance = More Transactions © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 7. Barriers to Performance• Raw Algorithmic Performance• Resource Limitations – Not enough cpu, disk, memory• Resource Contention – Locks• IO Latency – Network, Disk © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 8. LatencyTime delay in requesting an operation and it being initiated• Key factor in large scale distributed applications• Typically not taken into account during development © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 9. Latency Factors• Network Distance• Network Reliability• Data Size• Operation Granularity• Resource Contention• JVM GC © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 10. Move the Data and Processing Close Together © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 11. ScalabilityAbility to add more hardware in response to more demand. Without a reduction in performance! © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 12. Business Imperatives• Success of the Business or Service• Growth of Mobile• Huge Variation of Load through a period• Sudden Large Spikes due to events Cloud Enables Elastic Scalability © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 13. Scaling Out Horizontal Scaling• Add Additional Servers• Add Load Balancer• Distribute traffic across the servers• Much Cheaper than Scale Up• Has HA benefits © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 14. Linear Scalability (Nirvana) 900 800 700 600 500Users Linear Scalability 400 Typical Scalability 300 200 100 0 1 2 3 4 Cluster Nodes © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 15. Typical Scale Out Architecture Load BalancerNodes HostStateless Services Node Node Node Node 1 2 3 4 Database contains Database Persistent State © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 16. Stateless ServicesTrue Stateless Services Pseudo Stateless• Static HTML Serving • Read, Update and Store• Basic Calculations state in the DB• State Received from • Use sticky session to Client route to non critical state • Typical of Most Online applications • Push scalability issue to the database © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 17. Scaling a Stateless Middletier is easy howeverScaling Databases is hard and very expensive © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 18. Radical IdeaPut state back into the Middleware © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 19. Caching© C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 20. Read Through Cache GET AApplication Cache A Cache Loader Data StoreA © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 21. Write Through Cache GET B B PUTApplication Cache B Cache Writer Data Store © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 22. Write Behind Cache GET B B PUTApplication Cache B Write Behind Processor Data Store © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 23. Caches• Caches aren’t New – Hibernate Session Cache – Entity Bean Cache – JPA Cache – Custom Caches – Open Source Caches• Typically Cache Database Data or Page Fragments © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 24. JSR 107 JCACHE - Java Temporary Caching API• Been around a Long Time – 10 years• Focussed on Java SE – With some JEE Integration for JEE7• Caching API – V get(Object key) throws CacheException; – void put(K key, V value) throws CacheException; © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 25. JSR 107 Get Involved• Google Group for Discussion – http://groups.google.com/group/jsr107• Google Docs for Spec – https://docs.google.com/document/d/1YZ- lrH6nW871Vd9Z34Og_EqbX_kxxJi55UrSn4y L2Ak• GitHub for Code – https://github.com/jsr107/jsr107spec © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 26. Local Caching (Roll your Own)Benefits Challenges• Pretty Simple to Write • Cache Eviction – Concurrent Hashmap • Cache Loading/Storing• Used in many • Cache Prefetching applications • Cache Refresh• Use JCache API • Write Behind Processing • Clustering !! THINK LONG AND HARD!! © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 27. Clustering Challenges GET B GET BApplication Application Cache Cache B B DataB Store © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 28. Update Replication UPDATE B B2Application Application Cache B2 B1 Cache B1 Data Store © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 29. Update Invalidation UPDATE B B2Application Application Cache B1 B2 Invalidate Cache B1 Data Store © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 30. Replication Write Performance B PUT BApplication Application Application Application Cache Cache Cache Cache B © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 31. Cache Partitioning GET B PUT C B PUT BApplication Application C Application Application Cache Cache Cache Cache B C B © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 32. Elasticity in Partitioned Caches Application Application Cache Cache © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 33. HA Cache Partitioning B PUT BApplication Application Application Application Cache Cache Cache Cache NODE B CRASH !!! © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 34. Partitioned Cache• Linear Scalability – 2 hops for Read (Worst Case) – 2 hops for Write (Worst Case)• High Availability – Configurable Duplicates• Location Independent Access – Grid knows where data is• More Nodes = More Data in Memory © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 35. Consider a Large CacheApplication Application Application Application Application Application Application Cache Cache Cache Cache Cache Cache CacheApplication Application Application Application Application Application Application Cache Cache Cache Cache Cache Cache CacheApplication Application Application Application Application Application Application Cache Cache Cache Cache Cache Cache Cache © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 36. How Much Can We Store• 21 Amazon xLargeMemory Instances – 17Gb RAM• 3 Nodes Per Instance – 4Gb 64bit JVM Heap + 5 Gb OS• 63 Cluster Nodes• 252 Gb JVM Heap Available• Approx 125Gb Data in the Grid!• Cost per Month $9000 © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 37. Grids can Even OverflowApplication • Passivates Data to a Local Cache Backing Store (NIO memory mapped file) • Use Java NIO for Off Heap Storage • Berkely DB local Storage Local Drive •Reduces GC overhead © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 38. HA In Memory DataData Centre Server Rack 1 Server Rack 2 Application Application Application Application Application Application Cache Cache Cache Cache Cache Cache Data Centre Do We Need the Server Rack 1 Database? 2 Server Rack Applicati Applicati Applicati Applicati Applicati Applicati on Cache on Cache on Cache on Cache on Cache on Cache © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 39. Database as Business Audit Applicati Applicati Applicati Applicati Applicati Applicati Applicati Cach on Cach on Cach on Cach on Cach on Cach on Cach on e e e e e e e Applicati Applicati Applicati Applicati Applicati Applicati Applicati Cach on Cach on Cach on Cach on Cach on Cach on Cach on e e e e e e e Applicati Applicati Applicati Applicati Applicati Applicati Applicati Cach on Cach on Cach on Cach on Cach on Cach on Cach on e e e e e e e Business Audit Data Data Store © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 40. Now you Have a DATA GRID © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 41. So Much More than an L2 Cache © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 42. Computation on the GridApplication Application Application Application Grid Node Grid Node Grid Node Grid Node Process © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 43. In Place ProcessingApplication Application Application Application Grid Node Grid Node Grid Node Grid Node Process © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 44. Querying the GridApplication Application Application Application Grid Node Grid Node Grid Node Grid Node Query © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 45. Data Grid Events SubsystemApplication Application Application Application Map Grid Node Grid Node Grid Node Listener Grid Node © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 46. Putting it All Together Web Sockets Load Balancer JEE JEE JEE JEE JEE Cluster Cluster Cluster Cluster Cluster Process Node Node Node Node Node Applica Applica Applica Applica Applica Applica Applica Cac tion Cac tion Cac tion Cac tion Cac tion Cac tion Cac tion he he he he he he he Applica Applica Applica Applica Applica Applica Applica Cac tion Cac tion Cac tion Cac tion Cac tion Cac tion Cac tion he he he he he he he Applica Applica Applica Applica Applica Applica Applica Cac tion Cac tion Cac tion Cac tion Cac tion Cac tion Cac tion he he he he he he heData Grid © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 47. BE RADICALBuild New Architectures With Data Grids! © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 48. Extreme Performance• Reduced Latency – Data close to processing – Reduce roundtrips and expensive calculations• Parallel Processing – Distributed Processing (Map-Reduce-like) – Distributed Query Processing © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 49. Extreme Scalability• O(1) Writes and Reads – Worst Case two hops – No increase with number of nodes• Data Volume Increases with Nodes – Large data volumes stored in the Data Grid• Elastic Topology – Clusters Rebalance with node changes © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 50. Extreme Availability• No Single Point of Failure – Duplicates prevent data loss – Duplicate Numbers Configurable• Write Behind – decouples Database Availability• Self Healing – Removing Nodes causes rebalancing © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved
    • 51. © C2B2 Consulting Limited 2011 www.c2b2.co.uk All Rights Reserved

    ×