Scaling mature systems
Upcoming SlideShare
Loading in...5

Scaling mature systems






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • I am going to talk to you about software architecture and scalability.Most talks that I have attended, blogs or books that I have read on this subject make the same assumption:- You are developing a new system.- You don’t need to migrate existing functionality.- You have unlimited engineering resources.- This is typical for large corporates or academia.As software architects we have to deal with large mature systems, and it is obviously very expensive to rewrite these from scratch.The same holds for refactoring or migrating functionality to a new platform.As an architect, you may be given a lot of stick due to scalability issues with your system.So why don’t our systems scale?
  • The most common reason is that the environment that your software runs in has changed.This can be due to many factors. The most obvious are: - The number of users or traffic has increased. - The number of products you are offering has increased.But also, you may have a case where the purpose of a software component has changed over time.Your marketing or sales department may have decided that your internal test tool is something they can sell.This has actually happened in every single job that I have had since college - the worst case being the first tool that I ever wrote (phone call scanner to be used as evidence in court cases).All these environment changes put new demands on your software platform.Today I will talk about how we can make our existing systems scalable.
  • Focus for today will be on scalability.I will go through my interpretation of scalability, so that we share a common understanding of the term.
  • Performance is a single point on this graph.Scalability is the shape of the graph.In this case we see a system that scales linearly, the resource requirements increase linearly with the load on the system.This doesn’t necessarily mean that the performance in terms of response times, etc. remains constant.This graph shows a system that isn’t scalable.Performance is better than our first system (green), but at a certain point the resource requirements skyrocket.This is the ideal scenario, where a system scales better than linear.
  • Over the next 30 minutes I will take you through these common scalability blockers.There are quick wins in all of these areas, and I hope that what I present to you will help you make your system scale.
  • Unless you want to argue about old vs. new Pink Floyd, ACID (Syd Barret) is bad and BASE (Roger Waters) is good.ACID principle:Data consistency is the highest priorityAvailability is less importantPessimistic approachComplex and large overheadBASE principle:Consistency is less importantAvailability is the highest priorityOptimistic approachSimple and fastApproach:Assume that data will be eventually consistent, ORBuild processes to handle consistency exceptions
  • Latency – Overhead of coordinationPerformance – Sequential sequencing of operationsScalability – large resource requirements for single operationAvailability – operation fails if a single system failsAllowing your system to “fail” according to the traditional ACID methodology may actually make business sense. Do you really need your transactions to run as a single atomic operation?
  • OpenJaw’s Internet Booking Engine manages 2-phase transactions across multiple external systems.Most of these external systems are outside of our control, and include travel suppliers like - Expedia, HotelBeds, GTA - Amadeus, Sabre, SITA - Avis/Hertz - Payment gateways - Confirmation e-mailCheckout of a single shopping cart requires interaction with all of these systems, but from the consumer’s perspective this is a single booking with a single payment.We have the capability of running these as a 2-phase transaction, where payment and booking is done as a single all-or-nothing process. This makes all bookings slower than what they have to be, due to sequential processing.In most cases all systems return with a positive response – maybe with the exception of payment where there may be a relatively high failure rate due to the consumer’s credit limit. So why would we want a process that is consistently slow in order to cater for the 2-3% of requests where there are errors?A better approach is to charge payment first, and then run all bookings in parallel. Then we build processes around failed bookings, which could include alerting a call center agent to ring back the customer and propose an alternative product.
  • This is a summary of the changes we made to our transaction process.
  • You will have contention around internal and external resources.
  • It is very easy to build and integrate a cache.There is a range of potential platforms, and you must choose the one that suits your needs.In-memory databases like Redis offers extreme performance, but have limitations like size and failover.Distributed caches like Cassandra offer high scalability and failover, but do not perform as well as in-memory databases.SQL databases offer security in case the cached data is sensitive.Implementing basic caching done in three steps: - Build the cache key - Cache lookup - Cache storage
  • A cache is a simple way of buying performance by sacrificing data freshness or quality.In order to effectively cache you must measure both how well your cache performs (hit/miss rate) but also how good the data quality is. The latter is very often missed or ignored.You may also need to measure these for various product types and variations. In our case we measure per product type, and route (for flights) or destination (for hotels).This allows us to properly tune the cache – and achieving the ideal balance between performance and data-quality.
  • This is the theory!How do you do this in practice?In our case we had a stateless middleware platform that scaled linearly.The stateful front-end used standard J2EE session management, and became a barrier for scalability.J2EE sessions were accessed by our MVC platform, Java beans and JSPs.Changing how we managed sessions would mean a huge refactoring task, which would cost us months of engineering effort,
  • There are two ways of attacking this problem: - You can renew or replace your platform, which means you will have to refactor your application - You can change how your existing platform manages sessions.Our front-end application conformed to the J2EE spec, which allowed us to move to Tomcat. Tomcat is open-source and allows custom session managers to be implemented.On any platform where session/state is accessed through a single interface, you have the possibility of changing how session/state is persisted and shared.
  • Our custom Tomcat session manager uses Cassandra to store session data. This is a common approach in the PHP/LAMP world.Cassandra is the master for session data, and session data sits in the Tomcat session manager only for the duration of a single request.The session manager contains an in-memory cache (write-back or write-through) that exists for the lifetime of a single request.Allows use of a stateless (non-sticky) load balancer.Available as open-source: Google “tomcat Cassandra”
  • These are considerations for any distributed session/state management!Session object serialization must perform well: - Ensure that you only serialize what you need. - Optimize serialization, for instance by serializing XML using Xerces/XalanSession objects cannot be synchronized using Java locking. You may have to rewrite some session objects to be inherently thread-safe.You may also have to build code to handle inconsistencies between concurrent Ajax requests.One common pitfall when creating scalable systems is assuming that the network is always available. With this session manager, network bandwidth may become an issue! Consider using 3 network segments: - One in front-of your application - One for the middle-tier or back-end - One for Tomcat-CassandraWe implemented tunable compression in the Tomcat session manager: Trade-off between CPU and network bandwidth.
  • Making this happen isn’t that difficult!
  • Your application should be just that – an application.All data must be extracted to external stores: - Most applications store product data in back-end systems/databases. - You should store your session state in external storage. - You should also store all your configuration in external storage.Use a single URI or environment variable to point to your configuration store. Your hypervisor will allow you to set this variable per application node.The configuration store then points your application to the appropriate back-end and session store.This allows a single VM image to be used across all your environments; live, pre-live, UAT, test, etc.

Scaling mature systems Scaling mature systems Presentation Transcript

  • Scaling Mature Systems
  • © OPENJAW TECHNOLOGIES 20122 Teaching an old dog new tricks
  • © OPENJAW TECHNOLOGIES 20123 Change in environment • Different usage or purpose • Increase in products • Increase in traffic
  • © OPENJAW TECHNOLOGIES 20124 Morten Jørgensen Chief Architect OpenJaw Technologies Focus: • Support product functionality • Operational efficiency • Scalability and fault tolerance
  • © OPENJAW TECHNOLOGIES 20125 System Load ResourceConsumption Scalability vs Performance
  • © OPENJAW TECHNOLOGIES 20126 • Distributed transactions • Resource contention • State/session management • Deployment process Common Scalability Blockers
  • Distributed Transactions
  • © OPENJAW TECHNOLOGIES 20128 • ACID principle • Pessimistic • Slow • BASE principle • Optimistic • Fast ACID vs. BASE
  • © OPENJAW TECHNOLOGIES 20129 • Come at a cost: • Latency • Performance • Scalability • Availability • Software “failure” may make business sense! Distributed Transactions
  • © OPENJAW TECHNOLOGIES 201210 Distributed Transactions
  • © OPENJAW TECHNOLOGIES 201211 • Reconfigure transaction management • Implement failure detection • Report or log failures • Implement failure handling processes • Convince business owner Distributed Transactions
  • Resource Contention
  • © OPENJAW TECHNOLOGIES 201213 • Contention around internal and external systems • Consumer experience limited by availability, performance and scalability of these systems • Break dependency by using asynchronous processes • Caching is the easiest form of asynchronous decoupling Resource Contention
  • © OPENJAW TECHNOLOGIES 201214 • Implement using SQL or no-SQL database • Choose platform based on your needs • Simple approach • Build cache key • Cache lookup • Cache storage Cache Implementation
  • © OPENJAW TECHNOLOGIES 201215 • Trade-off between data quality and performance • Measure cache hit-rate • Measure data quality • Tune cache according to both! Cache Tuning
  • State & Session Management
  • © OPENJAW TECHNOLOGIES 201217 • Make your application stateless • Remove session data from the application • Increase in user sessions will not impact single node • Add new nodes when traffic increases Stateless Applications
  • © OPENJAW TECHNOLOGIES 201218 • Two options: • Move to new platform • Change existing platform • J2EE conformance allowed us to use Tomcat • Tomcat allowed session manager plugin Stateless Applications
  • © OPENJAW TECHNOLOGIES 201219 Cassandra KeySpace Column Family Key Column Schema Table Row Column SuperColumn Column Key Session Manager Session ID Session Attribute Name Session Object Value
  • © OPENJAW TECHNOLOGIES 201220 Cassandra Session Manager Cassandra “Ring” StatelessLoadBalancer Thrift Application Tomcat PLUGIN Application Tomcat PLUGIN Application Tomcat PLUGIN
  • © OPENJAW TECHNOLOGIES 201221 • Session object serialisation • Session object synchronisation • Network bandwidth Cassandra Session Manager
  • Deployment Process
  • © OPENJAW TECHNOLOGIES 201223 • Your application is decoupled • Your application is stateless • You can add nodes when demand increases • How quickly can you bring a new node up? • Is your application an elastic resource? Deployment Process
  • © OPENJAW TECHNOLOGIES 201224 • Remove everything but code! • Extract session state • Extract configuration • Create a VM image as a template • Instantiate VMs from the image on demand Deployment Process
  • © OPENJAW TECHNOLOGIES 201225 Deployment Process VM VM VMVM Stateless Load Balancer Client Admin Console
  • Results
  • © OPENJAW TECHNOLOGIES 201227 • Product built over 10 years • Large solution with high complexity • Made Internet scalable with small amount of engineering resources • Ability to scale is now a key selling point Our Results
  • Thank you!