The economies of scaling software - Abdel Remani


Published on

You spend your precious time building the perfect application. You do everything right. You carefully craft every piece of code and rigorously follow the best practices and design patterns, you apply the most successful methodologies software engineering has to offer with discipline, and you pay attention to the most minuscule of details to produce the best user experience possible. It all pays off eventually, and you end up with a beautiful code base that is not only reliable but also performs well. You proudly watch your baby grow, as new users come in bringing more traffic your way and craving new features. You keep them happy and they keep coming back. One morning, you wake up to servers crashing under load, and data stores failing to keep up with all the demand. You panic. You throw in more hardware and try optimize, but the hungry crowd that was once your happy user base catches up to you. Your success is slipping through your fingers. You find yourself stuck between having to rewrite the whole application and a hard place. It's frustrating, dreadful, and painful to say the least. Don't be that guy! Save your soul before it's too late, and come to learn how to build, deploy, and maintain enterprise-grade Java applications that scale from day one. Topics covered include: parallelism, load distribution, state management, caching, big data, asynchronous processing, and static content delivery. Leveraging cloud computing, scaling teams and DevOps will also be discuss. P.S. This session is more technical than you might think.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The economies of scaling software - Abdel Remani

  1. 1. The Economies of ScalingSoftwareAbdelmonaim RemaniAbdelmonaim Remani@PolymathicCoder@PolymathicCoder
  2. 2. About Me• Platform Architect at Inc.• JavaOne RockStar and frequent speaker at many developer events and conferencesincluding JavaOne, JAX, OSCON, OREDEV, 33rd Degree, etc...• Open-source advocate and contributor• Active Community member• The NorCal Java User Group• The Silicon valley Spring User Group• The SiliconValley dart Meetup• Bio:• Twitter: @PolymathicCoder• Email:• SlideShare:
  3. 3. License• Creative Commons Attribution Non-CommercialLicense 3.0 Unported• The graphics and logos in this presentation belongto their rightful owners
  4. 4. •• @PolymathicCoder
  5. 5. Let’s Go!Let’s Go!
  6. 6. What’s up with the title?• The Economies of Scale• “In microeconomics, economies of scaleare the cost advantages that enterprisesobtain due to size [...] Often operationalefficiency is [...] greater with increasingscale [...]” -Wikipedia
  7. 7. The line is blurred!• The was a time when only the enterprise worried about issueslike scalability• The rise of social and the abundance of mobile are responsiblefor• Not only an exponential growth of internet traffic• But the creation of a spoiled user-base that wants answersto questions like• I want to see the closest Moroccan restaurants to mycurrent location on a map along with consumerratings and whether any of my friends has recentlychecked-in in the last 30 days
  8. 8. The bar is high!• Scalability is everyone’s problem
  9. 9. What is Scalability?
  10. 10. The Common Definition• The ability of a software to handle anincreasing amount of work withoutperformance degradation
  11. 11. I have a problem with that definition...• It implies that a scalable system is one that iscapable of sustaining its scalability forever• Not realistic, It fails to recognize externalconstraints imposed• It fails to acknowledge that scalability is relative• It does not take into account that a system• Need not to be capable to handle the work• But simply capable of evolving to handle the work
  12. 12. A better definition• The ability of an application to gracefullyevolve within the constraints of itsecosystem in order to handle themaximum potential amount of workwithout performance degradation
  13. 13. Easier said than done!• A black art• Not surprise here!• An application that supports 1 millionusers• You add one new feature• 500,000 users crash your system
  14. 14. The BottlenecksThe Bottlenecks
  15. 15. The Bottlenecks• Scaling is about relieving or managing these limitations orconstraints that we call the bottlenecks• When we talk about bottlenecks in computing, we talk about theusual suspects• The CPU• Storage or I/O• The Network• Inter-related• The rest of this talk is structured around these bottlenecks to makethe case that one’s scalability needs are to be addressed in thatfashion
  16. 16. The CPUThe CPUBottleneckBottleneckBottleneckBottleneck
  17. 17. The CPU Bottleneck• Nothing affects the CPU more than theinstructions it is summoned to execute• In other words, this is about the very codeof your application
  18. 18. A Scalable Architecture
  19. 19. Architecture?• Architecture• “Things that people perceive as hard-to-change” - Martin Fowler•• Decisions you commit to; the ones that willbe stuck with you forever
  20. 20. Be wise...Think twice...• Choosing the right technologies• Platform• Languages• Frameworks• LibrariesMaking the right abstractions• Technical Abstractions• Functional Abstractions• Make sure that the former is subordinate to the later and not the other wayaround
  21. 21. Write Good Code
  22. 22. Write Good Code• Think your algorithms through and mind their complexity (Asymptotic Complexity,Cyclomatic Complexity, etc...)• SOLIDify your Design• Single Responsibility, Open-Closed, Liskov Substitution, Interface Segregation, andDependency Inversion• Understand the limitation of your technology and leverage its strengths• Don’t be afraid to be Polyglot• Obsess with testing• TTD/BDD• Tools• Static code analyzers (PMD, FindBugs, Etc...)• Profilers (Detect memory leaks, bottle neck, etc...)• Etc...
  23. 23. KnowYour S#!t• Read• The classics: The Mythical Man-Mouth• GoF’s “Design Patterns”• Eric Evans’ “Domain-Driven Design”• Every book by Martin Fowler• Uncle Bob’s “Clean Code”• Josh Bloch’s “Effective Java”• Brian Goetz’s “Java Concurrency in Practice ”• Etc...• The list is long ...
  24. 24. We do all that... and still end up with this...• The fading tradition of making cow dung piles
  25. 25. Still much better than this...
  26. 26. Technical Debt is a Reality• It is the inevitable...You will incur it one way or another deliberately and not• The quick-and-dirty you are not proud of• Things you would/should do differently• Anyways, after a while it starts to smell...• The bright side• The fact that it is recognize as a debt is good• Keep track and refactor• For the fearless... Be wise and think twice before you do it• Cut the right corners• Don’t lock yourself out• Don’t make it a part of your architecture
  27. 27. Scaling UpYourApplication
  28. 28. Vertical Scaling• Vertical Scaling (Scaling up)• A single-node system• Adding more computing resources to thenode (Get a beefier machine)• Writing code to harness the full power ofthe one node
  29. 29. Parallelism• Parallelism?• Writing concurrent code or simultaneously executing code• Most write code that runs within web containers by extendingframework classes that are already multi-threaded• Sometimes the complexity of the business logic demands that webreak it into smaller steps, execute them in parallel, thenaggregate data back to get a result within a reasonable amount oftime• This is not easy!• Often requires synchronizing state, which is a nightmare
  30. 30. Easier said than done...• On the one machine, we have been reaping the benefit of Moore’s Law• Performance gain is automatically realized by software (In otherwords, code is faster on faster hardware)• The End of Moore’s Law:The birth multi-core chip• We actually need to write code to take advantage of this• Good news! There are frameworks and libraries make it a lot easier• Fork/Join in Java• Akka• Etc...
  31. 31. Easier said than done...• Challenges• What about dependencies and 3rd Party code?• Synchronizing state just got HARDER across cores! Toomany cooks!• Frankly, this shared state deal is a real pain• Get a life and do without• Go immutable (Not always straightforward ornot even sometimes not possible)• Go “Functional” (No guts... no glory...)
  32. 32. It gets more interesting...• Amdahl’s Law• Throwing more cores does not necessarily result inperformance gain• We actually end up with diminishing return at some point nomatter how many cores you throw in
  33. 33. Scaling OutYourApplication
  34. 34. Horizontal Scaling• Horizontal Scaling (Scaling out)• A distributed system (A cluster)• Adding more nodes• Writing code to harness the full power ofthe cluster
  35. 35. Topology• A typical cluster consists of• A number of identical application server nodes behind a load balancerA number?• It depends on how many you actually need and can afford• Elastic Scaling / Auto-Scaling• The number of live nodes within the cluster shrinks and grows depending on the load• New ones are provisioned or terminated as neededIdentical?• Application nodes are cloned off of image files (Ex. AWS Ec2 AMIs, etc...)• Configuration Management tool (Chef, Puppet, Salt, etc...)Load balancer?• Load is evenly distributed across live nodes according to some algorithm (Round-Robin typically)
  36. 36. Topology
  37. 37. Managing State• Session data• Session Replication• Session Affinity/Sticky Session• Requests from the same client always get routed back to thesame server• When the node dies, session data die with it• Shared/Distributed Session• Session is in a centralized location• Do your self a favor and go stateless!• No session data• Any server would do
  38. 38. Parallelism• Leverage MapReduce• “A programming model for processinglarge data sets with a parallel, distributedalgorithm on a cluster”• Hadoop
  39. 39. Misc• Distributed Lock Manager (DLM)• Synchronized access to shared resources• Google Chubby• Zookeeper• Hazelcast• Teracotta• Etc...• Distributed Transactions• X/Open XA• HTTPS• End at the load balancer• Wildcard SSL• Leverage probabilistic data structures and algorithms• Bloom filters• Quotient filters• Etc...
  40. 40. Deployment
  41. 41. Deployment• Environments• Multiple Development,Test, Stage, andProduction• Automatic Configuration Management• Practice Continuos Delivery• Leverage The Cloud• IaaS, PaaS, SaaS, and NaaS
  42. 42. The StorageThe StorageBottleneckBottleneckBottleneckBottleneck
  43. 43. The Storage Bottleneck• Storage or I/O is usually the mostsignification
  44. 44. The PersistentDatastore
  45. 45. What datastore to use?What kind of question is that?What kind of question is that?• There was a time when the obvious choice was the relational model• Schema that guarantees data integrity• Data Normalized (minimized redundancies, no modification anomalies, etc...)• ACIDity (Atomicity, Consistency, Isolation, and Durability)• Data is stored in away that is independent from how the data is to accessed (No biasedtowards any particular query patterns)• Flexible query language• As our datasets grow, we scaled vertically• Buying beefier machines• Database tuning / Query Optimization• Creating MaterializedViews• De-normalizing• Etc...
  46. 46. Mucho Data!• We hit the limit of the one machine• Attempted to scale the RDBMS horizontally• Master/Slave clusters• Data Sharding• We failed...Why?• Eric Brewer’s CAP Theorem on distributed systems• Pick 2 out of 3• Consistency• Availability• Partition Tolerance• The relational model is designed to favor CA over P• It cannot be scaled horizontally
  47. 47. NoSQL• A wide range of specialized data stores with the goal ofaddressing the challenges of the relation model• “The whole point of seeking alternatives is that you need tosolve a problem that relational databases are a bad fit for” -EricEvans• A wide variety• Key-Value Data stores• Columnar Data stores• Document Data stores• Graph Data stores
  48. 48. Polyglot Persistence• Acknowledging• The complexity and variety data and data accesspatterns within the one application• The absurdity of the idea that all data should befitted into one storage model• Proposing a solutions that• Leverage multiple data stores within the oneapplication based on the specific way the data isstored and accessed
  49. 49. For more details...• Checkout my talk from JAX Conf 2012• The Rise of NoSQL and PolyglotPersistence• YouTubeVideo:•
  50. 50. Caching
  51. 51. Caching• A cache is typically simple key-value data structure• Instead of incurring the overhead of data retrieval orcomputation every time, you check the cache first• Since we can’t cache everything, caches can be configured touse multiple algorithms depending on the use cases (LRU,Béládys Algorithm, Etc...)• Use aggressively!• What to cache?• Frequently accessed data (Session data, feeds, etc...)• Long computation results
  52. 52. Caching• Where to cache?• On disk• File System: Slow and sequential access• DB:A little bit better (Data is arranged in structuresdesigned for efficient access, indexes, etc...)• Generally a terrible idea• SSD make things a little better• In-Memory: Fast and random access, but volatile• Something in between: Persistent caches (Redis, etc...)
  53. 53. Caching• Types of Caches• Local• Replicated• Distributed• Clustered
  54. 54. Caching• How to cache?• Most caches implement a very simple interface• Always attempt to get from cache first using a key• If it is a hit, you saved yourself the overhead• If it is a miss, compute or read from the datastore then put in cache for subsequent gets• When you update you can evict stale data• You can set a TTL when you put• Many other common operations...
  55. 55. Caching Patterns• Caching Query Results• Key: hash of the query itself• How about parametrized complex queries?• Key: hash of the query itself + hash of parameter values• Method/Function Memoization• Key: method name• How about with parametrized?• Key: hash of the method name + hash of parameter values• Caching Objects• Key: Identity of the object
  56. 56. Caching Pattern• Time-series datasets (Ex. Realtime feed)• Sometimes pseudo/near realtime isenough• Use caching to throttle access to thesource• Cache query result with a t expiry• Fresh data is only read every t
  57. 57. Caching Gotchas• Profile your code to assess what to cache, and whether youneed to to begin with• Stale state might bite you hard• Incoherence: Inconsistent copies of objects cached withmultiple keys• Stale nested aggregates• Network overhead of misses might outweighs theperformance gain of the hits• Consider writing/updating to cache when you write to thedata store
  58. 58. Featured Solutions• EhCache• Memcahed• Oracle Coherence• Redis• A Persistent NoSQL store• Supports built-in data structures like sets and lists• Supports intelligent keys and namespaces
  59. 59. The NetworkThe NetworkBottleneckBottleneckBottleneckBottleneck
  60. 60. AsynchronousProcessing
  61. 61. Asynchronous Processing• Resource-intensive tasks are not practicalto handle a during a HTTP request window• Synchronous is overused and not necessarymost of time
  62. 62. Asynchronous Processing Patterns• Pseudo-Asynchronous Processing• Flow• Preprocessing data / operations in advance• Request data or operation• Responding synchronously with preprocessedresult• Sometimes not possible (Dynamic content,etc...)
  63. 63. Asynchronous Processing Patterns• True Asynchronous Processing• Flow• Request data or operation• Acknowledge• Ex.A REST that return an “202 Accepted” HTTPstatus code• Do Processing at your own connivence• Allow the user to check progress• Optionally notify when processing is complete
  64. 64. Techniques• Job/Work/Task Queues• JMS• AMQP (RabbitMQ,ActiveMQ, Etc...)• AWS SQS• Redis Lists• Etc...• Task Scheduling• Jobs triggered periodically (Cron, Quartz, Etc...)• Batch Processing
  65. 65. Content DeliveryNetwork (CDN)
  66. 66. CDN• Static Content• Binary (Video,Audio, Etc...)• Web objects (HTML, Javascript, CSS, Etc...)• Do not serve through you application server• Use a CDN• “A large distributed system of server deployed inmultiple data centers across the internet”• Akamai• AWS CloudFront
  67. 67. CDN Gotchas• Versioning and caching• Assume that you a script file namedscript.js deployed on a CDN• Copies of the file script.js will bereplicated across all edge nodes• Clients will cache copies of the script filescript.js as well in their local cache
  68. 68. CDN Gotchas• Versioning and caching• When script.js is updated sharing the same URIwith the old version• The new content is NOT propagated acrossthe edge nodes• New clients end up being served with theold version, now dirty state• Old clients continue to use their local cachecontaining the old version, now dirty state
  69. 69. CDN Gotchas• Versioning and caching• What to do?• Simply append version numbers to filenames• script-v1.js, script-v2.js, Etc...• Force invalidation of the file on edge nodes• Set HTTP caching headers properly
  70. 70. Domain Name Service(DNS)
  71. 71. DNS• DNS• Do not rely on your free domain name registrarDNS services• Use a scalable DNS solution• AWS Route 53• DynECT• UltraDNS• Etc...
  72. 72. QuantifyingQuantifyingScalabilityScalabilityScalabilityScalability
  73. 73. Quantifying Scalability• Instrumentation• Bake it into the code early• Monitoring• Application health• Cluster• Individual node• System resources• JVM• Track Key Performance Indicators (KPIs)• Number of request handled• Throughput• Latency• Apdex Index• Etc ...• Logs• Testing• Load/Stress testing
  74. 74. DisasterDisasterRecoveryRecoveryRecoveryRecovery
  75. 75. When disaster hits...• Goal:• Fault tolerant system• If case of disaster, recover and restore service ASAP• Be proactive• Develop a Disaster Recovery Plan (DRP)• Test DRP in failure drills
  76. 76. ScalingScalingTeamsTeamsTeamsTeams
  77. 77. Scaling Teams• Hiring• Always hire top talent• You are as strong as your weakest link• Develop a process to bring people in• Turnkey Hardware/Software Set up (Tools likeVagrant, etc...)• Arrange for proper access/accounts• Develop a knowledge base (Architecture documentation, FAQs, etc...)• Development Process• Be Agile• Refine in the spirit of Six Sigma
  78. 78. Scaling Teams• Teams• Form small ad-hoc teams from pools of Agile breeds• Product Owners• Team Members• Team Lead (Scrum Master)• Engineers• QAs• Architecture Owners• Keep them small• Give them ownership of their DevOps
  79. 79. The Take-homeThe Take-home
  80. 80. The Take-home Message• The early-bird gets the worm• Design to scale from day one• Plan for capacity early• Your needs determine how scalable is scalable• Do not over-engineer• Do not bite more than you can chew• Building scalable system is process• Commit to a road map around bottlenecks• Guided by planned business features• Learn from others’ experiences (Twitter, Netflix, etc...)
  81. 81. Take it slow...You’ll get there...• Work smarter not harder
  82. 82. Questions?
  83. 83. Thank YouThank You