Your SlideShare is downloading. ×

Scalability, Availability & Stability Patterns


Published on

Overview of scalability, availability and stability patterns, techniques and products.

Overview of scalability, availability and stability patterns, techniques and products.

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Scalability,Availability &StabilityPatternsJonas BonérCTO Typesafetwitter: @jboner
  • 2. Outline
  • 3. Outline
  • 4. Outline
  • 5. Outline
  • 6. Outline
  • 7. Introduction
  • 8. Scalability Patterns
  • 9. Managing Overload
  • 10. Scale up vs Scale out?
  • 11. Generalrecommendations• Immutability as the default• Referential Transparency (FP)• Laziness• Think about your data:• Different data need different guarantees
  • 12. Scalability Trade-offs
  • 13. Trade-offs•Performance vs Scalability•Latency vs Throughput•Availability vs Consistency
  • 14. PerformancevsScalability
  • 15. How do I know if I have aperformance problem?
  • 16. How do I know if I have aperformance problem?If your system isslow for a single user
  • 17. How do I know if I have ascalability problem?
  • 18. How do I know if I have ascalability problem?If your system isfast for a single userbut slow under heavy load
  • 19. LatencyvsThroughput
  • 20. You should strive formaximal throughputwithacceptable latency
  • 21. AvailabilityvsConsistency
  • 22. Brewer’sCAPtheorem
  • 23. You can only pick 2ConsistencyAvailabilityPartition toleranceAt a given point in time
  • 24. Centralized system• In a centralized system (RDBMS etc.)we don’t have network partitions, e.g.P in CAP• So you get both:•Availability•Consistency
  • 25. AtomicConsistentIsolatedDurable
  • 26. Distributed system• In a distributed system we (will) havenetwork partitions, e.g. P in CAP• So you get to only pick one:•Availability•Consistency
  • 27. CAP in practice:• ...there are only two types of systems:1. CP2. AP• ...there is only one choice to make. Incase of a network partition, what doyou sacrifice?1. C: Consistency2. A:Availability
  • 28. Basically AvailableSoft stateEventually consistent
  • 29. Eventual an interesting trade-off
  • 30. Eventual an interesting trade-offBut let’s get back to that later
  • 31. Availability Patterns
  • 32. •Fail-over•Replication• Master-Slave• Tree replication• Master-Master• Buddy ReplicationAvailability Patterns
  • 33. What do we mean withAvailability?
  • 34. Fail-over
  • 35. Fail-overCopyrightMichael Nygaard
  • 36. Fail-overBut fail-over is not always this simpleCopyrightMichael Nygaard
  • 37. Fail-overCopyrightMichael Nygaard
  • 38. Fail-backCopyrightMichael Nygaard
  • 39. Network fail-over
  • 40. Replication
  • 41. • Active replication - Push• Passive replication - Pull• Data not available, read from peer,then store it locally• Works well with timeout-basedcachesReplication
  • 42. • Master-Slave replication• Tree Replication• Master-Master replication• Buddy replicationReplication
  • 43. Master-Slave Replication
  • 44. Master-Slave Replication
  • 45. Tree Replication
  • 46. Master-Master Replication
  • 47. Buddy Replication
  • 48. Buddy Replication
  • 49. Scalability Patterns:State
  • 50. •Partitioning•HTTP Caching•RDBMS Sharding•NOSQL•Distributed Caching•Data Grids•ConcurrencyScalability Patterns: State
  • 51. Partitioning
  • 52. HTTP CachingReverse Proxy• Varnish• Squid• rack-cache• Pound• Nginx• Apache mod_proxy• Traffic Server
  • 53. HTTP CachingCDN,Akamai
  • 54. Generate Static ContentPrecompute content• Homegrown + cron or Quartz• Spring Batch• Gearman• Hadoop• Google Data Protocol• Amazon Elastic MapReduce
  • 55. HTTP CachingFirst request
  • 56. HTTP CachingSubsequent request
  • 57. Service of RecordSoR
  • 58. Service of Record•Relational Databases (RDBMS)•NOSQL Databases
  • 59. How toscale outRDBMS?
  • 60. Sharding•Partitioning•Replication
  • 61. Sharding: Partitioning
  • 62. Sharding: Replication
  • 63. ORM + rich domain modelanti-pattern•Attempt:• Read an object from DB•Result:• You sit with your whole database in your lap
  • 64. Think about your data• When do you need ACID?• When is Eventually Consistent a better fit?• Different kinds of data has different needsThink again
  • 65. When isa RDBMSnotgood enough?
  • 66. Scaling readsto a RDBMSis hard
  • 67. Scaling writesto a RDBMSis impossible
  • 68. Do wereally needa RDBMS?
  • 69. Do wereally needa RDBMS?Sometimes...
  • 70. Do wereally needa RDBMS?
  • 71. Do wereally needa RDBMS?But many times we don’t
  • 72. NOSQL(Not Only SQL)
  • 73. •Key-Value databases•Column databases•Document databases•Graph databases•Datastructure databasesNOSQL
  • 74. Who’s ACID?• Relational DBs (MySQL, Oracle, Postgres)• Object DBs (Gemstone, db4o)• Clustering products (Coherence,Terracotta)• Most caching products (ehcache)
  • 75. Who’s BASE?Distributed databases• Cassandra• Riak• Voldemort• Dynomite,• SimpleDB• etc.
  • 76. • Google: Bigtable• Amazon: Dynamo• Amazon: SimpleDB• Yahoo: HBase• Facebook: Cassandra• LinkedIn: VoldemortNOSQL in the wild
  • 77. But first some background...
  • 78. • Distributed Hash Tables (DHT)• Scalable• Partitioned• Fault-tolerant• Decentralized• Peer to peer• Popularized• Node ring• Consistent HashingChord & Pastry
  • 79. Node ring with Consistent HashingFind data in log(N) jumps
  • 80. “How can we build a DB on top of GoogleFile System?”• Paper: Bigtable:A distributed storage systemfor structured data, 2006• Rich data-model, structured storage• Clones:HBaseHypertableNeptuneBigtable
  • 81. “How can we build a distributedhash table for the data center?”• Paper: Dynamo:Amazon’s highly available key-value store, 2007• Focus: partitioning, replication and availability• Eventually Consistent• Clones:VoldemortDynomiteDynamo
  • 82. Types of NOSQL stores• Key-Value databases (Voldemort, Dynomite)• Column databases (Cassandra,Vertica, Sybase IQ)• Document databases (MongoDB, CouchDB)• Graph databases (Neo4J,AllegroGraph)• Datastructure databases (Redis, Hazelcast)
  • 83. Distributed Caching
  • 84. •Write-through•Write-behind•Eviction Policies•Replication•Peer-To-Peer (P2P)Distributed Caching
  • 85. Write-through
  • 86. Write-behind
  • 87. Eviction policies• TTL (time to live)• Bounded FIFO (first in first out)• Bounded LIFO (last in first out)• Explicit cache invalidation
  • 88. Peer-To-Peer• Decentralized• No “special” or “blessed” nodes• Nodes can join and leave as they please
  • 89. •EHCache•JBoss Cache•OSCache•memcachedDistributed CachingProducts
  • 90. memcached• Very fast• Simple• Key-Value (string -­‐>  binary)• Clients for most languages• Distributed• Not replicated - so 1/N chancefor local access in cluster
  • 91. Data Grids / Clustering
  • 92. Data Grids/ClusteringParallel data storage• Data replication• Data partitioning• Continuous availability• Data invalidation• Fail-over• C + P in CAP
  • 93. Data Grids/ClusteringProducts• Coherence• Terracotta• GigaSpaces• GemStone• Tibco Active Matrix• Hazelcast
  • 94. Concurrency
  • 95. •Shared-State Concurrency•Message-Passing Concurrency•Dataflow Concurrency•Software Transactional MemoryConcurrency
  • 96. Shared-StateConcurrency
  • 97. •Everyone can access anything anytime•Totally indeterministic•Introduce determinism at well-definedplaces...•...using locksShared-State Concurrency
  • 98. •Problems with locks:• Locks do not compose• Taking too few locks• Taking too many locks• Taking the wrong locks• Taking locks in the wrong order• Error recovery is hardShared-State Concurrency
  • 99. Please use java.util.concurrent.*• ConcurrentHashMap• BlockingQueue• ConcurrentQueue  • ExecutorService• ReentrantReadWriteLock• CountDownLatch• ParallelArray• and  much  much  more..Shared-State Concurrency
  • 100. Message-PassingConcurrency
  • 101. •Originates in a 1973 paper by CarlHewitt•Implemented in Erlang, Occam, Oz•Encapsulates state and behavior•Closer to the definition of OOthan classesActors
  • 102. Actors• Share NOTHING• Isolated lightweight processes• Communicates through messages• Asynchronous and non-blocking• No shared state… hence, nothing to synchronize.• Each actor has a mailbox (message queue)
  • 103. • Easier to reason about• Raised abstraction level• Easier to avoid–Race conditions–Deadlocks–Starvation–Live locksActors
  • 104. • Akka (Java/Scala)• scalaz actors (Scala)• Lift Actors (Scala)• Scala Actors (Scala)• Kilim (Java)• Jetlang (Java)• Actor’s Guild (Java)• Actorom (Java)• FunctionalJava (Java)• GPars (Groovy)Actor libs for the JVM
  • 105. DataflowConcurrency
  • 106. • Declarative• No observable non-determinism• Data-driven – threads block untildata is available• On-demand, lazy• No difference between:• Concurrent &• Sequential code• Limitations: can’t have side-effectsDataflow Concurrency
  • 107. STM:SoftwareTransactional Memory
  • 108. STM: overview• See the memory (heap and stack)as a transactional dataset• Similar to a database• begin• commit• abort/rollback•Transactions are retriedautomatically upon collision• Rolls back the memory on abort
  • 109. • Transactions can nest• Transactions compose (yipee!!)atomic  {              ...              atomic  {                    ...                }        }  STM: overview
  • 110. All operations in scope ofa transaction:l Need to be idempotentSTM: restrictions
  • 111. • Akka (Java/Scala)• Multiverse (Java)• Clojure STM (Clojure)• CCSTM (Scala)• Deuce STM (Java)STM libs for the JVM
  • 112. Scalability Patterns:Behavior
  • 113. •Event-Driven Architecture•Compute Grids•Load-balancing•Parallel ComputingScalability Patterns:Behavior
  • 114. Event-DrivenArchitecture“Four years from now,‘mere mortals’ will begin toadopt an event-driven architecture (EDA) for thesort of complex event processing that has beenattempted only by software gurus [until now]”--Roy Schulte (Gartner), 2003
  • 115. • Domain Events• Event Sourcing• Command and Query ResponsibilitySegregation (CQRS) pattern• Event Stream Processing• Messaging• Enterprise Service Bus• Actors• Enterprise Integration Architecture (EIA)Event-Driven Architecture
  • 116. Domain Events“Its really become clear to me in the lastcouple of years that we need a new buildingblock and that is the Domain Events”-- Eric Evans, 2009
  • 117. Domain Events“Domain Events represent the state of entitiesat a given time when an important eventoccurred and decouple subsystems with eventstreams. Domain Events give us clearer, moreexpressive models in those cases.”-- Eric Evans, 2009
  • 118. Domain Events“State transitions are an important part ofour problem space and should be modeledwithin our domain.”-- GregYoung, 2008
  • 119. Event Sourcing• Every state change is materialized in an Event• All Events are sent to an EventProcessor• EventProcessor stores all events in an Event Log• System can be reset and Event Log replayed• No need for ORM, just persist the Events• Many different EventListeners can be added toEventProcessor (or listen directly on the Event log)
  • 120. Event Sourcing
  • 121. “A single model cannot be appropriatefor reporting, searching andtransactional behavior.”-- GregYoung, 2008Command and QueryResponsibility Segregation(CQRS) pattern
  • 122. BidirectionalBidirectional
  • 123. UnidirectionalUnidirectionalUnidirectional
  • 124. CQRSin a nutshell• All state changes are represented by Domain Events• Aggregate roots receive Commands and publish Events• Reporting (query database) is updated as a result of thepublished Events•All Queries from Presentation go directly to Reportingand the Domain is not involved
  • 125. CQRSCopyright by Axis Framework
  • 126. CQRS: Benefits• Fully encapsulated domain that only exposesbehavior• Queries do not use the domain model• No object-relational impedance mismatch• Bullet-proof auditing and historical tracing• Easy integration with external systems• Performance and scalability
  • 127. Event Stream Processingselect  *  from  Withdrawal(amount>=200).win:length(5)
  • 128. Event Stream ProcessingProducts• Esper (Open Source)• StreamBase• RuleCast
  • 129. Messaging• Publish-Subscribe• Point-to-Point• Store-forward• Request-Reply
  • 130. Publish-Subscribe
  • 131. Point-to-Point
  • 132. Store-ForwardDurability, event log, auditing etc.
  • 133. Request-ReplyF.e.AMQP’s ‘replyTo’ header
  • 134. Messaging• Standards:• AMQP• JMS• Products:• RabbitMQ (AMQP)• ActiveMQ (JMS)• Tibco• MQSeries• etc
  • 135. ESB
  • 136. ESB products• ServiceMix (Open Source)• Mule (Open Source)• Open ESB (Open Source)• Sonic ESB• WebSphere ESB• Oracle ESB• Tibco• BizTalk Server
  • 137. Actors• Fire-forget• Async send• Fire-And-Receive-Eventually• Async send + wait on Future for reply
  • 138. Enterprise IntegrationPatterns
  • 139. Enterprise IntegrationPatternsApache Camel• More than 80 endpoints• XML (Spring) DSL• Scala DSL
  • 140. Compute Grids
  • 141. Compute GridsParallel execution• Divide and conquer1. Split up job in independent tasks2. Execute tasks in parallel3. Aggregate and return result• MapReduce - Master/Worker
  • 142. Compute GridsParallel execution• Automatic provisioning• Load balancing• Fail-over• Topology resolution
  • 143. Compute GridsProducts• Platform• DataSynapse• Google MapReduce• Hadoop• GigaSpaces• GridGain
  • 144. Load balancing
  • 145. • Random allocation• Round robin allocation• Weighted allocation• Dynamic load balancing• Least connections• Least server CPU• etc.Load balancing
  • 146. Load balancing• DNS Round Robin (simplest)• Ask DNS for IP for host• Get a new IP every time• Reverse Proxy (better)• Hardware Load Balancing
  • 147. Load balancing products• Reverse Proxies:• Apache mod_proxy (OSS)• HAProxy (OSS)• Squid (OSS)• Nginx (OSS)• Hardware Load Balancers:• BIG-IP• Cisco
  • 148. Parallel Computing
  • 149. • UE: Unit of Execution• Process• Thread• Coroutine• ActorParallel Computing• SPMD Pattern• Master/Worker Pattern• Loop Parallelism Pattern• Fork/Join Pattern• MapReduce Pattern
  • 150. SPMD Pattern• Single Program Multiple Data• Very generic pattern, used in manyother patterns• Use a single program for all the UEs• Use the UE’s ID to select differentpathways through the program. F.e:• Branching on ID• Use ID in loop index to split loops• Keep interactions between UEs explicit
  • 151. Master/Worker
  • 152. Master/Worker• Good scalability• Automatic load-balancing• How to detect termination?• Bag of tasks is empty• Poison pill• If we bottleneck on single queue?• Use multiple work queues• Work stealing• What about fault tolerance?• Use “in-progress” queue
  • 153. Loop Parallelism•Workflow1.Find the loops that are bottlenecks2.Eliminate coupling between loop iterations3.Parallelize the loop•If too few iterations to pull its weight• Merge loops• Coalesce nested loops•OpenMP• omp  parallel  for
  • 154. What if task creation can’t be handled by:• parallelizing loops (Loop Parallelism)• putting them on work queues (Master/Worker)
  • 155. What if task creation can’t be handled by:• parallelizing loops (Loop Parallelism)• putting them on work queues (Master/Worker)EnterFork/Join
  • 156. •Use when relationship between tasksis simple•Good for recursive data processing•Can use work-stealing1. Fork:Tasks are dynamically created2. Join:Tasks are later terminated anddata aggregatedFork/Join
  • 157. Fork/Join•Direct task/UE mapping• 1-1 mapping between Task/UE• Problem: Dynamic UE creation is expensive•Indirect task/UE mapping• Pool the UE• Control (constrain) the resource allocation• Automatic load balancing
  • 158. Java 7 ParallelArray (Fork/Join DSL)Fork/Join
  • 159. Java 7 ParallelArray (Fork/Join DSL)ParallelArray  students  =      new  ParallelArray(fjPool,  data);double  bestGpa  =  students.withFilter(isSenior)                                                    .withMapping(selectGpa)                                                    .max();Fork/Join
  • 160. • Origin from Google paper 2004• Used internally @ Google• Variation of Fork/Join• Work divided upfront not dynamically• Usually distributed• Normally used for massive data crunchingMapReduce
  • 161. • Hadoop (OSS), used @Yahoo• Amazon Elastic MapReduce• Many NOSQL DBs utilizes itfor searching/queryingMapReduceProducts
  • 162. MapReduce
  • 163. Parallel Computingproducts• MPI• OpenMP• JSR166 Fork/Join• java.util.concurrent• ExecutorService, BlockingQueue etc.• ProActive Parallel Suite• CommonJ WorkManager (JEE)
  • 164. Stability Patterns
  • 165. •Timeouts•Circuit Breaker•Let-it-crash•Fail fast•Bulkheads•Steady State•ThrottlingStability Patterns
  • 166. TimeoutsAlways use timeouts (if possible):• Thread.wait(timeout)• reentrantLock.tryLock• blockingQueue.poll(timeout,  timeUnit)/offer(..)• futureTask.get(timeout,  timeUnit)• socket.setSoTimeOut(timeout)• etc.
  • 167. Circuit Breaker
  • 168. Let it crash• Embrace failure as a natural state inthe life-cycle of the application• Instead of trying to prevent it;manage it• Process supervision• Supervisor hierarchies (from Erlang)
  • 169. Restart StrategyOneForOne
  • 170. Restart StrategyOneForOne
  • 171. Restart StrategyOneForOne
  • 172. Restart StrategyAllForOne
  • 173. Restart StrategyAllForOne
  • 174. Restart StrategyAllForOne
  • 175. Restart StrategyAllForOne
  • 176. Supervisor Hierarchies
  • 177. Supervisor Hierarchies
  • 178. Supervisor Hierarchies
  • 179. Supervisor Hierarchies
  • 180. Fail fast• Avoid “slow responses”• Separate:• SystemError - resources not available• ApplicationError - bad user input etc• Verify resource availability beforestarting expensive task• Input validation immediately
  • 181. Bulkheads
  • 182. Bulkheads• Partition and toleratefailure in one part• Redundancy• Applies to threads as well:• One pool for admin tasksto be able to perform taskseven though all threads areblocked
  • 183. Steady State• Clean up after you• Logging:• RollingFileAppender (log4j)• logrotate (Unix)• Scribe - server for aggregating streaming log data• Always put logs on separate disk
  • 184. Throttling• Maintain a steady pace• Count requests• If limit reached, back-off (drop, raise error)• Queue requests• Used in for example Staged Event-DrivenArchitecture (SEDA)
  • 185. ?
  • 186. thanksfor listening
  • 187. Extra material
  • 188. Client-side consistency• Strong consistency• Weak consistency• Eventually consistent• Never consistent
  • 189. Client-sideEventual Consistency levels• Casual consistency• Read-your-writes consistency (important)• Session consistency• Monotonic read consistency (important)• Monotonic write consistency
  • 190. Server-side consistencyN = the number of nodes that store replicas ofthe dataW = the number of replicas that need toacknowledge the receipt of the update before theupdate completesR = the number of replicas that are contactedwhen a data object is accessed through a read operation
  • 191. Server-side consistencyW + R > N strong consistencyW + R <= N eventual consistency