High Availability and
Scalability: Too Expensive!–
Architectures for Future
Enterprise Systems

Eberhard Wolff
Freelance C...
The Dream

Foto: http://www.vaxman.de/

Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
Where Are We?

Eberhard Wolff - @ewolff
Non-functional
Requirements
Eberhard Wolff - @ewolff
Availability
Performance
Eberhard Wolff - @ewolff
Availability
Performance
Eberhard Wolff - @ewolff
Availability:

Traditional
Approach

Eberhard Wolff - @ewolff
•  Buy highly reliable
hardware
•  Built a small cluster
•  2 machines
•  Maybe add a stand-by
data center
Eberhard Wolff ...
•  Eventually system will fail
•  …and you are in real trouble

Eberhard Wolff - @ewolff
True Story
• 
• 
• 
• 

“Machine rebooted over night.”
“Several times.”
“No idea how often.”
“No idea why…”

Eberhard Wolf...
Let’s look at an
example

Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
•  Server fails
•  Application fails
•  No service to the customer
•  Can we do better?

Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
What You Have
Just Seen

Eberhard Wolff - @ewolff
•  Failing systems do not impact user
•  Failing systems are just restarted
•  Restarts happen automatically
•  System run...
System
EU West 1a
Elastic
Load
Balancer

System
EU West 1b
System
EU West 1c
Eberhard Wolff - @ewolff
What It Takes…
•  Virtualization
•  +API to start new servers
•  Watchdog to detect failed servers
•  Redundant data cente...
Can be implemented
in your datacenter!
I have none.

So I used the Amazon Cloud
Eberhard Wolff - @ewolff
Alternatives

Eberhard Wolff - @ewolff
Hardware
•  As cheap as it gets
•  Not highly available
•  Availability in Software

Eberhard Wolff - @ewolff
Traditional Servers

Eberhard Wolff - @ewolff
Traditional Servers

Eberhard Wolff - @ewolff
Highly
customized
Hard to
reproduce
Eberhard Wolff - @ewolff
•  Depends on details
•  True story:
•  Order of patch
installations matter
Eberhard Wolff - @ewolff
Stateful
Eberhard Wolff - @ewolff
Redundancy in
Hardware
Eberhard Wolff - @ewolff
Traditional Servers

Eberhard Wolff - @ewolff
Phoenix Servers

Eberhard Wolff - @ewolff
Easy to create a
new server
Eberhard Wolff - @ewolff
Reliably
reproducible
Eberhard Wolff - @ewolff
Stateless
Eberhard Wolff - @ewolff
Stateless
•  No data is lost
•  New server can take load
immediately

Eberhard Wolff - @ewolff
Redundancy in
Software
Eberhard Wolff - @ewolff
Implementations
•  Might use a VM image
•  …or a PaaS
•  …or provisioning tools

Eberhard Wolff - @ewolff
Provisioning Tools

Eberhard Wolff - @ewolff
•  Easy to create test environments
•  …with other software version

Eberhard Wolff - @ewolff
Chaos Monkey
•  Tool by Netflix
•  Video streaming
•  #1 in Internet usage in the US

Eberhard Wolff - @ewolff
Chaos Monkey
•  Kill random machines
•  To ensure system survives
hardware failures

Eberhard Wolff - @ewolff
Would you rather rely on…
…highly available hardware
…or a Chaos Monkey tested
system?
Eberhard Wolff - @ewolff
Resilience
Eberhard Wolff - @ewolff
Availability
Performance
Eberhard Wolff - @ewolff
Availability
Performance
Eberhard Wolff - @ewolff
Performance:
Traditional
Approach

Eberhard Wolff - @ewolff
• 
• 
• 
• 
• 

Estimate
#Users
Use Cases
Data volume
Etc.

•  Add a little bit
•  Order servers

Eberhard Wolff - @ewolff
Performance:
Problems

Eberhard Wolff - @ewolff
Problem: Estimate & Scaling
•  Performance hard to estimate
•  Coarse grained scaling
•  Backfires

Eberhard Wolff - @ewol...
True Story
• 
• 
• 
• 
• 
• 
• 

Initial estimate wrong
Just need a little more
Cluster: two servers
Add one
About 50% hig...
Problem: Load Peak
•  Business has load peaks
•  i.e. events that people register for
•  Need to have enough hardware for
...
Problem: Testing
•  Testing
•  Need production-like infrastructure
•  Prohibitive costs
•  Only needed during tests

Eberh...
Eberhard Wolff - @ewolff
System
EU West 1b
Elastic
Load
Balancer

System
EU West 1c
System
EU West 1c
System
EU West 1c
Eberhard Wolff - @ewolff
What You Have Just Seen
•  System tunes itself depending on
load
•  Same approach as for availability
•  +Watchdog for loa...
Easy to create a new server
Redundancy in Software
Reliably reproducible

✔

✔

✔

Stateless ?
Eberhard Wolff - @ewolff
Stateless
•  Stateless web servers: best practice
•  Some Java framework don’t follow
the approach
•  Can store HTTP sessi...
What about
Databases?
Eberhard Wolff - @ewolff
Databases

•  Often assumed to be
just “fast and scalable”
•  Large scale doable i.e.
Data Warehouse
•  Often use traditio...
Database: Problems
•  Availability
•  Highly available hardware
•  Performance
•  Limited scaling
•  Costly
Eberhard Wolff...
Databases
•  New approaches
•  Used by NoSQL databases
•  But also i.e. MySQL
•  …or in system architecture
Eberhard Wolff...
Databases
•  Replication
•  Read performance
•  Availability
•  Sharding
•  Spread data across servers
•  Write performanc...
Scaling MongoDB
Replica 1

Replica 1

Replica 2

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2
Eberhard Wolff - @ewolf...
Availability
Replica 1

Replica 1

Replica 2

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2
Eberhard Wolff - @ewolff
Scaling MongoDB
Replica 1

Replica 1

Replica 1

Replica 2

Replica 2

Replica 2

Replica 3

Replica 3

Replica 3

Shard 1...
Scaling MongoDB
Replica 1
Replica 2

Replica 1

?

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2
Eberhard Wolff - @ewo...
Replicas & Shards
•  Easy to understand
•  But: Coarse grained scaling
•  Adding another shard means
•  Moving lots of dat...
Amazon Dynamo Model
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4

Server D
Shard2
Shard4

Server C
Shard...
Amazon Dynamo Model
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4

Server D
Shard2
Shard4

Server C
Shard...
Amazon Dynamo Model
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4
New Server

Server D
Shard2
Shard4

Ser...
Amazon Dynamo Model
•  Published in the Dynamo paper
•  Implementations:
Riak, Cassandra etc
•  Fine grained scaling
•  Ca...
Hardware
•  Not highly reliable
•  Scales by distributing load across
servers
•  No NAS, SAN, RAID…
•  As cheap as it gets...
Sum Up
• 
• 
• 
• 
• 
• 
• 

Virtualization
+ Phoenix server
= Better availability
= Better performance
= Lower costs
Stat...
Thank You!
Eberhard Wolff - @ewolff
Upcoming SlideShare
Loading in...5
×

High Availability and Scalability: Too Expensive! Architectures for Future Enterprise Systems

1,785

Published on

High availability and scalability used to be solved in hardware - but that is quite expensive. This presentation shows how modern technologies like virtualization, cloud, NoSQL and new software architectures provide new and cheaper solutions - that are probably also even better than the traditional approaches.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,785
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
43
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

High Availability and Scalability: Too Expensive! Architectures for Future Enterprise Systems

  1. 1. High Availability and Scalability: Too Expensive!– Architectures for Future Enterprise Systems Eberhard Wolff Freelance Consultant / Trainer
 Head Technolocy Advisory Board adesso AG Eberhard Wolff - @ewolff
  2. 2. The Dream Foto: http://www.vaxman.de/ Eberhard Wolff - @ewolff
  3. 3. Eberhard Wolff - @ewolff
  4. 4. Eberhard Wolff - @ewolff
  5. 5. Eberhard Wolff - @ewolff
  6. 6. Where Are We? Eberhard Wolff - @ewolff
  7. 7. Non-functional Requirements Eberhard Wolff - @ewolff
  8. 8. Availability Performance Eberhard Wolff - @ewolff
  9. 9. Availability Performance Eberhard Wolff - @ewolff
  10. 10. Availability:
 Traditional Approach Eberhard Wolff - @ewolff
  11. 11. •  Buy highly reliable hardware •  Built a small cluster •  2 machines •  Maybe add a stand-by data center Eberhard Wolff - @ewolff
  12. 12. •  Eventually system will fail •  …and you are in real trouble Eberhard Wolff - @ewolff
  13. 13. True Story •  •  •  •  “Machine rebooted over night.” “Several times.” “No idea how often.” “No idea why…” Eberhard Wolff - @ewolff
  14. 14. Let’s look at an example Eberhard Wolff - @ewolff
  15. 15. Eberhard Wolff - @ewolff
  16. 16. •  Server fails •  Application fails •  No service to the customer •  Can we do better? Eberhard Wolff - @ewolff
  17. 17. Eberhard Wolff - @ewolff
  18. 18. What You Have Just Seen Eberhard Wolff - @ewolff
  19. 19. •  Failing systems do not impact user •  Failing systems are just restarted •  Restarts happen automatically •  System run in different data centers •  i.e. eu-west-1a / b / c Eberhard Wolff - @ewolff
  20. 20. System EU West 1a Elastic Load Balancer System EU West 1b System EU West 1c Eberhard Wolff - @ewolff
  21. 21. What It Takes… •  Virtualization •  +API to start new servers •  Watchdog to detect failed servers •  Redundant data centers if needed Eberhard Wolff - @ewolff
  22. 22. Can be implemented in your datacenter! I have none. So I used the Amazon Cloud Eberhard Wolff - @ewolff
  23. 23. Alternatives Eberhard Wolff - @ewolff
  24. 24. Hardware •  As cheap as it gets •  Not highly available •  Availability in Software Eberhard Wolff - @ewolff
  25. 25. Traditional Servers Eberhard Wolff - @ewolff
  26. 26. Traditional Servers Eberhard Wolff - @ewolff
  27. 27. Highly customized Hard to reproduce Eberhard Wolff - @ewolff
  28. 28. •  Depends on details •  True story: •  Order of patch installations matter Eberhard Wolff - @ewolff
  29. 29. Stateful Eberhard Wolff - @ewolff
  30. 30. Redundancy in Hardware Eberhard Wolff - @ewolff
  31. 31. Traditional Servers Eberhard Wolff - @ewolff
  32. 32. Phoenix Servers Eberhard Wolff - @ewolff
  33. 33. Easy to create a new server Eberhard Wolff - @ewolff
  34. 34. Reliably reproducible Eberhard Wolff - @ewolff
  35. 35. Stateless Eberhard Wolff - @ewolff
  36. 36. Stateless •  No data is lost •  New server can take load immediately Eberhard Wolff - @ewolff
  37. 37. Redundancy in Software Eberhard Wolff - @ewolff
  38. 38. Implementations •  Might use a VM image •  …or a PaaS •  …or provisioning tools Eberhard Wolff - @ewolff
  39. 39. Provisioning Tools Eberhard Wolff - @ewolff
  40. 40. •  Easy to create test environments •  …with other software version Eberhard Wolff - @ewolff
  41. 41. Chaos Monkey •  Tool by Netflix •  Video streaming •  #1 in Internet usage in the US Eberhard Wolff - @ewolff
  42. 42. Chaos Monkey •  Kill random machines •  To ensure system survives hardware failures Eberhard Wolff - @ewolff
  43. 43. Would you rather rely on… …highly available hardware …or a Chaos Monkey tested system? Eberhard Wolff - @ewolff
  44. 44. Resilience Eberhard Wolff - @ewolff
  45. 45. Availability Performance Eberhard Wolff - @ewolff
  46. 46. Availability Performance Eberhard Wolff - @ewolff
  47. 47. Performance: Traditional Approach Eberhard Wolff - @ewolff
  48. 48. •  •  •  •  •  Estimate #Users Use Cases Data volume Etc. •  Add a little bit •  Order servers Eberhard Wolff - @ewolff
  49. 49. Performance: Problems Eberhard Wolff - @ewolff
  50. 50. Problem: Estimate & Scaling •  Performance hard to estimate •  Coarse grained scaling •  Backfires Eberhard Wolff - @ewolff
  51. 51. True Story •  •  •  •  •  •  •  Initial estimate wrong Just need a little more Cluster: two servers Add one About 50% higher costs Order / install server takes time Bad performance until server delivered Eberhard Wolff - @ewolff
  52. 52. Problem: Load Peak •  Business has load peaks •  i.e. events that people register for •  Need to have enough hardware for load peaks •  Costly Eberhard Wolff - @ewolff
  53. 53. Problem: Testing •  Testing •  Need production-like infrastructure •  Prohibitive costs •  Only needed during tests Eberhard Wolff - @ewolff
  54. 54. Eberhard Wolff - @ewolff
  55. 55. System EU West 1b Elastic Load Balancer System EU West 1c System EU West 1c System EU West 1c Eberhard Wolff - @ewolff
  56. 56. What You Have Just Seen •  System tunes itself depending on load •  Same approach as for availability •  +Watchdog for load Eberhard Wolff - @ewolff
  57. 57. Easy to create a new server Redundancy in Software Reliably reproducible ✔ ✔ ✔ Stateless ? Eberhard Wolff - @ewolff
  58. 58. Stateless •  Stateless web servers: best practice •  Some Java framework don’t follow the approach •  Can store HTTP session externally •  i.e. RDBMS, NoSQL, Cache Eberhard Wolff - @ewolff
  59. 59. What about Databases? Eberhard Wolff - @ewolff
  60. 60. Databases •  Often assumed to be just “fast and scalable” •  Large scale doable i.e. Data Warehouse •  Often use traditional approach •  Cluster with two nodes •  Highly available hardware Eberhard Wolff - @ewolff
  61. 61. Database: Problems •  Availability •  Highly available hardware •  Performance •  Limited scaling •  Costly Eberhard Wolff - @ewolff
  62. 62. Databases •  New approaches •  Used by NoSQL databases •  But also i.e. MySQL •  …or in system architecture Eberhard Wolff - @ewolff
  63. 63. Databases •  Replication •  Read performance •  Availability •  Sharding •  Spread data across servers •  Write performance Eberhard Wolff - @ewolff
  64. 64. Scaling MongoDB Replica 1 Replica 1 Replica 2 Replica 2 Replica 3 Replica 3 Shard 1 Shard 2 Eberhard Wolff - @ewolff
  65. 65. Availability Replica 1 Replica 1 Replica 2 Replica 2 Replica 3 Replica 3 Shard 1 Shard 2 Eberhard Wolff - @ewolff
  66. 66. Scaling MongoDB Replica 1 Replica 1 Replica 1 Replica 2 Replica 2 Replica 2 Replica 3 Replica 3 Replica 3 Shard 1 Shard 2 Shard 3 Eberhard Wolff - @ewolff
  67. 67. Scaling MongoDB Replica 1 Replica 2 Replica 1 ? Replica 2 Replica 3 Replica 3 Shard 1 Shard 2 Eberhard Wolff - @ewolff
  68. 68. Replicas & Shards •  Easy to understand •  But: Coarse grained scaling •  Adding another shard means •  Moving lots of data •  Add quite some servers Eberhard Wolff - @ewolff
  69. 69. Amazon Dynamo Model Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  70. 70. Amazon Dynamo Model Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  71. 71. Amazon Dynamo Model Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 New Server Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  72. 72. Amazon Dynamo Model •  Published in the Dynamo paper •  Implementations: Riak, Cassandra etc •  Fine grained scaling •  Can immediately write to new node Eberhard Wolff - @ewolff
  73. 73. Hardware •  Not highly reliable •  Scales by distributing load across servers •  No NAS, SAN, RAID… •  As cheap as it gets Eberhard Wolff - @ewolff
  74. 74. Sum Up •  •  •  •  •  •  •  Virtualization + Phoenix server = Better availability = Better performance = Lower costs Stateless servers NoSQL Eberhard Wolff - @ewolff
  75. 75. Thank You! Eberhard Wolff - @ewolff
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×