Building a Scalable Architecture for Web Apps -  Part I (Lessons Learned @ Directi) <ul><li>By Bhavin Turakhia  </li></ul>...
Agenda <ul><li>Why is Scalability important </li></ul><ul><li>Introduction to the Variables and Factors </li></ul><ul><li>...
Why is Scalability Important in a Web 2.0 world <ul><li>Viral marketing can result in instant successes </li></ul><ul><li>...
The Variables <ul><li>Scalability  -  Number of users / sessions / transactions / operations the entire system can perform...
The Factors <ul><li>Platform selection </li></ul><ul><li>Hardware </li></ul><ul><li>Application Design </li></ul><ul><li>D...
Lets Start … <ul><li>We will now build an example architecture for an example app using the following iterative incrementa...
Step 1 – Lets Start … Creative Commons Sharealike Attributions Noncommercial Appserver & DBServer
Step 2 – Vertical Scaling Creative Commons Sharealike Attributions Noncommercial Appserver, DBServer CPU CPU RAM RAM
Step 2 - Vertical Scaling <ul><li>Introduction </li></ul><ul><ul><li>Increasing the hardware resources without changing th...
Step 3 – Vertical Partitioning (Services) Creative Commons Sharealike Attributions Noncommercial AppServer DBServer <ul><l...
Understanding Vertical Partitioning Creative Commons Sharealike Attributions Noncommercial <ul><li>The term Vertical Parti...
Step 4 – Horizontal Scaling (App Server) Creative Commons Sharealike Attributions Noncommercial AppServer AppServer AppSer...
Understanding Horizontal Scaling Creative Commons Sharealike Attributions Noncommercial <ul><li>The term Horizontal Scalin...
Load Balancer – Hardware vs Software Creative Commons Sharealike Attributions Noncommercial <ul><li>Hardware Load balancer...
Load Balancer – Session Management Creative Commons Sharealike Attributions Noncommercial <ul><li>Sticky Sessions </li></u...
Load Balancer – Session Management Creative Commons Sharealike Attributions Noncommercial <ul><li>Central Session Store </...
Load Balancer – Session Management Creative Commons Sharealike Attributions Noncommercial <ul><li>Clustered Session Manage...
Load Balancer – Session Management Creative Commons Sharealike Attributions Noncommercial <ul><li>Sticky Sessions with Cen...
Load Balancer – Session Management Creative Commons Sharealike Attributions Noncommercial <ul><li>Recommendation </li></ul...
Load Balancer – Removing SPOF Creative Commons Sharealike Attributions Noncommercial <ul><li>In a Load Balanced App Server...
Step 4 – Horizontal Scaling (App Server) Creative Commons Sharealike Attributions Noncommercial DBServer <ul><li>Our deplo...
Step 5 – Vertical Partitioning (Hardware) Creative Commons Sharealike Attributions Noncommercial DBServer <ul><li>Introduc...
Step 6 – Horizontal Scaling (DB) Creative Commons Sharealike Attributions Noncommercial DBServer <ul><li>Introduction </li...
Shared Nothing Cluster Creative Commons Sharealike Attributions Noncommercial <ul><li>Each DB Server node has its  own com...
Replication Considerations Creative Commons Sharealike Attributions Noncommercial <ul><li>Master-Slave </li></ul><ul><ul><...
Replication Considerations Creative Commons Sharealike Attributions Noncommercial <ul><li>Asynchronous </li></ul><ul><ul><...
Replication Considerations Creative Commons Sharealike Attributions Noncommercial <ul><li>Replication at RDBMS level </li>...
Real Application Cluster Creative Commons Sharealike Attributions Noncommercial <ul><li>All DB Servers in the cluster shar...
Recommendation Creative Commons Sharealike Attributions Noncommercial <ul><li>Try and choose a DB which natively supports ...
Step 6 – Horizontal Scaling (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Our architecture now looks...
Step 6 – Horizontal Scaling (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Shared nothing clusters ha...
Step 7 – Vertical / Horizontal Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Introducti...
Vertical Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Take a set of tables and move th...
Vertical Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Negatives </li></ul><ul><ul><li>...
Horizontal Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Take a set of rows and move th...
Horizontal Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Techniques </li></ul><ul><ul><...
Step 7 – Vertical / Horizontal Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial Lookup Map <ul><li...
Step 8 – Separating Sets Creative Commons Sharealike Attributions Noncommercial Lookup Map Lookup Map Global Redirector Gl...
Creating Sets Creative Commons Sharealike Attributions Noncommercial <ul><li>The goal behind creating sets is easier manag...
Step 8 – Horizontal Partitioning (Sets) Creative Commons Sharealike Attributions Noncommercial App Servers Cluster DB Clus...
Step 9 – Caching Creative Commons Sharealike Attributions Noncommercial <ul><li>Add caches within App Server </li></ul><ul...
Step 10 – HTTP Accelerator Creative Commons Sharealike Attributions Noncommercial <ul><li>If your app is a web app you sho...
Step 11 – Other cool stuff Creative Commons Sharealike Attributions Noncommercial <ul><li>CDNs </li></ul><ul><li>IP Anycas...
Platform Selection Considerations Creative Commons Sharealike Attributions Noncommercial <ul><li>Programming Languages and...
Tips Creative Commons Sharealike Attributions Noncommercial <ul><li>All the techniques we learnt today can be applied in a...
Questions?? bhavin.t@directi.com  http://directi.com http://careers.directi.com   Download slides:  http://wiki.directi.com
Upcoming SlideShare
Loading in...5
×

Building A Scalable Architecture

2,038

Published on

This is a presentation I delivered at the Great Indian Developer Summit 2008. It covers a wide-array of topics and a plethora of lessons we have learnt (some the hard way) over the last 9 years in building web apps that are used by millions of users serving billions of page views every month. Topics and Techniques include Vertical scaling, Horizontal Scaling, Vertical Partitioning, Horizontal Partitioning, Loose Coupling, Caching, Clustering, Reverse Proxying and more.

Published in: Technology, Education
1 Comment
4 Likes
Statistics
Notes
  • Excellent literature on architecture. You have done a great job in defining what true scability is!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,038
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
93
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Building A Scalable Architecture

  1. 1. Building a Scalable Architecture for Web Apps - Part I (Lessons Learned @ Directi) <ul><li>By Bhavin Turakhia </li></ul><ul><li>CEO, Directi </li></ul><ul><li>( http://www.directi.com | http://wiki.directi.com | http://careers.directi.com ) </li></ul>Licensed under Creative Commons Attribution Sharealike Noncommercial
  2. 2. Agenda <ul><li>Why is Scalability important </li></ul><ul><li>Introduction to the Variables and Factors </li></ul><ul><li>Building our own Scalable Architecture (in incremental steps) </li></ul><ul><ul><li>Vertical Scaling </li></ul></ul><ul><ul><li>Vertical Partitioning </li></ul></ul><ul><ul><li>Horizontal Scaling </li></ul></ul><ul><ul><li>Horizontal Partitioning </li></ul></ul><ul><ul><li>… etc </li></ul></ul><ul><li>Platform Selection Considerations </li></ul><ul><li>Tips </li></ul>Creative Commons Sharealike Attributions Noncommercial
  3. 3. Why is Scalability Important in a Web 2.0 world <ul><li>Viral marketing can result in instant successes </li></ul><ul><li>RSS / Ajax / SOA </li></ul><ul><ul><li>pull based / polling type </li></ul></ul><ul><ul><li>XML protocols - Meta-data > data </li></ul></ul><ul><ul><li>Number of Requests exponentially grows with user base </li></ul></ul><ul><li>RoR / Grails – Dynamic language landscape gaining popularity </li></ul><ul><li>In the end you want to build a Web 2.0 app that can serve millions of users with ZERO downtime </li></ul>Creative Commons Sharealike Attributions Noncommercial
  4. 4. The Variables <ul><li>Scalability - Number of users / sessions / transactions / operations the entire system can perform </li></ul><ul><li>Performance – Optimal utilization of resources </li></ul><ul><li>Responsiveness – Time taken per operation </li></ul><ul><li>Availability - Probability of the application or a portion of the application being available at any given point in time </li></ul><ul><li>Downtime Impact - The impact of a downtime of a server/service/resource - number of users, type of impact etc </li></ul><ul><li>Cost </li></ul><ul><li>Maintenance Effort </li></ul>Creative Commons Sharealike Attributions Noncommercial High : scalability, availability, performance & responsiveness Low : downtime impact, cost & maintenance effort
  5. 5. The Factors <ul><li>Platform selection </li></ul><ul><li>Hardware </li></ul><ul><li>Application Design </li></ul><ul><li>Database/Datastore Structure and Architecture </li></ul><ul><li>Deployment Architecture </li></ul><ul><li>Storage Architecture </li></ul><ul><li>Abuse prevention </li></ul><ul><li>Monitoring mechanisms </li></ul><ul><li>… and more </li></ul>Creative Commons Sharealike Attributions Noncommercial
  6. 6. Lets Start … <ul><li>We will now build an example architecture for an example app using the following iterative incremental steps – </li></ul><ul><ul><li>Inspect current Architecture </li></ul></ul><ul><ul><li>Identify Scalability Bottlenecks </li></ul></ul><ul><ul><li>Identify SPOFs and Availability Issues </li></ul></ul><ul><ul><li>Identify Downtime Impact Risk Zones </li></ul></ul><ul><ul><li>Apply one of - </li></ul></ul><ul><ul><ul><li>Vertical Scaling </li></ul></ul></ul><ul><ul><ul><li>Vertical Partitioning </li></ul></ul></ul><ul><ul><ul><li>Horizontal Scaling </li></ul></ul></ul><ul><ul><ul><li>Horizontal Partitioning </li></ul></ul></ul><ul><ul><li>Repeat process </li></ul></ul>Creative Commons Sharealike Attributions Noncommercial
  7. 7. Step 1 – Lets Start … Creative Commons Sharealike Attributions Noncommercial Appserver & DBServer
  8. 8. Step 2 – Vertical Scaling Creative Commons Sharealike Attributions Noncommercial Appserver, DBServer CPU CPU RAM RAM
  9. 9. Step 2 - Vertical Scaling <ul><li>Introduction </li></ul><ul><ul><li>Increasing the hardware resources without changing the number of nodes </li></ul></ul><ul><ul><li>Referred to as “Scaling up” the Server </li></ul></ul><ul><li>Advantages </li></ul><ul><ul><li>Simple to implement </li></ul></ul><ul><li>Disadvantages </li></ul><ul><ul><li>Finite limit </li></ul></ul><ul><ul><li>Hardware does not scale linearly (diminishing returns for each incremental unit) </li></ul></ul><ul><ul><li>Requires downtime </li></ul></ul><ul><ul><li>Increases Downtime Impact </li></ul></ul><ul><ul><li>Incremental costs increase exponentially </li></ul></ul>Creative Commons Sharealike Attributions Noncommercial Appserver, DBServer CPU CPU RAM RAM CPU CPU RAM RAM
  10. 10. Step 3 – Vertical Partitioning (Services) Creative Commons Sharealike Attributions Noncommercial AppServer DBServer <ul><li>Introduction </li></ul><ul><ul><li>Deploying each service on a separate node </li></ul></ul><ul><li>Positives </li></ul><ul><ul><li>Increases per application Availability </li></ul></ul><ul><ul><li>Task-based specialization, optimization and tuning possible </li></ul></ul><ul><ul><li>Reduces context switching </li></ul></ul><ul><ul><li>Simple to implement for out of band processes </li></ul></ul><ul><ul><li>No changes to App required </li></ul></ul><ul><ul><li>Flexibility increases </li></ul></ul><ul><li>Negatives </li></ul><ul><ul><li>Sub-optimal resource utilization </li></ul></ul><ul><ul><li>May not increase overall availability </li></ul></ul><ul><ul><li>Finite Scalability </li></ul></ul>
  11. 11. Understanding Vertical Partitioning Creative Commons Sharealike Attributions Noncommercial <ul><li>The term Vertical Partitioning denotes – </li></ul><ul><ul><li>Increase in the number of nodes by distributing the tasks/functions </li></ul></ul><ul><ul><li>Each node (or cluster) performs separate Tasks </li></ul></ul><ul><ul><li>Each node (or cluster) is different from the other </li></ul></ul><ul><li>Vertical Partitioning can be performed at various layers (App / Server / Data / Hardware etc) </li></ul>
  12. 12. Step 4 – Horizontal Scaling (App Server) Creative Commons Sharealike Attributions Noncommercial AppServer AppServer AppServer Load Balancer DBServer <ul><li>Introduction </li></ul><ul><ul><li>Increasing the number of nodes of the App Server through Load Balancing </li></ul></ul><ul><ul><li>Referred to as “Scaling out” the App Server </li></ul></ul>
  13. 13. Understanding Horizontal Scaling Creative Commons Sharealike Attributions Noncommercial <ul><li>The term Horizontal Scaling denotes – </li></ul><ul><ul><li>Increase in the number of nodes by replicating the nodes </li></ul></ul><ul><ul><li>Each node performs the same Tasks </li></ul></ul><ul><ul><li>Each node is identical </li></ul></ul><ul><ul><li>Typically the collection of nodes maybe known as a cluster (though the term cluster is often misused) </li></ul></ul><ul><ul><li>Also referred to as “Scaling Out” </li></ul></ul><ul><li>Horizontal Scaling can be performed for any particular type of node (AppServer / DBServer etc) </li></ul>
  14. 14. Load Balancer – Hardware vs Software Creative Commons Sharealike Attributions Noncommercial <ul><li>Hardware Load balancers are faster </li></ul><ul><li>Software Load balancers are more customizable </li></ul><ul><li>With HTTP Servers load balancing is typically combined with http accelerators </li></ul>
  15. 15. Load Balancer – Session Management Creative Commons Sharealike Attributions Noncommercial <ul><li>Sticky Sessions </li></ul><ul><ul><li>Requests for a given user are sent to a fixed App Server </li></ul></ul><ul><ul><li>Observations </li></ul></ul><ul><ul><ul><li>Asymmetrical load distribution (especially during downtimes) </li></ul></ul></ul><ul><ul><ul><li>Downtime Impact – Loss of session data </li></ul></ul></ul>AppServer AppServer AppServer Load Balancer Sticky Sessions User 1 User 2
  16. 16. Load Balancer – Session Management Creative Commons Sharealike Attributions Noncommercial <ul><li>Central Session Store </li></ul><ul><ul><li>Introduces SPOF </li></ul></ul><ul><ul><li>An additional variable </li></ul></ul><ul><ul><li>Session reads and writes generate Disk + Network I/O </li></ul></ul><ul><ul><li>Also known as a Shared Session Store Cluster </li></ul></ul>AppServer AppServer AppServer Load Balancer Session Store Central Session Storage
  17. 17. Load Balancer – Session Management Creative Commons Sharealike Attributions Noncommercial <ul><li>Clustered Session Management </li></ul><ul><ul><li>Easier to setup </li></ul></ul><ul><ul><li>No SPOF </li></ul></ul><ul><ul><li>Session reads are instantaneous </li></ul></ul><ul><ul><li>Session writes generate Network I/O </li></ul></ul><ul><ul><li>Network I/O increases exponentially with increase in number of nodes </li></ul></ul><ul><ul><li>In very rare circumstances a request may get stale session data </li></ul></ul><ul><ul><ul><li>User request reaches subsequent node faster than intra-node message </li></ul></ul></ul><ul><ul><ul><li>Intra-node communication fails </li></ul></ul></ul><ul><ul><li>AKA Shared-nothing Cluster </li></ul></ul>AppServer AppServer AppServer Load Balancer Clustered Session Management
  18. 18. Load Balancer – Session Management Creative Commons Sharealike Attributions Noncommercial <ul><li>Sticky Sessions with Central Session Store </li></ul><ul><ul><li>Downtime does not cause loss of data </li></ul></ul><ul><ul><li>Session reads need not generate network I/O </li></ul></ul><ul><li>Sticky Sessions with Clustered Session Management </li></ul><ul><ul><li>No specific advantages </li></ul></ul>Sticky Sessions AppServer AppServer AppServer Load Balancer User 1 User 2
  19. 19. Load Balancer – Session Management Creative Commons Sharealike Attributions Noncommercial <ul><li>Recommendation </li></ul><ul><ul><li>Use Clustered Session Management if you have – </li></ul></ul><ul><ul><ul><li>Smaller Number of App Servers </li></ul></ul></ul><ul><ul><ul><li>Fewer Session writes </li></ul></ul></ul><ul><ul><li>Use a Central Session Store elsewhere </li></ul></ul><ul><ul><li>Use sticky sessions only if you have to </li></ul></ul>
  20. 20. Load Balancer – Removing SPOF Creative Commons Sharealike Attributions Noncommercial <ul><li>In a Load Balanced App Server Cluster the LB is an SPOF </li></ul><ul><li>Setup LB in Active-Active or Active-Passive mode </li></ul><ul><ul><li>Note: Active-Active nevertheless assumes that each LB is independently able to take up the load of the other </li></ul></ul><ul><ul><li>If one wants ZERO downtime, then Active-Active becomes truly cost beneficial only if multiple LBs (more than 3 to 4) are daisy chained as Active-Active forming an LB Cluster </li></ul></ul>AppServer AppServer AppServer Load Balancer Active-Passive LB Load Balancer AppServer AppServer AppServer Load Balancer Active-Active LB Load Balancer Users Users
  21. 21. Step 4 – Horizontal Scaling (App Server) Creative Commons Sharealike Attributions Noncommercial DBServer <ul><li>Our deployment at the end of Step 4 </li></ul><ul><li>Positives </li></ul><ul><ul><li>Increases Availability and Scalability </li></ul></ul><ul><ul><li>No changes to App required </li></ul></ul><ul><ul><li>Easy setup </li></ul></ul><ul><li>Negatives </li></ul><ul><ul><li>Finite Scalability </li></ul></ul>Load Balanced App Servers
  22. 22. Step 5 – Vertical Partitioning (Hardware) Creative Commons Sharealike Attributions Noncommercial DBServer <ul><li>Introduction </li></ul><ul><ul><li>Partitioning out the Storage function using a SAN </li></ul></ul><ul><li>SAN config options </li></ul><ul><ul><li>Refer to “Demystifying Storage” at http://wiki.directi.com -> Dev University -> Presentations </li></ul></ul><ul><li>Positives </li></ul><ul><ul><li>Allows “Scaling Up” the DB Server </li></ul></ul><ul><ul><li>Boosts Performance of DB Server </li></ul></ul><ul><li>Negatives </li></ul><ul><ul><li>Increases Cost </li></ul></ul>SAN Load Balanced App Servers
  23. 23. Step 6 – Horizontal Scaling (DB) Creative Commons Sharealike Attributions Noncommercial DBServer <ul><li>Introduction </li></ul><ul><ul><li>Increasing the number of DB nodes </li></ul></ul><ul><ul><li>Referred to as “Scaling out” the DB Server </li></ul></ul><ul><li>Options </li></ul><ul><ul><li>Shared nothing Cluster </li></ul></ul><ul><ul><li>Real Application Cluster (or Shared Storage Cluster) </li></ul></ul>DBServer DBServer SAN Load Balanced App Servers
  24. 24. Shared Nothing Cluster Creative Commons Sharealike Attributions Noncommercial <ul><li>Each DB Server node has its own complete copy of the database </li></ul><ul><li>Nothing is shared between the DB Server Nodes </li></ul><ul><li>This is achieved through DB Replication at DB / Driver / App level or through a proxy </li></ul><ul><li>Supported by most RDBMs natively or through 3 rd party software </li></ul>DBServer Database DBServer Database DBServer Database Note: Actual DB files maybe stored on a central SAN
  25. 25. Replication Considerations Creative Commons Sharealike Attributions Noncommercial <ul><li>Master-Slave </li></ul><ul><ul><li>Writes are sent to a single master which replicates the data to multiple slave nodes </li></ul></ul><ul><ul><li>Replication maybe cascaded </li></ul></ul><ul><ul><li>Simple setup </li></ul></ul><ul><ul><li>No conflict management required </li></ul></ul><ul><li>Multi-Master </li></ul><ul><ul><li>Writes can be sent to any of the multiple masters which replicate them to other masters and slaves </li></ul></ul><ul><ul><li>Conflict Management required </li></ul></ul><ul><ul><li>Deadlocks possible if same data is simultaneously modified at multiple places </li></ul></ul>
  26. 26. Replication Considerations Creative Commons Sharealike Attributions Noncommercial <ul><li>Asynchronous </li></ul><ul><ul><li>Guaranteed, but out-of-band replication from Master to Slave </li></ul></ul><ul><ul><li>Master updates its own db and returns a response to client </li></ul></ul><ul><ul><li>Replication from Master to Slave takes place asynchronously </li></ul></ul><ul><ul><li>Faster response to a client </li></ul></ul><ul><ul><li>Slave data is marginally behind the Master </li></ul></ul><ul><ul><li>Requires modification to App to send critical reads and writes to master, and load balance all other reads </li></ul></ul><ul><li>Synchronous </li></ul><ul><ul><li>Guaranteed, in-band replication from Master to Slave </li></ul></ul><ul><ul><li>Master updates its own db, and confirms all slaves have updated their db before returning a response to client </li></ul></ul><ul><ul><li>Slower response to a client </li></ul></ul><ul><ul><li>Slaves have the same data as the Master at all times </li></ul></ul><ul><ul><li>Requires modification to App to send writes to master and load balance all reads </li></ul></ul>
  27. 27. Replication Considerations Creative Commons Sharealike Attributions Noncommercial <ul><li>Replication at RDBMS level </li></ul><ul><ul><li>Support may exists in RDBMS or through 3 rd party tool </li></ul></ul><ul><ul><li>Faster and more reliable </li></ul></ul><ul><ul><li>App must send writes to Master, reads to any db and critical reads to Master </li></ul></ul><ul><li>Replication at Driver / DAO level </li></ul><ul><ul><li>Driver / DAO layer ensures </li></ul></ul><ul><ul><ul><li>writes are performed on all connected DBs </li></ul></ul></ul><ul><ul><ul><li>Reads are load balanced </li></ul></ul></ul><ul><ul><ul><li>Critical reads are sent to a Master </li></ul></ul></ul><ul><ul><li>In most cases RDBMS agnostic </li></ul></ul><ul><ul><li>Slower and in some cases less reliable </li></ul></ul>
  28. 28. Real Application Cluster Creative Commons Sharealike Attributions Noncommercial <ul><li>All DB Servers in the cluster share a common storage area on a SAN </li></ul><ul><li>All DB servers mount the same block device </li></ul><ul><li>The filesystem must be a clustered file system (eg GFS / OFS) </li></ul><ul><li>Currently only supported by Oracle Real Application Cluster </li></ul><ul><li>Can be very expensive (licensing fees) </li></ul>DBServer SAN DBServer DBServer Database
  29. 29. Recommendation Creative Commons Sharealike Attributions Noncommercial <ul><li>Try and choose a DB which natively supports Master-Slave replication </li></ul><ul><li>Use Master-Slave Async replication </li></ul><ul><li>Write your DAO layer to ensure </li></ul><ul><ul><li>writes are sent to a single DB </li></ul></ul><ul><ul><li>reads are load balanced </li></ul></ul><ul><ul><li>Critical reads are sent to a master </li></ul></ul>DBServer DBServer DBServer Writes & Critical Reads Other Reads Load Balanced App Servers
  30. 30. Step 6 – Horizontal Scaling (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Our architecture now looks like this </li></ul><ul><li>Positives </li></ul><ul><ul><li>As Web servers grow, Database nodes can be added </li></ul></ul><ul><ul><li>DB Server is no longer SPOF </li></ul></ul><ul><li>Negatives </li></ul><ul><ul><li>Finite limit </li></ul></ul>Load Balanced App Servers DB Cluster DB DB DB SAN
  31. 31. Step 6 – Horizontal Scaling (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Shared nothing clusters have a finite scaling limit </li></ul><ul><ul><li>Reads to Writes – 2:1 </li></ul></ul><ul><ul><li>So 8 Reads => 4 writes </li></ul></ul><ul><ul><li>2 DBs </li></ul></ul><ul><ul><ul><li>Per db – 4 reads and 4 writes </li></ul></ul></ul><ul><ul><li>4 DBs </li></ul></ul><ul><ul><ul><li>Per db – 2 reads and 4 writes </li></ul></ul></ul><ul><ul><li>8 DBs </li></ul></ul><ul><ul><ul><li>Per db – 1 read and 4 writes </li></ul></ul></ul><ul><li>At some point adding another node brings in negligible incremental benefit </li></ul>Reads Writes DB1 DB2
  32. 32. Step 7 – Vertical / Horizontal Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Introduction </li></ul><ul><ul><li>Increasing the number of DB Clusters by dividing the data </li></ul></ul><ul><li>Options </li></ul><ul><ul><li>Vertical Partitioning - Dividing tables / columns </li></ul></ul><ul><ul><li>Horizontal Partitioning - Dividing by rows (value) </li></ul></ul>Load Balanced App Servers DB Cluster DB DB DB SAN
  33. 33. Vertical Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Take a set of tables and move them onto another DB </li></ul><ul><ul><li>Eg in a social network - the users table and the friends table can be on separate DB clusters </li></ul></ul><ul><li>Each DB Cluster has different tables </li></ul><ul><li>Application code or DAO / Driver code or a proxy knows where a given table is and directs queries to the appropriate DB </li></ul><ul><li>Can also be done at a column level by moving a set of columns into a separate table </li></ul>App Cluster DB Cluster 1 Table 1 Table 2 DB Cluster 2 Table 3 Table 4
  34. 34. Vertical Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Negatives </li></ul><ul><ul><li>One cannot perform SQL joins or maintain referential integrity (referential integrity is as such over-rated) </li></ul></ul><ul><ul><li>Finite Limit </li></ul></ul>App Cluster DB Cluster 1 Table 1 Table 2 DB Cluster 2 Table 3 Table 4
  35. 35. Horizontal Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Take a set of rows and move them onto another DB </li></ul><ul><ul><li>Eg in a social network – each DB Cluster can contain all data for 1 million users </li></ul></ul><ul><li>Each DB Cluster has identical tables </li></ul><ul><li>Application code or DAO / Driver code or a proxy knows where a given row is and directs queries to the appropriate DB </li></ul><ul><li>Negatives </li></ul><ul><ul><li>SQL unions for search type queries must be performed within code </li></ul></ul>App Cluster DB Cluster 1 Table 1 Table 2 Table 3 Table 4 DB Cluster 2 Table 1 Table 2 Table 3 Table 4 1 million users 1 million users
  36. 36. Horizontal Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial <ul><li>Techniques </li></ul><ul><ul><li>FCFS </li></ul></ul><ul><ul><ul><li>1 st million users are stored on cluster 1 and the next on cluster 2 </li></ul></ul></ul><ul><ul><li>Round Robin </li></ul></ul><ul><ul><li>Least Used (Balanced) </li></ul></ul><ul><ul><ul><li>Each time a new user is added, a DB cluster with the least users is chosen </li></ul></ul></ul><ul><ul><li>Hash based </li></ul></ul><ul><ul><ul><li>A hashing function is used to determine the DB Cluster in which the user data should be inserted </li></ul></ul></ul><ul><ul><li>Value Based </li></ul></ul><ul><ul><ul><li>User ids 1 to 1 million stored in cluster 1 OR </li></ul></ul></ul><ul><ul><ul><li>all users with names starting from A-M on cluster 1 </li></ul></ul></ul><ul><ul><li>Except for Hash and Value based all other techniques also require an independent lookup map – mapping user to Database Cluster </li></ul></ul><ul><ul><li>This map itself will be stored on a separate DB (which may further need to be replicated) </li></ul></ul>
  37. 37. Step 7 – Vertical / Horizontal Partitioning (DB) Creative Commons Sharealike Attributions Noncommercial Lookup Map <ul><li>Our architecture now looks like this </li></ul><ul><li>Positives </li></ul><ul><ul><li>As App servers grow, Database Clusters can be added </li></ul></ul><ul><li>Note: This is not the same as table partitioning provided by the db (eg MSSQL) </li></ul><ul><li>We may actually want to further segregate these into Sets, each serving a collection of users (refer next slide </li></ul>Load Balanced App Servers DB Cluster DB DB DB DB Cluster DB DB DB SAN
  38. 38. Step 8 – Separating Sets Creative Commons Sharealike Attributions Noncommercial Lookup Map Lookup Map Global Redirector Global Lookup Map SET 1 – 10 million users SET 2 – 10 million users <ul><li>Now we consider each deployment as a single Set serving a collection of users </li></ul>Load Balanced App Servers DB Cluster DB DB DB DB Cluster DB DB DB SAN Load Balanced App Servers DB Cluster DB DB DB DB Cluster DB DB DB SAN
  39. 39. Creating Sets Creative Commons Sharealike Attributions Noncommercial <ul><li>The goal behind creating sets is easier manageability </li></ul><ul><li>Each Set is independent and handles transactions for a set of users </li></ul><ul><li>Each Set is architecturally identical to the other </li></ul><ul><li>Each Set contains the entire application with all its data structures </li></ul><ul><li>Sets can even be deployed in separate datacenters </li></ul><ul><li>Users may even be added to a Set that is closer to them in terms of network latency </li></ul>
  40. 40. Step 8 – Horizontal Partitioning (Sets) Creative Commons Sharealike Attributions Noncommercial App Servers Cluster DB Cluster SAN Global Redirector SET 1 DB Cluster App Servers Cluster DB Cluster SAN SET 2 DB Cluster <ul><li>Our architecture now looks like this </li></ul><ul><li>Positives </li></ul><ul><ul><li>Infinite Scalability </li></ul></ul><ul><li>Negatives </li></ul><ul><ul><li>Aggregation of data across sets is complex </li></ul></ul><ul><ul><li>Users may need to be moved across Sets if sizing is improper </li></ul></ul><ul><ul><li>Global App settings and preferences need to be replicated across Sets </li></ul></ul>
  41. 41. Step 9 – Caching Creative Commons Sharealike Attributions Noncommercial <ul><li>Add caches within App Server </li></ul><ul><ul><li>Object Cache </li></ul></ul><ul><ul><li>Session Cache (especially if you are using a Central Session Store) </li></ul></ul><ul><ul><li>API cache </li></ul></ul><ul><ul><li>Page cache </li></ul></ul><ul><li>Software </li></ul><ul><ul><li>Memcached </li></ul></ul><ul><ul><li>Teracotta (Java only) </li></ul></ul><ul><ul><li>Coherence (commercial expensive data grid by Oracle) </li></ul></ul>
  42. 42. Step 10 – HTTP Accelerator Creative Commons Sharealike Attributions Noncommercial <ul><li>If your app is a web app you should add an HTTP Accelerator or a Reverse Proxy </li></ul><ul><li>A good HTTP Accelerator / Reverse proxy performs the following – </li></ul><ul><ul><li>Redirect static content requests to a lighter HTTP server (lighttpd) </li></ul></ul><ul><ul><li>Cache content based on rules (with granular invalidation support) </li></ul></ul><ul><ul><li>Use Async NIO on the user side </li></ul></ul><ul><ul><li>Maintain a limited pool of Keep-alive connections to the App Server </li></ul></ul><ul><ul><li>Intelligent load balancing </li></ul></ul><ul><li>Solutions </li></ul><ul><ul><li>Nginx (HTTP / IMAP) </li></ul></ul><ul><ul><li>Perlbal </li></ul></ul><ul><ul><li>Hardware accelerators plus Load Balancers </li></ul></ul>
  43. 43. Step 11 – Other cool stuff Creative Commons Sharealike Attributions Noncommercial <ul><li>CDNs </li></ul><ul><li>IP Anycasting </li></ul><ul><li>Async Nonblocking IO (for all Network Servers) </li></ul><ul><li>If possible - Async Nonblocking IO for disk </li></ul><ul><li>Incorporate multi-layer caching strategy where required </li></ul><ul><ul><li>L1 cache – in-process with App Server </li></ul></ul><ul><ul><li>L2 cache – across network boundary </li></ul></ul><ul><ul><li>L3 cache – on disk </li></ul></ul><ul><li>Grid computing </li></ul><ul><ul><li>Java – GridGain </li></ul></ul><ul><ul><li>Erlang – natively built in </li></ul></ul>
  44. 44. Platform Selection Considerations Creative Commons Sharealike Attributions Noncommercial <ul><li>Programming Languages and Frameworks </li></ul><ul><ul><li>Dynamic languages are slower than static languages </li></ul></ul><ul><ul><li>Compiled code runs faster than interpreted code -> use accelerators or pre-compilers </li></ul></ul><ul><ul><li>Frameworks that provide Dependency Injections, Reflection, Annotations have a marginal performance impact </li></ul></ul><ul><ul><li>ORMs hide DB querying which can in some cases result in poor query performance due to non-optimized querying </li></ul></ul><ul><li>RDBMS </li></ul><ul><ul><li>MySQL, MSSQL and Oracle support native replication </li></ul></ul><ul><ul><li>Postgres supports replication through 3 rd party software (Slony) </li></ul></ul><ul><ul><li>Oracle supports Real Application Clustering </li></ul></ul><ul><ul><li>MySQL uses locking and arbitration, while Postgres/Oracle use MVCC (MSSQL just recently introduced MVCC) </li></ul></ul><ul><li>Cache </li></ul><ul><ul><li>Teracotta vs memcached vs Coherence </li></ul></ul>
  45. 45. Tips Creative Commons Sharealike Attributions Noncommercial <ul><li>All the techniques we learnt today can be applied in any order </li></ul><ul><li>Try and incorporate Horizontal DB partitioning by value from the beginning into your design </li></ul><ul><li>Loosely couple all modules </li></ul><ul><li>Implement a REST-ful framework for easier caching </li></ul><ul><li>Perform application sizing ongoingly to ensure optimal utilization of hardware </li></ul>
  46. 46. Questions?? bhavin.t@directi.com http://directi.com http://careers.directi.com Download slides: http://wiki.directi.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×