Scalable Web Architecture

13,353 views

Published on

Scalable Web Architecture: LAMP, AWS, S3, CloudFront, EC2, caching strategy, scaling database, hight availibility, fault tolerant, horizontal scalability

Published in: Technology
2 Comments
26 Likes
Statistics
Notes
  • good presentation but has to be more elaborate
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • A good summary, hopefully more in detail.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
13,353
On SlideShare
0
From Embeds
0
Number of Embeds
202
Actions
Shares
0
Downloads
416
Comments
2
Likes
26
Embeds 0
No embeds

No notes for slide

Scalable Web Architecture

  1. 1. Scalable web architecture LAMP (& AWS infrastructure) Aleksandr Tsertkov
  2. 2. LAMP
  3. 3. LAMP: Platform <ul><li>L - Linux / Unix </li></ul><ul><li>A - Apache / lighthttpd / nginx </li></ul><ul><li>M - MySQL / PostgreSQL / SQLite </li></ul><ul><li>P - PHP / Python / Perl / Ruby </li></ul>
  4. 4. LAMP: Why LAMP <ul><ul><li>We know it! </li></ul></ul><ul><ul><li>Proved by very big guys (Facebook, YouTube, LiveJournal, etc) </li></ul></ul><ul><ul><li>Plenty of shared experience on the web </li></ul></ul><ul><ul><li>Its flexible and extendable </li></ul></ul><ul><ul><li>Easy to find engineers </li></ul></ul><ul><ul><li>Easy to maintain </li></ul></ul><ul><ul><li>Its cheap </li></ul></ul>
  5. 5. LAMP: Who use it <ul><li>Let’s take top 10 Internet sites according to Alexa. </li></ul><ul><li>Google </li></ul><ul><li>Yahoo! </li></ul><ul><li>YouTube </li></ul><ul><li>Facebook </li></ul><ul><li>Windows Live </li></ul><ul><li>MSN </li></ul><ul><li>Wikipedia </li></ul><ul><li>Blogger </li></ul><ul><li>Baidu </li></ul><ul><li>Yahoo! Yapan </li></ul>Who use LAMP? 5 of 10 
  6. 6. System design (SD)
  7. 7. SD: Key points <ul><li>Scalability </li></ul><ul><li>HA (High Availibility) </li></ul><ul><ul><li>Backup & restore strategy!!! </li></ul></ul><ul><li>Fault-tolerant </li></ul><ul><li>SA (Share Nothing) </li></ul>
  8. 8. SD: Load scalability <ul><li>“ Load scalability: The ability for a distributed system to easily expand and contract its resource pool to accommodate heavier or lighter loads. ” </li></ul><ul><li>-Wikipedia </li></ul>
  9. 9. SD: Horizontal scalability <ul><li>Adding new nodes to a system for handling growing load and removing nodes when load decreases. </li></ul>
  10. 10. SD: High Availibility <ul><li>Complex: High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period. </li></ul><ul><li>Simple: maximum uptime, minimum downtime. </li></ul>
  11. 11. SD: Fault Tolerant <ul><li>System should be able to continue function normally even if some of its components fail. </li></ul><ul><li>No single point of failure. </li></ul>
  12. 12. SD: Share Nothing <ul><li>“ A shared nothing architecture (SN) is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. “ </li></ul><ul><li>- Wikipedia </li></ul>
  13. 13. SD: Share Nothing <ul><li>Can be achieved on different application layers separately: </li></ul><ul><li>Database: data partitioning / sharding </li></ul><ul><li>Cache: memcache client side partitioning </li></ul><ul><li>Computing: job queues </li></ul>
  14. 14. SD: Typical web architecture <ul><li>Load balancer </li></ul><ul><li>Several web servers </li></ul><ul><li>Database server(s) </li></ul><ul><li>Shared file server (NAS) (if really needed) </li></ul>
  15. 15. SD: Typical web architecture
  16. 16. SD: Typical web architecture <ul><li>Each of these components / layers can be scaled separately. </li></ul><ul><li>Database is usually the toughest part to scale. </li></ul>
  17. 17. SD: Scaling database <ul><li>Master-Slave Replication variants: </li></ul><ul><li>Read/Write queries to Master, Read only from Slaves </li></ul><ul><li>Writes to Master, Reads only from Slaves </li></ul>
  18. 18. SD: Scaling database <ul><li>Master should be very powerful machine, but sooner or later you will hit the IO limit. </li></ul><ul><li>Data partitioning / sharding is used to distribute data across number of Masters spreading load between them (horizontal scaling). </li></ul>
  19. 19. SD: Scaling database
  20. 20. SD: Caching strategy <ul><li>Hierarchy of caches should be used for optimal performance and efficiency. </li></ul><ul><li>Local memory -> memcached -> local disk </li></ul>
  21. 21. SD: Caching hierarchy <ul><li>App server local in memory cache for highly common items (speedup scripts bootstrapping) </li></ul><ul><li>Distributed cache system (memcached) for caching database queries and general purpose cache </li></ul><ul><li>App server file cache for big size items </li></ul>
  22. 22. SD: High-CPU app servers <ul><li>For High-CPU computing operations like audio/video processing dedicated application servers should be used. </li></ul><ul><li>Good control over them can be achieved using job queue. </li></ul><ul><li>Video: check YouTube Platform ;-) </li></ul>
  23. 23. SD: Web servers optimization <ul><li>General web servers (apache) </li></ul><ul><li>COMET web servers </li></ul><ul><li>Static content web servers </li></ul><ul><li>Content Delivery Network (CDN) should be used for static public content. </li></ul>
  24. 24. SD: Static files strategy <ul><li>Network attached storage (NAS) </li></ul><ul><ul><li>Distributed network file system (Lustre, GlusterFS, MogileFS) </li></ul></ul><ul><ul><li>Not distributed (NFS) </li></ul></ul><ul><li>Fault-tolerance and data redundancy are required! </li></ul>
  25. 25. SD: Static files strategy <ul><li>Distributed filesystem is complex but in a perfect world it should give us what we need: performance, redundancy, fault-tolerance. </li></ul><ul><li>Static content web servers can run on DF nodes! </li></ul>
  26. 26. SD: Load balancers <ul><li>Software or hardware load balancers </li></ul><ul><li>Traffic distributed between several load balancers using round robin DNS </li></ul><ul><li>HA solution for load balancers </li></ul>
  27. 27. SD: Complete picture
  28. 28. Amazon Web Services
  29. 29. AWS: What is AWS? <ul><li>Amazon is not only about books  </li></ul><ul><li>Amazon Web Services provide infrastructure web services platform in the cloud. </li></ul>
  30. 30. AWS: Why AWS? <ul><li>Because it has everything what we need: </li></ul><ul><ul><li>EC2: Elastic Compute Cloud </li></ul></ul><ul><ul><ul><li>EBS: Elastic Block Store </li></ul></ul></ul><ul><ul><ul><li>CloudWatch: monitoring service with auto scaling </li></ul></ul></ul><ul><ul><ul><li>Elastic Load Balancing </li></ul></ul></ul><ul><ul><li>S3: Simple Storage Service </li></ul></ul><ul><ul><li>Cloud front (CDN) </li></ul></ul><ul><ul><li>SQS: Simple Queue Service </li></ul></ul>
  31. 31. AWS: EC2 <ul><li>Easy to deploy (os images) </li></ul><ul><li>Easy to scale up and down on demand (deals with peaks) with Auto Scaling </li></ul><ul><li>Out of the box monitoring with CloudWatch </li></ul><ul><li>Out of the box load balancing with Elastic Load Balancing http://aws.amazon.com/loadbalancing </li></ul>
  32. 32. AWS: S3 & CloudFront <ul><li>Out of the box CDN with CloudFront </li></ul><ul><li>DFS (sort of) with S3 </li></ul><ul><li>Very reliable </li></ul>
  33. 33. AWS: Services on top of AWS <ul><li>Some like AWS so much that they have created own cloud services on top of it  </li></ul><ul><li>RightScale www.rightscale.com </li></ul><ul><li>GoGrid www.gogrid.com </li></ul>
  34. 34. AWS: Panacea? <ul><li>AWS is indeed good to start with since its fast and cheap. </li></ul><ul><li>In a long time term if everything goes as expected and profit increases it might be better to build own cloud infrastructure and migrate to it at some point. </li></ul>
  35. 35. <ul><li>Thank you! </li></ul>

×