Advanced Scalability and height level          architectures                                   Mohamed Almasry            ...
Whats all this then?• Initial Thoughts• The most common scalability scenario• How to make a scalability plan and overcome ...
Initial ThoughtsWhy do you need to scale ?– “The Web has infinite space”.   Chris Anderson explains this very well in “The...
The most common scalability scenario             (The public one)                                4
How usually they start out ?                      5
Second step :Two servers                   6
3 step: Scaling the DB(1) rd                    7
and then : Scaling the DB(2)                      8
Scaling the web server                   9
Load balancing• Applying some changes to App Server• More code work will be required                               10
11
By the time …The previous scenario starts facing complex                bottlenecks                                  12
Ways to “think scalable” rather than            end-all-be-all solutions•   Conflicting advice ahead•   Not everything is a...
How to make a scalability plan and          overcome bottlenecks?• Lesson number 1:          Think Horizontal …!          ...
• Everything in your architecture, not just the front  end web servers• Micro optimizations and other implementation  deta...
Benchmarking techniques• Scalability isnt the same as processing  time  – Not “how fast” but “how many”  – Test “force”, n...
Vertical scaling• “Get a bigger server”• “Use faster CPUs”• Can only help so much (with bad scale/$  value)• A server twic...
Horizontal scaling• “Just add another box” (or another thousand  or ..)• Good to great ...   – Implementation, scale your ...
• Lesson number 2:    Use your resources wisely                     (simplicity is beauty..)                              ...
Resource management•   Balance how you use the hardware•   Use memory to save CPU or IO•   Balance your resource use (CPU ...
• Lesson number 3:             Partition the data                     (Divide and Conquer!)                               ...
Partition the data• Partitioning is great , but not so easy !                                    22
Sharding the Hibernate Way• What is Hibernate Shards?• Schema Design for Shards• The Sharding Code’s Relationship to Hiber...
• Lesson number 4:            Do the work in parallel                                24
Do the work in parallel• Split the work into smaller (but reasonable)  pieces and run them on different boxes• Send the su...
• Lesson number 5:              Use job queues                           26
•   Processing time too long for the user to wait?•   Can only do N jobs in parallel?•   Use queues (and an external worke...
Going much deeper• Planned architectures                          28
Architecture so far                  29
Non-Cached architectures                     30
Cached architectures                   31
A little Fact about Cached architectures                              32
Making the most of your assets                       33
How far can it go ?                 34
Live examples• Digg Architecture  – Digg Platform  – Digg stats  – Their experience  – What’s setting behind?  – Digg Arch...
Digg (dot) com•    Digg is a place where people can share    anything (content) from anywhere    (Imagine the enormous siz...
Digg (dot) comDigg Platform    •   MySQL    •   Linux    •   PHP    •   Lucene    •   APC PHP Accelerator    •   MCache   ...
Digg (dot) comTheir stats :• 100 servers hosted in multiple data centers.  - 20 database servers  - 30 Web servers  - A fe...
Digg (dot) com: Their experience• None of the scaling challenges we faced had  anything to do with PHP. The biggest issues...
Digg (dot) com: What’s inside ?• Load balancer in the front that sends queries to PHP  servers.• Uses a MySQL master-slave...
Digg (dot) com: What’s inside ?• Diggs usage pattern makes it easier for them to scale.  Most people just view the front p...
Digg (dot) com: What’s inside ?• To lighten their database load they used the APC PHP  accelerator MCache.• You can config...
Useful links• http://goldengate.com/ Transactional Data Management (TDM)  solutions• http://squid-cache.org/ the web proxy...
Upcoming SlideShare
Loading in …5
×

LAMP applications Scalability (Advanced Web Applications Scalability)

1,533 views
1,352 views

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,533
On SlideShare
0
From Embeds
0
Number of Embeds
45
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

LAMP applications Scalability (Advanced Web Applications Scalability)

  1. 1. Advanced Scalability and height level architectures Mohamed Almasry CitPoint workshop - May 18, 2008 1
  2. 2. Whats all this then?• Initial Thoughts• The most common scalability scenario• How to make a scalability plan and overcome the bottlenecks?• Hardware scalability and components of high scalable architectures• Going much deeper• Live examples 2
  3. 3. Initial ThoughtsWhy do you need to scale ?– “The Web has infinite space”. Chris Anderson explains this very well in “The Long Tail”– The market is heading to open applications “Open APIs” where users contribute– APIs will fuel the growth of the next phase of the Web’s evolution– The barriers keep getting lower. 3
  4. 4. The most common scalability scenario (The public one) 4
  5. 5. How usually they start out ? 5
  6. 6. Second step :Two servers 6
  7. 7. 3 step: Scaling the DB(1) rd 7
  8. 8. and then : Scaling the DB(2) 8
  9. 9. Scaling the web server 9
  10. 10. Load balancing• Applying some changes to App Server• More code work will be required 10
  11. 11. 11
  12. 12. By the time …The previous scenario starts facing complex bottlenecks 12
  13. 13. Ways to “think scalable” rather than end-all-be-all solutions• Conflicting advice ahead• Not everything is applicable to every situation• Don’t use a bazooka to …• Don’t wait till it’s to late 13
  14. 14. How to make a scalability plan and overcome bottlenecks?• Lesson number 1: Think Horizontal …! (Make all the use of your weapons) 14
  15. 15. • Everything in your architecture, not just the front end web servers• Micro optimizations and other implementation details –– Bzzzzt! Boring! 15
  16. 16. Benchmarking techniques• Scalability isnt the same as processing time – Not “how fast” but “how many” – Test “force”, not speed. Think amps, not voltage – Test scalability, not just performance• Use a realistic load• Test with “slow clients” 16
  17. 17. Vertical scaling• “Get a bigger server”• “Use faster CPUs”• Can only help so much (with bad scale/$ value)• A server twice as fast is more than twice as expensive• Super computers are horizontally scaled! 17
  18. 18. Horizontal scaling• “Just add another box” (or another thousand or ..)• Good to great ... – Implementation, scale your system a few times – Architecture, scale dozens or hundreds of times• Get the big picture right first, do micro optimizations later 18
  19. 19. • Lesson number 2: Use your resources wisely (simplicity is beauty..) 19
  20. 20. Resource management• Balance how you use the hardware• Use memory to save CPU or IO• Balance your resource use (CPU vs RAM vs IO)• Don’t swap memory to disk. Ever . 20
  21. 21. • Lesson number 3: Partition the data (Divide and Conquer!) 21
  22. 22. Partition the data• Partitioning is great , but not so easy ! 22
  23. 23. Sharding the Hibernate Way• What is Hibernate Shards?• Schema Design for Shards• The Sharding Code’s Relationship to Hibernate• Pluggable Strategies Determine How Data Are Split Across Shards 23
  24. 24. • Lesson number 4: Do the work in parallel 24
  25. 25. Do the work in parallel• Split the work into smaller (but reasonable) pieces and run them on different boxes• Send the sub-requests off as soon as possible, do something else and then retrieve the results 25
  26. 26. • Lesson number 5: Use job queues 26
  27. 27. • Processing time too long for the user to wait?• Can only do N jobs in parallel?• Use queues (and an external worker process)• AJAX can make this really spiffy• Database “queue” – Webserver submits job – First available “worker” picks it up and returns the result to the queue – Webserver polls for status 27
  28. 28. Going much deeper• Planned architectures 28
  29. 29. Architecture so far 29
  30. 30. Non-Cached architectures 30
  31. 31. Cached architectures 31
  32. 32. A little Fact about Cached architectures 32
  33. 33. Making the most of your assets 33
  34. 34. How far can it go ? 34
  35. 35. Live examples• Digg Architecture – Digg Platform – Digg stats – Their experience – What’s setting behind? – Digg Architecture• YouTube 35
  36. 36. Digg (dot) com• Digg is a place where people can share anything (content) from anywhere (Imagine the enormous size of data ).• Digg now receives 230 million plus page views per month and 26 million unique visitors. 36
  37. 37. Digg (dot) comDigg Platform • MySQL • Linux • PHP • Lucene • APC PHP Accelerator • MCache 37
  38. 38. Digg (dot) comTheir stats :• 100 servers hosted in multiple data centers. - 20 database servers - 30 Web servers - A few search servers running Lucene. - The rest are used for redundancy.• 30GB of data. 38
  39. 39. Digg (dot) com: Their experience• None of the scaling challenges we faced had anything to do with PHP. The biggest issues faced were database related.• The lightweight nature of PHP allowed them to move processing tasks from the database to PHP in order to improve scaling. Ebay does this in a radical way. They moved nearly all work out of the database and into applications, including joins, an operation we normally think of as the job of the database. 39
  40. 40. Digg (dot) com: What’s inside ?• Load balancer in the front that sends queries to PHP servers.• Uses a MySQL master-slave setup. - Transaction-heavy servers use the InnoDB storage engine. - OLAP-heavy servers use the MyISAM storage engine. - They did not notice a performance degradation moving from MySQL 4.1 to version 5.• Memcached is used for caching.• Sharding is used to break the database into several smaller ones. 40
  41. 41. Digg (dot) com: What’s inside ?• Diggs usage pattern makes it easier for them to scale. Most people just view the front page and leave. Thus 98% of Diggs database accesses are reads. With this balance of operations they dont have to worry about the complex work of architecting for writes, which makes it a lot easier for them to scale.• They had problems with their storage system telling them writes were on disk when they really werent. Controllers do this to improve the appearance of their performance. But what it does is leave a giant data integrity whole in failure scenarios. This is really a pretty common problem and can be hard to fix, depending on your hardware setup. 41
  42. 42. Digg (dot) com: What’s inside ?• To lighten their database load they used the APC PHP accelerator MCache.• You can configure PHP not parse and compile on each load using a combination of Apache 2’s worker threads, FastCGI, and a PHP accelerator. On a pages first load the PHP code is compiles so any subsequent page loads are very fast. 42
  43. 43. Useful links• http://goldengate.com/ Transactional Data Management (TDM) solutions• http://squid-cache.org/ the web proxy and load balancer• http://danga.com/memcached/ the advanced memory caching system• http://sqlrelay.sourceforge.net/ the database connection manager• http://lighttpd.net the light files server• http://highscalability.com 43

×