Planning for performance <ul><li>For web developer, open discussion </li></ul>Tin@Beijing Open Party
师不必强于己 己不必不如师 己不必不如师
Agenda <ul><li>Basic programming practice </li></ul><ul><li>Hardware platform </li></ul><ul><li>Software platform </li></u...
Basic practices <ul><li>Use proper SCM </li></ul><ul><ul><li>CVS </li></ul></ul><ul><ul><li>SVN </li></ul></ul><ul><ul><li...
Basic practices <ul><li>Use a auto-build system </li></ul><ul><ul><li>Shell scrips </li></ul></ul><ul><ul><li>Make </li></...
Basic practices <ul><li>Use a Continues Integration tool </li></ul><ul><ul><li>So first you need a lot of tests </li></ul>...
Basic practices <ul><li>Use a issue tracker </li></ul><ul><ul><li>Trac (only svn) </li></ul></ul><ul><ul><li>Bugzilla, Man...
Voice from twitter <ul><li>一定要测试!一定要早点测试!一定要早点测试!否则你就死定了。 </li></ul><ul><li>对任何部分都要测试。 </li></ul><ul><li>性能测试要交给用户来做。那样才有意...
Basic practices <ul><li>Lifecycle control </li></ul><ul><ul><li>Develop -> Test -> Deploy </li></ul></ul><ul><li>Release m...
Basic practices <ul><li>Use Agile methodologies </li></ul><ul><ul><li>XP practices </li></ul></ul><ul><ul><ul><li>TDD </li...
Hardware platform <ul><li>Use economical hardware </li></ul><ul><ul><li>CPU and Memory </li></ul></ul><ul><ul><li>Disk and...
Hardware platform <ul><li>Brand </li></ul><ul><ul><li>Dell, IBM, HP, Lenovo, Asus? </li></ul></ul><ul><ul><li>Service qual...
Network & hosting <ul><li>VPS,  虚拟主机 </li></ul><ul><li>Co-Located Hardware (colo),  主机托管 </li></ul><ul><ul><li>Bandwidth, ...
Software platform <ul><li>Use pre-compiled OS and software </li></ul><ul><li>Choose a OS </li></ul><ul><ul><li>CentOS, Red...
Software platform <ul><li>Choose a language (scriptiing language is better) </li></ul><ul><ul><li>PHP </li></ul></ul><ul><...
Software platform <ul><li>Choose a database ( or data provider) </li></ul><ul><ul><li>Mysql </li></ul></ul><ul><ul><li>Pos...
Now, let’s go
System essentials  <ul><li>Web server </li></ul><ul><ul><li>Apache </li></ul></ul><ul><ul><li>Lighthttpd </li></ul></ul><u...
System essentials  <ul><li>Different deployment style (python/ruby) </li></ul><ul><ul><li>Apache + mod_python (mod_rails, ...
System essentials <ul><li>Monitoring your system </li></ul><ul><ul><li>web server logs </li></ul></ul><ul><ul><ul><li>Weba...
System essentials <ul><li>Monitoring your system </li></ul><ul><ul><li>Monit (RubyWorks use runit) </li></ul></ul><ul><ul>...
System essentials <ul><ul><li>Munin & Nagios continues </li></ul></ul><ul><ul><ul><li>Munin has server and nodes, it gener...
System essentials <ul><li>Protect your system ( Management is important than tools ) </li></ul><ul><ul><li>SSH brute attac...
System essentials  <ul><li>SNA (Share Nothing Architecture) (This is relative term) </li></ul><ul><ul><li>All static file ...
System essentials <ul><li>Make your modules independent </li></ul><ul><ul><li>Layers, packages </li></ul></ul><ul><ul><li>...
Optimizations <ul><li>Split your static content and dynamic content server </li></ul><ul><ul><li>Use lightweight web serve...
Optimizations <ul><li>Caching </li></ul><ul><ul><ul><li>Optimize your code (lazy evaluate, cache result) </li></ul></ul></...
Optimizations <ul><li>Caching </li></ul><ul><ul><li>Better client side caching </li></ul></ul><ul><ul><ul><li>Use expired ...
Optimizations <ul><li>SQL optimizations </li></ul><ul><ul><li>Add index (especially the column in where closure) </li></ul...
Optimizations <ul><li>Choose proper database store engine </li></ul><ul><ul><li>Mysql: MyISAM? InnoDB? BDB? Heap? </li></u...
But most important thing: Find out the bottle neck before you start to optimize your application. Find out the bottle neck...
Next,   Scaling,  If time is enough If time is enough If time is enough
What is scaling? <ul><li>Three basics,  简单特性 : </li></ul><ul><ul><li>能够使用率的提高 , Useable capacity increasing </li></ul></ul...
Scaling, 2 ways <ul><li>Vertical Scaling </li></ul><ul><ul><li>Upgrade your hardware system </li></ul></ul><ul><ul><ul><li...
Scaling-Load Balancing <ul><li>DNS-GSLB </li></ul><ul><ul><li>Use DNS’s round-robin algorithm randomize IP result </li></u...
Scaling-Load Balancing <ul><li>Hardware LB </li></ul><ul><ul><li>Citrix: Netscalers, Foundry: ServerIron, F5 (4-7) </li></...
Scaling-Load Balancing <ul><li>Layer2, Layer4 and Layer7 LB </li></ul><ul><ul><li>Layer 2: Link aggregation, provide redun...
Scaling-Load Balancing <ul><li>Huge Scale LB </li></ul><ul><ul><li>GSLB -> DNS round robin </li></ul></ul><ul><ul><li>Virt...
Scaling-Proxy Cache <ul><li>Reverse proxy </li></ul><ul><ul><li>Squid </li></ul></ul><ul><ul><ul><li>Use http head method ...
Scaling-Database <ul><li>Scaling MySQL </li></ul><ul><ul><li>MySQL replication/duplication (Failure, Lag) </li></ul></ul><...
Scaling-File System <ul><li>Single Disk (Array) </li></ul><ul><ul><li>Raid 1, Raid 0, Raid5 </li></ul></ul><ul><ul><li>Par...
Scaling-File System Sharing <ul><li>Hardware based sharing NAS (previous page) </li></ul><ul><li>NFS - most simple way to ...
We are coming a long way, baby
Thanks!
Upcoming SlideShare
Loading in …5
×

Planning For High Performance Web Application

5,114 views

Published on

This slide is prepared for Beijing Open Party (a monthly unconference in Beijing China). And it's covered some important points when you are building a scalable web sites. And few page of this slide is in Chinese.

Published in: Technology
2 Comments
24 Likes
Statistics
Notes
No Downloads
Views
Total views
5,114
On SlideShare
0
From Embeds
0
Number of Embeds
185
Actions
Shares
0
Downloads
398
Comments
2
Likes
24
Embeds 0
No embeds

No notes for slide

Planning For High Performance Web Application

  1. 1. Planning for performance <ul><li>For web developer, open discussion </li></ul>Tin@Beijing Open Party
  2. 2. 师不必强于己 己不必不如师 己不必不如师
  3. 3. Agenda <ul><li>Basic programming practice </li></ul><ul><li>Hardware platform </li></ul><ul><li>Software platform </li></ul><ul><li>System essentials </li></ul><ul><li>Optimizations </li></ul><ul><li>Load Balancing </li></ul>
  4. 4. Basic practices <ul><li>Use proper SCM </li></ul><ul><ul><li>CVS </li></ul></ul><ul><ul><li>SVN </li></ul></ul><ul><ul><li>Mercurial </li></ul></ul><ul><ul><li>Git </li></ul></ul>
  5. 5. Basic practices <ul><li>Use a auto-build system </li></ul><ul><ul><li>Shell scrips </li></ul></ul><ul><ul><li>Make </li></ul></ul><ul><ul><li>Ant, Nant </li></ul></ul><ul><ul><li>Rake </li></ul></ul>
  6. 6. Basic practices <ul><li>Use a Continues Integration tool </li></ul><ul><ul><li>So first you need a lot of tests </li></ul></ul><ul><ul><li>Add auto test, compile job as daily task </li></ul></ul><ul><ul><li>Use CI tools to monitor health of your code base </li></ul></ul><ul><ul><ul><li>CruiseControl, Luntbuild, Continnum, Hudson </li></ul></ul></ul><ul><ul><ul><li>Cruise, Teamcity, Banboo </li></ul></ul></ul><ul><ul><li>Use cc-tray, cc-menu desktop widget </li></ul></ul>
  7. 7. Basic practices <ul><li>Use a issue tracker </li></ul><ul><ul><li>Trac (only svn) </li></ul></ul><ul><ul><li>Bugzilla, Mantis Bug Tracker </li></ul></ul><ul><ul><li>Jira </li></ul></ul><ul><ul><li>Mingle </li></ul></ul><ul><ul><li>BugFree </li></ul></ul>
  8. 8. Voice from twitter <ul><li>一定要测试!一定要早点测试!一定要早点测试!否则你就死定了。 </li></ul><ul><li>对任何部分都要测试。 </li></ul><ul><li>性能测试要交给用户来做。那样才有意义。所以要做好 log 。 </li></ul>
  9. 9. Basic practices <ul><li>Lifecycle control </li></ul><ul><ul><li>Develop -> Test -> Deploy </li></ul></ul><ul><li>Release management </li></ul><ul><ul><li>Trunk, Branch, Tag </li></ul></ul><ul><ul><li>Milestone, Release candicate </li></ul></ul>
  10. 10. Basic practices <ul><li>Use Agile methodologies </li></ul><ul><ul><li>XP practices </li></ul></ul><ul><ul><ul><li>TDD </li></ul></ul></ul><ul><ul><ul><li>Pair programming </li></ul></ul></ul><ul><ul><li>Scrum </li></ul></ul><ul><ul><li>Hybrid agile </li></ul></ul>
  11. 11. Hardware platform <ul><li>Use economical hardware </li></ul><ul><ul><li>CPU and Memory </li></ul></ul><ul><ul><li>Disk and disk I/O (Raid) </li></ul></ul><ul><ul><li>NIC </li></ul></ul><ul><ul><li>Power and fan </li></ul></ul><ul><ul><li>1U 2U 3U 4U ? </li></ul></ul>
  12. 12. Hardware platform <ul><li>Brand </li></ul><ul><ul><li>Dell, IBM, HP, Lenovo, Asus? </li></ul></ul><ul><ul><li>Service quality </li></ul></ul><ul><li>Hardware redundancy </li></ul><ul><ul><li>Part redundancy </li></ul></ul><ul><ul><li>Availability and Lead Time (critical parts) </li></ul></ul><ul><ul><li>Capacity redundancy </li></ul></ul><ul><ul><li>Future plan? </li></ul></ul>
  13. 13. Network & hosting <ul><li>VPS, 虚拟主机 </li></ul><ul><li>Co-Located Hardware (colo), 主机托管 </li></ul><ul><ul><li>Bandwidth, Duel lines, air-condition </li></ul></ul><ul><ul><li>Geo-location </li></ul></ul><ul><li>Self-Hosting </li></ul><ul><li>How to choose network hardware (switch/router)? </li></ul><ul><ul><li>Cisco, Huaway, Foundry </li></ul></ul>
  14. 14. Software platform <ul><li>Use pre-compiled OS and software </li></ul><ul><li>Choose a OS </li></ul><ul><ul><li>CentOS, Redhat, Suse </li></ul></ul><ul><ul><li>Freebsd </li></ul></ul><ul><ul><li>Solaris </li></ul></ul><ul><ul><li>no ubuntu server (from nicholas ding) </li></ul></ul>
  15. 15. Software platform <ul><li>Choose a language (scriptiing language is better) </li></ul><ul><ul><li>PHP </li></ul></ul><ul><ul><li>Python </li></ul></ul><ul><ul><li>Perl </li></ul></ul><ul><ul><li>Ruby </li></ul></ul><ul><ul><li>Java </li></ul></ul><ul><ul><li>Many many many... but not c... </li></ul></ul>
  16. 16. Software platform <ul><li>Choose a database ( or data provider) </li></ul><ul><ul><li>Mysql </li></ul></ul><ul><ul><li>Posgresql </li></ul></ul><ul><ul><li>Big table implementation? </li></ul></ul>
  17. 17. Now, let’s go
  18. 18. System essentials <ul><li>Web server </li></ul><ul><ul><li>Apache </li></ul></ul><ul><ul><li>Lighthttpd </li></ul></ul><ul><ul><li>Nginx </li></ul></ul><ul><ul><li>Tux, Cherokee, Lightspeed </li></ul></ul><ul><ul><li>Tomcat, Jetty </li></ul></ul><ul><ul><li>Mongrel, Thin </li></ul></ul>
  19. 19. System essentials <ul><li>Different deployment style (python/ruby) </li></ul><ul><ul><li>Apache + mod_python (mod_rails, passenger) </li></ul></ul><ul><ul><li>Fastcgi, SCGI, CGI </li></ul></ul><ul><ul><li>Proxy (Load balancing) + Multi-server instance </li></ul></ul><ul><ul><li>thread? process? </li></ul></ul>
  20. 20. System essentials <ul><li>Monitoring your system </li></ul><ul><ul><li>web server logs </li></ul></ul><ul><ul><ul><li>Webalizer, Report Magic </li></ul></ul></ul><ul><ul><ul><li>Beacon (seperate static file server tracker) </li></ul></ul></ul><ul><ul><ul><li>error log analysis </li></ul></ul></ul><ul><ul><li>AWStats & Google Analytics </li></ul></ul>
  21. 21. System essentials <ul><li>Monitoring your system </li></ul><ul><ul><li>Monit (RubyWorks use runit) </li></ul></ul><ul><ul><ul><li>Monitoring process status </li></ul></ul></ul><ul><ul><ul><li>Auto restart your important process </li></ul></ul></ul><ul><ul><ul><li>Better than cron for monitoring </li></ul></ul></ul><ul><ul><li>Munin & Nagios </li></ul></ul><ul><ul><ul><li>Distributed monitoring all of your system </li></ul></ul></ul><ul><ul><ul><li>Administrator’s eyes, developers friends </li></ul></ul></ul>
  22. 22. System essentials <ul><ul><li>Munin & Nagios continues </li></ul></ul><ul><ul><ul><li>Munin has server and nodes, it generate sites to report the statistics of your server (in interval) </li></ul></ul></ul><ul><ul><ul><li>Munin and Nagios and integrate </li></ul></ul></ul><ul><ul><ul><ul><li>Mem usage, CPU, process, disk usage </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Service: HTTP, SMTP, POP3, NNTP, Ping </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Hardware temperature and other datas </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Network statistics </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Custom scrips (plugins): db related, user number </li></ul></ul></ul></ul>
  23. 23. System essentials <ul><li>Protect your system ( Management is important than tools ) </li></ul><ul><ul><li>SSH brute attack protection </li></ul></ul><ul><ul><ul><li>ssh key login </li></ul></ul></ul><ul><ul><ul><li>blockhost (scripts + pf/iptables) </li></ul></ul></ul><ul><ul><li>Audit: SELinux... </li></ul></ul><ul><ul><li>Firewall (port block and audit) </li></ul></ul><ul><ul><li>Use safe OS? (Netbsd, freebsd) </li></ul></ul><ul><ul><li>Network safety (but no hardware firewall for websites) </li></ul></ul>
  24. 24. System essentials <ul><li>SNA (Share Nothing Architecture) (This is relative term) </li></ul><ul><ul><li>All static file and rsync </li></ul></ul><ul><ul><li>Database centric SNA </li></ul></ul><ul><ul><li>Memcached + db-persistence </li></ul></ul><ul><ul><li>Server hash, cluster, partition </li></ul></ul><ul><ul><li>Amazon/Blogger/Cragslist/Facebook/Google/LiveJournal/Slashdot/Wikipedia/Yahoo/YouTube </li></ul></ul><ul><li>Session sticky </li></ul>
  25. 25. System essentials <ul><li>Make your modules independent </li></ul><ul><ul><li>Layers, packages </li></ul></ul><ul><ul><li>Easy to replace module </li></ul></ul><ul><ul><li>Easy to deploy </li></ul></ul><ul><ul><li>Easy to profile and make improves </li></ul></ul>
  26. 26. Optimizations <ul><li>Split your static content and dynamic content server </li></ul><ul><ul><li>Use lightweight web server to server static contents </li></ul></ul><ul><ul><li>Use different domain to different server </li></ul></ul><ul><li>Caching </li></ul><ul><ul><li>Memcached </li></ul></ul><ul><ul><ul><li>Query result, domain objects, sessions </li></ul></ul></ul><ul><ul><ul><li>Page tiles, template tiles </li></ul></ul></ul><ul><ul><ul><li>Everything that you need </li></ul></ul></ul>
  27. 27. Optimizations <ul><li>Caching </li></ul><ul><ul><ul><li>Optimize your code (lazy evaluate, cache result) </li></ul></ul></ul><ul><ul><ul><li>Cache and asynchronous update (cron update) </li></ul></ul></ul><ul><ul><ul><li>目标,命中率 90% 以上! Target 90%+ </li></ul></ul></ul><ul><ul><ul><li>But cache invalidation is a critical problem! </li></ul></ul></ul><ul><ul><ul><li>Asynchronous messaging make sure cache validate </li></ul></ul></ul><ul><ul><ul><ul><li>No blocking! </li></ul></ul></ul></ul><ul><ul><ul><ul><li>ActiveMQ, RabbitMQ, Drb (for ruby) </li></ul></ul></ul></ul>
  28. 28. Optimizations <ul><li>Caching </li></ul><ul><ul><li>Better client side caching </li></ul></ul><ul><ul><ul><li>Use expired header: max-age, expired </li></ul></ul></ul><ul><ul><ul><li>E-tag? (Not recommended, IE doesn’t support it) </li></ul></ul></ul><ul><ul><ul><li>Use HEAD method and 301 to detect changes (for squid or other proxy scenarios) </li></ul></ul></ul><ul><ul><ul><li>Compress (contact js, css) </li></ul></ul></ul>
  29. 29. Optimizations <ul><li>SQL optimizations </li></ul><ul><ul><li>Add index (especially the column in where closure) </li></ul></ul><ul><ul><li>De-normalized SQL </li></ul></ul><ul><ul><ul><li>Useful redundancy (use duplication avoid join) </li></ul></ul></ul><ul><ul><li>Don’t relay on ORM. No matter Data-mapper/Active Record/Unit Of Work </li></ul></ul><ul><ul><li>Don’t use full-text search </li></ul></ul><ul><ul><ul><li>Use seperate search engine module (lucene) </li></ul></ul></ul>
  30. 30. Optimizations <ul><li>Choose proper database store engine </li></ul><ul><ul><li>Mysql: MyISAM? InnoDB? BDB? Heap? </li></ul></ul><ul><li>Accelerator </li></ul><ul><ul><li>PHP: APC, Zend Optimizer, XCache, eAccelerator, ionCube PHP Accelerator, Turck MMCache </li></ul></ul><ul><ul><li>Python: psyco </li></ul></ul><ul><ul><li>Ruby: Joyent accelerator </li></ul></ul>
  31. 31. But most important thing: Find out the bottle neck before you start to optimize your application. Find out the bottle neck before you start to optimize your application.
  32. 32. Next, Scaling, If time is enough If time is enough If time is enough
  33. 33. What is scaling? <ul><li>Three basics, 简单特性 : </li></ul><ul><ul><li>能够使用率的提高 , Useable capacity increasing </li></ul></ul><ul><ul><li>能够容纳数据集提高, Data capacity increasing </li></ul></ul><ul><ul><li>系统可维护, Maintainable </li></ul></ul>
  34. 34. Scaling, 2 ways <ul><li>Vertical Scaling </li></ul><ul><ul><li>Upgrade your hardware system </li></ul></ul><ul><ul><ul><li>More CPU, memory .... </li></ul></ul></ul><ul><li>Horizontal Scaling </li></ul><ul><ul><li>Buy more same hardware, deploy more server instance </li></ul></ul><ul><ul><li>Distributed your system </li></ul></ul><ul><ul><li>But this way need you modify your code (generally) </li></ul></ul>
  35. 35. Scaling-Load Balancing <ul><li>DNS-GSLB </li></ul><ul><ul><li>Use DNS’s round-robin algorithm randomize IP result </li></ul></ul><ul><ul><ul><li>xBayDNS </li></ul></ul></ul><ul><ul><li>Can’t deal with failure (TTL) </li></ul></ul><ul><ul><li>Hard to do accurate management </li></ul></ul><ul><li>CDN content delivery network </li></ul><ul><ul><li>transparent service provide by some company </li></ul></ul><ul><ul><li>expansive, and not suitable for dynamic content </li></ul></ul>
  36. 36. Scaling-Load Balancing <ul><li>Hardware LB </li></ul><ul><ul><li>Citrix: Netscalers, Foundry: ServerIron, F5 (4-7) </li></ul></ul><ul><ul><li>Expensive </li></ul></ul><ul><li>Software LB </li></ul><ul><ul><li>Perlbal (4), Pound (7) </li></ul></ul><ul><ul><li>LVS (4) </li></ul></ul>
  37. 37. Scaling-Load Balancing <ul><li>Layer2, Layer4 and Layer7 LB </li></ul><ul><ul><li>Layer 2: Link aggregation, provide redundancy and fault tolerance, improve access speed </li></ul></ul><ul><ul><li>Layer 4: round-robin on TCP (with port info) </li></ul></ul><ul><ul><li>Layer 7 </li></ul></ul><ul><ul><ul><li>Session sticky enalbed </li></ul></ul></ul><ul><ul><ul><li>Easy to write complicate hash logic </li></ul></ul></ul><ul><ul><ul><li>Good for Squid (Squid cluster enabled) </li></ul></ul></ul>
  38. 38. Scaling-Load Balancing <ul><li>Huge Scale LB </li></ul><ul><ul><li>GSLB -> DNS round robin </li></ul></ul><ul><ul><li>Virtual IP -> L4 or L7 LB (SNAT) </li></ul></ul><ul><ul><li>Example </li></ul></ul><ul><ul><ul><li>Level 1 LB use GSLB give geo-located DNS result </li></ul></ul></ul><ul><ul><ul><li>VIP is dispatched by F5 </li></ul></ul></ul><ul><ul><ul><li>F5 -> Squid, reverse proxy </li></ul></ul></ul><ul><ul><ul><li>Squid delegate real dynamic or static server </li></ul></ul></ul>
  39. 39. Scaling-Proxy Cache <ul><li>Reverse proxy </li></ul><ul><ul><li>Squid </li></ul></ul><ul><ul><ul><li>Use http head method to validate content </li></ul></ul></ul><ul><ul><ul><li>Use memory to cache content - light speed </li></ul></ul></ul><ul><ul><ul><li>Mature, fast, industry standard </li></ul></ul></ul>
  40. 40. Scaling-Database <ul><li>Scaling MySQL </li></ul><ul><ul><li>MySQL replication/duplication (Failure, Lag) </li></ul></ul><ul><ul><ul><li>Master/Slave </li></ul></ul></ul><ul><ul><ul><li>Tree replication </li></ul></ul></ul><ul><ul><li>Data partition </li></ul></ul><ul><ul><ul><li>MySQL proxy </li></ul></ul></ul><ul><ul><li>Data shard </li></ul></ul>
  41. 41. Scaling-File System <ul><li>Single Disk (Array) </li></ul><ul><ul><li>Raid 1, Raid 0, Raid5 </li></ul></ul><ul><ul><li>Partition table type (GPT, MBR) </li></ul></ul><ul><ul><li>Partition Format (ext2, ext3, resierfs, XFS, ZFS) </li></ul></ul><ul><li>Cluster </li></ul><ul><ul><li>Single Disk has limitation, but Cluster has no limit </li></ul></ul><ul><ul><li>NetApp Filer (NAS - Network-attached storage) </li></ul></ul><ul><ul><li>Many many choices </li></ul></ul>
  42. 42. Scaling-File System Sharing <ul><li>Hardware based sharing NAS (previous page) </li></ul><ul><li>NFS - most simple way to share FS </li></ul><ul><li>Samba - almost same with NFS, nice to try </li></ul><ul><li>MogileFS (for web, no cursor based random access) </li></ul><ul><li>GFS, Hadoop FS (chunk based) </li></ul>
  43. 43. We are coming a long way, baby
  44. 44. Thanks!

×