Planning For High Performance Web Application

Planning for performance For web developer, open discussion Tin@Beijing Open Party

师不必强于己己不必不如师己不必不如师

Agenda Basic programming practice Hardware platform Software platform System essentials Optimizations Load Balancing

Basic practices Use proper SCM CVS SVN Mercurial Git

Basic practices Use a auto-build system Shell scrips Make Ant, Nant Rake

Basic practices Use a Continues Integration tool So first you need a lot of tests Add auto test, compile job as daily task Use CI tools to monitor health of your code base CruiseControl, Luntbuild, Continnum, Hudson Cruise, Teamcity, Banboo Use cc-tray, cc-menu desktop widget

Basic practices Use a issue tracker Trac (only svn) Bugzilla, Mantis Bug Tracker Jira Mingle BugFree

Voice from twitter 一定要测试！一定要早点测试！一定要早点测试！否则你就死定了。对任何部分都要测试。性能测试要交给用户来做。那样才有意义。所以要做好 log 。

Basic practices Lifecycle control Develop -> Test -> Deploy Release management Trunk, Branch, Tag Milestone, Release candicate

Basic practices Use Agile methodologies XP practices TDD Pair programming Scrum Hybrid agile

Hardware platform Use economical hardware CPU and Memory Disk and disk I/O (Raid) NIC Power and fan 1U 2U 3U 4U ?

Hardware platform Brand Dell, IBM, HP, Lenovo, Asus? Service quality Hardware redundancy Part redundancy Availability and Lead Time (critical parts) Capacity redundancy Future plan?

Network & hosting VPS, 虚拟主机 Co-Located Hardware (colo), 主机托管 Bandwidth, Duel lines, air-condition Geo-location Self-Hosting How to choose network hardware (switch/router)? Cisco, Huaway, Foundry

Software platform Use pre-compiled OS and software Choose a OS CentOS, Redhat, Suse Freebsd Solaris no ubuntu server (from nicholas ding)

Software platform Choose a language (scriptiing language is better) PHP Python Perl Ruby Java Many many many... but not c...

Software platform Choose a database ( or data provider) Mysql Posgresql Big table implementation?

System essentials Web server Apache Lighthttpd Nginx Tux, Cherokee, Lightspeed Tomcat, Jetty Mongrel, Thin

System essentials Different deployment style (python/ruby) Apache + mod_python (mod_rails, passenger) Fastcgi, SCGI, CGI Proxy (Load balancing) + Multi-server instance thread? process?

System essentials Monitoring your system web server logs Webalizer, Report Magic Beacon (seperate static file server tracker) error log analysis AWStats & Google Analytics

System essentials Monitoring your system Monit (RubyWorks use runit) Monitoring process status Auto restart your important process Better than cron for monitoring Munin & Nagios Distributed monitoring all of your system Administrator’s eyes, developers friends

System essentials Munin & Nagios continues Munin has server and nodes, it generate sites to report the statistics of your server (in interval) Munin and Nagios and integrate Mem usage, CPU, process, disk usage Service: HTTP, SMTP, POP3, NNTP, Ping Hardware temperature and other datas Network statistics Custom scrips (plugins): db related, user number

System essentials Protect your system ( Management is important than tools ) SSH brute attack protection ssh key login blockhost (scripts + pf/iptables) Audit: SELinux... Firewall (port block and audit) Use safe OS? (Netbsd, freebsd) Network safety (but no hardware firewall for websites)

System essentials SNA (Share Nothing Architecture) (This is relative term) All static file and rsync Database centric SNA Memcached + db-persistence Server hash, cluster, partition Amazon/Blogger/Cragslist/Facebook/Google/LiveJournal/Slashdot/Wikipedia/Yahoo/YouTube Session sticky

System essentials Make your modules independent Layers, packages Easy to replace module Easy to deploy Easy to profile and make improves

Optimizations Split your static content and dynamic content server Use lightweight web server to server static contents Use different domain to different server Caching Memcached Query result, domain objects, sessions Page tiles, template tiles Everything that you need

Optimizations Caching Optimize your code (lazy evaluate, cache result) Cache and asynchronous update (cron update) 目标，命中率 90% 以上！ Target 90%+ But cache invalidation is a critical problem! Asynchronous messaging make sure cache validate No blocking! ActiveMQ, RabbitMQ, Drb (for ruby)

Optimizations Caching Better client side caching Use expired header: max-age, expired E-tag? (Not recommended, IE doesn’t support it) Use HEAD method and 301 to detect changes (for squid or other proxy scenarios) Compress (contact js, css)

Optimizations SQL optimizations Add index (especially the column in where closure) De-normalized SQL Useful redundancy (use duplication avoid join) Don’t relay on ORM. No matter Data-mapper/Active Record/Unit Of Work Don’t use full-text search Use seperate search engine module (lucene)

Optimizations Choose proper database store engine Mysql: MyISAM? InnoDB? BDB? Heap? Accelerator PHP: APC, Zend Optimizer, XCache, eAccelerator, ionCube PHP Accelerator, Turck MMCache Python: psyco Ruby: Joyent accelerator

But most important thing: Find out the bottle neck before you start to optimize your application. Find out the bottle neck before you start to optimize your application.

Next, Scaling, If time is enough If time is enough If time is enough

What is scaling? Three basics, 简单特性 : 能够使用率的提高 , Useable capacity increasing 能够容纳数据集提高， Data capacity increasing 系统可维护， Maintainable

Scaling, 2 ways Vertical Scaling Upgrade your hardware system More CPU, memory .... Horizontal Scaling Buy more same hardware, deploy more server instance Distributed your system But this way need you modify your code (generally)

Scaling-Load Balancing DNS-GSLB Use DNS’s round-robin algorithm randomize IP result xBayDNS Can’t deal with failure (TTL) Hard to do accurate management CDN content delivery network transparent service provide by some company expansive, and not suitable for dynamic content

Scaling-Load Balancing Hardware LB Citrix: Netscalers, Foundry: ServerIron, F5 (4-7) Expensive Software LB Perlbal (4), Pound (7) LVS (4)

Scaling-Load Balancing Layer2, Layer4 and Layer7 LB Layer 2: Link aggregation, provide redundancy and fault tolerance, improve access speed Layer 4: round-robin on TCP (with port info) Layer 7 Session sticky enalbed Easy to write complicate hash logic Good for Squid (Squid cluster enabled)

Scaling-Load Balancing Huge Scale LB GSLB -> DNS round robin Virtual IP -> L4 or L7 LB (SNAT) Example Level 1 LB use GSLB give geo-located DNS result VIP is dispatched by F5 F5 -> Squid, reverse proxy Squid delegate real dynamic or static server

Scaling-Proxy Cache Reverse proxy Squid Use http head method to validate content Use memory to cache content - light speed Mature, fast, industry standard

Scaling-Database Scaling MySQL MySQL replication/duplication (Failure, Lag) Master/Slave Tree replication Data partition MySQL proxy Data shard

Scaling-File System Single Disk (Array) Raid 1, Raid 0, Raid5 Partition table type (GPT, MBR) Partition Format (ext2, ext3, resierfs, XFS, ZFS) Cluster Single Disk has limitation, but Cluster has no limit NetApp Filer (NAS - Network-attached storage) Many many choices

Scaling-File System Sharing Hardware based sharing NAS (previous page) NFS - most simple way to share FS Samba - almost same with NFS, nice to try MogileFS (for web, no cursor based random access) GFS, Hadoop FS (chunk based)

We are coming a long way, baby

Planning For High Performance Web Application

More Related Content

What's hot

Similar to Planning For High Performance Web Application

Recently uploaded

Planning For High Performance Web Application