• Save
Building scalable and reliable websites
Upcoming SlideShare
Loading in...5
×
 

Building scalable and reliable websites

on

  • 3,811 views

 

Statistics

Views

Total Views
3,811
Views on SlideShare
3,501
Embed Views
310

Actions

Likes
5
Downloads
0
Comments
0

8 Embeds 310

http://tomasz.napierala.org 255
http://cendo.pl 17
http://www.slideshare.net 16
http://www.linkedin.com 14
https://www.linkedin.com 4
http://zen.jogger.pl 2
http://feeds2.feedburner.com 1
http://web.archive.org 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • - rozdać kartki publiczności

Building scalable and reliable websites Building scalable and reliable websites Presentation Transcript

  • Building scalable and reliable websites Studencki Festiwal Informatyczny Kraków, 12-14.03.2009 Tomasz Napierała
  • Tomasz Napierała Systems Architecture Engineer JID/email: tomasz.mapierala@allegro.pl
  • feed chain Naspers/MIH Tradus GaduGadu mail.ru Tencent QQ QXL Ricardo Allegro.pl QXL Poland
  • we are the borg 800 employees 650 in poland 125 in IT 13 services 10 on platform
  • . cz . ro . ua . bg . sk auctions ads payments shops inventory
  • meat the team
  • technical dept Application Unit 1 Application Unit 2 R&D BI & DWH PMO P&L Infrastructure Infrastructure: Systems Networks DBA Help Desk NOC
  • history Źródło: aukcjostat.pl Oracle DC2
  • ranks Site in Alexa: - 134th/206th worldwide - 4th in Poland Auctions: - 7th worldwide (counting eBay divisions) - biggest in central/eastern Europe
  • childhood
  • some numbers
    • >500 servers
    • 350TB of storage
    • 270 milions images (last 2 months)
    • 6.5TB of images
  • more numbers
    • show_item: ca 30%, roughly 100.000 per minute
    • 6 milions email notifications per day
    • 100.000 http requests per sec
    • 20.000 new http requests per sec
  • and even more
    • 5.5TB data in DB
    • 6GB daily growth
    • 2k – 10k queries/s, 200 milions queries per day
    • 1000 tables in database
  • service lifecycle Internet www DB www DB Internet Internet ? Internet Internet www DB www DB cache
    • room for growth
    • easy maintenance
    • easy development
    • availability
    • linear scalability
    • simple
    • best value for money
    • redundancy
    problem
  • recipes do you have any? we've got some...
  • cache everything browser client proxy reverse proxy memcached eaccelerator/xcache db cache fs cache
  • frontend reduce DNS response optimize images use ETags flush buffer get rid of cookies minimize requests control cache gzip JS at the bottom CSS at the top
  • frontend – the award let's see, how much we can really earn
  • frontend – the award empty cache vs primed cache 93% gain empty cache primed cache size filetype size filetype 6.5K 1 html/text 6.5K 1 html/text 7.3K 4 javascript 0.0K 1 css 8.3K 12 css images 71.4K 19 images 93.6K 37 6.5K 1
  • frontend – the award sweets get without cookie: 360B get with cookie: 1003B 75% gain
  • frontend – numbers
  • load balancing … or buy LACP/bonding LVS/IPVS HAProxy perlbal varnish/nginx ...
  • distribute load Internet cache LB originals LB Internet cache LB originals
  • storage
    • security
    • block size
    • data structure
    • hardware
    • filesystem size
    • copies
    • backups
    • I/O constraints
  • storage
    • MogileFS
    • get your own
    • Lustre
    • Hadoop
    • Amazon S3
  • database tune replicate, replicate more sharding
  • sharding + denormalization scale out availability small dataset - growing can be pain expensive joins plannig nightmare still replicate SPOF lookup table
  • eee, managebility easy to manage easy to deploy extensible easy to troubleshoot
  • mines sometimes better is worse
  • when time is an issue
  • sysinternals
    • 98% Linux shop
    • CentOS / Ubuntu / OpenSolaris / AIX
    • HP / IBM / 3PAR / Onstor / Isilon / Brocade
    • apache, lighttpd, squid, varnish, memcached, sphinx
    • Oracle, MySQL, PostgreSQL
  • sysinternals – storage
    • Isilon:
    • low impact node failure
    • infiniband intraconnect
    • R: 480MB/s, W: 250MB/s
    OnStor: 32k IO/s on-line reconfiguration online fix/resize 3PAR: all online remote copy thin provisioning unique architecture
  • netinternals
    • CISCO
    • Ironport
    • F5 – load balancer and much more
    • Juniper
  • monitor and manage
    • svn
    • monarch
    • sauron
    • altiris
    • request tracker
    • otrs
    • jira
    nagios gomez cacti rancid collectd cflowd SolarWinds
  • application
    • it's the code that doesn't scale, not the language
    • PHP, C, C++, Java
    • redefinig to more SOA
    • specialized daemons
  • application architecture Internet static www images LB cache originals Storage DB cache/pool cache originals upload
  • final thougths
    • KISS
    • research
    • long term perspective
    • scale horizontally
    • be ready for growth
    • focus on important
    • check your graphs
    • listen to wiser
    while (true) { identify_and_fix_bottlenecks(); drink(); sleep(); notice_new_bottlenecks(); }
  • thank you
  • questions?