Scalarea Aplicatiilor Web
       Andrei Gheorghe
          idevelop.ro
Cazul cel mai comun

    Shared Hosting
Unde apar probleme
• Puterea de procesare a serverului:
  CPU, RAM, etc
• Latimea de banda
• Capacitate de stocare
• Baza ...
Server Web + Server DB
Load Balancing
Load Balancing
• Hardware
  • Balancingul se face la nivel de transport pachete
  • Scump, nu stie nimic despre arhitectur...
Reverse Proxy Load Balancing
•   Un singur front-end pentru mai multe servere
•   Securitate
•   Accelerarea cererilor SSL...
Relational Databases
    tabele, coloane, joinuri
MySQL Replication
MySQL Cluster
• Data node
   – Nu se interactioneaza direct cu ele
• Management node
   – Configurarea si monitorizarea cl...
MySQL Cluster
• Synchronous Replication
  – Datele sunt replicate pe mai multe noduri pentru a asigura
    disponibilitate...
Normalizare
• Presupune aducerea bazei de date la o
  “forma normala”
• Datele sunt structurate pe tabele cu relatii
  int...
Normalizare / Denormalizare

USERS
user_id, user_name, user_password

POSTS
post_id, post_author_id

COMMENTS
c_id, c_post...
Normalizare / Denormalizare

USERS
user_id, user_name, user_password

POSTS
post_id, post_author_id, post_author_name

COM...
Normalizare / Denormalizare

USERS
user_id, user_name, user_password

POSTS
post_id, post_author_id, post_author_name,
pos...
Key → Value Databases
Key → Value Databases
• Distributed, persistent hash tables
  • quot;Eventual consistencyquot;

• Permit SELECT-uri cu con...
Sharding
Vertical Sharding
• Un server pentru useri, un server pentru
  search, etc
• JOIN-urile intre tabele se fac manual
  • Den...
Horizontal Sharding
• Impartirea inregistrarilor dintr-un tabel intre
  mai multe servere
• Algoritmul de impartire este f...
Avantajele sharding-ului
• High availability.
  • Daca un server crapa, aplicatia continua sa functioneze

• Query-uri mai...
Cache
memcached
 memcached -d -u www -m 2048 -l 10.0.0.8 -p 11211

• Hash table distribuit, pastrat in RAM
     set(key, value)
...
memcached
• quot;Least Recently Usedquot;

• Intr-o retea cu mai multe servere, instantele
  de memcached pot fi legate in...
Session Clustering
Load Balancing Revisited
Session Clustering
• Store in common filesystem
  • Not useful in multi-server environments
  • NFS will cache pages

• St...
Content Delivery Network
• A collection of web servers distributed across
  multiple locations to deliver static content m...
Multiple Codebases
• Daca arhitectura serverelor si a site-ului o
  permite, se pot face lucruri interesante avand
  cod d...
Studii de caz
 highscalability.com
LAMP
   Shards
 Memcached
    Squid
   Smarty
Imagemagick
• More than 4 billion queries per day
• ~35M photos in squid cache (total)
• ~2M photos in squid’s RAM
• ~470M photos, 4 o...
• Debian Linux, Apache, PHP, MySQL
• memcached
• MemcacheDB - distributed key-value storage
  system which conforms to mem...
• 26 million uniques a month
• 30 million users.
• Uniques are only half that traffic. Traffic =
  unique web visitors + A...
• Data are separated into separate clusters: User
  Actions, Users, Comments, Items, etc.
• Asynchronous queuing architect...
Amazon Web Services
Simple Storage Service (S3)
•   Cloud storage service
•   Servere in US / Europe
•   REST API
•   Stocare: $0.150 / GB
•  ...
Elastic Compute Cloud (EC2)
• On-demand server instances
• In 5 minute poti porni un server la care ai acces
  root
• $0.1...
SimpleDB
• Distributed hash DB
• Permite SELECT-uri cu conditii
• Query limitat la 5 secunde
thank you, come again
Upcoming SlideShare
Loading in …5
×

Scalarea Aplicatiilor Web - 2009

885 views
827 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
885
On SlideShare
0
From Embeds
0
Number of Embeds
70
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Scalarea Aplicatiilor Web - 2009

  1. 1. Scalarea Aplicatiilor Web Andrei Gheorghe idevelop.ro
  2. 2. Cazul cel mai comun Shared Hosting
  3. 3. Unde apar probleme • Puterea de procesare a serverului: CPU, RAM, etc • Latimea de banda • Capacitate de stocare • Baza de date
  4. 4. Server Web + Server DB
  5. 5. Load Balancing
  6. 6. Load Balancing • Hardware • Balancingul se face la nivel de transport pachete • Scump, nu stie nimic despre arhitectura aplicatiei • DNS Load Distribution (quot;Round Robinquot;) • Statistic, distribuie traficul uniform • Nu stie nimic despre disponibilitatea serverelor • Pot aparea probleme de DNS caching • Este o solutie doar la scara foarte mare • Reverse Proxy
  7. 7. Reverse Proxy Load Balancing • Un singur front-end pentru mai multe servere • Securitate • Accelerarea cererilor SSL • Caching • nginx, squid, lighthttpd
  8. 8. Relational Databases tabele, coloane, joinuri
  9. 9. MySQL Replication
  10. 10. MySQL Cluster • Data node – Nu se interactioneaza direct cu ele • Management node – Configurarea si monitorizarea clusterului • SQL node (mysqld process): – Un server MySQL care se conecteaza la nodurile de date pentru a cere sau stoca informatii • Generally, each node will run on a separate host
  11. 11. MySQL Cluster • Synchronous Replication – Datele sunt replicate pe mai multe noduri pentru a asigura disponibilitatea in cazul deconectarii unui nod de date • Horizontal Data Partitioning – Informatiile sunt partitionate automat intre toate nodurile de date folosind un algoritm bazat pe primary key • Hybrid Storage – memory / disk • Shared Nothing – “no single point of failure“
  12. 12. Normalizare • Presupune aducerea bazei de date la o “forma normala” • Datele sunt structurate pe tabele cu relatii intre ele, si fiecare informatie apare o singura data • Asigura consistenta informatiei in cazul operatiilor asupra bazei de date
  13. 13. Normalizare / Denormalizare USERS user_id, user_name, user_password POSTS post_id, post_author_id COMMENTS c_id, c_post_id, c_text
  14. 14. Normalizare / Denormalizare USERS user_id, user_name, user_password POSTS post_id, post_author_id, post_author_name COMMENTS c_id, c_post_id, c_text
  15. 15. Normalizare / Denormalizare USERS user_id, user_name, user_password POSTS post_id, post_author_id, post_author_name, post_comment_count COMMENTS c_id, c_post_id, c_text
  16. 16. Key → Value Databases
  17. 17. Key → Value Databases • Distributed, persistent hash tables • quot;Eventual consistencyquot; • Permit SELECT-uri cu conditii • Necesita o doza de denormalizare a datelor • Tratarea manuala a inconsistentelor, propagarea datelor corecte • MemcacheDB, CouchDB, Amazon SimpleDB, Hypertable, Google BigTable
  18. 18. Sharding
  19. 19. Vertical Sharding • Un server pentru useri, un server pentru search, etc • JOIN-urile intre tabele se fac manual • Denormalizarea DB reduce nevoia de JOIN-uri SEARCH COMMENTS USERS
  20. 20. Horizontal Sharding • Impartirea inregistrarilor dintr-un tabel intre mai multe servere • Algoritmul de impartire este foarte important • in functie de algoritmul ales, reechilibrarea datelor in cazul modificarii topologiei USR #1 poate fi dificila • Se poate folosi un dictionar central USR #2 • algoritm transparent • mai usor de reechilibrat USR #3 • poate crea SPF
  21. 21. Avantajele sharding-ului • High availability. • Daca un server crapa, aplicatia continua sa functioneze • Query-uri mai rapide • Query-urile fiind pe bucati mai mici de date se executa mai repede • Rata de scriere mai mare • Scrierile se executa mai repede deoarece, neavand un server central, se executa in paralel
  22. 22. Cache
  23. 23. memcached memcached -d -u www -m 2048 -l 10.0.0.8 -p 11211 • Hash table distribuit, pastrat in RAM set(key, value) get(key) delete(key) • value este de obicei un intreg obiect serializat • Ex: articol + comentarii + informatii autor • Exista clase de interactiune cu memcached pentru orice limbaj de programare, inclusiv PHP
  24. 24. memcached • quot;Least Recently Usedquot; • Intr-o retea cu mai multe servere, instantele de memcached pot fi legate intre ele pentru a forma un cluster memcache in care cache-ul este replicat pe mai multe noduri • memcached ruleaza pe Linux, Windows, poate fi pornit oriunde exista RAM liber
  25. 25. Session Clustering
  26. 26. Load Balancing Revisited
  27. 27. Session Clustering • Store in common filesystem • Not useful in multi-server environments • NFS will cache pages • Store in database • Very fast because you are only ever looking up primary keys • Make sure the DB has row locking (InnoDB), not table locking. • Store in memcached • Stored across several machines rather than just one. • A total machine failure now affects only a percentage of users rather than everyone.
  28. 28. Content Delivery Network • A collection of web servers distributed across multiple locations to deliver static content more efficiently to users. • The server selected for delivering content to a specific user is typically based on a measure of network proximity.
  29. 29. Multiple Codebases • Daca arhitectura serverelor si a site-ului o permite, se pot face lucruri interesante avand cod diferit • Folosind un reverse proxy, se pot trimite 10% din vizitatori spre o versiune 2.0 beta a site-ului si observa felul cum interactioneaza • Daca lucrurile nu ies cum ar trebui, se revine la codul initial si nu au fost afectati decat 10%
  30. 30. Studii de caz highscalability.com
  31. 31. LAMP Shards Memcached Squid Smarty Imagemagick
  32. 32. • More than 4 billion queries per day • ~35M photos in squid cache (total) • ~2M photos in squid’s RAM • ~470M photos, 4 or 5 sizes of each • 38k req/sec to memcached (12M objects) • 2 PB raw storage (consumed about ~1.5TB on Sunday • Over 400,000 photos being added every day
  33. 33. • Debian Linux, Apache, PHP, MySQL • memcached • MemcacheDB - distributed key-value storage system which conforms to memcache protocol →15,000 writes/second, 64,000 reads/second • Lots of servers
  34. 34. • 26 million uniques a month • 30 million users. • Uniques are only half that traffic. Traffic = unique web visitors + APIs + Digg buttons. • 2 billion requests a month • 13,000 requests a second, peak at 27,000 requests a second.
  35. 35. • Data are separated into separate clusters: User Actions, Users, Comments, Items, etc. • Asynchronous queuing architecture for near- term processing
  36. 36. Amazon Web Services
  37. 37. Simple Storage Service (S3) • Cloud storage service • Servere in US / Europe • REST API • Stocare: $0.150 / GB • Upload: $0.100 / GB • Download: $0.170 / GB • Twitter foloseste S3 pentru pozele userilor
  38. 38. Elastic Compute Cloud (EC2) • On-demand server instances • In 5 minute poti porni un server la care ai acces root • $0.10 / ora, 99.95% uptime garantat – 4 ore pe an downtime • Se pot aloca adrese IP statice si se pot construi arhitecturi complexe • Acces rapid la S3
  39. 39. SimpleDB • Distributed hash DB • Permite SELECT-uri cu conditii • Query limitat la 5 secunde
  40. 40. thank you, come again

×