Migrating LAMP to aDistributed Environment           By Paul Bogdashkin
Contact Paulhttps://joind.in/talk/view/7460paul.bogdashkin@ieee.org
Best Interview AnswerHiring Manager: “For this job, we needsomeone responsible”Applicant: “Im the one you want.On my last ...
Roadmap    Distributed System      –          Distributed System Defined      –          Scaling Up vs. Scaling Out      ...
Distributed System Defined    A distributed system is a collection of    independent computers that appears to its    use...
Scaling Up vs. Scaling Out    Scaling up, or vertical scaling refers to    upgrading an existing server by adding more   ...
Building a Distributed System    In order to build a scalable infrastructure,    the system has to support high availabil...
Performance Tuning    Caching: APC, Memcache or Redis    Static content: simple HTML pages that do    not require server...
Building a Distributed System                  Other Factors to Consider    Exterior Network – fast enough ISP connection...
Load Balancing
Load Balancing    Load balancer is a capability to balance traffic    load across a group of servers    It operates at m...
Load Balancer Representation
Load Balancer Types    Software    –        Linux Virtual Server    –        BalanceNG    –        HAproxy    –        Lo...
DNS Round Robin    Pre-load Balancing Era    It assigns multiple IPs to one hostname    (URL)    Switching is done rand...
DFP and Connection Tracking    Dynamic Feedback Protocol:    facilitates server-to-load-balancer    communications by pla...
Server Health    In-Band Server Health Tracking — refers to a    passive monitoring of packet activity to and    from the...
Out-of-Band Server Health Tracking    Server availability is determined by a simple    ICMP echo request (ping)    Appli...
Load Balancing Algorithms    Round Robin: distribution is done in a    sequential order of available servers that are in ...
ProblemsWhat Will Not Work Straight Out Of The Box
Problems and Solutions    Sessions    –        Load balancer session management    –        Centralized session server   ...
Scaling MySQL
Scaling MySQL: Replication    Performance: it is possible to spread the load    of database queries across several databa...
Distributed Read Replicas    Most common type of replication: read and    write operations are segregated    Perfect for...
Sharding    Sharding, or sometimes referred to as a type    of partitioning, is defined as the division of    one large d...
MySQL Clustering    The cluster concept refers to grouping    multiple servers together to behave as one,    either to pr...
MySQL Cluster    High scalability is achieved by automatically    sharding tables with complete application    transparen...
MySQL Cluster Usage    Requires using mysqld server that is    compiled with the cluster engine. Detailed    instructions...
Real World Examples
    > 50 millions users, > 6 billion pictures    Uses MySQL replication, splitting reads/writes    No DNS load balancin...
    >20 million articles in >250 languages    On average 50,000 requests every second    Built using MediaWiki with MyS...
Appendix
Scaling Out MySQL    ACID guarantees to maintain a persistent    global state as long as proper constraints    have been ...
Scaling Out MySQL Cont`d    For traditional databases CAP consistency is    the Holy Grail: it is maximized at the    exp...
Server Selection    Dispatch mode: Layer 2, rewrites MAC    address.    This is the fastest method of the three    Serve...
Types of Sharding    By Application Function: core functionality    such as logging or static content can reside    on a ...
What are your questions?
Upcoming SlideShare
Loading in …5
×

True North PHP conference 2012

979 views
932 views

Published on

A company could grow rapidly upon a successful idea. To sustain such success, infrastructure ought to grow and evolve into a large‐scale or enterprise‐grade Web application. This means that a distributed system has to be built to accommodate high availability and high traffic requirements.
This talk touches on scaling a simple PHP application residing in a LAMP environment (a stand-alone server is a common setup) to a scaled solution having redundant load balancers, distributed web and database servers. Most common problems (with both, scaling the system as well as scaling the PHP application) are covered with fully defined solutions, hints and tricks, as well as best practices. Real world examples are also given to grasp the perspective of how big guys (Flickr, Wikipedia to name a few) handled certain scenarios and overcame major obstacles.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
979
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

True North PHP conference 2012

  1. 1. Migrating LAMP to aDistributed Environment By Paul Bogdashkin
  2. 2. Contact Paulhttps://joind.in/talk/view/7460paul.bogdashkin@ieee.org
  3. 3. Best Interview AnswerHiring Manager: “For this job, we needsomeone responsible”Applicant: “Im the one you want.On my last job, every time anything went wrong,they said I was responsible”
  4. 4. Roadmap Distributed System – Distributed System Defined – Scaling Up vs. Scaling Out – Building a Distributed System Load Balancing – Load Balancer Types – DNS Round Robin – DFP and Connection Tracking – Server Health – Load Balancing Algorithms – Problems and Solutions Scaling MySQL – Replication – Sharding – MySQL Clustering – MySQL Cluster Usage  Real World Example
  5. 5. Distributed System Defined A distributed system is a collection of independent computers that appears to its users as a single coherent system However, just because it is possible to build a distributed system, it does not necessarily mean that it is a good idea A single stand-alone server can handle 500,000 page impressions per month Infrastructure of a site exceeding 1 million page visits per month needs to be scaled
  6. 6. Scaling Up vs. Scaling Out Scaling up, or vertical scaling refers to upgrading an existing server by adding more resources (RAM, CPU, disk space, etc.) Scaling out, or horizontal scaling implies building a system with 2 or more servers where the load is distributed (almost) evenly and provides transparency to the end user
  7. 7. Building a Distributed System In order to build a scalable infrastructure, the system has to support high availability and high traffic requirements To do so, there must be several layers of redundancy or replication present, and the overall performance must be fine-tuned
  8. 8. Performance Tuning Caching: APC, Memcache or Redis Static content: simple HTML pages that do not require server-side processing Asset Servers are used to serve static content Content Delivery Networks (CDN)
  9. 9. Building a Distributed System Other Factors to Consider Exterior Network – fast enough ISP connection Interior Network – very fast between servers Hardware: virtualization or commodity hardware Web Server: Apache or alternatives like nginx Database: Clustering, replication, NoSQL Server Load: CPU only handles 1 instruction at a time. Further requests are held in a queue
  10. 10. Load Balancing
  11. 11. Load Balancing Load balancer is a capability to balance traffic load across a group of servers It operates at multiple network layers – layer 2 (MAC), 3 (IP), 4 (TCP) and 5 (HTTP) Provides high availability of system as a whole Cost savings: multiple low-cost servers are better than one high-end super computer
  12. 12. Load Balancer Representation
  13. 13. Load Balancer Types Software – Linux Virtual Server – BalanceNG – HAproxy – Load Balancer Project Hardware – Load balancing functionality is embedded into a router – Comes with an enterprise price tag of around $30,000
  14. 14. DNS Round Robin Pre-load Balancing Era It assigns multiple IPs to one hostname (URL) Switching is done randomly Uneven load distribution Server health nor server load is available
  15. 15. DFP and Connection Tracking Dynamic Feedback Protocol: facilitates server-to-load-balancer communications by placing an agent on all servers that needs to be monitored Packet flow lookup and rewrite is done by storing source and destination connection data (interfaces, IPs, protocols, and port numbers). This information is stored in fast accessible memory, also known as shortcut tables
  16. 16. Server Health In-Band Server Health Tracking — refers to a passive monitoring of packet activity to and from the server, which is used as an indication if the server is active Out-of-Band Server Health Tracking — refers to active probing of servers for specific health information required from the server
  17. 17. Out-of-Band Server Health Tracking Server availability is determined by a simple ICMP echo request (ping) Application availability is achieved by testing if the application is listening on the expected port and responding; and that it is responding in the expected manner Application consistency ensures that application responses do not vary. The initial checksum of return values is
  18. 18. Load Balancing Algorithms Round Robin: distribution is done in a sequential order of available servers that are in the list Weighted Round Robin: weights are assigned to each server to spread the load more evenly. Servers with a higher weight receive more load Least Connection: using a connection table, lookup the server with the least amount of concurrent connections Weighted Least Connection: server selection is calculated by dividing a number of active connections by the server weight
  19. 19. ProblemsWhat Will Not Work Straight Out Of The Box
  20. 20. Problems and Solutions Sessions – Load balancer session management – Centralized session server – Asynchronous session state management SSL – Use one SSL certificate – Enable SSL Acceleration – No need for SSL between servers in DMZ Geographic distribution: use local ISPs DNS
  21. 21. Scaling MySQL
  22. 22. Scaling MySQL: Replication Performance: it is possible to spread the load of database queries across several database servers, thus offloading the primary database Geographic Diversity for DBs around the world Redundancy and Backup: high availability if any of the machines fail, rolling over to another server can be done in almost zero down time It is storage engine independent (can have InnoDB and MyISAM working closely together) Replication is asyncronous, which means that the slave server is not guaranteed to have the data when the master performs the change
  23. 23. Distributed Read Replicas Most common type of replication: read and write operations are segregated Perfect for applications with 80/20 r/w ratio
  24. 24. Sharding Sharding, or sometimes referred to as a type of partitioning, is defined as the division of one large database into a series of smaller MySQL servers The application then, must know where to retrieve the data based on a hashing algorithm or directory system, indicating where each fragment of data is located
  25. 25. MySQL Clustering The cluster concept refers to grouping multiple servers together to behave as one, either to provide enhanced performance of a system as a whole or for redundancy and fail-overs Main difference with replication is that it is a shared nothing, partitioning system that uses synchronous replication in order to maintain high availability and performance MySQL makes cluster technology available
  26. 26. MySQL Cluster High scalability is achieved by automatically sharding tables with complete application transparency Adding new nodes is done with no downtime Data is replicated synchronously between the nodes, ensuring multiple copies of data are available Node replacement usually takes less than 1 second Schemas can be modified on-the-fly Overall low cost of ownership as it is designed to run on commodity hardware
  27. 27. MySQL Cluster Usage Requires using mysqld server that is compiled with the cluster engine. Detailed instructions on MySQL website: http://dev.mysql.com/ doc/refman/5.5/en/mysql-cluster- installation.html To create a table, simply append engine=NDBCLUSTER statement at the end of the query like so: – CREATE TABLE tbl_name (col_name column_definitions) ENGINE=NDBCLUSTER
  28. 28. Real World Examples
  29. 29.  > 50 millions users, > 6 billion pictures Uses MySQL replication, splitting reads/writes No DNS load balancing – one IP Heavy front-end caching
  30. 30.  >20 million articles in >250 languages On average 50,000 requests every second Built using MediaWiki with MySQL Geographic load balancing Static file server: lighttpd Linux Virtual Server Memcached
  31. 31. Appendix
  32. 32. Scaling Out MySQL ACID guarantees to maintain a persistent global state as long as proper constraints have been defined. However a CAP theorem by Eric Brewer states that there are three desirable DB characteristics; but we can only have two – Consistency: every node in the system contains same data (e.g., replicas are never out of date) – Availability: every request to a non-failing node in the system returns a response – Partition Tolerance: system properties (consistency and/or availability) hold even when the system is partitioned and data is lost
  33. 33. Scaling Out MySQL Cont`d For traditional databases CAP consistency is the Holy Grail: it is maximized at the expense of availability and partition tolerance When scaling, failures happen: when process is done million times a second, a one-in-a- million failure happens every second. CAP consistency is a luxury that must be sacrificed in order to maintain availability This should be kept in mind when building a truly distributed database system
  34. 34. Server Selection Dispatch mode: Layer 2, rewrites MAC address. This is the fastest method of the three Server NAT mode: Layer 3, rewrites IP headers. Used when an application cannot bind RIP & VIP at the same time or NOS cannot handle ARP Client NAT mode: translates the original client IP address from a pool of addresses. Acts as a full proxy; useful to inject cookies
  35. 35. Types of Sharding By Application Function: core functionality such as logging or static content can reside on a separate shard server (Wikipedia languages) By Hash or Key: calculate how many shards are going to be needed and then distribute the data based on a common key such as the primary key of a record in a table Via a Lookup Service: having a lookup database where data is divided between a main user table and a series of user profile tables stored across multiple shards
  36. 36. What are your questions?

×