Computer Networks                                                                                                                                                 11/11/2009




                                                                                               Internet Online Applications
             Scalable Internet Servers and
                    Load Balancingg                                                          Internet online applications
                                                                                                 Applications accessible to online users through Internet
                                                                                                                                                  Internet.
                                                                                             Examples
                                                                                                 Online keyword search engine: Google.
                                                                                                 Web email: Gmail.
                                                                                                 News: CNN, NBC news.
                                                                                                  Web directory: Yahoo!, MSN.
                                      Kai Shen
                                                                                              

                                                                                             Scalability requirements
                                                                                                 Many simultaneous user accesses; large amount of hosted
                                                                                                  data, …

                                                                                             Internet servers
                                                                                                 Computer systems that host these online applications

                 11/11/2009          CSC 257/457 - Fall 2009              1                    11/11/2009               CSC 257/457 - Fall 2009               2




                   Internet Servers are at the                                                 Search Engine as An Example:
                   Application Layer                                                           Step 1 – Crawling
                Normally on the end hosts, involving no routers
                Function on transport-layer protocols TCP/UDP                       Crawling – get all these Web pages out there:
                                                                                             g g                   p g
                                                                                              First retrieve some root pages;
                                                                                              Parse their content and follow hyperlinks to retrieve more
                                                                                               pages;
                                                                                              Depth-first search or breadth-first search? Remove
                                     Internet                                                  duplicates.

                                                                 Google


                  Yahoo!


                               CNN
                  11/11/2009           CSC 257/457 - Fall 2009                3                11/11/2009               CSC 257/457 - Fall 2009               4




CSC 257/457 - Fall 2009                                                                                                                                                   1
Computer Networks                                                                                                                                                                              11/11/2009



                     Performance Analysis for                                                              Search Engine as An Example:
                     Crawling                                                                              Step 2 – Indexing
                What are the resources involved?
                    CPU processing for TCP/HTTP protocol handling and the
                          p         g            p               g                                   Indexing
                     parsing of page content
                              f                                                                           crawled raw web pages are not easy to search.
                    writing to disk storage                                                              we index them to formats that are easy to search.
                     network bandwidth to remote web sites
                                                                                                      As part of indexing, we need to give each page an ID
                 
                                                                                                 

                Assume average page size 10KB                                                            using a hash function.
                     raw processing power of a single CPU
                                                                                                                                                         ……
                 

                       1000 requests/sec
                                                                                                     Computer:          Page #123 Page #357
                    I/O to a single disk
                       100 seeks/sec  up to 100 requests/sec

                    network bandwidth from/to the Internet
                       T1 link (1.5Mbit/s)  12 requests/sec
                                                                                                     Networks:          Page #124 Page #468              ……
                       T3 link (45Mbit/s)  360 requests/sec




                  11/11/2009                CSC 257/457 - Fall 2009                          5             11/11/2009                       CSC 257/457 - Fall 2009                        6




                     Search Engine as An Example:
                     Step 3 – Online Search                                                                Partitioning and Replication
                                                                                                                                                                             Index servers
                                                                                                                                                                              (partition 1)
                                                                             Index server




                                                                                                                          Firewall/
                           Firewall                                                                                        Router
                                                                      Local-
                                                                      Local-area
                                                                       network
                                      Web server/
                                      Query handler                                                                                                      Local-
                                                                                                                                                         Local-area
                  Internet                                                                                 Internet                                       network         Index servers
                                                                                                                                                                           (partition 2)


                                                                               Page server
                                                                                                                                       Web server/
                                                                                                                                      Query handlers

                                                                                                                                                                      Page servers
                                       Scalability, reliability
                  11/11/2009                CSC 257/457 - Fall 2009                          7             11/11/2009                       CSC 257/457 - Fall 2009                        8




CSC 257/457 - Fall 2009                                                                                                                                                                                2
Computer Networks                                                                                                                                                11/11/2009



                                                                              Load Balancing on Internet Servers
                     Load Balancing over Internet
                                                                              Technique 1 - DNS Rotation
                     Servers
                                                                                                                                             128.111.1.2
                Popular sites like Google or CNN receive tens or
                 hundreds of millions of hits per day
                                                  day.                                     IP address of
                                                                                             CNN.com?
                A large number of replicated servers are used at                                                         Firewall/           128.111.1.3
                 these sites.                                                                                              Router

                                                                                IP address of
                Key question: how to balance client requests over                CNN.com?      Internet
                 these servers?
                                                                                                                                              128.111.1.4

                                                                                                                 128.111.1.2
                                                                                                                                       Web servers
                                                                                                                                       for CNN.com
                                                                                                   128.111.1.3
                                                                                                                                DNS server
                                                                                                                               for CNN.com

                  11/11/2009               CSC 257/457 - Fall 2009        9   11/11/2009                   CSC 257/457 - Fall 2009                          10




                                                                               Load Balancing on Internet Servers
                  Discussions on DNS Rotation                                  Technique 2 – Cooperative Offloading
                                                                                                                                             128.111.1.2
                Advantages
                    Require almost no change on the existing Internet
                     architecture
                                                                                                                          Firewall/           128.111.1.3
                                                                                                                           Router
                Problems
                    DNS Caching                                                                Internet
                    Rigid load balancing policy
                       can’t balance based on runtime load changes
                                                                                                                                              128.111.1.4

                       slow or no adjustment in response to failures

                                                                                                                                       Web servers
                                                                                                                                       for CNN.com




                  11/11/2009               CSC 257/457 - Fall 2009       11   11/11/2009                   CSC 257/457 - Fall 2009                          12




CSC 257/457 - Fall 2009                                                                                                                                                  3
Computer Networks                                                                                                                                                11/11/2009



                  Discussions on Cooperative                                     Cooperative Offloading with
                  Offloading                                                     TCP Handoff [Pai et al. ASPLOS1998]
                                                                                                                                             128.111.1.2
                                                                                                  What does 1.3 do?
                 Can be combined with the DNS rotation.
                                                                                                  What does 1.4 do?
                 Advantages:
                      More flexible policy is possible                                                         clt IP Firewall/              128.111.1.3
                                                                                                                        Router
                      Be more responsive to runtime workload and server                                         1.3
                       failures (to a certain degree)                                                                                           clt IP
                                                                                             Internet                                            1.4
                 Problems:                                                                                                                   128.111.1.4
                      Need software changes on servers                                                     1.3
                                                                                                            13
                      Longer delay                                                                       clt IP                       Web servers
                                                                                                                                       for CNN.com
                                                                                                All packets in a TCP
                                                                                                connection must
                                                                                                offload to one server?
                  11/11/2009               CSC 257/457 - Fall 2009         13   11/11/2009              CSC 257/457 - Fall 2009                             14




                  Cooperative Offloading vs.                                    Load Balancing on Internet Servers
                  TCP Handoff                                                   Technique 3 – Load Balancing Router
                                                                                                                                             128.111.1.2
                                                                                                                             clt IP
                 Software changes on the servers
                               g                                                                                                                     1.2
                                                                                                                                 1.2
                                                                                                                                                 clt IP
                                                                                                          clt IP
                  Delays                                                                                                                      128.111.1.3
                                                                                                                       Firewall
                                                                                                           1.1         LB Router

                                                                                             Internet              128.111.1.1
                                                                                                          1.1

                                                                                                        clt IP                                128.111.1.4


                                                                                                                                       Web servers
                                                                                                                                       for CNN.com




                  11/11/2009               CSC 257/457 - Fall 2009         15   11/11/2009              CSC 257/457 - Fall 2009                             16




CSC 257/457 - Fall 2009                                                                                                                                                  4
Computer Networks                                                                                                                          11/11/2009




                 More About Load Balancing Router                                  Summary
             How deep do we look into the network protocol stack?                 Scalable Internet servers
                Network layer (IP)?                                                  partitioning
                                                                                      replication
                Transport layer (TCP/UDP)?
                Application layer?                                               Load balancing for Internet servers
                                                                                      DNS rotation
             Load balancing policies in LB routers (Goal: transparency,               cooperative offloading (w. TCP handoff)
               plug-and-play)                                                         Load balancing router
                Simple rotation                                                  Changes required on the components:
                                                                                                           components
                                                                                       DNS server??
                 Least number of active requests
                                                                                   
             
                                                                                      Web server??
                Shortest response time                                               client??
                                                                                      router??


                 11/11/2009             CSC 257/457 - Fall 2009           17       11/11/2009               CSC 257/457 - Fall 2009   18




CSC 257/457 - Fall 2009                                                                                                                            5

Scalable Internet Servers and Load Balancing

  • 1.
    Computer Networks 11/11/2009 Internet Online Applications Scalable Internet Servers and Load Balancingg  Internet online applications  Applications accessible to online users through Internet Internet.  Examples  Online keyword search engine: Google.  Web email: Gmail.  News: CNN, NBC news. Web directory: Yahoo!, MSN. Kai Shen   Scalability requirements  Many simultaneous user accesses; large amount of hosted data, …  Internet servers  Computer systems that host these online applications 11/11/2009 CSC 257/457 - Fall 2009 1 11/11/2009 CSC 257/457 - Fall 2009 2 Internet Servers are at the Search Engine as An Example: Application Layer Step 1 – Crawling  Normally on the end hosts, involving no routers  Function on transport-layer protocols TCP/UDP  Crawling – get all these Web pages out there: g g p g  First retrieve some root pages;  Parse their content and follow hyperlinks to retrieve more pages;  Depth-first search or breadth-first search? Remove Internet duplicates. Google Yahoo! CNN 11/11/2009 CSC 257/457 - Fall 2009 3 11/11/2009 CSC 257/457 - Fall 2009 4 CSC 257/457 - Fall 2009 1
  • 2.
    Computer Networks 11/11/2009 Performance Analysis for Search Engine as An Example: Crawling Step 2 – Indexing  What are the resources involved?  CPU processing for TCP/HTTP protocol handling and the p g p g  Indexing parsing of page content f  crawled raw web pages are not easy to search.  writing to disk storage  we index them to formats that are easy to search. network bandwidth to remote web sites As part of indexing, we need to give each page an ID    Assume average page size 10KB  using a hash function. raw processing power of a single CPU ……   1000 requests/sec Computer: Page #123 Page #357  I/O to a single disk  100 seeks/sec  up to 100 requests/sec  network bandwidth from/to the Internet  T1 link (1.5Mbit/s)  12 requests/sec Networks: Page #124 Page #468 ……  T3 link (45Mbit/s)  360 requests/sec 11/11/2009 CSC 257/457 - Fall 2009 5 11/11/2009 CSC 257/457 - Fall 2009 6 Search Engine as An Example: Step 3 – Online Search Partitioning and Replication Index servers (partition 1) Index server Firewall/ Firewall Router Local- Local-area network Web server/ Query handler Local- Local-area Internet Internet network Index servers (partition 2) Page server Web server/ Query handlers Page servers Scalability, reliability 11/11/2009 CSC 257/457 - Fall 2009 7 11/11/2009 CSC 257/457 - Fall 2009 8 CSC 257/457 - Fall 2009 2
  • 3.
    Computer Networks 11/11/2009 Load Balancing on Internet Servers Load Balancing over Internet Technique 1 - DNS Rotation Servers 128.111.1.2  Popular sites like Google or CNN receive tens or hundreds of millions of hits per day day. IP address of CNN.com?  A large number of replicated servers are used at Firewall/ 128.111.1.3 these sites. Router IP address of  Key question: how to balance client requests over CNN.com? Internet these servers? 128.111.1.4 128.111.1.2 Web servers for CNN.com 128.111.1.3 DNS server for CNN.com 11/11/2009 CSC 257/457 - Fall 2009 9 11/11/2009 CSC 257/457 - Fall 2009 10 Load Balancing on Internet Servers Discussions on DNS Rotation Technique 2 – Cooperative Offloading 128.111.1.2  Advantages  Require almost no change on the existing Internet architecture Firewall/ 128.111.1.3 Router  Problems  DNS Caching Internet  Rigid load balancing policy  can’t balance based on runtime load changes 128.111.1.4  slow or no adjustment in response to failures Web servers for CNN.com 11/11/2009 CSC 257/457 - Fall 2009 11 11/11/2009 CSC 257/457 - Fall 2009 12 CSC 257/457 - Fall 2009 3
  • 4.
    Computer Networks 11/11/2009 Discussions on Cooperative Cooperative Offloading with Offloading TCP Handoff [Pai et al. ASPLOS1998] 128.111.1.2 What does 1.3 do?  Can be combined with the DNS rotation. What does 1.4 do?  Advantages:  More flexible policy is possible clt IP Firewall/ 128.111.1.3 Router  Be more responsive to runtime workload and server 1.3 failures (to a certain degree) clt IP Internet 1.4  Problems: 128.111.1.4  Need software changes on servers 1.3 13  Longer delay clt IP Web servers for CNN.com All packets in a TCP connection must offload to one server? 11/11/2009 CSC 257/457 - Fall 2009 13 11/11/2009 CSC 257/457 - Fall 2009 14 Cooperative Offloading vs. Load Balancing on Internet Servers TCP Handoff Technique 3 – Load Balancing Router 128.111.1.2 clt IP  Software changes on the servers g 1.2 1.2 clt IP clt IP Delays 128.111.1.3  Firewall 1.1 LB Router Internet 128.111.1.1 1.1 clt IP 128.111.1.4 Web servers for CNN.com 11/11/2009 CSC 257/457 - Fall 2009 15 11/11/2009 CSC 257/457 - Fall 2009 16 CSC 257/457 - Fall 2009 4
  • 5.
    Computer Networks 11/11/2009 More About Load Balancing Router Summary How deep do we look into the network protocol stack?  Scalable Internet servers  Network layer (IP)?  partitioning  replication  Transport layer (TCP/UDP)?  Application layer?  Load balancing for Internet servers  DNS rotation Load balancing policies in LB routers (Goal: transparency,  cooperative offloading (w. TCP handoff) plug-and-play)  Load balancing router  Simple rotation  Changes required on the components: components DNS server?? Least number of active requests    Web server??  Shortest response time  client??  router?? 11/11/2009 CSC 257/457 - Fall 2009 17 11/11/2009 CSC 257/457 - Fall 2009 18 CSC 257/457 - Fall 2009 5