#lspe: Dynamic Scaling
Shock Absorbers and APIs

Steve Shah
Sr. Director, Product Management
February 21, 2013
Disclaimer

• I’m going to talk about a product.
 ᵒIt’s kind of necessary in order to make this talk useful.
 ᵒBut a lot of you have this product or know someone that does!
 ᵒThe product is pretty cool…
 ᵒIt can also sing and dance.
 ᵒMaking coffee is on the roadmap.
• Sorry.
 ᵒYes, I am marketing scum.
 ᵒNo, I will not to do a hard sell.
• My Competition
 ᵒGoogle it. No really… It’s not hard to find them.
 ᵒTheir product has various approaches too. I encourage you to ask them.
What is NetScaler?




                            Performan
             Availability       ce      Offload   Security




   NetScaler powers some of the world’s largest infrastructures.
1998 to 2012: From Load Balancing to Virtual
Networking

 1998     1999           2002          2003      2005    2006         2008           2009     2011

 L4 SLB   L7 SLB         SSL         SSLVPN     AppFW    ICA         XML VPX         SDX
          GSLB           CMP         RHI        SIP      IPv6        nCore EdgeSight AppFlow
          MUX            DNS                    AAA-TM                               DataStream


                                                         RHI = Route Health Injection
          Secret Decoder Ring:                           ICA = App Proxy for ICA
          SLB = Server Load Balancing                    IPv6 = IPv6 Routing, Switching, LB
          GSLB = Global Server Load Balancing            XML = XML Security, Routing
          MUX = HTTP Multiplexing                        VPX = Virtual NetScaler
          SSL = SSL Acceleration                         nCore = multi-core scaling
          CMP = HTTP Compression                         SDX = Multi-tenant NetScaler
          DNS = DNS Load Balancing / Proxy
Agenda

• Things That Impact Scalability
• Shock Absorbers
• Out Scaling
• Your ADC has an API!
Things That Impact Scalability
Touching on a bit of theory…
Load is Not Linear

• There are startup costs for enabling features in an ADC (memory and CPU)
• However, each incremental request takes a small fraction of resources
• As load increases, some global functions can take resources as well
 ᵒE.g., flushing unused IP fragments, running timers, management overhead, etc.
Data Structures and Big O

• I/O, Data structures, and String processing are big factors

• The two that get you are data structures and string
 ᵒACLs, VLANs, connection table, connection state, persistence table, etc.
 ᵒHTTP request processing and policy execution

• Know your Big O – understand their impact
 ᵒBig O notation is how programmers describe efficiency of algorithms
 ᵒE.g., O(n) vs. O(log n) vs. O(1)
Shock Absorbers
Coping with Load
Launching v8: The Role of Data Structures

• Story time… launching a major service and what we learned
• Major new roll-out – expected to double the number of servers to handle
• Early testing revealed that large numbers of slow connections are meh
• Invest in your data structures! Clean up on several core structures
•  Average connection lookup time driven to near constant time: O(1)
•  Stir in a team that dreams in assembly language and can see cache
  misalignment by glancing at code and shave another 20% off connection
  lookup times (absolute times)
• Lesson: drive your apps to good data structures. Drive your vendors to do
  better.
MaxConns and SurgeQ

                                 Incoming load




                               Peak perf – we want to
                               stay there




                      Typical server performance curve
MaxConns and SurgeQ


                        Queue incoming requests
                        in the ADC

                                                  Set max conns here


                      Server stays operating at maximum throughput
Story time:
When 4 Hurricanes Hit
Out-Scaling
The SR-71 Approach: Go Faster
  Treat a collection of NS devices
                                                          • Single System
 like a grand unified “big” device
                                                           ᵒconfigured and managed as a
                                                            single logical system
                                                          • Scalable
                 The Sheet-metal Test
  Steps:
                                                           ᵒscales with number of devices
   • Take a cluster of NS, and an L2 switch.                (distributes work)
   • Configure the devices to your liking.
   • Wrap the whole thing with sheet-metal, such
      that only the network ports remain exposed.
                                                          • Fault Tolerant
  Test:                                                    ᵒHandles device failure, addition…
  Must be able to configure and use this contraption as
  if it were just another NS box.                         • Dynamic
   • connect wires into any visible port(s), create
      LAGs at will, enable L2 mode, MBF …
   • point GUI to Cluster’s IP and configure away
Clustering

• Create a single system image out of a collection of instances
 ᵒInstances = virtual machines, physical instances, or instances on multi-tenant boxes
• True shared management + data plane (the sheet metal test)
• Shared state for key data structures (persistence, health check, etc.)
• Linear scale by adding instances (up to 32)
• Ability to manage faults with proportional degradation
Real-time   Policy Based
                           Analytics     Actions
     Bandwidth                                         Compress
   Connections                                         Cache
Top ‘N Requests                                        Log
 Response Time                                         Drop
      Frequency                                        Respond

                Policy Based                   Decision
             Traffic Selection                 Feedback loop
Scaling Globally

                                                                      Active                Mirror
                                                                       Site                  Site




      Global Server Load Balancing                                   Route Health Injection
                 (GSLB)                                                     (RHI)
NetScaler uses DNS to send users to the closest site based   NetScaler dynamically updates routing tables to direct
 on administrator defined metrics (geography, topology,        clients to the active site based on real-time health
             site performance, availability)                         monitoring of backend infrastructure.
Your ADC Has an API!
API in a Nutshell: Your ADC Has This


                                                     API

      Interfaces         Client Toolkits                       Policy              Statistics

                        Scripting           OOP          Reverse                 Bulk      Granular
     SOAP    RESTful   Perl/PHP/Python/   Java/C#/ASP/              JSON/XML
                          PowerShell       .NET based    Call-Out              Reporting   Reporting
More RESTful - HTTP Status Code

                 REQUEST                                                              RESPONSE

 Success Case:                                                      Success Case
 GET
 http://<nsip>/nitro/v1/config/lbvserver/lbv1                       HTTP 200 OK

 Failure Case:
 POST http://<nsip>/nitro/v1/config/lbvserver                       Failure Case:
 Content-
 Type:application/vnd.com.citrix.netscaler.lbvser                   HTTP/1.0 409 Conflict
 ver+json
                                                                    {
 {"lbvserver":                                                          "errorcode": 273,
    {"name":"lbv111", "servicetype":"HTTP"}                             "message": "Resource already exists",
 }                                                                      "severity": "ERROR"
                                                                    }

                                         Citrix Confidential - Do Not Distribute
Example: Using Java
                 Indicate we want “rollback on failure” in this session




             Prepare 3 lbvservers to be added in one bulk operation




                                               Output



                                      Print results                       No attempt to add
                                                                          “lb3” because of
                                                                          Rollback behavior
AutoSense and AutoScale
       NetScalerautomatically is auto-provisionedabnormal behavior withbindings
        Traffic is monitoring engine auto-detects byin new serviceon NetScaler
          NetScaler NetScaler scaled for the newly added services does servers
              NetScaler triggers AutoScale capability CloudStack
     CloudStack “auto-provisions”CloudStack provides CloudStackAutoScale policy
       On successful AutoScale, adds server instances Latency, Throughput …
          NetScaler automatically new new service resources and descriptions
                    monitors servers to CPU, Memory, based on




                                                                   M
                                                                       M
                                                                    M
         Internet                                                   M

                                                                       M
                                                                   M


                                        CloudStack
Work better. Live better.

#lspe: Dynamic Scaling

  • 1.
    #lspe: Dynamic Scaling ShockAbsorbers and APIs Steve Shah Sr. Director, Product Management February 21, 2013
  • 2.
    Disclaimer • I’m goingto talk about a product. ᵒIt’s kind of necessary in order to make this talk useful. ᵒBut a lot of you have this product or know someone that does! ᵒThe product is pretty cool… ᵒIt can also sing and dance. ᵒMaking coffee is on the roadmap. • Sorry. ᵒYes, I am marketing scum. ᵒNo, I will not to do a hard sell. • My Competition ᵒGoogle it. No really… It’s not hard to find them. ᵒTheir product has various approaches too. I encourage you to ask them.
  • 3.
    What is NetScaler? Performan Availability ce Offload Security NetScaler powers some of the world’s largest infrastructures.
  • 4.
    1998 to 2012:From Load Balancing to Virtual Networking 1998 1999 2002 2003 2005 2006 2008 2009 2011 L4 SLB L7 SLB SSL SSLVPN AppFW ICA XML VPX SDX GSLB CMP RHI SIP IPv6 nCore EdgeSight AppFlow MUX DNS AAA-TM DataStream RHI = Route Health Injection Secret Decoder Ring: ICA = App Proxy for ICA SLB = Server Load Balancing IPv6 = IPv6 Routing, Switching, LB GSLB = Global Server Load Balancing XML = XML Security, Routing MUX = HTTP Multiplexing VPX = Virtual NetScaler SSL = SSL Acceleration nCore = multi-core scaling CMP = HTTP Compression SDX = Multi-tenant NetScaler DNS = DNS Load Balancing / Proxy
  • 5.
    Agenda • Things ThatImpact Scalability • Shock Absorbers • Out Scaling • Your ADC has an API!
  • 6.
    Things That ImpactScalability Touching on a bit of theory…
  • 7.
    Load is NotLinear • There are startup costs for enabling features in an ADC (memory and CPU) • However, each incremental request takes a small fraction of resources • As load increases, some global functions can take resources as well ᵒE.g., flushing unused IP fragments, running timers, management overhead, etc.
  • 8.
    Data Structures andBig O • I/O, Data structures, and String processing are big factors • The two that get you are data structures and string ᵒACLs, VLANs, connection table, connection state, persistence table, etc. ᵒHTTP request processing and policy execution • Know your Big O – understand their impact ᵒBig O notation is how programmers describe efficiency of algorithms ᵒE.g., O(n) vs. O(log n) vs. O(1)
  • 9.
  • 10.
    Launching v8: TheRole of Data Structures • Story time… launching a major service and what we learned • Major new roll-out – expected to double the number of servers to handle • Early testing revealed that large numbers of slow connections are meh • Invest in your data structures! Clean up on several core structures •  Average connection lookup time driven to near constant time: O(1) •  Stir in a team that dreams in assembly language and can see cache misalignment by glancing at code and shave another 20% off connection lookup times (absolute times) • Lesson: drive your apps to good data structures. Drive your vendors to do better.
  • 11.
    MaxConns and SurgeQ Incoming load Peak perf – we want to stay there Typical server performance curve
  • 12.
    MaxConns and SurgeQ Queue incoming requests in the ADC Set max conns here Server stays operating at maximum throughput
  • 13.
    Story time: When 4Hurricanes Hit
  • 14.
  • 15.
    The SR-71 Approach:Go Faster Treat a collection of NS devices • Single System like a grand unified “big” device ᵒconfigured and managed as a single logical system • Scalable The Sheet-metal Test Steps: ᵒscales with number of devices • Take a cluster of NS, and an L2 switch. (distributes work) • Configure the devices to your liking. • Wrap the whole thing with sheet-metal, such that only the network ports remain exposed. • Fault Tolerant Test: ᵒHandles device failure, addition… Must be able to configure and use this contraption as if it were just another NS box. • Dynamic • connect wires into any visible port(s), create LAGs at will, enable L2 mode, MBF … • point GUI to Cluster’s IP and configure away
  • 16.
    Clustering • Create asingle system image out of a collection of instances ᵒInstances = virtual machines, physical instances, or instances on multi-tenant boxes • True shared management + data plane (the sheet metal test) • Shared state for key data structures (persistence, health check, etc.) • Linear scale by adding instances (up to 32) • Ability to manage faults with proportional degradation
  • 17.
    Real-time Policy Based Analytics Actions Bandwidth Compress Connections Cache Top ‘N Requests Log Response Time Drop Frequency Respond Policy Based Decision Traffic Selection Feedback loop
  • 18.
    Scaling Globally Active Mirror Site Site Global Server Load Balancing Route Health Injection (GSLB) (RHI) NetScaler uses DNS to send users to the closest site based NetScaler dynamically updates routing tables to direct on administrator defined metrics (geography, topology, clients to the active site based on real-time health site performance, availability) monitoring of backend infrastructure.
  • 19.
    Your ADC Hasan API!
  • 20.
    API in aNutshell: Your ADC Has This API Interfaces Client Toolkits Policy Statistics Scripting OOP Reverse Bulk Granular SOAP RESTful Perl/PHP/Python/ Java/C#/ASP/ JSON/XML PowerShell .NET based Call-Out Reporting Reporting
  • 21.
    More RESTful -HTTP Status Code REQUEST RESPONSE Success Case: Success Case GET http://<nsip>/nitro/v1/config/lbvserver/lbv1 HTTP 200 OK Failure Case: POST http://<nsip>/nitro/v1/config/lbvserver Failure Case: Content- Type:application/vnd.com.citrix.netscaler.lbvser HTTP/1.0 409 Conflict ver+json { {"lbvserver": "errorcode": 273, {"name":"lbv111", "servicetype":"HTTP"} "message": "Resource already exists", } "severity": "ERROR" } Citrix Confidential - Do Not Distribute
  • 22.
    Example: Using Java Indicate we want “rollback on failure” in this session Prepare 3 lbvservers to be added in one bulk operation Output Print results No attempt to add “lb3” because of Rollback behavior
  • 23.
    AutoSense and AutoScale NetScalerautomatically is auto-provisionedabnormal behavior withbindings Traffic is monitoring engine auto-detects byin new serviceon NetScaler NetScaler NetScaler scaled for the newly added services does servers NetScaler triggers AutoScale capability CloudStack CloudStack “auto-provisions”CloudStack provides CloudStackAutoScale policy On successful AutoScale, adds server instances Latency, Throughput … NetScaler automatically new new service resources and descriptions monitors servers to CPU, Memory, based on M M M Internet M M M CloudStack
  • 24.

Editor's Notes

  • #4 NetScaler enhances the deliver of your customers web applications across four principle dimensions (click). These include Availability, Performance, Offload, and Security; all built on a common IT interface and providing an excellent ROI.Within each feature category there are numerous techniques (CLICK) delivered by NetScaler and I will elaborate on each.Customers gain:100% application availability via our world-class L4-L7 load balancing capabilities and intelligent service health monitoring featuresAccelerated application performance by 5x through static and dynamic content caching and compressionAn average of 60% in application infrastructure savings through connection pooling and offloading SSL processing from servers; this is especially important for Web 2.0 applicationsEnd-to-end application security with integrated Access Gateway Enterprise for secure remote access and an application firewall for protectionagainst application layer attacks