Adjusting Carbon Topology to
Match High Availability Scenario
         Requirements

              Afkham Azeez
          Director of Architecture
                WSO2 Inc




                                     1
About Me
• PMC member Apache Axis, Committer Synapse
  & Web Services
• Member, Apache Software Foundation
• Co-author, Axis2 Web Services
• Director of Architecture, WSO2 Inc
• Blog: http://blog.afkham.org
• Twitter: afkham_azeez




                                              2
Agenda
• A brief look at the WSO2 platform
• Carbon clustering for availability
• Cost of availability & related topologies




                                              3
WSO2 Offerings
• WSO2 Carbon
  •   Full platform of servers for deployment on-premise, in private or public cloud
  •   Products share a consistent architecture and core platform services (e.g.
      logging, management, security, identity, caching) through OSGi and the “Carbon
      Core”
  •   Includes ESB, AppServer, Data Services, Governance, Identity, Business
      Process, and more

• WSO2 Stratos
  •   Platform-as-a-Service (PaaS) Foundation
  •   Supports running servers as elastic, metered, billed, multi-tenant with self-service
       • Including all Carbon Servers, PHP, Jetty, and a growing list through a standard Cartridge
         model

• WSO2 StratosLive
  •   http://stratoslive.wso2.com
  •   WSO2’s Public PaaS
  •   An instance of Stratos running in the cloud with all Carbon Servers available                  4
Consistent Architecture
•       Carbon: A consistent set of class-leading enterprise servers
    •         The same products run either on-premise or in the cloud, single-tenant or multi-
              tenant
    •         Utilize the same Carbon core runtime for a seamless experience
•       Stratos: A cloud platform for enterprise, hybrid and public deployment
    •         Extends the deployment to support full self-service, elastic scaling, metering and
              billing
    •         Supports Carbon and native server runtimes
          •       Including Java and non-Java servers such as Jetty and PHP
          •       Re-uses the same core Carbon architecture to offer core PaaS services including:
                 •     Identity, Logging, File, Relational Storage, Column Storage, Code Deployment, etc
•       Both projects share a common set of OSGi modules and a core runtime
        architecture


                                                                                                           5
WSO2 SOA Platform




                    6
WSO2 Carbon




              7
Availability
The degree to which a system, subsystem, or
 equipment is in a specified operable and
 committable state at the start of a mission, when
 the mission is called for at an unknown, i.e., a
 random, time.


Simply put, availability is the proportion of time a
  system is in a functioning condition.


                                                       8
Availability




               9
Availability




               10
High Availability (HA)
A system that is designed for continuous operation in the
  event of a failure of one or more components. However,
  the system may display some degradation of service, but
  will continue to perform correctly.


The proportion of time during which the service is
  accessible with reasonable response times should be
  close to 100%.


All single points of failure should be eliminated

                                                        11
HA, CO & CA
• Continuous Operation (CO)
  •   Ability to avoid planned outages.
  •   hardware and software maintenance carried out
      while applications remains available users.
• Continuous Availability (CA)
  •   Combines the characteristics of HA and CO to keep
      the applications running without any noticeable
      downtime
  •   Hot update/ graceful round-robin restart

                                                      12
High Availability Techniques
• Redundancy
  •   Time – retransmit
  •   Data – e.g. parity bits
  •   Processing – e.g. redundant nodes
• Diversity
  •   e.g. Hybrid deployments, do the same thing using
      different implementations




                                                         13
How to decide required availability?
• Average throughput (TPS)
• Max throughput (TPS)
• Monetary value of a transaction
• Average loss & max loss per second of
  downtime
• Decide on how much to invest on availability
  based on cost vs. benefit tradeoff


                                                 14
Patching Production Deployments
                      Patch Distribution Coordinator


                                                1. Check patch list
                                                2.Pull new patch




    3. Push patch   3. Push patch
                                                         3. Push patch

                                    3. Push patch




                                                                         15
Patching Production Deployments
                       Patch Distribution Coordinator




                                                        Round-robin



 4. Maintenance mode
 5. Graceful restart




                                                                      16
Clustering
• Clustering for scalability


• Clustering for availability




                                17
Clustering for scalability




                             18
Clustering for availability




    Group Communication Channel/State replication

                                                    19
Carbon Clustering
• Membership types
 •   Static
 •   Dynamic
 •   Hybrid
• Membership modes
 •   Multicast
 •   Well-known address



                                   20
Static membership
• Predefined members
• Other (non-predefined) nodes cannot join


        Static group


              M1            M2               N




                       M3        M4


                                                 21
Dynamic membership
• No predefined members
• Nodes can join & leave


        Dynamic group


             M1              M2               N
                                       Join




                        M3        M4


                                                  22
Hybrid membership
•   Some predefined (well-known) members, and some
    dynamic members
•   Nodes can join & leave
•   Membership revolves around the static members
    Hybrid group

       Dynamic members       Static members                      N

                                                    Join
            M5          M6        M1          M2    (IP, Port)




                   M7             M3          M4


                                                                 23
Multicast based membership management




                                      M4


                            M1
      N   Join
          (IP, Port)




                       M2        M3



                                           24
Well-known Address (WKA) based
         membership management


Hybrid group

   Dynamic members                Static members

                    M6
        M5
                                       WK1                           N
                                                   WK2

                         Notify                          Join (IP,
                                                         Port)

               M7                      WK3         WK4




                                                                         25
Multicast vs. WKA
Multicast                                   WKA
All nodes should be in the same subnet      Nodes can be in different networks
All nodes should be in the same multicast
domain                                      No multicasting requirement
Multicasting should not be blocked
No fixed IP addresses or hosts required     At least one well-known IP address or host
                                            required
Failure of any member does not affect       New members can join with some WKA
membership discovery                        nodes down, but not if all WKA nodes are
                                            down
Does not work on IaaSs such as Amazon       IaaS-friendly
EC2
                                            Requires keepalived, elastic IPs or some
                                            other mechanism for remapping IP
                                            addresses of WK members in cases of
                                            failure
                                                                                         26
Multicast vs. WKA – how to decide?
• Multicast
  •   Cluster is going to be setup in a network where
      multicasting is allowed
• WKA
  •   Cloud based deployment
  •   Members are distributed across datacenters &
      regions
  •   Multicasting blocked


                                                        27
HTTP Session Replication
• catalina-server.xml
  •   <Cluster className="org.wso2.carbon.core.session.CarbonTomcatSimpleTcpCluster"/>
  •   <Valve
      className="org.wso2.carbon.tomcat.ext.valves.CarbonTomcatSessionReplicationValve"/>

• web.xml
  •   <distributable/>




                                                                                            28
State Replication
JSR-107/JCache
  A standard Java Caching API for use by developers and a standard SPI ("Service Provider
     Interface") for use by implementers.



      import javax.cache.*

      …

      CacheManager cacheMgr = Caching.getCacheManager();

      Cache<String, Integer> cache =cacheMgr .getCache(cacheName);
      cache.put(“key”, sampleValue);
      Integer i = cache.get(“key”);



                                                                                   29
State Replication
CarbonContext based API

  Cache cache = CarbonContext.getCurrentContext().getCache();
  cache.put(“key”, sampleValue);
  Integer i = cache.get(“key”);


Axis2 Contexts
  Using Axis2 clustering StateManager – axis2.xml
  <stateManager class="org.apache.axis2.clustering.state.DefaultStateManager” enable=”true">




                                                                                               30
Elastic Load Balancer 2.0
• New sysadmin-friendly configuration language
• High performance PassThrough transport
• Tenant-aware load balancing
• Ability to dedicate clusters for tenants (private
  jet mode)
• Improved auto-scaler
  •   Separate IaaS-aware Cloud controller takes care of
      spawning new instances on different IaaSs

                                                           31
Tenant-aware LB




                  32
Private Jet mode

• Analogy
  • Economy class
     • no SLA management, only elasticity
  • Business class
     • elasticity plus SLA guarantees
  • Private Jet
     • Guaranteed isolated VMs or machines for a specific
       tenant
     • Still elastically scaled
Private Jet Mode




                   34
Topologies
• Single node
• Multi-node with LB
• Multi-node with elasticity using ELB
• Management & worker node separated
• Multi-zone or multi-datacenter deployment
• Multi-region




                                              35
Single node
HIGHEST
          Availability



                         Cost




LOWEST



                                              36
Primary-secondary
HIGHEST
          Availability



                         Cost




LOWEST


                                   Primary   Secondary
                                                         37
Primary-secondary, multiple LB
HIGHEST




                                           keepalived
          Availability



                         Cost




LOWEST


                                 Primary        Secondary
                                                            38
Active cluster, multiple LB
HIGHEST




                                               keepalived
          Availability



                         Cost




LOWEST

                                      Active        Active    Active
                                                                       39
Management & Worker Node Separation
•   Proper separation of concerns - management nodes
    specialize in management of the setup while worker nodes
    specialize in serving requests to deployment artifacts
•   Only management nodes are authorized to add new artifacts
    into the system or make configuration changes
•   Worker nodes can only deploy artifacts & read configuration
•   Lower memory foot in the worker nodes because the
    management console related OSGi bundles are not loaded
• Improved security - management nodes can be behind the
  internal firewall & be exposed to clients running within the
  organization only, while worker nodes can be exposed to
  external clients.
• Isolation of failures
                                                                  40
Management & Worker Node Separation
HIGHEST
          Availability



                          Cost




LOWEST




                                                               41
Regions & Zones




                  42
Stratos 2.0 Architecture




                           43
Multi-zone or multi-datacenter Deployment

HIGHEST




                                          Cloud
                                         Controller




                                Zone 1




                                                      Zone 2
          Availability




                                          Region X
                         Cost




LOWEST




                                                               44
Multi-region deployment
HIGHEST




                                 Zone 1




                                                     Zone 2



                                          Region X




                                                              Zone 1
          Availability




                                                                                  Zone 2
                         Cost




LOWEST
                                                                       Region Y



                                                                                           45
Multi-IaaS Deployment


       Cloud Controller




                          46
Multiple IaaS (hybrid) Deployment
HIGHEST




                                                  Zone 1




                                Private cloud (data center)    Zone 2




                                                                              Zone 1




                                                                                       Zone 2
                                                                        Amazon EC2




                                                Zone 1
          Availability



                         Cost




                                                              Zone 2
LOWEST
                                         Rackspace Cloud



                                                                                                47
Single Node




           Primary-Secondary, single LB




               Primary-Secondary,
               with multiple LBs




                    Multi-node active cluster
                    - Single zone
                                                      Cost of Availability




                                    Multi-zone
                                       Multi-region
                                         Multi-IaaS
48
HA for the Load Balancer
• Load balancer cluster
• Keepalived
• Elastic IP address
• Round Robin DNS




                                  49
Monitoring Servers
• Monit
 •   Automatically provide alerts & restart processes
     when monitored items (e.g. latency) fall below
     certain thresholds.
• New Relic
• Nagios




                                                        50
References
Information on tenant-aware load balancing
http://sanjeewamalalgoda.blogspot.com/2012/03/tenant-aware-load-balancer-is-upcoming.html

http://sanjeewamalalgoda.blogspot.com/2012/05/tenant-aware-load-balancer.html




Scaling Stratos
http://srinathsview.blogspot.com/2012/06/scaling-wso2-stratos.html

http://blog.afkham.org/2011/09/how-to-setup-wso2-elastic-load-balancer.html

http://blog.afkham.org/2011/09/wso2-load-balancer-how-it-works.html




                                                                                            51
Auto-scaler service deployment
http://nirmalfdo.blogspot.com/2012/07/autoscaler-service-deployment.html




Auto-scaler service
http://nirmalfdo.blogspot.com/2012/07/wso2-autoscaler-service-part-i.html



Automatic failover for WSO2 ELB
http://gonesimple.org/2012/09/24/automatic-fail-over-for-wso2-elb/




                                                                            52
Questions?




    http://www.flickr.com/photos/oberazzi/
                                   53

Adjusting carbon topology to match high availability scenario requirements

  • 1.
    Adjusting Carbon Topologyto Match High Availability Scenario Requirements Afkham Azeez Director of Architecture WSO2 Inc 1
  • 2.
    About Me • PMCmember Apache Axis, Committer Synapse & Web Services • Member, Apache Software Foundation • Co-author, Axis2 Web Services • Director of Architecture, WSO2 Inc • Blog: http://blog.afkham.org • Twitter: afkham_azeez 2
  • 3.
    Agenda • A brieflook at the WSO2 platform • Carbon clustering for availability • Cost of availability & related topologies 3
  • 4.
    WSO2 Offerings • WSO2Carbon • Full platform of servers for deployment on-premise, in private or public cloud • Products share a consistent architecture and core platform services (e.g. logging, management, security, identity, caching) through OSGi and the “Carbon Core” • Includes ESB, AppServer, Data Services, Governance, Identity, Business Process, and more • WSO2 Stratos • Platform-as-a-Service (PaaS) Foundation • Supports running servers as elastic, metered, billed, multi-tenant with self-service • Including all Carbon Servers, PHP, Jetty, and a growing list through a standard Cartridge model • WSO2 StratosLive • http://stratoslive.wso2.com • WSO2’s Public PaaS • An instance of Stratos running in the cloud with all Carbon Servers available 4
  • 5.
    Consistent Architecture • Carbon: A consistent set of class-leading enterprise servers • The same products run either on-premise or in the cloud, single-tenant or multi- tenant • Utilize the same Carbon core runtime for a seamless experience • Stratos: A cloud platform for enterprise, hybrid and public deployment • Extends the deployment to support full self-service, elastic scaling, metering and billing • Supports Carbon and native server runtimes • Including Java and non-Java servers such as Jetty and PHP • Re-uses the same core Carbon architecture to offer core PaaS services including: • Identity, Logging, File, Relational Storage, Column Storage, Code Deployment, etc • Both projects share a common set of OSGi modules and a core runtime architecture 5
  • 6.
  • 7.
  • 8.
    Availability The degree towhich a system, subsystem, or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time. Simply put, availability is the proportion of time a system is in a functioning condition. 8
  • 9.
  • 10.
  • 11.
    High Availability (HA) Asystem that is designed for continuous operation in the event of a failure of one or more components. However, the system may display some degradation of service, but will continue to perform correctly. The proportion of time during which the service is accessible with reasonable response times should be close to 100%. All single points of failure should be eliminated 11
  • 12.
    HA, CO &CA • Continuous Operation (CO) • Ability to avoid planned outages. • hardware and software maintenance carried out while applications remains available users. • Continuous Availability (CA) • Combines the characteristics of HA and CO to keep the applications running without any noticeable downtime • Hot update/ graceful round-robin restart 12
  • 13.
    High Availability Techniques •Redundancy • Time – retransmit • Data – e.g. parity bits • Processing – e.g. redundant nodes • Diversity • e.g. Hybrid deployments, do the same thing using different implementations 13
  • 14.
    How to deciderequired availability? • Average throughput (TPS) • Max throughput (TPS) • Monetary value of a transaction • Average loss & max loss per second of downtime • Decide on how much to invest on availability based on cost vs. benefit tradeoff 14
  • 15.
    Patching Production Deployments Patch Distribution Coordinator 1. Check patch list 2.Pull new patch 3. Push patch 3. Push patch 3. Push patch 3. Push patch 15
  • 16.
    Patching Production Deployments Patch Distribution Coordinator Round-robin 4. Maintenance mode 5. Graceful restart 16
  • 17.
    Clustering • Clustering forscalability • Clustering for availability 17
  • 18.
  • 19.
    Clustering for availability Group Communication Channel/State replication 19
  • 20.
    Carbon Clustering • Membershiptypes • Static • Dynamic • Hybrid • Membership modes • Multicast • Well-known address 20
  • 21.
    Static membership • Predefinedmembers • Other (non-predefined) nodes cannot join Static group M1 M2 N M3 M4 21
  • 22.
    Dynamic membership • Nopredefined members • Nodes can join & leave Dynamic group M1 M2 N Join M3 M4 22
  • 23.
    Hybrid membership • Some predefined (well-known) members, and some dynamic members • Nodes can join & leave • Membership revolves around the static members Hybrid group Dynamic members Static members N Join M5 M6 M1 M2 (IP, Port) M7 M3 M4 23
  • 24.
    Multicast based membershipmanagement M4 M1 N Join (IP, Port) M2 M3 24
  • 25.
    Well-known Address (WKA)based membership management Hybrid group Dynamic members Static members M6 M5 WK1 N WK2 Notify Join (IP, Port) M7 WK3 WK4 25
  • 26.
    Multicast vs. WKA Multicast WKA All nodes should be in the same subnet Nodes can be in different networks All nodes should be in the same multicast domain No multicasting requirement Multicasting should not be blocked No fixed IP addresses or hosts required At least one well-known IP address or host required Failure of any member does not affect New members can join with some WKA membership discovery nodes down, but not if all WKA nodes are down Does not work on IaaSs such as Amazon IaaS-friendly EC2 Requires keepalived, elastic IPs or some other mechanism for remapping IP addresses of WK members in cases of failure 26
  • 27.
    Multicast vs. WKA– how to decide? • Multicast • Cluster is going to be setup in a network where multicasting is allowed • WKA • Cloud based deployment • Members are distributed across datacenters & regions • Multicasting blocked 27
  • 28.
    HTTP Session Replication •catalina-server.xml • <Cluster className="org.wso2.carbon.core.session.CarbonTomcatSimpleTcpCluster"/> • <Valve className="org.wso2.carbon.tomcat.ext.valves.CarbonTomcatSessionReplicationValve"/> • web.xml • <distributable/> 28
  • 29.
    State Replication JSR-107/JCache A standard Java Caching API for use by developers and a standard SPI ("Service Provider Interface") for use by implementers. import javax.cache.* … CacheManager cacheMgr = Caching.getCacheManager(); Cache<String, Integer> cache =cacheMgr .getCache(cacheName); cache.put(“key”, sampleValue); Integer i = cache.get(“key”); 29
  • 30.
    State Replication CarbonContext basedAPI Cache cache = CarbonContext.getCurrentContext().getCache(); cache.put(“key”, sampleValue); Integer i = cache.get(“key”); Axis2 Contexts Using Axis2 clustering StateManager – axis2.xml <stateManager class="org.apache.axis2.clustering.state.DefaultStateManager” enable=”true"> 30
  • 31.
    Elastic Load Balancer2.0 • New sysadmin-friendly configuration language • High performance PassThrough transport • Tenant-aware load balancing • Ability to dedicate clusters for tenants (private jet mode) • Improved auto-scaler • Separate IaaS-aware Cloud controller takes care of spawning new instances on different IaaSs 31
  • 32.
  • 33.
    Private Jet mode •Analogy • Economy class • no SLA management, only elasticity • Business class • elasticity plus SLA guarantees • Private Jet • Guaranteed isolated VMs or machines for a specific tenant • Still elastically scaled
  • 34.
  • 35.
    Topologies • Single node •Multi-node with LB • Multi-node with elasticity using ELB • Management & worker node separated • Multi-zone or multi-datacenter deployment • Multi-region 35
  • 36.
    Single node HIGHEST Availability Cost LOWEST 36
  • 37.
    Primary-secondary HIGHEST Availability Cost LOWEST Primary Secondary 37
  • 38.
    Primary-secondary, multiple LB HIGHEST keepalived Availability Cost LOWEST Primary Secondary 38
  • 39.
    Active cluster, multipleLB HIGHEST keepalived Availability Cost LOWEST Active Active Active 39
  • 40.
    Management & WorkerNode Separation • Proper separation of concerns - management nodes specialize in management of the setup while worker nodes specialize in serving requests to deployment artifacts • Only management nodes are authorized to add new artifacts into the system or make configuration changes • Worker nodes can only deploy artifacts & read configuration • Lower memory foot in the worker nodes because the management console related OSGi bundles are not loaded • Improved security - management nodes can be behind the internal firewall & be exposed to clients running within the organization only, while worker nodes can be exposed to external clients. • Isolation of failures 40
  • 41.
    Management & WorkerNode Separation HIGHEST Availability Cost LOWEST 41
  • 42.
  • 43.
  • 44.
    Multi-zone or multi-datacenterDeployment HIGHEST Cloud Controller Zone 1 Zone 2 Availability Region X Cost LOWEST 44
  • 45.
    Multi-region deployment HIGHEST Zone 1 Zone 2 Region X Zone 1 Availability Zone 2 Cost LOWEST Region Y 45
  • 46.
    Multi-IaaS Deployment Cloud Controller 46
  • 47.
    Multiple IaaS (hybrid)Deployment HIGHEST Zone 1 Private cloud (data center) Zone 2 Zone 1 Zone 2 Amazon EC2 Zone 1 Availability Cost Zone 2 LOWEST Rackspace Cloud 47
  • 48.
    Single Node Primary-Secondary, single LB Primary-Secondary, with multiple LBs Multi-node active cluster - Single zone Cost of Availability Multi-zone Multi-region Multi-IaaS 48
  • 49.
    HA for theLoad Balancer • Load balancer cluster • Keepalived • Elastic IP address • Round Robin DNS 49
  • 50.
    Monitoring Servers • Monit • Automatically provide alerts & restart processes when monitored items (e.g. latency) fall below certain thresholds. • New Relic • Nagios 50
  • 51.
    References Information on tenant-awareload balancing http://sanjeewamalalgoda.blogspot.com/2012/03/tenant-aware-load-balancer-is-upcoming.html http://sanjeewamalalgoda.blogspot.com/2012/05/tenant-aware-load-balancer.html Scaling Stratos http://srinathsview.blogspot.com/2012/06/scaling-wso2-stratos.html http://blog.afkham.org/2011/09/how-to-setup-wso2-elastic-load-balancer.html http://blog.afkham.org/2011/09/wso2-load-balancer-how-it-works.html 51
  • 52.
    Auto-scaler service deployment http://nirmalfdo.blogspot.com/2012/07/autoscaler-service-deployment.html Auto-scalerservice http://nirmalfdo.blogspot.com/2012/07/wso2-autoscaler-service-part-i.html Automatic failover for WSO2 ELB http://gonesimple.org/2012/09/24/automatic-fail-over-for-wso2-elb/ 52
  • 53.
    Questions? http://www.flickr.com/photos/oberazzi/ 53

Editor's Notes

  • #13 Fox Mobile who ran for two years with zero downtime and multiple updates including a hardware refresh.
  • #21 Membership modes – multicast &amp; wkaA look at the cluster configuration
  • #43 Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, you can protect your applications from failure of a single location. Regions consist of one or more Availability Zones, are geographically dispersed, and will be in separate geographic areas or countries. The Amazon EC2 Service Level Agreement commitment is 99.95% availability for each Amazon EC2 Region. Amazon EC2 is currently available in three Regions: the US East (Northern Virginia) Region and the US West (Northern California) Region in the United States, and the EU (Ireland) Region in Europe.&quot;