Acquia Managed Cloud:!
Highly Available Architecture for Highly Unpredictable Traffic!




 Kieran Lal!           Jess Iandiorio!
 Technical Director!   Sr. Director, Cloud
                       Product Marketing!
 Acquia!
                       Acquia!
 January 19th, 2012!
Your Drupal Application Life Stages
              Set-up/Launch            Production                    Crisis

     Build                      Application updates           Diagnosis
     •    Load balancers        •  Drupal App code            •  Site failure
     •    Fast page cache                                     •  Infrastructure failure
                                Infrastructure updates
     •    App Servers                                         •  Application errors
                                •  OS
     •    Database                                            Resolution
                                •  Debugging
     •    File systems
                                •  Security                   •  Resize
     •    Web servers
                                Operations                    •  Launch new virtual servers
     •    App Configuration
                                                              •  Multi-region failover
     •    HA architecture       •  24X7 monitoring & alerts
                                •  Backups
     Deploy
                                •  Load testing
     •  Integrated Git/SVN
     •  Drag and drop content
        management




2!
Capacity Planning Options

      Options            Users hitting your site
                 .010
     Over Plan
 1               .008
     Over Pay

                 .006


                 .004


                 .002


                   0
                        Jul         Aug            Sept   Oct   Nov   Dec




3!
Capacity Planning Options

      Options                 Users hitting your site
                      .010
     Over Plan
 1                    .008
     Over Pay

     Under Plan       .006
 2
     Expect Outages

                      .004


                      .002


                        0
                             Jul         Aug            Sept   Oct   Nov   Dec




4!
Capacity Planning Options

      Options                 Users hitting your site
                      .010
     Over Plan
 1                    .008
     Over Pay

     Under Plan       .006
 2
     Expect Outages

                      .004
     Acquia Plan
 3
     No Failure
                      .002


                        0
                             Jul         Aug            Sept   Oct   Nov   Dec




5!
Unpredictable Traffic Victims
         Events Businesses             News/ M&E Organizations                High Growth Sites

     Challenges                        Challenges                        Challenges
     •  Plagued by prior event stats   •  You never know when you’ll be •  Lack of experience/skill set
     •  Failure extends beyond web        “Huff Po’d”                   •  No prior benchmarking data
     Consequences of                   •  Time-to-market is critical    Consequences of failure
     failure                           Consequences of failure           •  Missed opportunities
     •  Sales (tickets)                •  Loss of credibility            •  Discouraged users
     •  Brand Damage                   •  Readership                     •  Loss of confidence
     •  Missed donation                •  Contractual failures per
        opportunities                     advertising agreements
                                       •  Impact to the ad sales cycle




6!
The Framework
             Planned Successfully                Planned Unsuccessfully              Unplanned
        1                                   2                                   3
             Test early, often                   Best Effort Not Enough              “Crisis mode”


                 Profile                             Profile                             Profile
     •  Companies that are               •  Companies that plan to handle   •  Companies with truly volatile
        experienced with resizing           it themselves but don’t have       businesses
        exercises                           the “crisis” speed skill set
                                                                            •  Mission-critical sites where
     •  Allocate 3+ weeks for resizing   •  Web teams that have no prior       failure isn’t an option
        exercises combined with load        experience manually scaling
                                            servers                         •  Web teams that haven’t
        testing
                                                                               invested in HA architecture
     •  Don’t underestimate              •  Web teams who don’t have a
                                            triage plan in place for        •  Web teams that have separate
        administrative challenges
                                            evaluating application v.          application and infrastructure
                                            infrastructure failures            support

                                         •  Companies that are unlucky




7!
Planned Successfully
                               1
                                   Test early, often




     Planned Successfully

             Profile

     • Advanced notice
     • Work with our team to
     develop a plan and load
     test it

     Acquia:
     • Plan development
     • Provision resources
     • Continuous monitoring
     day of event




8!
Planned Successfully
                  1

The King Center
                      Test early, often




9!
Planned Successfully
                                          1

The King Center
                                              Test early, often




                       The Players!
             Customer: The King Center!
             Partner: Palantir, Soasta!
             Acquia: Sales, Operations, Support!

             Triage to Resolution: 3 Weeks!




10!
Planned Unsuccessfully
                                  2
                                      Best Effort Not Enough



      Planned Unsuccessfully

                Profile

       • Advanced notice
       • Tried to plan for the
       “worst case scenario”
       • Planning fell short of
       worst case scenario

       Acquia:
       • Immediate detection &
       resolution of
       infrastructure issues




11!
Planned Unsuccessfully
                  2
The BRIT Awards       Best Effort Not Enough




12!
Planned Unsuccessfully
                                      2
The BRIT Awards                           Best Effort Not Enough




                      The Players!
            Customer: The BRIT Awards!
            Acquia: Support, Operations, Cloud
            Engineering!

            Triage to Resolution: 20 minutes!




13!
Planned Unsuccessfully
                    2
Lilith Fair (RIP)       Best Effort Not Enough




14!
Unplanned
                             3
                                 “Crisis mode”



            Unplanned

              Profile
      • No advanced notice
      • Resources not
      available
      • Site goes down
      • Panic

      Acquia:
      • Triage the issue –
      Code, attack or
      capacity?
      • Resolve



15!
Unplanned
               3
Mother Jones       “Crisis mode”




16!
Unplanned
                                                      3
Mother Jones                                              “Crisis mode”




                             The Players!
               Customer: Mother Jones!
               Partner: New Eon Media!
               Acquia: Operations, Cloud Engineering,
               Support, Sales!

               Triage to Resolution: 2 months (code base, Drupal
               upgrade !




17!
Unplanned
                 3
Foreign Policy       “Crisis mode”




18!
Unplanned
                                             3
Foreign Policy                                   “Crisis mode”




                          The Players!

                 Customer: Foreign Policy!
                 Acquia: Operations, Cloud
                 Engineering, Sales!




19!
Unplanned
             3
Al Jazeera       “Crisis mode”




20!
Unplanned
                                            3
Al Jazeera                                      “Crisis mode”




                       The Players!
             Customer: Al Jazeera!
             Acquia: Support, Operations, Sales!

             Triage to Resolution: 12 Hours!




21!
Unplanned
           3
Al-Masry       “Crisis mode”




22!
Unplanned
                                          3
Al-Masry                                      “Crisis mode”




                     The Players!
           Customer: Al-Masry!
           Acquia: Support, Operations!

           Triage to Resolution: 1 Day!




23!
When Failure is Not an Option




24!
The Acquia Triage Checklist
 Determine nature of the problem                          10 to 30 minutes
    Check monitoring
    Check logs
 Mitigate problem                                        30 minutes to 2+ hrs
    Code
       Roll back or remediate
    Attack
       DOS – Block offending IP
       DDOS – Bring in DOSarrest
    Resize
       Automatic: Server HA, Web/DB failover
       Manual:
              Clone site for internal testing (Nagios)
              Increase size of DB
              Faster load balancers
              Larger Varnish Page Caching
              File system updates (GlusterFS)
              Increase web servers
25!
Platform-as-a-Service Stack
                            24/7 break- x, Advisory support,
 World Class Application      Technical account managers,
        Support!            Audits: Site, security, performance



  Application Network!        Search, Spam, Insight, Mobile,
                           Functional testing, Marketing testing,
        Services!            Load testing, Runtime reporting

      Application!
       Lifecycle!               Customized environment,
                               Analyze, Code management,
      Management!               Work ow, Cloud migration


    Platform Features!          Low Cost, Flexible, Reliable
Underlying Elastic Technology Stack
       Caching Load
                            Page Caching             Load Balancing
         Balancer
                                                                      Each layer is
                             Web Servers             Drupal Modules   composed of
 Drupal Application                                                   multiple
      Servers                                                         redundant
                                 PHP                    Caching
                                                                      servers. If
                                                                      one fails,
                               MySQL                  File Storage    there is little
                                                                      or no
       Data Services
                                                                      downtime!
                             Memcache                    Email


                        International Data Centers       Monitoring
Secure Infrastructure         Amazon AWS                  Backups




 27!
Multi-region replication & failover
For Back-ups across Borders

•  Acquia can deploy instances in any
   Amazon EC2 regions:
      -  US East
      -  US West
      -  Europe
      -  Singapore
      -  Japan
•  Who is this for?
      -  Organizations who see significant risk
         hosting their sites out of one geographic
         location




28!
Lessons Learned
          Planned Successfully             Planned Unsuccessfully       Unplanned
      1                                2                            3
          Test early, often                Best Effort Not Enough       “Crisis mode”


                                   How can I be successful?


                                 You need elastic infrastructure
                                 You need scaling automation
                          You need a team that can do diagnosis
                                    You need 24X7 support


                                 Engage Acquia early and often

29!
Conclusion

                         Acquia won’t let you fail


      We have the talent & infrastructure in place to ensure you’re
                                successful


      We’ll find the needle in a haystack, and ensure your best day
                          will never be your worst




             Predictable outcomes for unpredictable businesses!

30!
For more information about Managed Cloud
           Check out our website             Speak to a Sales rep




      http://www.acquia.com/products-services/acquia-managed-cloud!


31!
Questions
• For more information visit:
   http://www.acquia.com
• Contact us: sales@acquia.com or 888.9.ACQUIA
• Follow us: @acquia

• Comments welcome:
• Jess.iandiorio@Acquia.com
• Kieran.Lal@Acquia.com
       !"#$%&'()*+,-$.(.*/".#,-0(),11(+*(2"'3*#(3"4(
    http://acquia.com/resources/recorded_webinars!

Acquia Managed Cloud: Highly Available Architecture for Highly Unpredictable Traffic

  • 1.
    Acquia Managed Cloud:! HighlyAvailable Architecture for Highly Unpredictable Traffic! Kieran Lal! Jess Iandiorio! Technical Director! Sr. Director, Cloud Product Marketing! Acquia! Acquia! January 19th, 2012!
  • 2.
    Your Drupal ApplicationLife Stages Set-up/Launch Production Crisis Build Application updates Diagnosis •  Load balancers •  Drupal App code •  Site failure •  Fast page cache •  Infrastructure failure Infrastructure updates •  App Servers •  Application errors •  OS •  Database Resolution •  Debugging •  File systems •  Security •  Resize •  Web servers Operations •  Launch new virtual servers •  App Configuration •  Multi-region failover •  HA architecture •  24X7 monitoring & alerts •  Backups Deploy •  Load testing •  Integrated Git/SVN •  Drag and drop content management 2!
  • 3.
    Capacity Planning Options Options Users hitting your site .010 Over Plan 1 .008 Over Pay .006 .004 .002 0 Jul Aug Sept Oct Nov Dec 3!
  • 4.
    Capacity Planning Options Options Users hitting your site .010 Over Plan 1 .008 Over Pay Under Plan .006 2 Expect Outages .004 .002 0 Jul Aug Sept Oct Nov Dec 4!
  • 5.
    Capacity Planning Options Options Users hitting your site .010 Over Plan 1 .008 Over Pay Under Plan .006 2 Expect Outages .004 Acquia Plan 3 No Failure .002 0 Jul Aug Sept Oct Nov Dec 5!
  • 6.
    Unpredictable Traffic Victims Events Businesses News/ M&E Organizations High Growth Sites Challenges Challenges Challenges •  Plagued by prior event stats •  You never know when you’ll be •  Lack of experience/skill set •  Failure extends beyond web “Huff Po’d” •  No prior benchmarking data Consequences of •  Time-to-market is critical Consequences of failure failure Consequences of failure •  Missed opportunities •  Sales (tickets) •  Loss of credibility •  Discouraged users •  Brand Damage •  Readership •  Loss of confidence •  Missed donation •  Contractual failures per opportunities advertising agreements •  Impact to the ad sales cycle 6!
  • 7.
    The Framework Planned Successfully Planned Unsuccessfully Unplanned 1 2 3 Test early, often Best Effort Not Enough “Crisis mode” Profile Profile Profile •  Companies that are •  Companies that plan to handle •  Companies with truly volatile experienced with resizing it themselves but don’t have businesses exercises the “crisis” speed skill set •  Mission-critical sites where •  Allocate 3+ weeks for resizing •  Web teams that have no prior failure isn’t an option exercises combined with load experience manually scaling servers •  Web teams that haven’t testing invested in HA architecture •  Don’t underestimate •  Web teams who don’t have a triage plan in place for •  Web teams that have separate administrative challenges evaluating application v. application and infrastructure infrastructure failures support •  Companies that are unlucky 7!
  • 8.
    Planned Successfully 1 Test early, often Planned Successfully Profile • Advanced notice • Work with our team to develop a plan and load test it Acquia: • Plan development • Provision resources • Continuous monitoring day of event 8!
  • 9.
    Planned Successfully 1 The King Center Test early, often 9!
  • 10.
    Planned Successfully 1 The King Center Test early, often The Players! Customer: The King Center! Partner: Palantir, Soasta! Acquia: Sales, Operations, Support! Triage to Resolution: 3 Weeks! 10!
  • 11.
    Planned Unsuccessfully 2 Best Effort Not Enough Planned Unsuccessfully Profile • Advanced notice • Tried to plan for the “worst case scenario” • Planning fell short of worst case scenario Acquia: • Immediate detection & resolution of infrastructure issues 11!
  • 12.
    Planned Unsuccessfully 2 The BRIT Awards Best Effort Not Enough 12!
  • 13.
    Planned Unsuccessfully 2 The BRIT Awards Best Effort Not Enough The Players! Customer: The BRIT Awards! Acquia: Support, Operations, Cloud Engineering! Triage to Resolution: 20 minutes! 13!
  • 14.
    Planned Unsuccessfully 2 Lilith Fair (RIP) Best Effort Not Enough 14!
  • 15.
    Unplanned 3 “Crisis mode” Unplanned Profile • No advanced notice • Resources not available • Site goes down • Panic Acquia: • Triage the issue – Code, attack or capacity? • Resolve 15!
  • 16.
    Unplanned 3 Mother Jones “Crisis mode” 16!
  • 17.
    Unplanned 3 Mother Jones “Crisis mode” The Players! Customer: Mother Jones! Partner: New Eon Media! Acquia: Operations, Cloud Engineering, Support, Sales! Triage to Resolution: 2 months (code base, Drupal upgrade ! 17!
  • 18.
    Unplanned 3 Foreign Policy “Crisis mode” 18!
  • 19.
    Unplanned 3 Foreign Policy “Crisis mode” The Players! Customer: Foreign Policy! Acquia: Operations, Cloud Engineering, Sales! 19!
  • 20.
    Unplanned 3 Al Jazeera “Crisis mode” 20!
  • 21.
    Unplanned 3 Al Jazeera “Crisis mode” The Players! Customer: Al Jazeera! Acquia: Support, Operations, Sales! Triage to Resolution: 12 Hours! 21!
  • 22.
    Unplanned 3 Al-Masry “Crisis mode” 22!
  • 23.
    Unplanned 3 Al-Masry “Crisis mode” The Players! Customer: Al-Masry! Acquia: Support, Operations! Triage to Resolution: 1 Day! 23!
  • 24.
    When Failure isNot an Option 24!
  • 25.
    The Acquia TriageChecklist Determine nature of the problem 10 to 30 minutes Check monitoring Check logs Mitigate problem 30 minutes to 2+ hrs Code Roll back or remediate Attack DOS – Block offending IP DDOS – Bring in DOSarrest Resize Automatic: Server HA, Web/DB failover Manual: Clone site for internal testing (Nagios) Increase size of DB Faster load balancers Larger Varnish Page Caching File system updates (GlusterFS) Increase web servers 25!
  • 26.
    Platform-as-a-Service Stack 24/7 break- x, Advisory support, World Class Application Technical account managers, Support! Audits: Site, security, performance Application Network! Search, Spam, Insight, Mobile, Functional testing, Marketing testing, Services! Load testing, Runtime reporting Application! Lifecycle! Customized environment, Analyze, Code management, Management! Work ow, Cloud migration Platform Features! Low Cost, Flexible, Reliable
  • 27.
    Underlying Elastic TechnologyStack Caching Load Page Caching Load Balancing Balancer Each layer is Web Servers Drupal Modules composed of Drupal Application multiple Servers redundant PHP Caching servers. If one fails, MySQL File Storage there is little or no Data Services downtime! Memcache Email International Data Centers Monitoring Secure Infrastructure Amazon AWS Backups 27!
  • 28.
    Multi-region replication &failover For Back-ups across Borders •  Acquia can deploy instances in any Amazon EC2 regions: -  US East -  US West -  Europe -  Singapore -  Japan •  Who is this for? -  Organizations who see significant risk hosting their sites out of one geographic location 28!
  • 29.
    Lessons Learned Planned Successfully Planned Unsuccessfully Unplanned 1 2 3 Test early, often Best Effort Not Enough “Crisis mode” How can I be successful? You need elastic infrastructure You need scaling automation You need a team that can do diagnosis You need 24X7 support Engage Acquia early and often 29!
  • 30.
    Conclusion Acquia won’t let you fail We have the talent & infrastructure in place to ensure you’re successful We’ll find the needle in a haystack, and ensure your best day will never be your worst Predictable outcomes for unpredictable businesses! 30!
  • 31.
    For more informationabout Managed Cloud Check out our website Speak to a Sales rep http://www.acquia.com/products-services/acquia-managed-cloud! 31!
  • 32.
    Questions • For more informationvisit: http://www.acquia.com • Contact us: sales@acquia.com or 888.9.ACQUIA • Follow us: @acquia • Comments welcome: • Jess.iandiorio@Acquia.com • Kieran.Lal@Acquia.com !"#$%&'()*+,-$.(.*/".#,-0(),11(+*(2"'3*#(3"4( http://acquia.com/resources/recorded_webinars!