How to you manage Performance in the Cloud, in particular in "Platform as a Service (PaaS) environments like Window's Azure or Heroku where you don't have a "virtual machine" to manage?
Even in "Infrastructure as a Service (IaaS)" environments like Amazon EC2 there are limitations on the tools you can deploy into that environment to assist in performance management, troubleshooting etc (e.g. you can't deploy promiscuous mode network sniffing tools in EC2).
James Smith from Adactus will give us an overview of Cloud Services as a whole, and then drill down into some of the issues they have experienced in deployed their "Pulse" Claims Management Solution into the Azure cloud (http://www.pulseclaims.com/home).
Beyond just looking at page speed performance he'll talk about the challenges involved in managing SLA's, Cloud "support" (or lack of it!), performance troubleshooting and the whole "performance lifecycle".
10. Infrastructure (IaaS)
• Outsource hardware to support operations
– Storage, servers, networking components
• Service provider owns and hosts equipment
• Service provider responsible for
management & maintenance.
11.
12. Platform (Paas)
• Paradigm for delivering operating systems
and associated services over the Internet
• No downloads or installation
• Google App Engine, Microsoft Windows
Azure, Heroku & Force.com.
13. Software (SaaS)
• Software distribution model in which
applications are hosted by a vendor or
service provider
• Made available to customers over the
Internet
• SalesForce.com, many...many...more.
15. • “Virtualised” infrastructure operated for a
single organisation (single tenant)
• Hosted internally or externally
• Managed internally or by a third-party
• Can be secured to meet compliance
• More expensive, less flexible.
Private Cloud
16. • Service provider makes resources available
to the general public over the Internet
– Compute, Storage, O/S, Applications
• May be free or pay-per-usage model
• Fast deployment, short commitments
• Shared services, less control.
Public Cloud
17. • Core platform on private cloud
• Burstable capability into public cloud
• Brings best of both private and public
• Brings problems of both private and public.
Hybrid
18. THE COST OF POOR CLOUD
PERFORMANCE
Financial and customer satisfaction
19.
20. Cost
• Compuware survey suggests large business
losses can exceed £500k due to poor cloud
performance
• 57% of European IT Directors believe that
they can’t manage cloud application
performance
• You still have to deliver 2 second response
times.
21. Performance
• 50% of ops teams have suffered more than
one P-1 performance issue in the cloud
• 33% experience a P-1 issue every month
• 60% of incidents took more than 2 hours to
resolve
• Good luck webops (cloudops).
Source: AppDynamics
23. Performance Challenges
• Traditional
• Connectivity
– Bandwidth /
Latency
• Bottlenecks
– CPU, IO, Database
• Contemporary
• Bigger scale
– More stuff
• Shared
infrastructure
– Not your stuff
(entirely).
24. Traditional
• Connectivity
• Latency, jitter &
Packet loss
• Bandwidth limitations
• Users demand fast
access to data
• Bottlenecks
• Will still occur!
• Virtualised hardware
– Host Contention
– Storage.
25. Contemporary
• Bigger Scale
• 10’s, 100’s, 1000’s, 10,0
00’s of servers
– VM Sprawl
• Dynamically allocated
physical resource
• Over-provisioning
• Hidden billing costs
• Shared Resources
• Room for one more?
• Deal with other
peoples problems
– DDOS, general
stupidity?
– Mi casa, es tu casa.
26. • Elasticity
– Planned (scheduled/controlled scaling)
– Unplanned (auto-scaling)
• Global distribution
– Data Centres
– Data
• Less Control.
Paradigm Shift
30. • Adactus Food Ordering Platform
• Transacts
– > 7 million orders & > $100M USD a year
– 30% daily of orders taken in1 hour
• Adopted as eCommerce platform for Pizza
Hut and KFC globally.
Application
31. Platform
• Private
• Global instances all
deployed on private
clouds
• VMWare ESX Hosts
– V-Web’s
• Dedicated / Non-
Virtualised SQL
• Public
• Rackspace public
cloud
• On-Demand
– Load Balancers
– Web Servers
– SQL Servers
• High-scale, high-
volume.
32. • Big Scale
– A lot more to manage
• Virtual Platform
– Contention
• End-to-End Application Performance
Management.
Challenges
36. • Adactus Pulse
• Claims management solution for the
insurance industry delivered as SaaS
• Processed over a million claims
• Deployed for ISS and Aviva.
Application
37. Platform
• Deployed into Windows Azure Platform
– Web Roles
– Worker Roles
– SQL Azure
– SQL Azure Reporting Services
• Upgrade of traditional ASP.NET application
• Continuous Deployment Process.
38. Challenges
• Disproving the “shared resource” impact
– Is it the infrastructure?
• Database performance is a black-box
– Limitations and more limitations
• Getting performance data is hard work
– Not easy to access, dispersed everywhere
• Baseline performance is not linear.
45. • Service provider takes responsibility for
installing and maintaining the database.
• Amazon (mySQL)
• Microsoft SQL Azure
• Google App Engine Datastore
• CouchDB, MongoDB.
Overview
46. Challenges
• Most service providers are having
performance issues (even Google!)
• Database is a (performance) black-box
– You will find limitations
• Need to handle transient connections
– Your database will be there, but not always.
47. Solutions
• Do as much tuning outside of the cloud as
possible
• Instrument your data access
• DB sharding becomes viable easy
• Build connection resiliency into your data-
framework.
48. • On-premise databases
– Are you sure?
• You might be about to create your own
data storm?
– Too much on-premise data
– Too little bandwidth.
Caution
50. Overview
• Adactus Pulse
– Delivered on a SaaS Model
• We consume SaaS (heavily)
– CRM, Performance, Google Apps, WIKI, Bug
Tracking, Testing, Accounting, Planning &
Forecasting, Document
Management, CMS, Exception
Handling, Business
Intelligence, Deployment, APM, Collaboration,
HRM, ERP and more.
51. Challenges
• Consumer
• Good news
– Performance is out
of your control!
• Bad news
– Performance is out
of your control!
• Provider
• Expectations are
high!
– Response times
• Performance is still
king!
– Competitors
– Repeat use.
52. Real User Monitoring
• Consumer
• It’s your new best
friend
• Get to know your SLA
– Its your new best friend
• Simple rules
– Be the first to know
– Get your money back
• Provider
• It’s your new best
friend
• You will live & die by
your SLA’s
• Simple rules
– Be the first to know
– Tell your customers.
57. Service-Level-Agreements
• Critical element for both provider and
consumer
• Don’t waste time on detailed numerical
service level agreements
• SLAs need to be based on end-user
experience.
58. Service-Level-Agreements
1. Establish system availability
2. Establish system response time
3. Establish error resolution time
4. Establish a fail over window for disaster
recovery
5. Ensure that you can get your data back.
59. Service-Level-Agreements
• IaaS
– The O/S is your responsibility
• Managed Cloud Platforms are available
• PaaS
– SLA’s stop at the O/S
• Your application still remains your responsibility
• SaaS
– Know your SLA inside out. Its your responsibility.
60. Disaster Recovery
• It’s hard in the cloud
• DR strategies are still emerging
• Bandwidth & network capacity limits
• Security is still a concern.
61. Disaster Recovery
• There isn’t a single blueprint
• Identify critical resources and recovery
methods
• Architect for redundancy
• Back up to/from and restore to/from the cloud
• Most cloud SLA’s > 99.5% availability
– 4 hours, 39 minutes downtime per month.
The Cloud is the perfect (the natural) environment for distributed applications & the idea of service orientation Amazon played a key role in the development of cloud computing by modernising their data centers, which were using as little as 10% of their capacity at any one time, just to leave room for occasional spikes.
Centralised logging and reportingOver 60% are still using log files (Source:CloudFoundry)http://www.virtualizationpractice.com/rackspace-buys-cloudkick-implications-for-iaas-performance-management-8697/