1 Good Cloud/Bad CloudQ: How many cloud experts does it take to change a lightbulb?A: Define "lightbulb."If youre like me, perhaps youve felt a little anxiety when serious business conversations ensue about the cloud. Threatened even. Bipolar? 0
2 "Stuck in the middle with you" (clowns to the left, jokers to the right)Vahe Torossian is corporate vice president of worldwide small and midmarket solutions and partners for Microsoft...careens between two positions: on one hand, taking a hard line on security, yet on the other hand, questioning how it meshes with people’s inevitable and unavoidable habits.“People say you have better technology at home than at work. That’s true, 37% of U.S. info workers are solving customer and business problems using technology they master first at home, then bring to work.”"It has become a work‐the‐way‐you‐want‐to world. Right?""Wrong, That’s not to say it isn’t happening — but it can’t be that simple for the companies that must provide such flexibility securely and safely. At least not yet — and IT pros are stuck in the middle, balancing the interests and expectations of the organization and its employees.“Most of the time, consumers don’t realize the challenges behind privacy,” he said, citing theft, security, privacy, compliance and intellectual property protection as business risks. 1
3 Security Issues slideIntegration & Lock in ‐ getting them all to work with the same data. (The cloud makes it effortless to sign up for online sales automation, email management, invoicing and expenses management. But it’s a different story if you want to have new subscribers to your email newsletter flow automatically into the sales prospect database, or to have invoices and expenses automatically update your bookkeeping package.)Lock‐in ‐ Platform as a service providers to date can’t talk to each other. These platforms are proprietary. (You can just swap Azure to Force.com. And guess what you probably never will. You’re betting on a platform, a language and that company’s ability to keep running.)Cost‐ There are many enterprises who dipped a toe into Private cloud fleeing en masse. (SaavisPres)Infrastructure as a service can add up with usage—especially if you add other services like storage, load balancing, monitoring, content delivery and other items. You still have operational costs. You still have to manage, secure, backup and recover cloud deployments.Availability ‐ The recent Amazon Web Services outage along with other cloud mishaps of late is raising questions about as a service maturity, lock‐in and customer recourse.
The 9 lives of Netflix“Our Architecture avoids using EBS” 3
5 What if our information...became widely public and widely distributed?...were manipulated by an outsider?...failed to provide expected results?...were unexpectedly changed? ...were unavailable for a period of time?...could not satisfy regulatory/compliance requirements? 4
Whenever mission critical applications are concerned, how "secure" cloud providers claim to be matters a great deal less than the claw back service level agreements (SLA) they provide, or whether auditors can adequately evaluate their offerings against regulatory compliance criteria.AWS outage. for everyone involved (not least Amazon’s own operations staff) it’s been a very long 4 days. What are the lessons to learn?1. Read your cloud provider’s SLA very carefully ‐ Amazingly, the four‐day outage did not breach Amazon’s EC2 SLA. which as a FAQ explains, “guarantees 99.95% availability of the service within a Region over a trailing 365 period.” Since it has been the EBS (elastic block storage) and RDS (relational dbase) services rather than EC2 itself that has failed (and all the failures have been restricted to Availability Zones within a single Region), the SLA has not been breached, legally speaking. That’s no consolation for those affected of course, nor is it any excuse for the disruption they’ve suffered. But it certainly gives pause for thought.2. Don’t take your provider’s assurances for granted ‐ Many of the affected customers were paying extra to host their instances in more than one Availability Zone (AZ). Amazon recommends this course of action to ensure resilience against failure. (Each AZ, according to Amazon’s FAQ, “runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. Common points of failures like generators and cooling equipment are not shared across Availability Zones. Additionally, they are physically separate, such that even extremely uncommon disasters such as fires, tornados or flooding would only affect a single Availability Zone.” )Unfortunately, this turned out to be a technical specification rather than a contractual guarantee. It will take Amazon quite some effort to repair the reputational damage this event has brought upon it. Justin Santa Barbara, founder and CEO of FathomDB was forthright in his blog post on Why the sky is falling: “AWS broke their promises on the failure scenarios for Availability Zones … The sites that are down were correctly designing to the ‘contract’; the problem is that AWS didn’t follow their own specifications. Whether that happened through incompetence or dishonesty or something a lot more forgivable entirely, we simply don’t know at this point.” While it’s easy to be wise after the event, Amazon’s vulnerability to this type of failure may have been visible on a deep‐enough due diligence exercise. As Amazon competitor Joyent’s Chief Scientist Jason Hoffman notes on the company’s blog, “This is not a ’speed bump’ or a ‘cloud failure’ or ‘growing pains’, this is a foreseeable consequence of fundamental architectural decisions made by Amazon.”3. Most customers will still forgive Amazon its failings ‐ However badly they’ve been affected, providers have sung Amazon’s praises in recognition of how much it’s helped them run a powerful infrastructure at lower cost and effort. Many prefaced criticisms with gratitude for what Amazon had made possible, such as BigDoor’s CEO Keith Smith: “AWS has allowed us to scale a complex system quickly, and extremely cost effectively. At any given point in time, we have 12 database servers, 45 app servers, six static servers and six analytics servers up and running. Our systems auto‐scale when traffic or processing requirements spike, and auto‐shrink when not needed in order to conserve dollars.”4. There are many ways you can supplement a cloud provider’s resilience As O’Reilly’s George Reese points out, “if your systems failed in the Amazon cloud this week, it wasn’t Amazon’s fault. You either deemed an outage of this nature an acceptable risk or you failed to design for Amazon’s cloud computing model.” It’s useful to review the techniques customers have used to minimize their exposure to failures at Amazon.(Twilio, for example, didn’t go down. Although the company hasn’t explained exactly what its exposure was to the affected North Virginia Availability Zones, it has described its architectural design principles in a first entry on its new engineering blog by co‐founder and CTO Evan Cooke. These include decomposing resources into independent pools, building in support for quick timeouts and retries, and having idempotent interfaces that allow multiple retries of failed requests. Of course all this is easier said than done if all your experience is in designing tightly‐coupled enterprise application stacks that assume a resilient local area network. Cooke’s post goes on to describe some of the characteristics that make Twilio’s architecture capable of operating in this more fault tolerant manner. To start with, “Separate business logic into small stateless services that can be organized in simple homogeneous pools.” Another step is to partition the reading and writing of data: “if there is a large pool of data that is written infrequently, separate the reads and writes to that data … For example, by writing to a database master and reading from database slaves, you can scale up the number of read slaves to improve availability and performance.” Another site that didn’t go down is NetFlix, which runs all its infrastructure in the Amazon cloud. 5. Building in extra resilience comes at a cost (Bob Warfield describes how a previous company used Amazon.com infrastructure in a way that allowed it to “bring back the service in another region if the one we were in totally failed within 20 minutes and with no more than 5 minutes of data loss.” As he goes on to say, the choices you make about the length of outage you’re prepared to support have consequences for the cost your customers or enterprise must fund. “Smart users and PaaS vendors will look into packaging several options because you should be backed up to S3 regardless, so what you’re basically arguing about and paying extra for is how ‘warm’ the alternate site is and how much has to be spun up from scratch via S3.”)6. Understanding the trade‐offs helps you frame what to ask ‐ There are questions you should be asking to satisfy yourself that a cloud service you rely on is not exposing you to a similar failure (or at least that, if it is, you understand this and are willing to bear the consequences in return for a cheaper cost). Referring to NetFlix’s practice of randomly killing resources and services in order to test its resilience, Bob Warfield adds this advice:“That’s likely another good question to ask your PaaS and Cloud vendors — “Do you take down production infrastructure to test your failover?” Of course you’d like to see that and not just take their word for it too.”7. Lack of transparency may be Amazon’s ‘Achilles heel’ ‐ Several affected customers have complained of the lack of useful information forthcoming from Amazon during the outage. BigDoor CEO Keith Smith wrote, “If Amazon had been more forthcoming with what they are experiencing, we would have been able to restore our systems sooner.” GoodData’s Roman Stanek called on Amazon to tear down its wall of secrecy: “Our dev‐ops people can’t read from the tea‐leaves how to organize our systems for performance, scalability and most importantly disaster recovery. The difference between ‘reasonable’ SLAs and ‘five‐9s’ is the difference between improvisation and the complete alignment of our respective operational processes … There should not be communication walls between IaaS, PaaS, SaaS and customer layers of the cloud infrastructure.”Amazon’s challenge in the coming weeks is to show that it is prepared to give its customers the information it needs to build in that resilience reliably. If it does not meet that need and allows others to do better, it may gradually start losing its dominant position today in IaaS provision. 5
7 Its the business model behind the applicationUltimately, understanding the business model behind the application is how you know whether cloud economics are a fit. While business buyers may provision certain cloud applications themselves, it still takes developers and IT administrators to activate infrastructure and platform cloud services. Much more importantly, it takes a CIO who can help bridge the understanding between IT and the business to make this a reality.(How many companies are tying applications to the business models that underpin them?) Yikes.
8 In a nutshell, the cloud no brainer use cases: Some DevelopmentShort‐term and cyclical applicationsApps that fluctuate and have variable use cases Companies that can throttle resources ****Enterprises or business units that can use the cloud to leverage new businesses and save on capital spending
9 Good CloudDriver 1for those prepared to innovate, there’s the power of the cloud to increase competitiveness and realize new opportunities.Driver 2 there’s the threat posed by the cloud (and thos fast nimble cheap startups) to those established businesses that are unable to innovate fast enough (eg consumerization trends. BUSINESS CASE Something well come back to)There is ample evidence that forward‐looking enterprises are thinking carefully about these threats and opportunities. They’re changing the way they organise IT and its relationship with the business. Intuit generates more than half its $4+ billion revenues from connected services. SaaS.Ginny Lee, CIO, explains that to support this growth in on‐demand capabilities, she “had to turn the IT organization from a service provider into a change agent … I had to change the mindsets of people within IT to make sure they know that their mission is to enable growth and a great customer experience.”Financial services giant Fidelity, which is using the cloud to provide employee portals to its clients that combine customer HR data and benefit plans with relevant information about 401k investment planning. Xerox is cloud‐enabling its high‐volume printing systems to serve its customers better and open up opportunities to provide turnkey marketing services to smaller companies. Postage meterage provider Pitney Bowes faces falling spending on stamps and so is building a secure mailbox in the cloud.