Hi everyone. My name is Lisa and welcome to my presentation. The topic that I will be talking to you about today is Cloud computing, in specific, the availability issues and controls.
I will start off by discussing with you the availability issue associated with cloud outages, with detailed reference to Amazon’s cloud services, as well as the general reactions from the consumers. After that, I will be talking to you about other availability issues including data lock-in and vendor shut-down. Finally, I will be talking to you about the potential role CA’s can play to mitigate the risks associated with cloud unavailability.
To start us off, let us first take a look at the famous Amazon cloud services provider. Amazon offers two main types of cloud services, namely Elastic Cloud 2 and Amazon Web Services. By consensus, Amazon had 5 instances of cloud outages throughout 2008-2009. However, from 2010 up until now, Amazon only had one outage which took place on April 21, 2011. Per Amazon, this was an issue of “stuck data volume”. In essence, the consumers cannot access data stored in Amazon’s cloud server. This outage had lasted for 2 days.
To examine the root cause of Amazon’s cloud outages, let us take a close look at Amazon’s cloud infrastructure. Amazon’s cloud infrastructure was built upon the concept of redundancy. Basically, Amazon’s data warehouses are located in 5 Regions including East Coast & West Coast of United States, Ireland, Tokyo, and Singapore. In each one of the Regions, Amazon also have various Availability zones built in different locations. Amazon explained that by launching instances in separate Availability Zones, the consumers’ applications would be protected from the failure of a single location as backups can be retrieved from different locations. This is known as redundancy. However, Amazon’s availability zones are structured in a way that redundancies are built within the same region. This is because inter-regional redundancies could cause latency and is often more expensive.
Theoretically, there is a flaw to Amazon’s architecture of redundancy. In the event of a natural disaster, like the Japanese Earthquake that took place not long ago, it would no longer make sense to build redundancies within the same Region since the backup locations would be destroyed as well. But realistically speaking, it did not take an act of god to cause the cloud outage that has taken place in April 2011.
In an effort to minimize the negative effects of Amazon’s cloud outage, Quora has decided to solve the matter in their own hands. For the April 21st 2011 outage, Quora took the initiative and brought up a new database from the most recent back-up that Quora had performed at the company level on April 19th 2011. However, not every Amazon clients did the same. Reddit, for instance, simply posted a note on top of its website to inform the users of the inaccessible data caused by Amazon’s cloud outage.
In reality, cloud outages occur quite often at other big-name cloud vendors as well. According to the statistics shown on this slide, you can see that Google and Microsoft’s cloud is not performing any better in terms of availability compared to the Amazon cloud.
It’s interesting, however, that the many instances of cloud outages did not scare companies away. According to an IDC report, the expenditure for cloud-related technologies will grow into 45 billion by 2013. In addition, Harris interactive conducted a survey for IT executives which 43% of them expressed that they expect to increase the usage of cloud. Why, you may ask, do people continue to resort to cloud in light of all the detrimental cloud outages?
One explanation could be the data lock-in effect of employing cloud services. This term is used to describe the inability to switch cloud vendors or move data back to one’s own data warehouse due to the high conversion costs. For instance, SalesForce.com has a proprietary programming language called Apex that only runs on SalesForce’s platform. Although consumers can retrieve the data that they legitimately own, they will lose all the formatting and data will become unmanageable. Most often than not, consumers would choose to remain with the same cloud vendor since retrieving data back from the cloud will put them at a risk of incompliance with industry standards.
When EMC shut down its Atmos Online cloud storage services only a year after its launch, consumers started to worry about the going concern of the cloud vendors’ businesses. Although consumers usually hire cloud vendors with big corporate structures and good reputation, the case of EMC proves that even big-name cloud vendors can easily shut down its own cloud services, putting the consumer’s data, infrastructure, or platform at risk.
Over the recent years, there has been heated debate over whether or not CA’s should have a share in cloud computing by providing assurance services over cloud. While I would personally be against auditing the cloud due to the questionable auditability of cloud, I believe that CA’s can still have a share of the cloud by providing consultancy services over the preventative measures vendors and consumers can take to mitigate the risks associated with cloud unavailability.
This list contains some of the controls vendors should implement in an effort to sustain the availability of their cloud services. Some of the important ones include, ensuring that redundancies are built in geographically diversified locations to fight against natural disasters. Also, vendors should use reliable equipment that has already been tested for stress-level. Furthermore, vendors should establish and regularly monitor their own Service Health Dashboard in order to promptly receive and address cloud issues that consumers may have reported. Lastly, vendors should regularly monitor the health of their own virtual servers. There is an alternative other than doing this internally, there are also 3rd party service providers such as Amazon’s CloudWatch and Nimsoft’s Cloud Monitor that can be employed by the cloud vendor to monitor the health of their virtual servers.
This slide shows a list of things consumers can do to manage issues of cloud availability. It is to note however, the degree of control depends on the mission criticalness of cloud to the consumer’s core business processes. For instance, Tweeter as a consumer may not feel the urgent need to resolve unavailability issues as people may not have a big problem with not having access to their tweets for a couple of days. On the contrary, Netflix may view unavailability as a detrimental issue since its success depends on the continuous supply of video streaming.At minimum, the consumers should monitor various medias to keep themselves informed of the reported cloud outages. It is also important for consumers to fully comprehend the vendor’s Services Level Agreement since some vendors like Google Apps would provide consumers with service credits for failing to comply with their promised up time. If cloud services are critical to the success of a consumer’s business, the consumer should set up business continuity plans such as performing off-cloud backup or employing a second cloud services provider. Lastly, the consumers can self-monitor the vendor’s virtual servers by hiring 3rd party consultants such as uptime for this type of services.
It appears that the significant cost reductions associated with cloud is highly lucrative to the Chief Information Officers of many organizations since many of them are either already using cloud, or contemplating to employ cloud, despite of all the availability concerns surrounding the cloud. While cloud computing is still a fairly new topic, keep in mind that there are controls to mitigate the risks of cloud unavailability. Since many CIO’s do not understand how that can be done, this in turn provides us CA’s with the opportunity to step into the picture and save the day!
That concludes my presentation, I hope you enjoyed it!
Cloud Computing - Availability Issues and Controls
Cloud Computing – Availability Issues and Controls<br />By: Lisa Cheng<br />
Agenda<br />Cloud Outages<br />Amazon<br />Impact of unavailability on consumers<br />Data Lock-in<br />Vendor Shut-down<br />Linkage to the CA profession<br />Preventative measures<br />
Amazon Cloud Outages<br />Two main types of services<br />EC2 and Amazon Web Services<br />From 2008-2009<br />Had 5 instances of cloud outages<br />From 2010-2011<br />Had 1 instance of cloud outage<br />April 21, 2011 outage<br />“Stuck data volume” <br />Lasted 2 days<br />Data were inaccessible, although websites could still function<br />
Amazon Cloud Structure Critique<br />“Act of God”<br />Japanese Earthquake<br />Cannot make back-ups within the same Region<br />Recent April 21, 2011 outage<br />All availability zones within the same region failed simultaneously<br />Amazon’s competency in building redundancies is questionable<br />
Amazon Clients<br />Quora<br />Brought up new database from the latest back-up at the company’s level<br />Synchronization issue<br />Reddit<br />Did nothing<br />
Cloud outages elsewhere<br />Google:<br />12 outages from 2008-2009<br />6 outages from 2010 – 2011 (now)<br />Microsoft:<br />4 outages from 2008-2009<br />6 outages from 2010-2011 (now)<br />Others<br />Playstation Network<br />Intuit<br />Twitter<br />
Impact of unavailability on consumers<br />Companies are still resorting to cloud services to cut down costs<br />IDC report shown:<br />17 billion spent on cloud-related technologies in 2009<br />By 2013, it’ll grow to 45 billion<br />Harris Interactive:<br />43% of IT executives are expected to increase the usage of cloud<br />Why?<br />
Data Lock-in<br />Vendors with proprietary technologies<br />High conversion costs<br />SalesForce.com<br />Proprietary programming language named Apex<br />Microsoft Azure & Amazon Web Services<br />Data are the only portable items<br />To consumers:<br />Risk of paying high prices for poor services<br />No compatible technology to retrieve data from cloud<br />Risk of incompliance with standards<br />
Vendor Shut-down<br />Atmos Online Cloud Storage<br />Shut down its business after one month of operation<br />Offered multiple migration options<br />Potential impact on consumers<br />Worried about reliance on 3rd party service providers<br />
Linkage to the CA profession<br />Heated debate over auditing cloud<br />Advantage: opportunities<br />Disadvantage: auditability of cloud<br />Provide consultancy services over preventative measures vendors/consumers can take to mitigate risks of unavailability<br />
Preventative measures: Vendors<br />Geographically diversified architecture<br />Reliable internet connection<br />Reliable and redundant hardware/software<br />Effective business continuity plans<br />Make web console and API available to consumers<br />Establish Service Health Dashboard<br />Regularly monitor CCID – cloud outages database<br />Regularly monitor the health of virtual servers<br />
Preventative measures: consumers<br />Monitor vendors’ service health dashboard<br />Monitor CCID<br />Monitor customer mailing list of recent changes<br />Monitor RSS feed hosted by the vendor<br />Understand/negotiate Services Level Agreement<br />Set-up business continuityplans <br />Hire a second cloud services providers<br />Off-cloud backup<br />Periodical updates to reflect expansion<br />Self-monitor vendors’ virtual servers <br />