Corsis provides insight into the April 2011 service outage on Amazon EC2 cloud hosting service in this September 2011 article in Government Technology Magazine, "Securing Data in the Cloud" by Brian Heaton.
Home News Topics Jobs Digital Communities Education Video Events Webinars Papers & Books Grants Magazines Advertise Search Securing Data in the CloudNews Topics Daily Govtech News In Your InboxE-Government August 30, 2011 By Brian Tweet 38Emerging and Sustainable Heaton Enter your email Recommend 2 VIEW SAMPLETechnology When part of Amazon’s ElasticHealth and Community Compute Cloud (EC2) crashed Subscribe to Government TechnologyServices on April 21, governmentIT Policy/Mgmt/Enterprise agencies in the midst of Subscribe | View Digital IssueTech moving to the cloud received a grim reminder of the need to secure critical databases and files. Although cloudJustice and Public Safety technology is new to many, experts say the concepts behind disaster recovery andProducts prevention remain essentially unchanged.Transportation and From having multiple backup contingencies in place to making sure cloud provider serviceInfrastructure agreements are clear on system redundancies, the same due diligence performed in pre-Wireless/Mobile/Broadband cloud times is required to ensure that data stays accessible in the event of a crash. Like 2,165 people like this. Be the first ofView All News Topics... your friends. Though many technology professionals want to take advantage of virtualized cloud computing, Terry Weipert, a partner with technology consulting and outsourcing companyGT Network of Sites Accenture, said stronger planning models should be in place first. “You still have to do all Follow @govtechnews 5,639 followersDigital Communities the backup and archiving that is needed, just like if you were managing your ownEmergency Management hardware or data center,” she explained. “If you are requiring Amazon or another providerPublic CIO to [do] that, you [must] have governance and policy in place to know that they have followed those procedures.” VideosPhotos [ Sponsored White Paper: D Block Spectrum Act and the FirstNet Broadband Network. ]Newsletters Thomas Shelford, president of IT services firm Corsis, agrees. He said companies thatIndustry Perspectives experienced disruption when Amazon’s EC2 service went down may have been under the false impression that Amazon was too big to fail. “It’s really a cultural issue where a lot ofCase Studies companies feel they don’t need to have a system administrator in-house because theWhite Papers cloud [provider] takes care of redundancies,” he said.Contributed Solutions However, that’s a dangerous misconception. “The cloud provides capacity on demand,How to Guides but not architecture on demand,” Shelford said. “The types of backups you did in the old days still apply.” While setting policy and enforcing agreements may seem simple enough, the message clearly wasn’t received by all, given the number of private- and public-sector users that MOST VIEWED MOST COMMENTED went down during Amazon’s crash. Commercial websites, such as the popular location- tagging mobile platform Foursquare, the online knowledge database Quora and social This Section | Whole Site news website Reddit were all temporarily offline. 5 Ways Public Servants Can Stay on Track The U.S. Department of Energy’s OpenEI website was another outage casualty. The site, Virginia Turns to the Cloud for its Hosted E-Mail which promotes collaboration in clean energy research, was out of commission for almost Archive two days. Special Report: How Strategic Sourcing Helps Local Government Spend Smarter Online movie rental giant Netflix and ShareFile, a file storage and transmittal firm, are Study: Surveillance Cams Worth the Money, Don’t both Amazon customers that got through the outage relatively unscathed. They did it by Always Reduce Urban Crime following the old adage of “don’t put all your eggs in one basket.” Both companies had Whats the Capital of Kansas? Google detailed plans in place to handle outages and designed system architectures assuming that failures would ultimately occur, making the situation easier to manage. Sidebar: Six Ways to Manage the Risk of Cloud Crashes 1. Incorporate failover for all points in the system. Every server image should be deployable in multiple regions and data centers, so the system can keep running even if there are outages in more than one region. 2. Develop the right architecture for your software. Architectural nuances can make a huge difference to a system’s failover response. A carefully created system will keep the database in sync with a copy of the database elsewhere, allowing for a seamless failover. 3. Carefully negotiate service-level agreements. SLAs should provide reasonable
compensation for the business losses you may suffer from an outage. Simply receivingprorated credit for your hosting costs during downtime won’t compensate for the costs ofa large system failure.4. Design, implement and test a disaster recovery strategy. One component of such aplan is the ability to draw on resources like failover instances, at a secondary provider.Provisions for data recovery and backup servers are also essential. Run simulations andperiodic testing to ensure your plans will work.5. In coding your software, plan for worst-case scenarios. In every part of your code,assume that the resources it needs to work might become unavailable, and that any partof the environment could go haywire. Simulate potential problems in your code, so thatthe software will respond correctly to cloud outages.6. Keep your risks in perspective, and plan accordingly. In cases where even a briefdowntime would incur massive costs or impair vital government services, multipleredundancies and split-second failover can be worth the investment, but it can be quitecostly to eliminate the risk of a brief failure.Spread It OutShareFile has a farm of server instances spread out across Amazon’s East Coast andWest Coast data centers. Although the company hosts its operational databases at a co-located data center near its headquarters in North Carolina, all of its clients’ files are inthe cloud.To protect those files — and ensure that client uploads and downloads are done withoutinterruption — ShareFile created a proprietary “heartbeat” system that pings each of theservers in the cloud to verify that they’re online and responding to the requests. It’s atechnology that’s been around for decades. While the system gives ShareFile moreinformation than a simple “yes, I’m here” response, that’s all it really boils down to. If theresponse is less than satisfactory, or there isn’t a response at all, the company drops thatserver.Amazon has a variety of backup options now, but ShareFile CEO Jesse Lipson said thatwhen his company agreed to be an Amazon cloud beta customer years ago, there wasn’ta backup system in place, so the company developed its own.“The good thing about it is that the system is pretty fault tolerant,” Lipson said. “If a servergoes offline for any reason, it’s likely to disrupt only a small number of customersbecause we’re heartbeating the servers every minute. It’ll be dropped out, and even if, bychance, a customer caught it in that minute, all they’d have to do is try again, and theupload and download will work.”ShareFile also saves every file that’s uploaded or downloaded by customers into adisaster recovery data center outside the cloud. Though Lipson admitted that the practiceis duplicative and expensive, it’s an extra layer of security that adds to ShareFile’s — andits customers’ — peace of mind. “We didn’t have to use it during the EC2 crash,” Lipsonsaid, “but the long-term idea is that we could recover files from completely outside ofAmazon.”Netflix’s story is similar to ShareFile’s. When Netflix moved to the cloud, its staff foresawthe likelihood of such a cloud crash and designed its system around the possibility. In itsTech Blog, Netflix representatives said the company’s IT architecture avoids using ElasticBlock Store — which provides Amazon cloud users with persistent storage — as its mainstorage service. Instead, Netflix uses a mix of storage services and a distributedmanagement system to ensure redundancy.While staff at Netflix admitted in a blog post that there was a bit of internal scrambling tomanually reroute customer traffic, the company is looking at automating much more ofthe process.Don’t Be IntimidatedDespite the Amazon crash, experts were universal in their opinion that the cloud is stillthe way to go in the future. Weipert said that while the process of backing up data is“definitely more involved,” the learning curve can be somewhat overcome by keeping theprocess simple.“Just like your current environment, you still have the same issues of trying todynamically manage an event,” Weipert said of a potential cloud crash. “You don’t losedata in the cloud. There is a way to trace in the cloud computing environment, but youreally have to have a plan and be able to do things dynamically.”Shelford agreed and stressed that the Amazon crash and others should be treated aslessons learned.“Cloud computing offers significant cost-savings opportunities for government institutionsthat should be taken advantage of,” Shelford said. “The lesson here is thatcountermeasures were relatively easy to implement. The moral of the story is, you’ve gotto stick with traditional best practices.”
You may use or reference this story with attribution and a link tohttp://www.govtech.com/policy-management/Securing-Data-in-the-Cloud.html | MoreComments oldest first Phil Cox | Commented September 6, 2011 As I read this article, I am once again reminded that "good security is good security". As the article implicitly and explicitly stated: You should be doing the things you have always done (or should have been doing ;). The cloud is not a technology panacea, and tried and true things like design and DR need to be considered. One other point, that the author made, and I feel is often understated is the need for those doing the design and DR to understand the technology, so you are not "forklifting traditional apps to the cloud" (I believe an Adrian Cockcroft of Netflix analogy). That is one of the things that RightScale stress to our customers: architect with understanding. it is imperative if you want to get it done right with no surprises down the road. REPLY TO THIS THREAD Add Your Comment Name * Email Comment * You are solely responsible for the content of your comments. We reserve the right to remove comments that are considered profane, vulgar, obscene, factually inaccurate, off-topic, or considered a personal attack. Submit Comment Related To This StoryMassachusetts Senate Special: Symbolically Significant, Practically Irrelevant(GOVERNING: All)HI-Governor: Duke Aionas Linda Lingle Liability (GOVERNING: All)In Newark, the Revenge of Sharpe James? (GOVERNING: All)Billionaire Has Choice Words for Obama (Moneynews) [?] Latest From IT Policy/Mgmt/Enterprise TechHow Massachusetts Is Verifying Insurance BuyersCan Digital Recordings Save Money for Courts?Special Report: How Strategic Sourcing Helps Local Government Spend SmarterVirginia Turns to the Cloud for its Hosted E-Mail Archive5 Ways Public Servants Can Stay on Track GovTech Papers and Case Studies View Library