Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Calculating Downtime Costs: How Much Should You Spend on DR?

2,177

Published on

Published in: Technology, Business
1 Comment
5 Likes
Statistics
Notes
  • Hello

    Nice to meet you, my name is GLORY, I saw your profile and I picked interest to contact you. I've something very important which I would love to share with you therefore, I would appreciate if you respond back to me through this E-mail{ glorysonko@yahoo.com }, & I'll write you back with my full details with more picture's. I am wait for your response thanks, Truly yours GLORY.




    glorysonko@yahoo.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,177
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
34
Comments
1
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Good afternoon and welcome. My name is Paul Croteau, I’m currently in my 9th year at Rackspace, the first 7 of those were spent as an Enterprise Solution Engineer…these days I work as a member of our product team helping to create technical content for our customers and the market as a whole. At the end of this session you will hopefully have a better understanding of how to asses your cost of downtime, you will have more clarity on the role a hosting provider plays in protecting your business, and will see where EMC fits into the equation.A QUICK SHOW OF HANDS: Raise your hand if you are here representing a small to medium business sized company. <WAIT> OK. My content today is very high level, I’m not going to dive into specific EMC storage management apps or hardware configuration settings. My goal today is to get you thinking about how to properly assess your cost of downtime, give you some tools or scenarios to help set expectations, and then your homework will be to take what you’ve learned and use that information to frame your DR solution architecture discussions back at the office. Let’s get started.
  • Here’s our agenda.
  • We know that data center outages and unplanned downtime are inevitable. IT downtime is like traffic. It’s not a matter of if it will happen, but when. There have been numerous public examples; we see it all the time in the news. Netflix suffered a very public and painful multi-hour outage last Christmas Eve. A variety of Amazon cloud outages have hit some very large and very popular web and social media properties. And thanks to social media, we can learn about and track the status of these outages and their recovery (or lack thereof) in real-time.InformationWeek published a study showing that IT downtime costs us $26.5 Billion in lost revenue every year. In another study by to Dunn and Bradstreet, 59% of Fortune 500 companies experience a minimum of 1.6 hours of downtime PER WEEK. That works out to more than 6 hours per month per company. LETS LOOK AT THE NUMBERS MORE CLOSELY.Source:Ronni J. Colville and George SpaffordConfiguration Management for Virtual and Cloud InfrastructuresAssessing The Financial Impact Of DowntimeBy Alan Arnold, in Analysis April 20, 2010http://www.businesscomputingworld.co.uk/assessing-the-financial-impact-of-downtime/IT Downtime Costs $26.5 Billion In Lost RevenueBy Chandler Harris InformationWeekMay 24, 2011 10:21 AM http://www.informationweek.com/storage/disaster-recovery/it-downtime-costs-265-billion-in-lost-re/229625441Network WorldHow much will you spend on application downtime this year?Aug. 2, 2009http://www.networkworld.com/newsletters/nsm/2009/080309nsm1.htmlOnline Banking Upgrade Contributed to Bank of America OutageOct. 2011Bank of America Corp., whose website has been down sporadically since last Friday, says the problem stems from technical hiccups, not a hack attack.BusinessNavitaire booking glitch earns Virgin $20m in compohttp://www.theaustralian.com.au/business/navitaire-booking-glitch-earns-virgin-20m-in-compo/story-e6frg8zx-1226033624246IT Downtime Costs $26.5 Billion In Lost RevenueBy Chandler Harris InformationWeekMay 24, 2011 10:21 AM http://www.informationweek.com/storage/disaster-recovery/it-downtime-costs-265-billion-in-lost-re/229625441------------------------------Target’s Online President Departs Following Website CrashTarget generates about 2% (or 1.35B) of its $67.4 billion revenue online.The day the revamped site went live, links such as “learn all about what’s new” didn’t work. On Sept. 13, the online store crashed when demand for products from the Italian fashion house Missoni exceeded expectations.Source: Bloomberg/Businessweek, 10/13/2011http://www.businessweek.com/news/2011-10-13/target-s-online-president-departs-following-website-crash.html------------------------------Navitaire booking glitch earns Virgin $20m in compo by: Teresa OoiApril 05, 201112:00AMRESERVATIONS management company Navitaire is understood to have compensated Virgin Blue for up to $20 million for a customer service meltdown that resulted in 130 cancelled flights and delays for more than 60,000 passengers in September.Source: The Australianhttp://www.theaustralian.com.au/business/navitaire-booking-glitch-earns-virgin-20m-in-compo/story-e6frg8zx-1226033624246
  • Here is some 2012 financial data for the Fortune 500. Total Revenue was almost $12 T, total profit almost $1 T. <NEXT> Here are the Averages and Means. (ELABORATE) These are large numbers, so let’s break them down into smaller more digestible chunks. <NEXT> These are the HOURLY downtime averages and medians for the F500. (ELABORATE) If we go back to that BusinessWeek study I mentioned on the previous slide, 6 hours of downtime per month at the median costs more than $7M in revenue loss or more than $450k in lost profit per month. LETS LOOK AT SOME MORE NUMBERS. <NEXT>Dividing by 8760 hours in a year$1.2M/hr x (59% x 500 = 295) = $354M/wk x 52 = $18.4B/yr
  • Here is some 2012 financial data for the Fortune 500. Total Revenue was almost $12 T, total profit almost $1 T. <NEXT> Here are the Averages and Means. (ELABORATE) These are large numbers, so let’s break them down into smaller more digestible chunks. <NEXT> These are the HOURLY downtime averages and medians for the F500. (ELABORATE) If we go back to that BusinessWeek study I mentioned on the previous slide, 6 hours of downtime per month at the median costs more than $7M in revenue loss or more than $450k in lost profit per month. LETS LOOK AT SOME MORE NUMBERS. <NEXT>Dividing by 8760 hours in a year$1.2M/hr x (59% x 500 = 295) = $354M/wk x 52 = $18.4B/yr
  • <NEXT> According to InformationWeek, the average cost of unplanned downtime is about $5,600 per minute. Let’s take a look at a couple of outage scenarios, and the average cost associated with each. <NEXT> For a partial DC outage, the average downtime is about an hour and costs approximately $260,000. <NEXT> For a total DC outage, the average recovery time is over two hours and costs on average, $680,000. For larger companies, or companies with an ecommerce business model, that number could easily go much higher. What would that cost look like if you were down for a few days? Here’s a sobering factoid, <NEXT> “93% of companies that lost their data for 10 days or more filed for bankruptcy within one year of the disaster, and 50% filed for bankruptcy immediately.” What’s the main takeaway here? Your company’s survival depends on quantifying the impact of downtime. SO, DEPLOYING MORE TECHNOLOGY IS THE FIX FOR THIS? NOT NECESSARILY. <NEXT>(Source: National Archives & Records Administration in Washington.)According to InformationWeek, the average cost of unplanned downtime is about $5,600 per minute. Let’s take a look at a couple of outage scenarios, and the average cost associated with each. For a partial DC outage, the average downtime is about an hour and costs approximately 260 thousand dollars. For larger companies or companies with an ecommerce business model, that number could easily go much higher. For a total DC outage, the average recovery time is over two hours and costs on average, 680 thousand dollars. What would that cost look like if you were down for a few days? Here’s a sobering factoid: “93% of companies that lost their data for 10 days or more filed for bankruptcy within one year of the disaster, and 50% filed for bankruptcy immediately.”Your company’s survival depends on quantifying the impact of downtime.According to InformationWeek, the average cost of unplanned downtime is about $5,600 per minute. Let’s take a look at a few outage scenarios, and the average cost associated with each. For a partial DC outage, the average downtime is about an hour and costs approximately 260 thousand dollars. For larger companies or companies with an ecommerce business model, that number could easily go north of 6 figures per hour. For a total DC outage, the average recovery time is over two hours and costs on average, 680 thousand dollars. What would that cost look like if you were down for a few days? Here’s a sobering factoid: “93% of companies that lost their data for 10 days or more filed for bankruptcy within one year of the disaster, and 50% filed for bankruptcy immediately.”Your company’s survival depends on quantifying the impact of downtime.
  • We also know that humans make mistakes, and Gartner predicts over the next several years the vast majority of outages impacting mission critical services will be caused by people and process issues, and more than half of these will be caused by change/configuration/release integration and hand off issues.You can deploy wonderfully reilient technology and still suffer downtime. What I want to do today is help you understand how much your inevitable downtime might cost your company so that you can determine how much to spend based on your business needs, financial needs, and your tolerance for risk. At the end of the day, the cost of any DR solution is analogous to an insurance policy. We may not like paying for it, but when you need it you are very happy you did so. <NEXT>Source:Ronni J. Colville and George SpaffordConfiguration Management for Virtual and Cloud InfrastructuresAssessing The Financial Impact Of DowntimeBy Alan Arnold, in Analysis April 20, 2010http://www.businesscomputingworld.co.uk/assessing-the-financial-impact-of-downtime/IT Downtime Costs $26.5 Billion In Lost RevenueBy Chandler HarrisInformationWeekMay 24, 2011 10:21 AM http://www.informationweek.com/storage/disaster-recovery/it-downtime-costs-265-billion-in-lost-re/229625441Network WorldHow much will you spend on application downtime this year?Aug. 2, 2009http://www.networkworld.com/newsletters/nsm/2009/080309nsm1.htmlOnline Banking Upgrade Contributed to Bank of America OutageOct. 2011Bank of America Corp., whose website has been down sporadically since last Friday, says the problem stems from technical hiccups, not a hack attack.BusinessNavitaire booking glitch earns Virgin $20m in compohttp://www.theaustralian.com.au/business/navitaire-booking-glitch-earns-virgin-20m-in-compo/story-e6frg8zx-1226033624246IT Downtime Costs $26.5 Billion In Lost RevenueBy Chandler HarrisInformationWeekMay 24, 2011 10:21 AM http://www.informationweek.com/storage/disaster-recovery/it-downtime-costs-265-billion-in-lost-re/229625441------------------------------Target’s Online President Departs Following Website CrashTarget generates about 2% (or 1.35B) of its $67.4 billion revenue online.The day the revamped site went live, links such as “learn all about what’s new” didn’t work. On Sept. 13, the online store crashed when demand for products from the Italian fashion house Missoni exceeded expectations.Source: Bloomberg/Businessweek, 10/13/2011http://www.businessweek.com/news/2011-10-13/target-s-online-president-departs-following-website-crash.html------------------------------Navitaire booking glitch earns Virgin $20m in compo by: Teresa OoiApril 05, 201112:00AMRESERVATIONS management company Navitaire is understood to have compensated Virgin Blue for up to $20 million for a customer service meltdown that resulted in 130 cancelled flights and delays for more than 60,000 passengers in September.Source: The Australianhttp://www.theaustralian.com.au/business/navitaire-booking-glitch-earns-virgin-20m-in-compo/story-e6frg8zx-1226033624246------------------------------
  • OK, four slides of data, is anyone nervous yet? ;-) I’m building a case to help you understand the potential impact of downtime in real terms. We all understand that downtime is a problem, but the data I’m presenting here should move us get past the emotional aspects of the topic and help quantify the impact of downtime so that you can go to your technical and financial stakeholders with a compelling case to take action to protect your business and your customers.In my many years as an engineer/architect I’ve talked to thousands of customers of all sizes. One of the things that has remained constant in these interactions is the fact that so many of the people I’ve talked to had been so busy running their businesses that they never pulled the trigger on a DR plan. And sadly, some of those conversations I had took place after disasters took place and customers were trying to save what they could after the fact. TIME FOR SOME MORE NUMBERS. <NEXT>
  • Here is some data from a 2011 Symantec SMB Disaster Preparedness Survey, when I first read this I was surprised. <NEXT> 41% of SMBs never thought about putting together a DR plan. <NEXT> Less than half backup weekly, less than ¼ backup daily. Granted. this may not be a surprise to some of you, as you know that depending on the amount of data your backing up, restore times can take many hours or even days. <NEXT> Backups are NOT enough. Your data may be protected but this approach doesn’t address downtime well. For some perspective, It’s difficult to say how long a restore will take b/c it depends on what kind of data your restoring, but on average on a 1Gbps network, restore time will take 60GB/hr.. <NEXT>Disaster recovery is a holistic strategy comprised of people, process, policies and technology. It’s focus is to restore IT systems critical to business function. In other words, it helps keep the business running after a major disruption. Remember a disaster could be mother nature’s wrath or a guy named Bob who installed a patch that broke a critical application. A DR Plan helps to keep the lights on and the company open for business. SO LETS TALK ABOUT RECOVERY PERSPECTIVE. <NEXT>
  • You don’t get to choose the type of disaster that hits your business. <NEXT> You might suffer a localized failure like a single device, or a cabinet or two of damage due to a burst pipe or fire or electrical surge. <NEXT> Or, you might face a facility wide failure like a flood, cooling failure (melting servers), or massive explosion (bomb, plane, volcano). So, depending on the scale of the event, your IT team or outsourcing provider faces the task of replacing thousands of devices, or maybe even tens of thousands. ELABORATE on channel issues, manpower issues, MRR stack ranking, etc.AND, if you don’t have your data in more than a single data center, I don’t want to see you out there on Twitter griping about downtime. You can RAID this and cluster that, but when downtime hits, a single data center is still a single point of failure.  Alright. We’ve seen how much downtime can cost, and we’ve seen that downtime is unpredictable. So WHAT THIS ALL BOILS DOWN TO: <NEXT>
  • If you don’t know your actual cost of downtime, you are wasting time talking about or designing a DR solution. And you may be wasting money(maybe lots of money) if you spend too much on DR. Let’s look at some specific business scenarios with real dollars tied to them to help gain even more perspective. <NEXT>
  • Here we have a company with annual revenue of $15M. Let’s assume this is an ecommerce site with limited retail space, where the vast majority of revenue comes from online sales. Since their market is mainly in the US, most of the committed transactions take place during business/daylight hours. So, assuming 12 hours of shopping every day, and 365 days per year, that gives us 4,380 peak shopping hours and a cost of downtime of just over $3000. (that’s 15M * 90% ,then divided by the total number of hours (4380), A 12 hour outage would mean more than $36k in lost revenue.AND, don’t forget to include the damage to your brand name, or lost future transactions for customers than went to a different vendor not just for this purchase but future purchases. LET’S LOOK AT ANOTHER SCENARIO. <NEXT>.
  • Here’s some math for a single online event. This could be a weekend charity fund raiser, or an annual pledge drive. Assuming a goal of $1M, every our of this 72 hour event should generate an average of more than $13k. And with an event this short, you better have a quickly scalable solution, something that lets you move fast in more than one data center location. <NEXT>
  • Here’s a different view on a single event, this time from the perspective of sales lost instead of pure revenue generation on the previous slide. Assume a four day online event, perhaps something over a holiday weekend. Lots of advertising dollars spent, print, television, online, etc. In this scenario we are using numbers that any good retail business should have readily available: things like historical web traffic stats, conversion rate percentages from click-through traffic, etc. Here we expect to see half a million visitors hitting our web property. We know in the past that we’ve had a great conversion rate of 6%. If the average price of our goods is $500 (maybe one of those fancy purses, or a wildly popular electronic device), we expect to generate $15M in sales over this four day period. And the math works out to more than $150k of downtime per hour. <NEXT>
  • OK, last one. This one looks at downtime from a productivity basis. Instead of focusing on sales or e-commerce, let’s assume we are talking about an outsourced back office application. (financlal, CRM, email, etc.). Take your annual revenue and divide it by the number of employees you have. This gives us an average revenue per hour per employee. Now multply the number of hours of downtime by the percentage or number of employees affected by the outage and you get $120k/hr in this example. Now, these examples have been very general, you can poke all sorts of holes or throw exceptions out there. These aren’t meant to be specific examples, they are meant to show averages, and more importantly, to show different ways of thinking about this topic. Now, wouldn’t it be great if you had a worksheet or app that you could play with to enter your own numbers and see how much downtime might cost your business? <NEXT>
  • This little web-based tool is available right now for you to try out. It’s a simple calculator with three of the business scenarios we just walked through. This link gets you to our DR Planning page, the link to the calc is down the page just a bit. Give it a try, see how things look, and feel free to use it to help get your point across to decision makers back home. <NEXT>
  • I’ve spent a lot of time talking about financial numbers, now let’s look at things from a process perspective. We all agree that downtime needs to be avoided and that it can get expensive very fast. Therefore, businesses need to determine how fast they want to get back online after an outage, and how far back in time they need to go to recover data and resume normal operations. <NEXT>
  • Here’s a common DR timeline of a generic business. Every week this company performs full data backups. <NEXT> Then an outage hits. Since this company isn’t using something really cool like VMware for virtualization with Site Recovery Manager, they have to rely on recovering data from backup tapes. Maybe the tapes are still on site and were not damaged. Or, perhaps the tapes are off-site. <NEXT> The company has a documented goal of resuming online operations within 60 hours of an outage. Plenty of time to get your tapes delivered and re-load al of your data. <NEXT> Unless your tape machine was destroyed by the disaster making your tapes useless until you replace that hardware. So, <NEXT> this company missed its desired RTO by 18 hours. <NEXT>
  • Here’s a common DR timeline of a generic business. Every week this company performs full data backups. <NEXT> Then an outage hits. Since this company isn’t using something really cool like VMware for virtualization with Site Recovery Manager, they have to rely on recovering data from backup tapes. Maybe the tapes are still on site and were not damaged. Or, perhaps the tapes are off-site. <NEXT> The company has a documented goal of resuming online operations within 60 hours of an outage. Plenty of time to get your tapes delivered and re-load al of your data. <NEXT> Unless your tape machine was destroyed by the disaster making your tapes useless until you replace that hardware. So, <NEXT> this company missed its desired RTO by 18 hours. <NEXT>
  • Here’s a common DR timeline of a generic business. Every week this company performs full data backups. <NEXT> Then an outage hits. Since this company isn’t using something really cool like VMware for virtualization with Site Recovery Manager, they have to rely on recovering data from backup tapes. Maybe the tapes are still on site and were not damaged. Or, perhaps the tapes are off-site. <NEXT> The company has a documented goal of resuming online operations within 60 hours of an outage. Plenty of time to get your tapes delivered and re-load al of your data. <NEXT> Unless your tape machine was destroyed by the disaster making your tapes useless until you replace that hardware. So, <NEXT> this company missed its desired RTO by 18 hours. <NEXT>
  • Here’s a common DR timeline of a generic business. Every week this company performs full data backups. <NEXT> Then an outage hits. Since this company isn’t using something really cool like VMware for virtualization with Site Recovery Manager, they have to rely on recovering data from backup tapes. Maybe the tapes are still on site and were not damaged. Or, perhaps the tapes are off-site. <NEXT> The company has a documented goal of resuming online operations within 60 hours of an outage. Plenty of time to get your tapes delivered and re-load al of your data. <NEXT> Unless your tape machine was destroyed by the disaster making your tapes useless until you replace that hardware. So, <NEXT> this company missed its desired RTO by 18 hours. <NEXT>
  • Here’s a common DR timeline of a generic business. Every week this company performs full data backups. <NEXT> Then an outage hits. Since this company isn’t using something really cool like VMware for virtualization with Site Recovery Manager, they have to rely on recovering data from backup tapes. Maybe the tapes are still on site and were not damaged. Or, perhaps the tapes are off-site. <NEXT> The company has a documented goal of resuming online operations within 60 hours of an outage. Plenty of time to get your tapes delivered and re-load al of your data. <NEXT> Unless your tape machine was destroyed by the disaster making your tapes useless until you replace that hardware. So, <NEXT> this company missed its desired RTO by 18 hours. <NEXT>
  • So, when measuring how far back you need to go to get useful data, and how fast you want to resume business operations after a disaster, different technologies get you there in different amounts of time. <NEXT> Tape is at one end of the spectrum, on the outer limits of this timeline, while things like replication and clustering are closer to the center. <NEXT> And as you might expect, the faster you want to recover, the more money you will need to spend. And, the longer it takes you to recover, the deeper the potential financial impact. < NEXT>
  • So, when measuring how far back you need to go to get useful data, and how fast you want to resume business operations after a disaster, different technologies get you there in different amounts of time. <NEXT> Tape is at one end of the spectrum, on the outer limits of this timeline, while things like replication and clustering are closer to the center. <NEXT> And as you might expect, the faster you want to recover, the more money you will need to spend. And, the longer it takes you to recover, the deeper the potential financial impact. < NEXT>
  • So, when measuring how far back you need to go to get useful data, and how fast you want to resume business operations after a disaster, different technologies get you there in different amounts of time. <NEXT> Tape is at one end of the spectrum, on the outer limits of this timeline, while things like replication and clustering are closer to the center. <NEXT> And as you might expect, the faster you want to recover, the more money you will need to spend. And, the longer it takes you to recover, the deeper the potential financial impact. < NEXT>
  • Here’s a graphic to help set cost expectations around certain technologies. Pricing ranges from hot to cold; the tier rankings are just a way to group things. The RTO and RPO numbers here are telling, because while DR recovery scenarios are unique, you generally find that each Objective falls into one of three timeframes: <ELABORATE ON RTO/RPO ROWS>. Array-based at storage layer / vSphere/VM replication = at Hype layerArray = replicate physical servers / vSphere cannot . Array configured per LUN or volume / Host = configured per VMArray = storage eng / VM = sys admin
  • OK, we’ve covered lots of numbers so far. Let’s take a look at some architecture drawings. <NEXT>
  • Walk the audience through the basics of redundancy throughout the typical hosting tiers, while pointing out EMC products in the mix. BUT, this focuses on a single data center, which is still a single point of failure.
  • This slide expands the config to a second data center, points out a smaller DR config, plus the EMC technologies in play.
  • Earlier I mentioned how we don’t get to choose our disasters. You might suffer a localized failure like a cabinet or two of damage due to a burst pipe or fire or electrical surge. Or, you might face a facility wide failure like a flood, cooling failure (melting servers), or massive explosion (bomb, plane, volcano). Now your provider faces replacing thousands of devices, or even tens of thousands, Channel issues, manpower issues, MRR stack ranking, etc. Don’t complain about downtime if you are running out of a single data center.
  • This slide expands the config to a second data center, points out a smaller DR config, plus the EMC technologies in play.
  • This slide expands the config to a second data center, points out a smaller DR config, plus the EMC technologies in play.
  • This slide expands the config to a second data center, points out a smaller DR config, plus the EMC technologies in play.
  • Showing how you can deploy Dev/Staging in a second DC and then use it as your DR site. Get more bank for your hosting dollar.
  • Showing how you can deploy Dev/Staging in a second DC and then use it as your DR site. Get more bank for your hosting dollar.
  • Showing how you can deploy Dev/Staging in a second DC and then use it as your DR site. Get more bank for your hosting dollar.
  • If you need to depend on your DR location longer than expected, you can make it more robust.
  • Here are the areas where Rackspace can lend a helping hand, and the areas that the customer must own. The top level is the holistic DR strategy. This is owned by the customer. Remember when we defined disaster recovery? It’s encompasses more than just the technology, but also the policies, people, and process. The customer is responsible for creating the DR plan, training the appropriate employees, and creating the failover run book, testing the failover often, making the “go-time” decision to failover after a disruption occurs, and then deciding to failback once the primary DC comes online. Rackspace is responsible for failing over to the Target DC once the authorization has been given by the customer. Rackspace also monitors the VM Replication virtual appliance, and alerts the customer when a replication fails to complete. As part of the Managed Virtualization service, Rackspace also manages the VM, guest OS, and hypervisor layer. In addition to the software layers, dedicated hardware, network and the DC is also covered. Failover is the customer’s responsibility but we assist and are on-call during the process.
  • ×