The idea of moving key parts of your business into the cloud is scary. You lose control, visibility, and a neck to wring when something goes wrong. What does it take to make the leap? More important than the ROI and the many other benefits is the ability to trust the cloud. This presentation focuses n why that is, how to build that trust, what to look for when moving to the cloud.
4. Rate the challenges/issues ascribed to the cloud
Security 87.5%
Performance 83.3%
Availability 82.9%
Worried on-demand will cost more 81%
Lack of interoperability standards 80.2%
Bringing back in-house may be difficult 79.8%
Hard to integrate with in-house IT 76.8%
Not enough major suppliers 76%
0% 33% 67% 100%
Source: IDC Enterprise Panel, 3Q09
5. Rate the challenges/issues ascribed to the cloud
Lack of Security 87.5%
Trust Performance 83.3%
Availability 82.9%
Worried on-demand will cost more 81%
Lack of interoperability standards 80.2%
Bringing back in-house may be difficult 79.8%
Hard to integrate with in-house IT 76.8%
Not enough major suppliers 76%
0% 33% 67% 100%
Source: IDC Enterprise Panel, 3Q09
20. Rate the benefits ascribed to the cloud
Pay only for what you use 77.9%
Easy/fast to deploy 77.7%
Monthly payments 75.3%
Encourages standard systems 68.5%
Requires less in-house IT staff, costs 67%
Always offers latest functionality 64.6%
Sharing systems with partners simpler 63.9%
Seems like the way of the future 54%
0% 33% 67% 100%
Source: IDC Enterprise Panel, 3Q09
21. Rate the benefits ascribed to the cloud
Pay only for what you use 77.9%
Easy/fast to deploy 77.7%
Cost
Monthly payments 75.3%
Encourages standard systems 68.5%
Requires less in-house IT staff, costs 67%
Always offers latest functionality 64.6%
Sharing systems with partners simpler 63.9%
Seems like the way of the future 54%
0% 33% 67% 100%
Source: IDC Enterprise Panel, 3Q09
22. Rate the benefits ascribed to the cloud
Pay only for what you use 77.9%
Easy/fast to deploy 77.7%
Cost
Monthly payments 75.3% Agility
Encourages standard systems 68.5%
Requires less in-house IT staff, costs 67%
Always offers latest functionality 64.6%
Sharing systems with partners simpler 63.9%
Seems like the way of the future 54%
0% 33% 67% 100%
Source: IDC Enterprise Panel, 3Q09
23. Rate the benefits ascribed to the cloud
Pay only for what you use 77.9%
Easy/fast to deploy 77.7%
Cost
Monthly payments 75.3% Agility
Encourages standard systems 68.5%
Requires less in-house IT staff, costs 67%
Architecture
Always offers latest functionality 64.6%
Sharing systems with partners simpler 63.9%
Seems like the way of the future 54%
0% 33% 67% 100%
Source: IDC Enterprise Panel, 3Q09
24. Rate the benefits ascribed to the cloud
Pay only for what you use 77.9%
Easy/fast to deploy 77.7%
Cost
Monthly payments 75.3% Agility
Encourages standard systems 68.5%
Requires less in-house IT staff, costs 67%
Architecture
Always offers latest functionality 64.6%
Sharing systems with partners simpler 63.9%
Seems like the way of the future 54% Core
0% 33% Competencies
67% 100%
Source: IDC Enterprise Panel, 3Q09
26. Key data center objectives for 2010
Cost is
#1
Source: Symantec State of the Datecenter 2010
27. Scaling with and without the Cloud
Non-Cloud Cloud Load
Jan Feb March Apr May June July Aug Sep Oct Nov Dec
28. Scaling with and without the Cloud
Non-Cloud Cloud Load
60 servers
45 servers
30 servers
15 servers
0 servers
Jan Feb March Apr May June July Aug Sep Oct Nov Dec
29. Scaling with and without the Cloud
Non-Cloud Cloud Load
60 servers
Wasted capacity
45 servers
30 servers
15 servers
0 servers
Jan Feb March Apr May June July Aug Sep Oct Nov Dec
30. Scaling with and without the Cloud
Non-Cloud Cloud Load
60 servers
Wasted capacity
45 servers
30 servers
Under-capacity
15 servers
0 servers
Jan Feb March Apr May June July Aug Sep Oct Nov Dec
31. Scaling with and without the Cloud
Non-Cloud Cloud Load
60 servers
45 servers
30 servers
15 servers
0 servers
Jan Feb March Apr May June July Aug Sep Oct Nov Dec
55. “Enabling customers to ensure the
confidentiality, integrity, and
availability of their data is of the
utmost importance to AWS, as
is maintaining trust and confidence.”
-- Amazon Web Services
Hi, my name is Lenny and I’m going to be talking about trust.
Specifically, trust in the cloud. I believe that trust is biggest barrier to companies adopting cloud computing.
This is the results of a survey done by an analyst firm called IDC last year asking companies to rate their biggest concerns when considering using cloud computing in their business. It includes about eight different issues ranging from security to cost to integration.
It turns out that the three biggest concerns are all based on a lack of trust. Specifically, they don’t trust that the cloud providers will be secure enough, perform fast enough, or be reliable enough.
Why is this? Why are companies worried? Why is there so little trust?
Because cloud computing is scary. It’s a paradigm shift in how IT functions. You move your data and your applications outside your firewall, and that’s not something most companies have ever thought about doing. Why?
http://sethgodin.typepad.com/seths_blog/2010/03/the-wordperfect-axiom.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+typepad/sethsmainblog+(Seth's+Blog)
Because you give up control over some of your companies most important assets. Especially if you are an online services or e-commerce company, where downtime and response time are critical, it’s hard to give up that control.
You also lose visibility into what’s going on with your infrastructure. You can’t walk up to your servers and look at the blinking lights or check if the cable is plugged in. You can’t listen to the machines for problems or kick it when it’s acting up. You need to rely on the cloud provider for these things, and again that takes getting used to.
One of the most difficult things to get used to is not having someone to yell at when something breaks. You have to count on your cloud provider to respond to issues, and getting mad at your own IT people won’t help. Not like yelling at IT has ever helped anything.
Lastly, you need to trust your data won’t disappear, and that the provider will be stable. Problems that the cloud provider has end up being your problems, especially in the eyes of your customers. They don’t care that it isn’t really your fault, all they know is that they can’t use your service and they will blame you.
The big question is whether this fear is rational, and whether the this whole cloud idea is worth it. Is the cloud worth it?
In spite of all of these concerns, for most companies in the end using the cloud, at least for some elements of their architecture, makes too much sense.
No question people are excited about the cloud. This graph shows the number of searches on google for the term “cloud computing”. Three years ago it was barely a blit on the radar, and today we have conferences like this all around the world.
Many of the largest companies in the world, from Google to Microsoft to Amazon, are building cloud platforms for the world to use.
Some of the most successful companies in the world are building their companies on the cloud
Note: Not sure how many of these brands will be recognizable in China.
When looking at the IT industry as a whole, you see the same pattern. This graph shows that today companies are spending about 5.7% of their IT budget on cloud computing, but in 5 years...
...they are projected to be spending over 13%, which is more then double.
This is another projection done by Gartner showing the total money spent on cloud computing, which is estimated to increase from about $56 billion today to over $150 billion in three years. Now, the question is...why is everyone so interested?
This is another survey done by IDC, asking companies why they are moving to the cloud, and what benefits they see in cloud computing. We can divide the benefits into four main categories.
Cost
Agility
Architecture
And being able to focus on your core competencies
Let’s start with cost. No question the biggest motivator right now for companies has been cost. With cloud computing you pay as you go, you never have to pay for hardware until you actually need it, and you can shut down unused servers anytime.
This is a survey that Symantec did recently, asking IT departments what their biggest objectives for 2010 were. Reducing cost was the #1 goal, and that number went up from last year, which isn’t a big surprise, and that means they’ll be looking at their options around the cloud.
One of the pain advantages of using cloud computing is the ability to scale on demand. Let’s walk through a quick example of how that changes your strategy around scaling and buying servers. Say this orange line represents the amount of load your getting over the next year.
Today, you buy a bunch of servers, set them up, and put them in production on a certain schedule. You need to plan the number of servers, estimate the load you expect to get, and try to make sure you have enough servers, which you buy the servers weeks in advance.
In reality, you end up wasting a lot of money on excess capacity, that ends up not being used
and you end up not being able to predict how your traffic will really look, which ends up hurting your users or bringing your site down
With cloud computing, which is represented by the green line, you only launch your virtual server when you need them, and you can scale down when necessary. Notice how closely your cloud infrastructure matches the actual load demands throughout the scaling process. This is what cloud computing allows you to do.
The second main benefit is agility, because companies that use the cloud are able to move faster, introduce new services quicker, and respond to change more effectively. You have
instant access to resources, anytime you need them, without a commitment. You can try things out, experiment with different ideas, and shift resources without spending a lot of time building and managing servers.
Say that Britney Spears says something about your company on twitter and your traffic jumps like this graph. Or say you had a huge sales month and your customer base doubled. With the cloud you’re able to scale up to keep the visitors happy and the revenue flowing, and then back scale back down when the traffic goes away. Your operations department doesn’t need to scramble to find more servers, and then have them sit idle for the rest of the year.
Just as important is how the cloud helps companies develop products more efficiently. In a modern software company where development is done in an agile way, you try not to plan too far ahead, and deliver short iterations of product on a regular interval. With a traditional infrastructure, developers are often stuck waiting for approval of servers or getting them configured, and they end up holding on to these servers in case they ever need them again in the future. IT people waste time dealing with these short notice problems, and waste their time on non-customer facing work. If you give your developers access to a cloud, they can try things out more quickly, test their stuff more easily, and get products out the door faster.
The third biggest benefit I see to cloud computing is the way it almost forces you to architect your system well.
Since the cloud providers generally give you access to a large number of limited power boxes, each of which generally loses all of it’s data when rebooted, you end up building your system to be able to scale by simply launching more servers, where each server bootstraps itself is it comes online. This type of architecture automatically gives you a lot of redundancy, and the ability to scale further without having to rewrite your entire system. You also end up with a very loosely coupled system, where unrelated pieces are kept separate.
Part of the reason for that is that the cloud providers offer a lot of specialized components that you can build your system around. For example, beyond just servers and storage, Amazon offers a CDN service, a highly scalable database, load balancing, and queuing systems. You don’t have to spend time building these yourself, which gives you more time to focus on your actual business.
You also get pretty much for free the global footprint of the cloud provider, bringing your application closer to your users which reduces latency. And the larger cloud providers get, the more distribution your apps will automatically get. Which bring us to the last main benefit...
...the fact that using the cloud allows your company to have more time to focus on the things you’re best at, the things that set you apart from your competition. Your company shouldn’t have to be an expert at managing hardware, or spend money on building out your datacenters. Your IT people can be a lot more efficient if they don’t have to worry about power supply’s and cooling, and instead can focus on keeping the software running efficiently, making sure that performance is solid, and catching problems before they explode.
The reality of it is:
- Cloud providers love servers a lot more then you do
- They are better at security than you
- They are better at high availability then you
- and they take downtime just as seriously as you do, maybe even more-so
Their entire business is based on providing a good service, which includes being as reliable as possible, and also as cheap as possible. Google is famous for building their own computers which use tiny amounts of power and run as efficiently as possible.
Companies like Amazon, Microsoft, and Google are building datacenters near power plants to get the cheapest possible power and energy. Unless your company is in the business of hosting, you’re never going to be able to compete with these platforms, and you shouldn’t try to. Let them do the hard work and take advantage of it.
Now, lets get back to the trust issue. How do you get the benefit of all of these great things that cloud can do for you, while avoiding the concerns we talked about earlier.
The way I see it the key is to begin with some basic trust in the cloud, but then do everything you can to confirm that trust.
You will need to start with some basic faith in your cloud provider, which is not that different from the faith you put in your existing datacenter. You aren’t worried that they’ll randomly turn the power off, or start looking through your data, or all of a sudden disappear. You can pretty much trust those companies because you’ve had good experiences with them in the past, and they’ve been around long enough to build a reputation. This same thing will happen with cloud providers over time. However you don’t want to trust blindly, and luckily there is a tremendous amount that you can do to verify that trust, and to keep both the provider and your own staff accountable.
I like to group the strategies into four layers of trust. If you implement these four strategies effectively, you’ll end up feeling a lot more confident using the cloud and end up being a lot more successful in your cloud projects
The first layer I call Educate
Followed by Monitor
Process,
and finally Failover. Let me dive into each of these and explain what I mean by them
The first step to any project around the cloud is to educate both yourself and your company.
Most of you here are probably the cloud evangelist at your companies, and that means you need to know as much as possible about both the cloud concepts, and the specific cloud providers you are evaluating. There’s a lot of information out there, and it could be a bit overwhelming, so let’s focus on the most important elements.
You’ll want to make sure to read the terms of service for each cloud provider you’re looking, to understand how they addresses things like their internal security, encryption, audits, certifications, and so on.
Each cloud provider has different policies and guarantees, and there is a lot of details you’ll want to specifically look for. Things like what certifications they’ve attained, privacy policies, physical security access, and how the system architecture protects you from other customers. Unfortunately I don’t have time to dig into the specifics of each cloud, and these things change, so I highly recommend you visit your favorite cloud providers site and read through these policies.
The bottom line is that these providers know that you have a lot of concerns about security and availability, and so they are doing everything they can to build that trust. This is a quote from the Amazon web services security page showing us how much emphasis they put on security and trust. In the end, it’s up to you to understand their policies, and decide what is acceptable to you and your company.
You’ll also want to see what others are saying about the cloud providers. There’s no better way to find out about the realities then talking to someone that’s been in the trenches. That’s what’s great about conferences like this, where you can talk to people have done this and come out alive.
You’ll also want to do some health checks on the provider, just like you would with any other key vendor, by looking at their revenue, customers, and past history, to make sure they aren’t going out of business anytime soon.
Maybe the most important element of the terms of service of any cloud provider you’ll want to make sure to truly understand the service level agreements that they provide. This is the part that guarantees a certain uptime and performance, and holds the provider accountable to you. SLA’s are one of the biggest reasons that companies are still weary about the cloud, and this is slowly improving.
Each provider has a different approach with SLA’s and the differences are really important to understand. This is a simple comparison between the SLAs of Rackspace Cloud Servers and Amazon EC2. Notice the differences, such as the uptime guarantee and the maximum number of credits you can get for the downtime. One potentially huge difference which isn’t show here is the definition of “downtime”, which isn’t as obvious as you would think. Some providers only count downtime when their service returns an error, while others consider any sort of connectivity issues downtime. (ed: give more specifics on the latter, maybe screenshots)
Something that surprises a lot of people is that in most cases it is your responsibility as the customer to notify the provider of the downtime and request your credits, which brings us to the second layer...
Monitor. This is the layer where you as the customer hold the provider accountable, and catch issues before they turn into big problems. This involves monitoring your web site and your infrastructure for uptime and performance.
The most important kind of monitoring you’ll want to do is external performance monitoring, which simulates real end users coming to your site and running through your most important transactions with a real browser, testing things like the login process in or searching or doing a checkout. This kind of monitoring is key because it validates the entire process from the perspective of the end user, and notifies you of any customer facing issues. You also don’t have to install anything on your servers, because this type of monitoring happens from the outside, so it’s easy to setup and maintain.
You’ll want to set up monitoring of your most important transactions, such as your checkout or signup process, and test that transaction all day every day from as many locations as possible using your monitoring service.
You’ll also want to setup monitoring that tests each specific component of your infrastructure, such as your cloud servers, your database, your CDN, and your merchant accounts. This way when there’s a problem with your site, you can quickly tell which part of your system is the root cause start working on that. This is especially important now that you have more moving parts in your system.
This is the Neustar Webmetrics monitoring dashboard, which gives you a big picture view into your system health, including uptimes for the past week, month, and year, and the current status and performance of each of your key transactions. Each of these monitoring services monitors a specific transaction of your site simulating a real user and notifies you if anything looks broken, while also gathering performance data that you can run reports on.
The monitoring can happens from over 100 locations around the world, as often as every minute, giving you a global perspective on your site’s performance and the health of your infrastructure.
Using the monitoring data you can see how your site performs over a period of time, and track the types of issues that your users experience. With these reports, you can go to your cloud providers and either get credit for downtime, or simply ask for help on improving your application performance. You can also share this data with any of your partners or coworkers to collaborate on whatever issues you have.
You can also drill into the performance of an individual page on your site, which is especially helpful when you have reason to believe that the cloud provider is the source of your performance problems.
The other side of monitoring involves monitoring the low level resources on your virtual servers. A lot of cloud providers include tools to monitor this type of thing, such as CPU and memory and network usage, so you should definitely take advantage of that.
This will help you watch for performance bottlenecks and catch problems before customers experience them.
This is a screenshot of Amazon CloudWatch measuring CPU utilization and disk access and network usage on your virtual servers. Other cloud providers have similar tools, or you can use the same tools you’re already using with your existing servers, such as Nagios, on the cloud servers.
Once you have your monitoring in place, it’s not enough to simply be gathering data. You’ll need to have a process in place to deal with downtime events and customer issues and communication, which brings us to the next layer...
Process. This is the human element of the equation, which includes things like
training,
and escalation to the right people in your organization
and documentation on how to manage and troubleshoot your infrastructure now that it includes cloud servers.
It’s really not that different from how you manage your systems today, except that you less direct access to the physical hardware, which means you need to rely on the cloud provider for help.
The first thing you’ll want to do is train your IT people on the cloud. They need to feel comfortable with the concept, and how it fits into the architecture. There are a lot of good books and articles online that you can use to get your people up to speed. There are also a number of discussion forums and mailing lists that you can participate online and ask questions as they come up.
Most importantly your IT people will need to know how to get help when they need it.
The type of help you can get from the cloud provider ranges from total self services to fanatical, and so you need to understand your level of support, and how to take advantage of it. The last thing you want your IT people to be doing during a disaster is scrambling for contact information or being wasting time calling Rackspace when you don’t have that level of support.
One of your biggest friends will be the public health dashboard that hopefully your cloud provider offers, which shows you the real-time status of every service that they offer. This is a screenshot of Amazon’s public health dashboard, which shows that everything is functioning normally. If there was a problem right now, you’d see a yellow or red status light, and an explanation of what is going on. As an on-call person, if you have any reason to believe that the cloud is behaving badly, all you need to do is visit this page and see if they confirm it or not.
Part of your training will probably involve changes to your on-call process, since you’re changing some fundamental parts of your architecture. I don’t think this should be overly complicated, especially if you’ve done a good job with the documentation and training, and could be as simple as...
...training the operations or sysadmin’s what issues should be directed to the cloud provider, and how they can go about contacting them. This way you don’t need to get engineering involved, which makes for happy engineers.
Before we move on from process, I wanted to also mention the importance of automation. One of the larger benefits of being able to control your infrastructure through code is that it’s a lot easier to automate work that is done over and over. Things like code releases, and scaling servers up and down, and patch updates. Everyone has always wanted to automate as much as possible, but with the cloud and virtualized hardware, it’s a lot easier to actually do it. The goal should be to automate everything, and let your IT people focus on higher level things. But as with anything new, you always have to ready for the unexpected, which brings us to the final layer...
Failover. This final layer is where you prepare for issues with your cloud, or your applications, or any of the other third parties that you’re system relies on. No matter how reliable your cloud provider is, one day things will fail, and all you can do is be prepared for those days.
The foundation for any failover strategy is to have reliable backups of your data and your applications. You basically should never assume that the cloud provider is perfect and won’t lose your data. Especially if that data is critical to your business. You can back it up locally behind your firewall, or to another cloud, or to some other third party service.
The other important part of backups is running
drill. You need to verify backups by restoring them and running drills on them regularly. Just like with your existing system, you don’t want to be testing your backups the day you desperately need them. It’s really no different from what you’re probably doing today.
Now, lets talk about what happens when something goes wrong. There are three main ways to handle downtime and failover.
The first is where you fail over within the same cloud. This happens when one of your virtual instances has a problem and needs to be taken down, or some data is corrupted and you refresh your system. This should be pretty quick and painless, and built into your automation system. You simply boot up a new virtual server, and shut down the old.
The second approach is to fail over to another cloud provider, if your main cloud provider is having system wide issues. What’s nice about this strategy is that you don’t have to have any hardware sitting around waiting for an event, and ideally you can run the exact same architecture in the backup cloud. This isn’t at all easy, and still a new concept, and it assumes that if one cloud is having issues the chances of another also having issues is low, which may not always be true.
But if you can find two clouds that are not related, and make sure your application works well in both, you’ll have a really solid failover strategy. This is a quick diagram of what it would look like to have two clouds running the same application, either at the same time, or when a failover happens and the one of these blocks goes down. You simply route traffic to the other and keep humming along.
The third and last approach to handling a failover is to fail over to physical hardware that you own, inside your firewall, that you can manage directly. This approach is easy to understand, and easy to buy off on because it feels the safest. A lot of times companies start with this by simply replicating their architecture in the cloud, while keeping the existing system running locally, and so they automatically have this local backup ready to go. The problem obviously is that you have to keep unused hardware sitting around in your datacenter, which kills a lot of the cost benefits of moving in the cloud.
My guess is that he best approach is a combination of all three, where you store your data and a minimal infrastructure in-house, and plan to failover into a different cloud for the majority of the work. This is a pretty new concept, and one of the questions you’ll need to answer is how to actually accomplish the failover.
We’ve found that one of the easiest and most robust ways to handle failing over across clouds or to your own infrastructure is to use DNS.
One of the products that we offer at Neustar is called Site Backer, which redirects traffic to different location as soon as it detects a problem with the primary site. You can use this to point users to your secondary cloud, or your local backup system, as soon as the primary cloud goes down. Here’s a quick sample. Say your site is mysite.com and your visitor is trying to connect to it. The users machine queries your DNS server...
and gets back an IP that points to your primary cloud provider, say rackspace.
The user would then hit your primary cloud. Then say that cloud goes down...
The user would then get back a different IP address, and this time be given the IP to your backup cloud service, say Amazon. Then if that goes down again,
you can have a third IP address that points to your local system. All of this is invisible to your user, or at least as invisible as it can be with a whole datacenter going down, and would happen automatically. You can use this same service for load balancing traffic during normal operations, or if you want to run multiple clouds at the same time.
So that’s the four layers of trust. Educate, monitor, process, and failover.
Realistically, there are always going to be things you don’t want to put in the cloud, no matter how much you trust it, but the way I’ve heard it described is that we’ll from asking “What can we put in the cloud?” to...
what can’t we put in the cloud. And the more things we put in the cloud, the more important that trust is going to be.
Especially now that the cloud is becoming such an surprisingly important part of our everyday lives, consumers are going to expect things to just work. It’s up to use to make that happen, and to ...
-- The cloud is playing a larger role in consumers lives
--- devices tied to the cloud (iphone, ipad)
--- saas (google apps, gmail, facebook, twitter)
--- services (yelp, siri, google maps)
-- Reference
--- http://news.cnet.com/8301-19413_3-10133487-240.html
--- http://lifehacker.com/400268/do-you-trust-the-cloud
make help us all love the cloud
Now before I wrap up, two things that neustar can help you with is external performance monitoring, from over 100 locations around the world (including a handful in China), and industry leading manage DNS with points of presence all over the world.
Thank you, and if you have any questions here’s my contact information.