My customers are internal Amazon employees. Our internal IT apps are probably a lot like yours… Financial Systems: Accounting , Shared Services, Financial Planning & Analysis, Tax HR Systems: Recruiting, on-boarding, training /development, payroll Developer Tools: Service lifecycle mgmt, shared libraries, source control, build & deploy, change mgmt, issue tracking Knowledge Management: Intranet, search, communities, blogs, wiki, collaboration Employee Tools: Laptops, phones, email, calendar, remote access In total, we have about 200+ applications . All of these systems process and store data that we classify as ‘private’ (and I’ll speak to our data classification policy in just a bit)
What’s motivating us to move to AWS? Clearly – it’s about reducing our Total Cost of Ownership Dealing with computing hardware infrastructure isn’t the core competency of our IT shop…hardware vendor relationships, negotiations, purchasing, shipping, receiving, racking, cabling, powering, cooling, securing, etc. Handing the “muck” of hardware provisioning over to a trusted provider sounds good I would prefer to have the ‘easy’ button – provision hardware with the push of a button That all sounds good – and you’ve been hearing that all day. But we found ourselves asking … Is this really just about cost reduction? What else do we get?....And what’s really motivating us to move? [JENN NOTES: At this point, I acknowledged that most of the customers in the room were already using AWS and many of them had already heard the ‘marketing’ speak about AWS – so I wanted to push through to the next slides]
We want to unleash innovation We all know that hiring great engineers is arguably the most important thing you can do. So once you’ve done that…what’s the best thing you can do for them? Empower them to build… Free them to innovate.. We have quickly learned that infrastructure on-demand is a powerful catalyst. When you remove barriers…and make it easy to build… Engineers are more motivated…they’re more inspired With infrastructure on-demand and the freedom to try…they “just do it” Not only do they just do it….they talk about it…and show it to others…then others get excited If it doesn’t work…tear it down – total cost could be less than a pizza
What else do we get? We want to reduce hardware administration overhead Enterprise IT is a lot more than software innovation…we all know this. Operations plays a big part. Dial tone like availability is expected…Operational efficiency is paramount. Heck it’s complex.. At Amazon, we run on leased hardware. We’re swapping out hardware continuously as leases turn over. Imagine what we could have produced had we not been spending so much time on hardware management. In my previous role of running ITOPs and Sun, I had over 1000 people(employees and contractors) running 9 data centers around the world and dozens of “computer rooms”. I had outsourced about half of the data centers, but the outsourcing model is really just taking the data centers that you already own and letting an outsourcer run them for you. I was still exposed or worse deciding to how many people were running, what type of people, how often the HW was refreshed, all the details of running the data centers were things that I had to know because I was usually paying for every little service and if I didn’t stay on top of it, the costs would spiral out of control. DO WE DO THIS TODAY? Another motivating factor for our move that helps run our operations is AWS Auto Scaling. Auto Scaling allows you to automatically scale your Amazon EC2 capacity up or down according to conditions you define. As your defined thresholds are breached EC2 instances will be launched or terminated as needed. You can seamlessly scale up during demand spikes to maintain performance or scale down automatically during demand lulls to minimize costs. Many of the engineers in our group thought “that’s great, but many of our apps are ‘steady state’…they don’t see significant spikes or troughs.” Regardless, they still need to be available… So we use Auto Scaling to enable automated response to host failure.
Visibility into hardware utilization rates: At Amazon, we look very closely at our hardware utilization rates – hardware held divided by hardware used. We work very hard to understand our hardware utilization patterns. The easiest way for increasing utilization rates is to release unused hardware. One of the steps we took to reduce our TCO was move to Zen technology - and it is helping us prepare for our migration to AWS. We’re also leveraging AWS Auto Scaling – to hold just the capacity we need. Just like the bill from your utility company, we’re getting reliable, auditable, metered usage data from the AWS platform sent directly to service owners. Giving direct visibility encourages action and a sense of ownership of the data – and ultimately drives improved software efficiency. [JENN NOTES: I also talked about our current utilization rates compared to industry standard. Then someone in the audience raised their hand and asked what our utilization rates were before we started moving to virts. I had no idea ]
When I ran the Data centers for Sun, the provision process was cost optimized and in that environment, it practically meant that we held no HW in reserve and every system was custom design. There were approximately 6 month lead times from request to provisioned and ready for application deployment. Even in an escalated case, we had 8-10 week lead times. But with AWS the lead time is 0.
Our starting point – which is probably just like everyone else’s. We are: Classic IT shop with a secure internal network and fixed capacity in owned datacenters Employees running apps Engineering teams deploying and supporting apps within our own firewall – a mix of dedicated hardware and virts (on Zen technology)
The direction we’ve taken is to extend our internal network to the cloud – utilizing a VPC or Virtual Private Connection to maintain our security and privacy standards.
When we started the program last year, we came up with some core tenets or guiding principles for the program. Amazon is an enterprise customer of AWS: we will drive requirements into AWS accordingly. We are a software vendor: wherever possible, we build our tools to benefit all AWS enterprise customers, not Amazon-specific solutions. We will not take any steps backwards on our key metrics: we will meet or exceed our existing availability and latency SLAs when moving to AWS Customer trust is maintained by our strict adherence to enterprise security requirements: we will drive requirements to ensure compliance with enterprise security and governance standards. Amazon values frugality We will reduce the cost of managing our capacity.
There are many customer-facing applications across Amazon that have run for a long time on AWS, and more every month. The following are some of the best practices we’re following in my group – and that we would recommend to others as we go through the ongoing process of moving substantially all of our apps to AWS. Phase1: Pre Migration Readiness Before we started moving over applications, the first thing we did was set up a program infrastructure and hired a Technical Program Manager to run the program and manage dependencies (IT Security, Networking, etc). We also needed a single voice for priorities and requirements into the AWS organization. Then we did a rough system assessment – as stated earlier, we have about 200 applications for consideration so we did rough cuts to get a sense of what was ahead of us. For our 3 rd party vendors – we immediately started looking at licensing and AWS certifications. Phase 2: Experiment and Get our hands dirty Start learning and educating yourself. Get an account.. Go. We quickly started using S3 for data backups. Then we identified 2 pilot apps to deploy in EC2 (via a VPC) to understand operational procedures, etc. Phase 3: Phased Migration We’ve set aggressive internal goals around migration for both internal applications and 3 rd party applications to give us momentum on a multi-year phased application approach. In addition, we always look at AWS first for new development. Above all, we’ve looked for places where we can leverage work across the enterprise – making it easy for all Amazonians to deploy to AWS. [JENN NOTES– I spoke to this slide at a high level and have some detail slides to follow. This probably needs another update based on the migration paper from Jinesh. I also got the feedback from Jinesh that I used “experiment” too much – made it sound like it wasn’t ready for prime time.]
Phase1: Pre Migration Readiness Data Classification: Top Secret, Secret, Private, Public Application Criticality (Availability, SLAs): Mission Critical, Business Critical, Business Operational, Administrative Dependencies, we found that applications that had componentized architectures or were SOA’s were easier to move. Applications with complicated dependancies could have big latency impacts and could be rewritten to be faster. Compliance Requirements(SOX, PCI) HW Component Usage (Disk, I/O, Memory) Current TCO We also looked at 3 rd party vendors(transition)
We are collaborating with 3 rd party software vendors and AWS business development to…. Adapt license models to the paradigm of elastic capacity Expand AWS support of 3 rd party vendors’ system requirements (Microsoft support of Windows OS) Test AWS hosted systems against vendors’ system performance benchmarks Work with your 3 rd party vendors.
Phase 2: Experiment and Get our hands dirty First thing we did get an account and get going. We started using S3 for backups. Then then we identified 2 pilot apps to deploy in EC2 (via a VPC) to validate latency, understand operational procedures, etc. The pilot apps were considered low risk – they were simple services classified with private data. One was an HR System that generates mailing lists off of reporting hierarchies and the other was a metadata service used by our software build systems. Once we had the two pilot apps running, we decided to move more – which brings us to phase 3.
Phase 3: Phased Migration We continue to refine our application assessments – looking at criticality, compliance requirements. We’ve set aggressive internal goals around migration for both internal applications and 3 rd party applications to give us momentum that are based on a multi-year phased application approach. What we found was that as we learned more, we wanted to share more and make it even easier for teams across Amazon to deploy to AWS.
As an example of leveraging synergies across the organization, first thing we did was make encryption easy. Amazon takes care of the physical security of the data center - it’s our job to encrypt our data. We built a client library that is used to store data according to our own Amazon Security Data Handling Policy. The library is designed to be easy-to-use with minimal effort required from the developers to get all of the functionality that the Security team requires for data handling while maintaining the scalability that many services need in their day-to-day operations. S3 gives us a really simple file storage solution – such automated data backups. We have a simple website to enable users to put/get files and create hyperlinks to them.
Internal web application to host internal video for Amazonians – our internal YouTube Videos include tech talks, presentations, training, and company events Old solution required manual intervention by the audio/video team to encode and post QuickTime videos We had 2 software engineers in our KM organization who wanted to fix the problem. They went off and in 3 weeks time – had completely refactored how employees post and download internal videos. Self-service publication, automatic encoding, and automatic publication
Our Web front end was launched on existing hardware. Videos stream within a Flash-based embedded player - Encoding technology used in Broadcast: FFmpeg (http://www.ffmpeg.org/) Automatic encoding pipeline to re-render legacy and new video hosted within Amazon EC2 Over 900 hours of video re-encoded Ordinarily, 900 hours * 3 hours to encode per video = 112 days With Amazon EC2, we were able to parallelize encoding and finish within one week Storing and serving “unlimited” video using Amazon S3 Massive productivity increase 2 software engineers, 3 weeks, 1 application Engineers empowered to build the solution on their own, no requisition process involved [JENN NOTES: one of the customer case studies was almost exactly the same – video rendering using parallel encoding in EC2]
Five applications running on Remedy v7.1 Mid-tier. The mid-tier is Remedy’s out-of-the box “web tier”. The license model was adapted – not bound to hardware. These 5 web applications all run on the mid-tier and interact with the BMC Remedy AR System application tier. [JENN NOTE: one of the customer case studies was this exact example – one of the 3 data centers being hosted by EC2. Felt like a let down example of a case study! I spoke to the fact that since they had seen it before, they should consider starting out this way. ]
We were comfortable in our own firewall. We needed everyone to be comfortable in the cloud. Get security and auditors involved early on. Everyone needs to understand our Access control policies and how we handle data security. We changed the question from “is it secure to run in the cloud” to “what do we need to feel secure in the cloud?” Make it easy. You need to invest time. I already talked about how we made encryption easy. We also looked to integration of our own software deployment system. When we started, Amazon’s software deployment system and infrastructure automation tools weren’t fully integrated with EC2. We want to enable Amazon service owners to easily and securely migrate their applications to EC2 and to take full advantage of AWS’s cloud management offerings, including the new Autoscaling capability. Service owners wishing to move to the cloud will be able to simply click on a “Move to EC2” button, and the new user interface walks them through a few simple steps to set up their EC2 configuration, create an autoscaling group, and execute their first cloud deployment.
A PRACTICAL APPROACH TO MIGRATING INTERNAL IT APPS TO THE AWS CLOUD Jerry Hunter, VP Amazon IT
CASE STUDY 1: BROADCAST – THE AWS VERSION Encryption Amazon Internal Network
900 hrs of video encoded in 5 days (would have taken 112 days)
2 software engineers, 3 weeks
Users Dynamically Scaled Video Rendering Unlimited Video Storage Web Front-End MySql Rendering Job Manager
CASE STUDY 2: BMC REMEDY MID-TIER Amazon EC2 instances hosting part of Remedy mid-tier server fleet spread across three data centers Employees Amazon VPC DC1 DC2 DC3 Load Balancer Amazon Internal Network