Migrating enterprise workloads to AWS


Published on

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The Module ObjectivesBy the end of this training you will be able to do the following:Identify the Oracle and AWS alliance timeline. Describe how to identify opportunities that can be solved by AWS products and services and what other customers have done before. Verify some common best practices using Oracle and AWS product and services. Describe the support and licensing polices and other online resources.
  • Now that you know some of the main problems our customers are solving on AWS, we’d like to talk a bit about why they choose AWS cloud
  • Cloud computing is a better way to run your business. The cloud helps companies of all sizesbecome moreagile. Instead of running your applications yourself you can run them on the cloud where IT infrastructure is offered as a service like a utility. With the cloud, your company saves money: there are no up-front capital expenses as you don’t have to buy hardware for your projects. The massive scale and fast pace of innovation of the cloud drive the costs down for you. In the cloud, you pay only for what you use just like electricity.The cloud can also help your company save time and improve agility – it’s faster to get started: you can build new environments in minutes as you don’t need to wait for new servers to arrive. The elastic nature of the cloud makes it easy to scale up and down as needed. At the end of the day you have more resources left for innovation which allows you to focus on projects that can really impact your businesses like building and deploying more applications. “With the high growth nature of our business, we were looking for a cloud solution to enable us to scale fast. Think twice before buying your next server. Cloud computing is the way forward.” - Sami Lababidi, CTO, Playfish
  • Amazon Web Services provides highly scalable computing infrastructure that enables organizations around the world to requisition compute power, storage, and other on-demand services in the cloud.  These services are available on demand so a customer doesn’t need to think about controlling them, maintaining them or even where they are located. Our approach has always been to be a customer focused company.  We constantly look to develop services in line with the needs of our customers to make sure they get the flexibility and usability out of the service that they need to be successful. 
  • Without getting into the industry debate about public vs. private cloud it’s clear that most cloud benefits cannot be realized with on-premise virtualization technologies. In the on-premise virtualization model, you often have to buy expensive hardware and software which virtually eliminates the cost benefits of cloud computing. Although on-premise virtualization allows you to quickly provision new servers, your ability to scale up is limited to your physical infrastructure. You still need to buy physical servers to grow. If you want to scale down you won’t see significant cost-savings as you already paid for the hardware. These limitations of the on-premise virtualization model impact your ability to innovate fast and free up money to invest in new projects.NAS is file based, SAN is block based.Short for Multiprotocol Label Switching, an IETF initiative that integrates Layer 2 information about network links (bandwidth, latency, utilization) into Layer 3 (IP) within a particular autonomous system--or ISP--in order to simplify and improve IP-packet exchange.MPLS gives network operators a great deal of flexibility to divert and route traffic around link failures, congestion, and bottlenecks.
  • Without getting into the industry debate about public vs. private cloud it’s clear that most cloud benefits cannot be realized with on-premise virtualization technologies. In the on-premise virtualization model, you often have to buy expensive hardware and software which virtually eliminates the cost benefits of cloud computing. Although on-premise virtualization allows you to quickly provision new servers, your ability to scale up is limited to your physical infrastructure. You still need to buy physical servers to grow. If you want to scale down you won’t see significant cost-savings as you already paid for the hardware. These limitations of the on-premise virtualization model impact your ability to innovate fast and free up money to invest in new projects.
  • Many architecture diagrams have all the latest and greatest services in them along with a fully scalable, available, loosely coupled, fault tolerant, and multi-tier design. In some cases, customers are moving a very basic implementation with 5 to 20 users. This is the case for the architecture shown above. It is an Oracle PeopleSoft implementation with minimal availability and DR requirements. It is a light weight and low cost solution for hosting PeopleSoft on AWS. The things that stand out about the architecture are: 1. No load balancing as there are only 5 concurrent online users. 2. No long term archiving as there are no regularity compliance needs. 3. No auto scaling for application tier as the application server can be recovered manually using the Amazon EC2 instance snapshots. 4. No automatic HA/multi-AZ for database tier as RDS backups can be used to recover the Oracle database. 5. No session recover as there are limited online transactions and the users can resubmit a failed session.PeopleSoft is hosted on an Amazon EC2 Instance. This is an Amazon Elastic Block Storage (EBS) based Amazon EC2 large Instance with 7.5 GB of memory and 4 Amazon EC2 Compute Units. The database is hosted on an Amazon RDS Oracle Instance. This is an Amazon EBS based Amazon RDS large Instance with 7.5 GB of memory and 4 Amazon EC2 Compute Units. Amazon RDS is backed up automatically. The frequency of the backups can be set automatically. A backup snapshot can be take at anytime but I/O will be suspended for a few minutes unless multi-AZ is set for Amazon RDS. Amazon EBS Snapshots will be used for Application Server high availability and potentially disaster recovery. The snapshots can be located in the same region in a different AZ or snapshot to another region for additional protection. AWS spot instances, spare Amazon EC2 instances that you bid on, can be used when there are extreme large batch files to process and load into the database.
  • On the other end of the spectrum from the minimal PeopleSoft configuration is highly available and scalable Oracle E-Business Suite implementation. These implementations can be complex and expensive. There are typically dense peak periods and wild swings in traffic patterns result in low utilization rates of expensive hardware. Amazon Web Services provides the reliable, scalable, secure, and high-performance infrastructure required for Oracle E-Business Suite while enabling an elastic, scale out and scale down infrastructure to match IT costs in real time as customer traffic fluctuate.The database server is a High-Memory Quadruple Extra Large Instance with 68.4 GB of memory and 8 virtual cores,26 EC2 Compute Units. The application server instances are also high memory as a minimum of 6 GB of memory is recommended for Oracle E-Business Suite. We will use the High-CPU extra large instances which have 7 GB of memory and 8 virtual cores. The HTTP Servers can be High-CPU Medium instances with 1.7 GB of memory and 2 virtual cores. The user's DNS requests are served by Amazon Route 53, a highly available Domain Name System (DNS) service. Network traffic is routed to infrastructure running in Amazon Web Services. The HTTP requests are first handled by the Elastic Load Balancing, which automatically distributes incoming application traffic across multiple Amazon EC2 instances across AZs. It enables even greater fault tolerance in your applications, seamlessly providing the amount of load balancing capacity needed in response to incoming application traffic. The Oracle Web, application and database servers are deployed on Amazon EC2 instances. This will be a custom AMIusing Oracle Enterprise Linux 5.3 and Oracle E-Business Suite 12.1.3. Amazon Spot Instances or Auto Scaling can be used to support batch processing.Web and application servers are deployed in an Auto Scaling group. Auto Scaling automatically adjusts your capacity according to conditions you define. This ensures that the number of Amazon EC2 instances increases seamlessly during demand spikes. Oracle database backups and the batch flat files for integration with the corporate data center are stored on Amazon S3.The storage volumes for the Applications Servers will be standard Amazon EBS volumes.The Oracle database storage volumes will be Amazon EBS PIOPS volumes. These provide up to 1000 IOPS per volume. These will be stripped using Oracle ASM. Spot instances can be used to handle large batch loads.
  • Now that you know some of the main problems our customers are solving on AWS, we’d like to talk a bit about why they choose AWS cloud
  • 6. IDS : An intrusion detection system (IDS) is a device or software application that monitors network or system activities for malicious activities or policy violations and produces reports to a management station. Some systems may attempt to stop an intrusion attempt but this is neither required nor expected of a monitoring system.7. IPS : Intrusion prevention systems (IPS), also known as intrusion detection and prevention systems (IDPS), are network security appliances that monitor network and/or system activities for malicious activity. The main functions of intrusion prevention systems are to identify malicious activity, log information about this activity, attempt to block/stop it, and report it.   Intrusion prevention systems are considered extensions of intrusion detection systems because they both monitor network traffic and/or system activities for malicious activity.A host-based intrusion detection system (HIDS) is an intrusion detection system that monitors and analyzes the internals of a computing system, and in some cases the network packets on its network interfaces (just like an NIDS).  A host-based IDS monitors all or parts of the dynamic behavior and the state of a computer system. HIDS was first designed for the mainframe.  HIDS uses sensors (agents) located on each host.    These host-based agents, which are sometimes referred to as sensors (or agents), would typically be installed on a machine that is deemed to be susceptible to possible attacks. The term “host” refers to an individual computer/virtual host. This means that separate sensor would be needed for every machine/virtual host. Sensors/agents work by collecting data about events taking place on the system being monitored. This data is recorded by operating system in audit trails. Therefore, HIDS is very log intensive.Network-based intrusion detection systems offer a different approach. NIDS collects information from the network itself rather than from each separate host. They operate essentially based on a “wiretapping concept" (network taps).  Information is collected from the network traffic stream, as data travels on the network.  The intrusion detection system checks for attacks or irregular behavior by inspecting the contents and header information of all the packets moving across the network. The network sensors come equipped with “attack signatures” that are rules on what will constitute an attack, and most network-based systems allow advanced users to define their own signatures.  this method is also known as packet sniffing, and allows the sensor to identify hostile traffic.I still don't believe that we are injecting a 0/0 route, but I haven't personally tried setting up a no-BGP tunnel to an ASA, I will try and find one to test and reach out to the VPC team to ask.  On the HIPS/HIDS question, the typical FUD is around additional resources being used by the HIPS agent, aka Amazon wants you to run HIPS so you need to run more instances (and pay more $) because the IPS agent will use a bunch of resources.  In fact the HIPS solution we recommend, Trend Micro Deep Security, is really lightweight because it only loads the signatures that are required for that instance based on the software and OS that is running plus it has the advantage of being able to stop attacks as well as reducing false positives since the signature set is automatically tuned for that particular instance.  This is a huge benefit in my opinion because typical NIDS create a crapton of noise and thus typically no one ever looks the output, resulting in a lower security posture in many cases.  Also if they really want NIDS the Alert Logic Threat Manager product is also fairly lightweight, though it does impact network performance, and since few instances are really ever 100% network bound the additional bandwidth has a negligible impact.  CISCO ASA and SonicWall dedicated device for AWS VPC.   Configure VPN on AWS side it generates an ACL that tunnel is requesting needs to be on both device then all traffic on that device will only go to AWS. BGP is available this is not an issue. Only an issue when using ASA (specific routes).Migrate R5 Demo ApplicationWhat is required to be Active/Active : How to use shopping cart session data (DynamoDB), AZ to AZ using ELB, Auto Scaling, Route 53. Database only running in one AZ.  How do they manage?·       How should specific application design be modified to utilize AWS such as shared data, shopping carts and content delivery (S3)
·       Requires Application architect resource to provide direction to the  THG development team to modify application code to be Active/Active 
  • Physical SecurityAmazon has many years of experience in designing, constructing, and operating large-scale datacenters. This experience has been applied to the AWS platform and infrastructure. AWS datacenters are housed in nondescript facilities. Physical access is strictly controlled both at the perimeter and at building ingress points by professional security staff utilizing video surveillance, intrusion detection systems, and other electronic means. Authorized staff must pass two-factor authentication a minimum of two times to access datacenter floors. All visitors and contractors are required to present identification and are signed in and continually escorted by authorized staff.  AWS only provides datacenter access and information to employees and contractors who have a legitimate business need for such privileges. When an employee no longer has a business need for these privileges, his or her access is immediately revoked, even if they continue to be an employee of Amazon or Amazon Web Services. All physical access to datacenters by AWS employees is logged and audited routinely.Network SecurityDistributed Denial of Service (DDoS)Standard mitigation techniques in effectMan in the Middle (MITM)All API endpoints protected by SSLIP SpoofingProhibited at host OS levelNetwork SecurityUnauthorized Port ScanningViolation of TOSDetected, stopped and blockedPacket SniffingPromiscuous mode ineffectiveProtection at hypervisor levelStorage Device DecommissioningUses techniques from:DoD 5220.22-M (“National Industrial Security Program Operating Manual “)NIST 800-88 (“Guidelines for Media Sanitization”)Ultimately, all devices are:degaussedphysically destroyedVirtual Memory and Local DiskProprietary disk management prevents one instance from reading disk contents of anotherDisk is wiped upon creationDisks can be encrypted by customerAWS Third-Party Attestations, Reports, and CertificationsAWS EnvironmentService Organization Controls (SOC) ReportsSOC 1 Type II (SSAE 16/ISAE 3402/formerly SAS70)SOC 2 Type IISOC 3Payment Card Industry Data Security Standard (PCI DSS) Level 1 CertificationISO 27001 CertificationFedRAMPSMDIACAP and FISMAITARFIPS 140-2Additional information available at https://aws.amazon.com/compliance/. Customers have deployed various compliant applications:Sarbanes-Oxley (SOX) HIPAA (healthcare)FedRAMPSM (US Public Sector)FISMA (US Public Sector)ITAR (US Public Sector)DIACAP MAC III Sensitive IATO
  • Oracle ASM disk groups provide three types of redundancy: normal, high, and external. With normal and high redundancy, files are replicated within the disk group. With external redundancy, ASM does not provide any redundancy for the disk group. When creating setting up ASM for a group of volumes, we recommend using external redundancy since Amazon EBS volumes are already redundant within an availability zone.Oracle ASM best practices like having different disk groups for data and log files, work and recovery areas, also apply in Amazon EBS.Because this architecture is targeted at a medium-sized enterprise class database, we recommend using fewer than 10 total volumes. To provide a benefit, a provisioned IOPS volume must maintain an average queue length (rounded up to the nearest whole number) of 1 for every 200 provisioned IOPS per minute. If you set the queue length to less than 1 per 200 IOPS provisioned, your volume will not consistently deliver the IOPS that you've provisioned. Setting the queue length too far above the recommended setting won't affect the IOPS your volume delivers, however per-request latencies will increase. For a Provisioned IOPS volume of 500, the queue length average must be 3. If the average queue length is less than 3 for this volume, you aren't consistently sending enough I/O requests.Instance StoreZero network overhead; local, direct attached resource.No network variabilityNot optimized for random I/OGenerally better for sequential I/ORoot volume and data volume are lost on physical disk failure, stopping, or terminating of instanceIdeal for storing temporary data like buffers, caches, scratch data, and other temporary content, or for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers.Maintain a number of pending I/O requests to get the most out of your Provisioned IOPS volume. The volumes must maintain an average queue length of 1 (rounded up to the nearest whole number) for every 200 provisioned IOPS in a minute Maintain a queue depth of 10 for a 2,000 Provisioned IOPS volumeMaintain a queue depth of 3 for a 500 Provisioned IOPS volumeExample: a 2000 Provisioned IOPS volume can handle:2000 16KB read/write per second, or 1000 32KB read/write per second, or 500 64KB read/write per second You will get consistent 32 MB/sec throughput (with 16KB or higher IOs)Perform an index creation action and sends I/O of 32K, IOPS becomes 1000, you still get 32MB/sec throughputOn best effort, you may get up to 40 MB/sec throughput fioLinux, WindowsFor benchmarking I/O performance. (Note that fio has a dependency on libaio-devel.)Oracle ORIONLinux, WindowsFor calibrating the I/O performance of storage systems to be used with Oracle databases.SQLIOWindowsFor calibrating the I/O performance of storage systems to be used with Microsoft SQL Server.We like ext3/4, but we love XFSHigh performance, consistentRobust and lots of options for tweaking/adjusting as neededOur favorite mount options: (your mileage may vary)inode64, noatime, nodiratime, attr2, nobarrier, logbufs=8, logbsize=256k, osyncisdsync, nobootwait, noautoYields great performance, reduces unnecessary writes, stableWe like ZFS a lot too, but we want to see more runtime on linux firstBut FreeBSD/ZFS would be a fine choiceHowever: test your workload!File systems behave differently under different workloadsAn EC2 instance comes with a certain amount of “local” storage, which is ephemeral. Any data placed on those devices will not be available after that instance is terminated by the customer, or if the underlying hardware fails which would cause an instance restart to happen on a different server. This characteristic makes instance storage ill-suited for database persistent storage. AWS offers a storage service called Amazon EBS (Elastic Block Storage), which provides persistent block-level storage volumes. Amazon EBS volumes are off-instance storage that persists independently from the life of an instance. Amazon EBS volumes are designed to be highly available and reliable. Amazon EBS volume data is replicated across multiple servers in an Availability Zone (datacenter) to prevent the loss of data from the failure of any  single component. For all these reasons, we recommend to use EBS for data files, log files and for the flash recovery area. Using ephemeral storage intelligently can boot performance. This can be used for many kind of temp files and regularly backup static files.For high I/O workloads, an alternative to Provisioned IOPS EBS volumes is to use High I/O instances, which contain SSD drives as internal storage and address the most demanding database workloads. The High I/O Quadruple Extra Large instance can provide up to 120,000 random read IOPS and 85,000 random write IOPS. The High Memory Cluster Eight Extra Large Instance offers 244 GB of memory in addition to 240 GB of local SSD storage. Note however that this SSD storage is internal to the instance and will be lost if the instance is stopped or if the underlying hardware fails. When using this type of storage for databases, you should make sure that you have a solid strategy to avoid loss of data, for example by frequently backing up your data to Amazon S3. In addition to storage performance, High I/O and High Memory Cluster Instances also have very high I/O performance via 10 Gigabit Ethernet, which allows for increased EBS performance.EBS Optimized M3.2Xlarge instance has 1 Gb/s bandwidth dedicated to EBS, more than 12 PIOPS volumes at 500 IOPS each will saturate the 1 GB/s network16 KB per IO = up to 64 MB/sec. It can burst up to 40 MB/sec on best effort basis.High-performance SSD optionhi1.4xlarge EC2 instance type(2) x 1TB SSD local to instance~120,000 random read IOPS (4 KB blocks)~10,000-85,000 random write IOPS (4 KB blocks)
  • AMIS : You need to use an AMI (Amazon Machine Image) to start an EC2 instance. There are a lot of options. We recommend using the AMIs that are published by Oracle, available at http://aws.amazon.com/amis/Oracle. There are AMIs containing Oracle Enterprise Linux and Oracle database 11g release 2 with the following versions: Standard Edition One, Standard Edition and Enterprise Edition. You get the benefit of having a fully pre-installed Oracle database. Alternatively, our customers can start an EC2 instance running the operating system of your choice, and install Oracle manually, just like they would do on an internal server at their company .  As the number of Oracle supplied AMIs have need kept up with demand and as Oracle has not been providing AMIs for the latest and greatest releases, it might be a good idea to give options to the users.Sizing: The amount of CPU and memory, as well as the network bandwidth available to the database depends on the type of instance on which it is deployed. If migrating an existing database from on-prem to EC2, you can pick the closest instance  type and use that as the starting point and then monitor the performance to determine whether it is a good match or if you need to pick a bigger/smaller instance type.When running constant-on high-performance databases, it is best to choose the high-memory instance class as this allows you to maximize the amount of memory available to the SGA of the database. Larger instance types may also have the added benefit of providing higher throughput to the attached EBS volumes. Mention advantages of ne CC and Hi I/O instances.Instance Type: Increasing the performance of a database requires an understanding of which of the server’s resources is the performance constraint. If the database performance is limited by CPU or memory, users can scale up the memory, compute, and network resources by choosing a larger instance type. The three architectures we've discussed cover most Oracle database use cases on the AWS platform. In the rare case that you want to run an OLTP application, your database would need very high IOPS, in the range of 100,000 – 200,000. To attain those high IOPS in this architecture, we use local SSD-based volumes that are available in the Amazon EC2 instance itself. Because these are ephemeral disks, there is the potential to lose the entire database if the instance fails. To prevent any potential for data loss and ensure reliability, this architecture employs a second instance in the same Availability Zone. It uses Oracle Data Guard to replicate data to this instance from the primary instance. We may also want to introduce the Oracle Flash Cache feature to extend database performance on High memory instance types with SSD disks.In short we can utilize the Oracle Flash Cache feature on Oracle 11g to extend database Buffer cache to 240G of SSD above the existing 240G of RAM. This is useful for high memory database requirements and also in-memory database requirements.For simple bootstrapping, user data text/scripts may be adequate.  Keep in mind the limit on size is 16K for user data.s3cmd is often used to load the bootstrap scripts for S3. More on this can be found here:http://s3tools.org/s3cmdhttps://github.com/s3tools/s3cmdA very good document on using user data, CloudFormation, Chef, Puppet and other tools to bootstrap EC2 instances can be found here:https://s3.amazonaws.com/cloudformation-examples/BoostrappingApplicationsWithAWSCloudFormation.pdf
  • Use of Route 53 to manage Oracle database endpoints as seen by applications - this makes it easier to maintain HA in an environment where the Oracle instances themselves may be transient.Vertical Scaling : For many customers, increasing the performance of a single DB instance is the easiest way to increase the performance of their application overall. In the Amazon EC2 or Amazon RDS environments, you can simply stop an instance, increase the instance size, and restart the instance. This is particularly true if you have a set maintenance window and can tolerate system downtime. This technique is often referred to as scaling up.Advanced setups can benefit from the elastic nature of Amazon Web Services. By monitoring the usage of the primary database with Amazon CloudWatch, you can receive notifications indicating that a heavy load threshold has been met or exceeded. In this situation, you can create on-demand new stand-by databases to lower the load on the primary. Once this heavy usage period is finished, stand-by instances and the resources they consume can be disposed . DataGuard can be used only with EE. There are many third party solutions that provide the same functionality even for Standard and Standard one (like SharePlex, Dbvisit). Would be a good idea to mention those too.Active-Active replicationCommercially available active-active database replication technologies can also be used to boost the overall throughput of   an application. This can be especially useful if there is way to divide up the workload between multiple DB instances such that even when they share the same schema, the updates they make are mostly exclusive to each other. For instance handling customer orders based on the location of the customer with all US based orders going into one database and non-US orders going to a second database. However the application would need to handle any conflict resolution scenarios, for instance if there is a total count of number of orders being maintained, then it needs to be updated outside of these replicated databases . Would be good to explain Multi-Master setups too.   Oracle Golden Gate also can be used for this purpose, so can be streams. Oracle is putting emphasis on GG on their Road map so it would be a good idea to cover that tooAWS specific tactics for implementing HA best practices:1. Failover gracefully using Elastic IPs: Elastic IP is a static IP that is dynamically re-mappable. You can quickly remap and failover to another set of servers so that your traffic is routed to the new servers. It works great when you want to upgrade from old to new versions or in case of hardware failures2. Utilize multiple Availability Zones: Availability Zones are conceptually like logical datacenters. By deploying your architecture to multiple availability zones, you can ensure highly availability. Utilize Amazon RDS Multi-AZ [21] deployment functionality to automatically replicate database updates across multiple Availability Zones.3. Maintain an Amazon Machine Image so that you can restore and clone environments very easily in a different Availability Zone; Maintain multiple Database slaves across Availability Zones and setup hot replication.4. Utilize Amazon CloudWatch (or various real-time open source monitoring tools) to get more visibility and take appropriate actions in case of hardware failure or performance degradation. Setup an Auto scaling group to maintain a fixed fleet size so that it replaces unhealthy Amazon EC2 instances by new ones.5. Utilize Amazon EBS and set up cron jobs so that incremental snapshots are automatically uploaded to Amazon S3 and data is persisted independent of your instances.6. Utilize Amazon RDS and set the retention period for backups, so that it can perform automated backups.This implementation sets up Data Guard for Fast Start Failover, so that the failover to standby instance can be achieved quickly. In this architecture the primary instance uses Elastic Network Interface (ENI), which can be leveraged for an even faster failover by swapping the ENI from the primary instance to the standby instance, because both instances are in the same Availability Zone. This requires a third observer instance to monitor the primary instance and swap the ENI in case of a failure.Oracle Active Data Guard is an Oracle Database add-on, which allows you to set up standby databases that can be open for read-only requests, while continuing to archive transactions from the primary database. The standby databases can be used as read replicas of your primary database. The replication between the primary and the standby databases can be configured to be synchronous. This allows you to scale your database layer horizontally by adding read replicas and to offload read-only queries from the primary database. This setup is often valuable because most applications generate more reads to the database than writes. Also, read-heavy clients like business intelligence applications can be executed against a standby instance, with no impact on the primary production database.You can use Active Data Guard to build an elastic database infrastructure. By monitoring the usage of the primary database with Amazon CloudWatch, you can receive notifications indicating that a heavy load threshold has been met or exceeded. In this situation, you can create on-demand new standby databases to lower the load on the primary. Once this heavy usage period is over, standby instances and the resources they consume can be disposed.Note: Oracle Active Data Guard is only available for Oracle Database Enterprise Edition, not for Standard Edition and Standard Edition One.It is also possible to use active-active replication to boost performance. In this scenario, you creates one or more database replicas that can be both written to and read from, in effect implementing a distributed database where all replicas are synchronized. These technologies are covered in the “High Availability” section.
  • The Blueprint offers a step by step approach to cloud migration and has been proven successful. When customers will follow this blueprint and focus on creating a proof of concept, they will immediately see value in their proof of concept projects and see tremendous potential in the AWS cloud. After they move their first application to the cloud, they will get new ideas and will want to move them into the cloud.
  • Applications that are very interesting, easy to experiment with, simple sel
  • We have noticed some of our SMBs and startup companies in our ecosystem skipped the classification and other stages I discussed above and dove right into a proof of concept. There is no doubt that a proof of concept will answer tons of questions very quickly. During the proof of concept it is important that you get your feet wet with Amazon Web Services, get trained from Amazon (we have AWS University and have launched a training course in Seattle). Andy started multiple projects in parallel. He regularly focused on Proof of concept.
  • Talk about relative Costs but highlight that this is about getting data their fast…Rectangle not ovals. Border line in size (GB vs TB) and speed (Hours vs Days)Backup…can use storage gateway if less than 5 TB a day as this is max with storage gateway (also need a backup software to get data from disk to storage gateway), Riverbed is a great solution as they offer 2 TB an hour and no back up storage needed. CommVault is another
  • Add more lines for operating costs and flexibility
  • Add SecurityGroupdefinitions---Storage area network (SAN) – Access to SAN is at a block levelNAS is practically an array of hard disc drives with network interface. Volumes of HDD NAS are treated by network users as shared network resources. Access to NAS-stored data is provided at the file-level.A NAS (see Figure 2) is typically composed of networked file servers that make use of Ethernet and TCP/IP, handling data at the file level. You attach NAS devices to an existing TCP/IP network (usually Ethernet) to add additional storage.A simple way to remember the difference between SAN and NAS is to think about how each technology is implemented. NAS is commonly found in server farms--application servers, e-mail servers, and so on--where increasing storage volume is as easy as attaching another system to the network. A SAN is usually deployed for e-commerce applications, data backup and other cases in which large amounts of data must be stored and transmitted over a network; a SAN lets you offload such high-volume traffic, sparing your Ethernet network from congestion.http://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ConfigWindowsHPC.htmlhttp://aws.amazon.com/hpc-applications/ High Performance Computing (HPC)Network-attached storage (NAS) is file-level computer data storage connected to a computer network providing data access to a heterogeneous group of clients. NAS not only operates as a file server, but is specialized for this task either by its hardware, software, or configuration of those elements. NAS is often manufactured as a computer appliance – a specialized computer built from the ground up for storing and serving files – rather than simply a general purpose computer being used for the role.[nb 1]As of 2010 NAS devices are gaining popularity, as a convenient method of sharing files among multiple computers.[1] Potential benefits of network-attached storage, compared to file servers, include faster data access, easier administration, and simple configuration.[2]NAS systems are networked appliances which contain one or more hard drives, often arranged into logical, redundant storage containers or RAID. Network-attached storage removes the responsibility of file serving from other servers on the network. They typically provide access to files using network file sharing protocols such as NFS, SMB/CIFS, or AFP.NAS vs. SAN[edit]Visual differentiation of NAS vs. SAN use in network architectureNAS provides both storage and a file system. This is often contrasted with SAN (Storage Area Network), which provides only block-based storage and leaves file system concerns on the "client" side. SAN protocols include Fibre Channel, iSCSI, ATA over Ethernet (AoE) and HyperSCSI.One way to loosely conceptualize the difference between a NAS and a SAN is that NAS appears to the client OS (operating system) as a file server (the client can map network drives to shares on that server) whereas a disk available through a SAN still appears to the client OS as a disk, visible in disk and volume management utilities (along with client's local disks), and available to be formatted with a file system and mounted.Despite their differences, SAN and NAS are not mutually exclusive, and may be combined as a SAN-NAS hybrid, offering both file-level protocols (NAS) and block-level protocols (SAN) from the same system. An example of this is Openfiler, a free software product running on Linux-based systems. A shared disk file system can also be run on top of a SAN to provide filesystem services.We provide an Amazon DNS server. To use your own DNS server, update the DHCP options set for your VPC. For more information, see DHCP Options Sets.To enable an EC2 instance to be publicly accessible, it must have a public IP address, a DNS hostname, and DNS resolution.http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-dns.html
  • What period are you amortizing hardware across?  Are you using the same RI term?  Are you comparing vs. heavy utilization RIs?How much buffer capacity are you planning on carrying?  If small, what is your plan if you need to add more?  What if you need less capacity?  What is your plan to be able to scale down costs?Are you taking labor into account?  What about maintenance (broken disks, patching hosts, servers going offline, etc).What are you assuming for network gear?  What if you need to scale beyond a single rack?What about availability?  Are you accounting for 2N power?  If not, what happens when you have a power issue to your rack?What is your bandwidth peak to average ratio? Have you modeled in AWS lowering prices over time?  Your purchased gear will never get cheaper – and hosting (power and cooling) is not getting cheaper
  • Smarter Agent powers more highly rated and downloaded real estate app titles in the Android, iPhone and Blackberry marketplaces than any other in the real estate vertical. This includes the #1 and #2 downloaded and rated large franchisor apps, the most highly downloaded and rated independent brokerage office app and many of the top downloaded Multi Service Listings (MLS) apps.
  • As many metrics as you can manageOS level, database, and application metricsTime long running activities, and clearly note runtimes in launch plan
  • The Module ObjectivesBy the end of this training you will be able to do the following:Identify the Oracle and AWS alliance timeline. Describe how to identify opportunities that can be solved by AWS products and services and what other customers have done before. Verify some common best practices using Oracle and AWS product and services. Describe the support and licensing polices and other online resources.
  • Migrating enterprise workloads to AWS

    1. 1. • • • • • • • • Why Enterprises Choose AWS Enterprise Applications Architectures Seven design principals for AWS Best Practices Migration Approach Calculating Total Cost of Ownership (TCO) Customer Project : Migration lessons learned Next steps
    2. 2. No Up-Front Capital Expense Low Cost Pay Only for What You Use Self-Service Infrastructure Easily Scale Up and Down Improve Agility & Time-to-Market Deploy
    3. 3. Technology stack On premise solution AWS Network VPN, MPLS AWS VPC, VPN, AWS Direct Connect Security Firewalls, NACLs, routing tables, disk encryption, SSL, IDS, IPS AWS Security Groups, AWS CloudHSM, NACLs, routing tables, disk encryption, SSL, IDS, IPS Storage DAS, SAN, NAS, SSD Computer Hardware, Virtualization Content Delivery CDN Solutions Databases DB2, MS SQL Server, MySQL, Oracle, PostgresSQL, MongoDB, Couchbase Load Balancing Hardware and software load balancers, HA Proxy Scaling Hardware and software clustering, Apache ZooKeeper Domain Name Services DNS providers AWS EBS, AWS S3, AWS EC2 Instance storage (SSD), GlusterFS AWS EC2 AWS CloudFront AWS RDS, AWS DynamoDB, DB2, MS SQL Server, MySQL,PostgesSQL, Oracle, MongoDB, Couchbase AWS Elastic Load Balancer, software load balancers, HA Proxy AWS Auto Scaling, software clustering, Apache ZooKeeper AWS Route 53
    4. 4. Technology stack On premise solution AWS Analytics Hadoop, Cassandra Data Warehousing Specialized hardware and software solutions AWS RedShift Messaging and workflow Messaging and workflow software AWS Simple Queuing Service, AWS Simple Notification Server, AWS Simple Workflow Service Caching Memcached, SAP Hana Archiving Tape library, off site tape storage Email Email software Identity Management LDAP Deployment Chef, Puppet AWS AMIs, AWS CloudFormation, AWS OpsWorks, AWS Elastic Beanstalk, Chef, Puppet Management and Monitoring CA, BMC, Rightscale AWS CloudWatch, CA, BMC, Rightscale AWS Elastic MapReduce, Hadoop, Cassandra AWS ElastiCache, Memcached, SAP Hana AWS Glacier AWS Simple Email Service AWS IAM, LDAP
    5. 5. 1. Design for failure and nothing fails 2. Loose coupling sets you free 3. Implement elasticity
    6. 6. 4. Build security in every layer 5. Don’t fear constraints 6. Think parallel 7. Leverage different storage options
    7. 7. Design for failure – Avoid single points of failure – Assume everything fails and design backwards • Goal: Applications should continue to function even if the underlying physical hardware fails or is removed/replaced. Automatic failover App Server Database Server (Primary) Database Server (Secondary )
    8. 8. Loose coupling sets you free – Use a queue to pass messages between components Web Servers Queue Video Processing Servers App Servers Decouple tiers with a queue
    9. 9. Implement elasticity – Elasticity is a fundamental property of the cloud – Don’t assume the health, availability, or fixed location of components – Use designs that are resilient to reboot and re-launch – Bootstrap your instances • When an instance launches, it should ask ―Who am I and what is my role?‖ – Favor dynamic configuration
    10. 10. Build security in every layer Security is a shared responsibility. You decide how to: – Encrypt data in transit and at rest – Enforce principle of least privilege – Create distinct, restricted Security Groups for each application role • Restrict external access via these security groups – Use multi-factor authentication
    11. 11. Don’t fear constraints – Need more RAM? • Horizontal : Consider distributing load across machines or a shared cache • Vertical : Stop and restart instance – Need better IOPS for database? • Instead, consider multiple read replicas, sharding, or DB clustering – Hardware failed or config got corrupted? • ―Rip and replace‖—Simply toss bad instances and instantiate replacement
    12. 12. Think parallel – Experiment with parallel architectures Same cost (i.e., 4 instance hours), but parallel is 4x faster Hour 1 Hour 2 Hour 3 Hour 4
    13. 13. Auto Scaling and Elasticity ―AWS enables Netflix to quickly deploy thousands of servers and terabytes of storage within minutes. Users can stream Netflix shows and movies from anywhere in the world, including on the web, on tablets, or on mobile devices such as iPhones.‖ From 40 EC2 instances to 5k instances after launching the Facebook application
    14. 14. High Availability Within Amazon EC2, Airbnb is using Elastic Load Balancing, which automatically distributes incoming traffic between multiple Amazon EC2 instances. HA using Elastic Load Balancer with Apache-WLS, Oracle WebLogic and Oracle RAC in a multi-AZ configuration
    15. 15. Disaster Recovery Washington Trust Bank and AWS Advanced Consulting Provider ITLifeline use the AWS cloud to cut disaster recovery costs, reduce overhead, and improve recovery time in a compliance-driven industry. DiskAgent protects their healthcare industry customers against physical systems damage by storing backedup records offsite, in multiple Amazon data centers.
    16. 16. VPC • Use it…VPC by default for new accounts • Database in private subnet IDS/IPS • Trend Micro, AlertLogic, Snort • Host based • Conduct penetration test : prior approval from AWS VPN • Redundant connections • Consider two Customer Gateways • Dynamic routing (BGP) over static (ASA) Dedicated, secure connection • Direct Connect - 1 Gbps or 10 Gbps NAT • Set up multi-AZ NAT Fail over • ELB : Multi-AZ • Route 53 : Geo/region
    17. 17. Next Session
    18. 18. Storage • Use Instance storage for temporary storage or database EBS • PIOPS (applies to I/O with a block size of 16KB) • Stripe using RAID 0, 10, LVM, or ASM • RAID 10 (can decrease performance) • Snapshot often : Single volume DB • 20 TB DB size (potential max) : Depends upon IOPS and instance type (1 Gbps or 10 Gbps) File system • ext3/4, XFS (less mature) • Try different block sizes : start with 64K Stripping • Stripe multiple volumes for more IOPS (e.g., (20) x 2,000 IOPS volumes in RAID0 for 40,000 IOPS) • ASM (Oracle) with external redundancy • More difficult to Snapshot : Use OSB, database backup solution Tuning • Maintain an average queue length of 1 for every 200 provisioned IOPS in a minute • Pre-warm $ dd of=/dev/md0 if=/dev/null • fio, Oracle ORION • Database Compression
    19. 19. AMIS • Use vendor provided • Build your own AMI EC2 • EBS optimized, cluster compute and storage optimized instances • SSD backed for high performance IO : hi1.4xlarge has 2 TB of SSD attached storage • EBS • Install software binaries on a separate EBS volume SSD backed, high memory instance for cached database using Oracle Smart Flash Cache: cr1.8xlarge has 240 GB of SSD plus 244 GB of memory and 88 ECUs • Boot Strapping • User data/scripts • CloudFormation • Consider Chef, Puppet, OpsWorks Turn off (stop) when not using https://s3.amazonaws.com/cloudformationexamples/BoostrappingApplicationsWithAWSCloudFormation.pdf
    20. 20. Scaling • Vertical Scaling with EC2 : stop instance and change instance type • Horizontal scaling for web and application severs : auto scaling • Horizontal Scaling for Database with Read Replicas and multi-AZ • This will need to be configured using Oracle Active Data Guard, Oracle GoldenGate, 3rd party technology • Amazon CloudWatch : detailed monitoring, custom metrics • Amazon Route 53 : Latency based routing to route traffic to region closest to the user Requires replicated, sharded, or geo dispersed databases HA • Elastic IPs and Elastic Network Interfaces (ENIs) • Active-passive multi-AZ using Oracle Data Guard or other replication solutions • Active-Active multi-AZ using Oracle GoldenGate or other replication solutions • Amazon Route 53 : Now supports health checks for multi-region HA • ELB : Web and Application Server for multi-AZ HA. Health checks (HTML file) to see if Oracle DB is up and running. Associate ENI / Elastic IP to new Oracle DB.
    21. 21. http://aws.amazon.com/whitepapers
    22. 22. Questions to ask? Existing Applications ―No-brainer to move‖ Apps Planned Phased Migration • • • Is it a technology fit? Is there a pressing business need the migration would address? Is there an immediate or potentially big business impact the migration may have? Examples • Dev/Test applications • Self-contained Web Applications • Social Media Product Marketing Campaigns • Customer Training Sites • Video Portals (Transcoding and Hosting) • Pre-sales Demo Portal • Software Downloads • Trial Applications
    23. 23. Proof of concept will answer tons of questions quickly • Get your feet wet with Amazon Web Services – Learning AWS – Build reference architecture – Be aware of the security features • Build a Prototype/Pilot – Build support in your organization – Validate the technology – Test legacy software in the cloud – Perform benchmarks and set expectations
    24. 24. • Select apps • Test platform • Plan migration Plan Deploy • Migrate data • Migrate components • Cutover • Embrace AWS services • Re-factor architecture Optimize
    25. 25. Data Velocity Required GBs One-time upload w/ constant delta updates TBs UDP Transfer Software (e.g., Aspera, Tsunami, …) Attunity Cloudbeam AWS Storage Gateway Riverbed Hours Days Transfer to S3 Over Internet AWS Import / Export Data Size* * relative to internet bandwidth and latency
    26. 26. Forklift Effort Forklift Embrace Scalability Optimize Operational Burden Embrace AWS • May be only option for • Minor modifications to some apps improve cloud usage • Run AWS like a virtual • Automating servers co-lo (low effort) can lower operational • Does not optimize for burden on-demand (over• Leveraging more provisioned) scalable storage Optimize for AWS • Re-design with AWS in mind (high effort) • Embrace scalable services (reduce admin) • Closer to fully utilized resources at all times
    27. 27. ELB Forklift steps: Match resources and build AMIs • Thinks about application needs not server specs • Build out custom AMI for application roles AMI-1 @ C1.Medium AMI-1 @ C1.Medium AMI-2 @ M2.XLarge AMI-6 @ M2.XLarge AMI-3 @ C1.Medium AMI-2 @ M2.XLarge AMI-5 @ M2.2XLarge AMI-4 @ M1.Large Convert appliances: • Map appliances to AWS services or virtual appliance AMIs Deploy supporting components: • NAS replacements • DNS • Domain controllers Secure the application components: • Use layered security groups to replicate firewalls
    28. 28. App Tier Auto-scaling Group Web Tier Auto-scaling Group ELB Web Web Server Server Web Web Server Server App Server App Server App Server App Server Master Database Network Filesystem Network Filesystem DNS Config Management Server Domain Controller Steps to Embrace AWS: Rethink storage: • Leverage S3 for scalable storage • Edge cache with CloudFront • Consider RDS for HA RDBMS Parallelize processing: • Bootstrap AMIs for autodiscovery • Pass in bootstrapping parameters • Leverage configuration management tools for automated build out Scale out and in on-demand: • Use CloudWatch and Autoscaling to auto-provision the fleet
    29. 29. Web Tier Auto-scaling Group App Tier Auto-scaling Group A Phased Migration to AWS - Optimize Steps to Optimize for AWS: Web Web Server Server Web Web Server Server App Server App Server SQS App Server Re-Rethink storage: • Break up datasets across storage solutions based on best fit and scalability App Server App Server Parallelize processing: • Spread load across multiple resources • Decouple components for parallel processing EMR Use Spot where possible to reduce costs Config Management Server Network Filesystem DNS Route 53 Domain Controller Embrace scalable on-demand services • Scale out systems with minimal effort • Route53 • SES, SQS, SNS • …
    30. 30. #1 Start with a use case or an application – compare apples to apples, capacity utilization, networking, availability, peak to average, DR costs, power etc. #2 Take all the fixed costs into consideration (Don’t forget administration, maintenance and redundancy costs) #3 Use Updated Pricing (compute, storage and bandwidth) Price cuts, Tiered Pricing and Volume Discounts #4 Use variable capacity & reserved instances where they fit the business needs #5 Intangible Costs – Take a closer look at what is built in with AWS – security, elasticity, innovation, flexibility
    31. 31. DOs DON’Ts 3 or 5 Year Amortization Use 3-Year Heavy RIs or Fixed RIs Use Volume RI Discounts Ratios (VM:Physical, Servers:Racks, People:Servers) Mention Tiered Pricing (Less expensive at every Tier : network IO, storage) Cost Benefits of Automation (Auto scaling, APIs, Cloud Formation, OpsWorks, Trusted Advisor, Optimization) BONUS
    32. 32. Forget Power/Cooling (compute, storage, shared network) DOs DON’Ts BONUS Forget Administration Costs (procurement, design, build, operations, network, security personnel) Forget Rent/Real Estate (building deprecation, taxes, shared services staff) Forget VMware Licensing and Maintenance Costs Forget to mention Cost of ―Redundancy‖, MultiAZ Facility
    33. 33. Time from ordering to procurement DOs DON’Ts (Releasing early = Increased Revenue) Cost of ―capacity on shelf‖ (top of step) Incremental cost of adding an on-premises server when physical space is maxed out Real cost of resource shortfalls (bottom of step) Cost of disappointed or lost customers when unable to scale fast enough BONUS
    34. 34. • Trusted Advisor: Draws upon best practices learned from AWS’ aggregated operational history of serving hundreds of thousands of AWS customers. The AWS Trusted Advisor inspects your AWS environment and makes recommendations when opportunities exist to save money, improve system performance, or close security gaps. • Apptio: Leader in technology business management (TBM), a new category and discipline backed by global IT leaders that helps you understand the cost, quality, and value of the services you provide. • CloudHealth: Delivers business insight for your cloud ecosystem. Designed for management and executive teams to to optimize AWS performance and costs.
    35. 35. • Leading provider of white label mobile applications and services to real estate industry • Powers more real estate app titles than any other in the real estate vertical • Multi-level marketing platform © 2013 smartShift. All rights 10/24/2013 47
    36. 36. DNS Provider (R53, DNSMade Easy) Internet Apache+ HAProxy 2 Apache+ HAProxy 1 Auto scaling Group JBoss Node 1 Primary Oracle 11g DB From RETS system 3rd Party protocol Windows Downloader Server PREPROC Oracle 11g DB Availability Zone JBoss Node n JBoss Node 3 JBoss Node 2 Active Standby Oracle 11g DB Redo Log Shipping From RETS system Windows Downloader Server PREPROC Oracle 11g DB Availability Zone Application Code Bucket Daily Database Backup EBS Snapshots
    37. 37. • Choose great partners • Understand the cloud capabilities trajectory (rapid pace of innovation) • Have a strong methodology • Implement rich and detailed monitoring • Plan for, and perform as many launch rehearsals as possible • EBS provisioned IOPS works as promised • AWS continues to rapidly improve services (4K IOPS now available) and reduce costs • Multi-AZ implementation • Rehearsed DB restorations
    38. 38. • The cloud-based system operates as expected in terms of performance and cost • Cloud costs as per our projection (with the use of reserved instances) • Project delivered on budget • Operational staff requirements reduced • Incidentally, physical infrastructure failed on 07/10/13 – would have resulted in a total service outage • Lower overall incident rate • Application and storage performance highly consistent • Infrastructure now a selling point for the business
    39. 39. Here are some additional resources: • Get started with a free trial – http://aws.amazon.com/free • White papers – http://aws.amazon.com/whitepapers/ • Reference Architectures – http://aws.amazon.com/architecture/ • Enterprise on AWS – http://aws.amazon.com/enterprise-it/ • Executive level Overview : Extending Your Infrastructure to the AWS Cloud (4 minutes) – http://www.youtube.com/watch?v=CsGqu5L_PFI • Simple Monthly Pricing Calculator – http://calculator.s3.amazonaws.com/calc5.html • TCO Calculator for Web Applications – http://aws.amazon.com/tco-calculator/ tomlasz@amazon.com