Amazon Web Services- AWS Educational Guide Last updated: Mar'14
AWS Educational Guide
Saurabh Bangad
Last updated : Mar'14...
Amazon Web Services- AWS Educational Guide Last updated: Mar'14
AWS Educational Guide
1. Abstract :
This paper is a gettin...
Amazon Web Services- AWS Educational Guide Last updated: Mar'14
4.1 Regions and Availability zone(AZ) :
Amazon EC2 is host...
Amazon Web Services- AWS Educational Guide Last updated: Mar'14
4.4.5 Relational Database Service(RDS) :
A web service tha...
Amazon Web Services- AWS Educational Guide Last updated: Mar'14
5. Common scenarios :
5.01 I need more memory, and I don't...
Amazon Web Services- AWS Educational Guide Last updated: Mar'14
5.05 I need to make sure that, it is secured from the outs...
Amazon Web Services- AWS Educational Guide Last updated: Mar'14
5.09 I am not alone anymore, I have a 'team' :
Section 5.0...
Amazon Web Services- AWS Educational Guide Last updated: Mar'14
5.12 I need to make my system distributed in co-ordination...
Amazon Web Services- AWS Educational Guide Last updated: Mar'14
Amazon provides you lots of options for cluster computing....
Amazon Web Services- AWS Educational Guide Last updated: Mar'14
6.06 I am looking for a solution for movement of data betw...
Amazon Web Services- AWS Educational Guide Last updated: Mar'14 Snapshots of database :
You can backup database s...
Amazon Web Services- AWS Educational Guide Last updated: Mar'14
6.11.2 Support :
AWS also provides 24x7x365 available supp...
Upcoming SlideShare
Loading in...5

AWS Educational Guide


Published on

This is getting started guide to AWS. This might help understanding variety of solutions available with cloud computing.

Things are readily available you just have to know they exist!

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

AWS Educational Guide

  1. 1. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 AWS Educational Guide Saurabh Bangad Last updated : Mar'14 Latest version can be found at : Page 1 of 12
  2. 2. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 AWS Educational Guide 1. Abstract : This paper is a getting started guide for cloud computing evolution by Amazon Web Services(AWS). A new user is most likely unaware about the whole wide range of the available solutions. It can also be a good starting point for someone who is confused with the so many services(more than 25); each with some purpose(s). This paper is intended to describe some of the common use-cases. It also outlines how it can be handled with some service(s). Specially, from startups to one of the biggest players of the industry have been using these services. This paper is about l earned lessons about their successful approaches. This paper might help you tackle the traditional learned issues, allowing you to focus on your business logic. This is like a 'Menu' of a restaurant, you actually need to try out in order to find what is your preference. Covers : -Getting started with Amazon Web Services(introduction to 25+ individual services) -General common scenarios, and service(s) that might solve those issues -How AWS services communicate with one another, to make a complete solution Does NOT cover : -Best practices, and Cost optimizations -Particular service in detail, and its features -Compliance specific details Whatever this document describes is a SUBSET of what AWS provides. For more details, checking individual service documentation is the best idea. This is just a starting point to design the architecture with AWS. For latest version of this document visit: 2. Audience : This is a guide for the beginners to the Cloud Computing. This is an attempt to educate the reader with the variety of services provided by AWS. If you have used some of these services, but still not really clear about other services, then this can be a good guide for you as well. If you know everything but are still looking for something particular you can check for documentation. What do I get from this paper? Understanding of most of the AWS Services with examples based on successful AWS customer stories. This can highlight independent services, and what it does. 3. What is so different about Cloud computing? Before the computers, we had issues with electricity. Later electricity became an assumption, and we had new issues such as CPU, Memory Internet access. Now, those are assumptions too. Then came the issues for managing the data-centers, security of the systems in all aspects, having various system administrators, and the worst part is costs associated with upgrading the hardware. Cloud Computing changes these assumptions. You can just work on your idea and just assume everything else to be handled by your cloud resources. The worst part described earlier, turns into the best part. You do not have to worry about the underlying hardware and you pay only for your usage. Also, gives you the flexibility to add more hardware when required or get rid of it when not required. 4. AWS basic building blocks : To understand physics, you make use of basic units such as 'meter' for length, 'kilogram' for mass, 'second' for time etc. Similarly, to understand AWS Services you make use of some basic units. Let's go through those first. These basic blocks can be used independently, and many services are based on them so anyway they are used. Page 2 of 12
  3. 3. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 4.1 Regions and Availability zone(AZ) : Amazon EC2 is hosted in multiple locations world-wide. These locations are composed of regions and Availability Zones. Each region is a separate geographic area. Each region has multiple, isolated locations known as Availability Zones. This helps you with serving users based on location. It helps you design highly available solutions. More information on AWS Infrastructure can be checked here: e.g. Oregon is a region and us-west-2b could be one of the AZs of Oregon 4.2 Elastic Compute Cloud(EC2) : Amazon EC2 is a web service that provides re-sizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers. Linux and Windows are the currently supported two platforms. This is like having a second computer somewhere in the world that can have any type of hardware, and operating system with whatever configuration you need. Any changes can be made at anytime, and you control who can do those. e.g. installing apache on Red Hat 6.5 e.g. using Windows 2008 R2 with IIS 4.3 Virtual Private Cloud(VPC) : Amazon Virtual Private Cloud (Amazon VPC) lets you provision logically private, isolated section of the AWS Cloud where you can launch AWS resources in a virtual network(s) that you define. You control the IP addressing, routing, and access over the Internet. e.g. having subnets and in region Sydney inside a vpc More about VPC Network connectivity options at: 4.4 Databases and/or storage options : 4.4.1 Instance Store : Temporary block-level storage for many Amazon EC2 instance providing the high I/O operations. It is like hard disc directly attached to the EC2 instance. Ideally used as buffers, caches, scratch data, and other temporary content. High I/O performance at the cost of durability. e.g. temporary storage of a configuration file which is useful till the next reboot 4.4.2 Elastic Block Store(EBS) : Persistent block storage which can also be formatted with a specific file system type such as NTFS, vFAT, ext4, xfs, etc. Ideally used for frequently changing data, and needs long-term persistence. It is also replicated within the same AZ, offering the high availability e.g. primary storage for a relational database 4.4.3 Simple Storage Service(S3) : Provides a simple web service interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. Ideally used for bulk object storage, write-once, read-many. S3 uses 'Buckets' for their logical partitioning(like folders in file systems). This is like having infinite amount storage over the Internet. e.g. media files, backups, huge size uploads stored in bucket called 'Bucket_bulk' 4.4.4 Glacier : Deep archive storage service. It is designed to handle large volumes of data that are infrequently accessed. Typically used for storing files for compliance purposes. e.g. backup logs of year 2009-2012 Page 3 of 12 Amazon S3 Amazon Glacier Amazon VPC Amazon EC2 Instance Store Amazon EBS
  4. 4. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 4.4.5 Relational Database Service(RDS) : A web service that makes it easy to set up, operate, and scale a relational database service in the cloud. You can have MySQL, Oracle, Microsoft SQL, or PostgreSQL relational database server in the cloud. You eliminate the administrative overheads associated with launching, managing, and scaling your own relational database. e.g. Relational database for employee database with RDS with MySQL 4.4.6 DynamoDB : Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. Helps you get rid of administrative burdens associated with operating, and scaling a highly-available distributed database cluster. Typically used for expected I/O throughput. It allows low-latency read and write access to items ranging from 1 byte up to 64 KB. e.g. serving ads based on user's clicking database associated with primary key 'user_location' 4.4.7 Redshift : Data warehouse service to analyze all your data using your existing business intelligence tools. It is a column based storage, to provide solution for data warehousing, and analytics with cluster computing. Amazon Redshift has a massively parallel processing (MPP) architecture that parallelizes and distributes SQL operations to take advantage of all available resources. e.g. analyzing the collected game data to identify user patterns 4.4.8 ElastiCache : It is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. Typically used for session store for volatile data, and as a caching layer. Most commonly used as database front end for read-heavy application. ElastiCache supports two popular open-source caching engines: Memcached and Redis. Memcached is a widely adopted memory object caching system. Redis is a popular open-source in-memory key-value store that supports data structures such as sorted sets and lists. e.g. most of the users from Japan querying about the thing X which requires join operations in the RDS tables can be cached Detailed storage options with best practices are described at : Now, once you understand basic units like meter and second then you can go next level that is speed : meter per second. You can come with basic blocks working with each other in an appropriate manner to create solutions which works for a situation. Quick question : Do I get to try out like a trial version? Answer : Yes! Try Free tier! It is actually free. Check this for more information : Page 4 of 12 Amazon RDS DynamoDB Amazon Redshift ElastiCache
  5. 5. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 5. Common scenarios : 5.01 I need more memory, and I don't care about the CPU : 5.01.1 EC2 various instance types : AWS has family of EC2 instances. Any application may need some particular resource type more than the other.(e.g. '.tar.bz2' vs '.tar.gz' ; the requirements differ) You have various options such as general purpose(balanced compute, memory, network resources), heavy computations, memory intensive, storage optimized for higher I/O operations, GPU optimized(also described in section 6.04.1). More information is available at : By benchmarking the application requirements and finding the right instance type is the ideal beginning for any project. e.g. It's a video encoding application so need more compute power than memory; I will use C1 instance type 5.02 Don't you have pre-installed configurations? 5.02.1 AWS Marketplace : AWS Marketplace is an on-line store that helps finding, the pre-configured software and services that run in the EC2 cloud. Softwares will be directly provided and maintained by the trusted vendors such as: RedHat, SAP, Microsoft, Canonical, IBM. Many widely used open source offerings such as Wordpress, Drupal. Check this for all available ready configurations: e.g. using Red Hat Enterprise Linux(RHEL) provided and maintained directly by Red Hat. 5.03 Now, I know what my application needs, I will do it all with just the basic building blocks Now, that you have understood the basic building blocks, manually you can use each basic block to configure as per your requirements. e.g. using the EC2 instance for Python based application hosting and RDS-MySQL as the relational database and a cronjob can push all the logs to S3 bucket every Sunday at midnight 5.04 I have used the basic blocks, but how do I monitor? 5.04.1 CloudWatch : Web service which lets you monitor resource utilization, operational performance, and overall demand patterns including metrics such as CPU utilization, disk reads and writes, and network traffic. You can compare graphs with various aspects. You can set alarms for defined conditions to take particular actions. Fun fact: You can monitor billing as well! e.g. CPU utilization graph of the EC2 instance 5.04.2 Simple Notification Service(SNS) : SNS is web service that makes it easy to set up, operate, and send notifications from the cloud. It can use SES(described in section 5.04.3) which sends email notifications, can also send SMS notifications or to any HTTP endpoint. Typically used with CloudWatch alarms. e.g. Notify me when the instance has hit 70% CPU utilization for 3 minutes 5.04.3 Simple Email Service(SES) : SES is a highly scalable and cost-effective bulk and transactional email-sending service for businesses and developers. Amazon SES eliminates the complexity and expense of building an in-house email solution or licensing, installing, and operating a third-party email service. This also helps sending emails from any of the applications that are hosted on services such as the Amazon EC2. e.g. send email to a set of users when instance has hit 70% CPU utilization for 3 minutes Best practices for SES at: Page 5 of 12 CloudWatch Amazon SNS Amazon SES
  6. 6. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 5.05 I need to make sure that, it is secured from the outsiders : 5.05.1 Access control list(ACL) : A list of permissions or rules for accessing an object or network resource. In the Amazon EC2, security groups act as ACLs at the instance level, controlling which users(IP address ranges) have the permission to access specific instances. In Amazon S3, you can use ACLs to give read, or write access on buckets or objects to groups of users. In Amazon VPC, ACLs act like network firewalls and control access at the subnet level. e.g. do not allow anyone to write to bucket_X e.g. do not allow anyone on port 22 on our entire network(VPC) 5.05.2 Security Groups : A security group gives you control over the protocols, ports, and source IP address ranges that are allowed to reach your Amazon EC2 instances; in other words, it defines the firewall rules for your instance. These rules specify which incoming network traffic should be delivered to your instances. e.g. only keep port 80 open to everyone( and allow all the outbound traffic More about security best practices can be read on the following link: 5.06 How can I make it robust, and flexible? 5.06.1 Elastic Load Balancer(ELB) : It automatically distributes incoming application traffic across multiple Amazon EC2 instances depending upon their load serving capability. You can have multiple Amazon EC2 instances behind a single ELB node to serve the requests. ELB also monitors the health of the back-end EC2 instances and routes traffic to only healthy instances. e.g. balancing the incoming traffic for my website to multiple EC2 instances 5.06.2 AutoScale(AS): Allows you to scale your Amazon EC2 capacity up or down automatically according to conditions you define. It also monitors the health of the EC2 instance; if instance is not healthy, then replaces it with the new one. You get to define everything about the instance in use which includes region, type, minimum number, the least number(called desired), maximum number, e.g. when CPU usage is more than 60% for 3minutes add one more instance and when it is lower than 40% for 3minutes remove one instance but at any point keep at least two instances 5.07 I want to control the DNS to make it easy to remember : 5.07.1 Route53 : Amazon Route 53 is a highly available DNS service that is available from all AWS regions and edge locations worldwide. You can manage name to IP mapping. Route 53 also provides you with various options for DNS entries such as simple, latency based, weighted, primary-secondary. It is also capable of monitoring the health of the endpoint. Also, described in section e.g. for Sydney region should serve ELBNAME_SYD and for USA it should be ELBNAME_USA 5.08 I believe in faster delivery of my contents : Usually, media content are stored at some particular remote location or even S3. When someone requests for that(e.g. example.jpeg), then it is delivered; someone else also requests the same thing, it would be fetched from the same location. Even if the same person requests still it would be fetched from the same location. What if some location closer to the requesting person keeps a copy for future references? 5.08.1 CloudFront : Amazon CloudFront is a content delivery service that operates from the numerous AWS edge locations worldwide. AWS CloudFront delivers customer data in configuration sets called a distribution. Each distribution has one (or more in the case of cache behaviors) configured origins. Each origin may be an Amazon S3 bucket, or a web server, including web servers running from within Amazon EC2, or even an ELB. e.g. cache the image file of company's logo for the website as every page stores it Page 6 of 12 Elastic Load Balancing Auto Scaling Route 53 CloudFront
  7. 7. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 5.09 I am not alone anymore, I have a 'team' : Section 5.05 discussed about protection from outsiders but what about insiders? 5.09.1 Identity and Access Management(IAM) : You can centrally manage users, security credentials such as passwords, access keys, and permissions, policies that control which AWS services and resources users can access. Create access keys to use when you make programmatic calls to AWS using the command line interface or API calls, also described in section 6.10. Multi-factor authentication (MFA) token is enabled to provide two-factor authentication for the root AWS account. You can also provide credentials to the AWS resources for letting them have the power to do whatever a user could do, called role. Here is the to get started with the policies : e.g. user_A should not have access to S3 bucket_1 and User_B should not have access to change Route 53 entries 5.09.2 CloudTrail : AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you. You can start logging actions, and API calls made by the users. It records the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service This log is stored in an S3 bucket. Typically SNS notification is set whenever new logs are available. e.g. after last Thursday's production issue, going through API calls to identify the exact events 5.10 I am a developer! I would just work on my idea, need everything else automated 5.10.1 Elastic Beanstalk : Fully managed deployment and management tool that automates the functions of capacity provisioning, load balancing, auto scaling, and health monitoring for your applications. It just needs your deployable code. Typically allows developers to keep their attention on the business logic. It supports various popular languages. e.g. PHP application is written and version controlled with 'git' is deployed using beanstalk 5.10.2 OpsWorks : An application management service that helps you control the complete application life-cycle. It allows you to automate and manage all the processes involved in the deployment of your applications, including resource provisioning, configuration management, application deployment, software updates, monitoring, and access control. It also supports 'Chef' which is an Open Source systems integration framework that automates deployment of software. e.g. deploy software_A on some instances and deploy software_B on some instances. Allow both softwares to use python 2.7. Keep them configured with ELB_A and ELB_B. 5.11 I would like to have semi-automation to configure my systems 5.11.1 CloudFormation : It gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning, and updating them in an orderly and predictable fashion. You have to define the AWS resources needed to run your application in a simple text file called a template, which can be used repeatedly to create identical copies of the same resource stack (or used as a foundation to start a new stack) AWS Resources(EC2 instances, etc) can be used as generic resource types/units to create new templates. To be precise, it would be a JSON formatted text-file. Suddenly like integer, float, string now you have new data type: 'Auto Scaling Groups', 'Amazon RDS Database', etc. You can use even Elastic Beanstalk as part of this semi-automation. It gives you both control and power. A complete list of resources which can be used can be found by searching: 'What resource types does AWS CloudFormation support?' Fun fact : Elastic Beanstalk totally uses CloudFormation in order to provide the automation e.g. a website has been designed for the University X, it is launched in N. Virginia region with 2 instances of type C1 and they are behind ELB_1 with AutoScale for CPU usage alarms. Now, this json file can be used to update the whole stack. The same website can be altered for the University Y with the help of the templates. Page 7 of 12 IAM Elastic Beanstalk OpsWorks AWS CloudFormation AWS CloudTrail
  8. 8. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 5.12 I need to make my system distributed in co-ordination : Whenever it is about making architectural planning, we need to get away from the single point of failures and get rid of bottlenecks. 5.12.1 Simple Queue Service(SQS) : It is service for reliable, highly scalable, hosted queues for storing messages as they travel between computers. Typically, it can be used to make loosely coupled modules. Messages can be buffered as per the application requirements. Basically, it helps solving the trivial producer-consumer problem. Makes the system asynchronous and stateless. e.g. AutoScaling group 'AS1' of instances queue requests and another AutoScaling group of instances 'AS2' process those messages. Even if any of the instance from AS1 gets terminated after putting a request into the queue it doesn't affect the system. Also, even if the instance processing the request from the queue gets terminated,then another instance can pick up the same message from the queue. 5.12.2 Amazon Simple WorkFlow Service (SWF) : Amazon SWF makes it easy to build applications that coordinate work across distributed components. Using SWF, you can structure the various processing steps in an application as 'tasks' that drive work in distributed applications, and Amazon SWF coordinates these tasks in a reliable and scalable manner. Amazon SWF manages the task execution dependencies, scheduling, and concurrency based on a developer’s application logic. The service stores tasks, dispatches them to application components, tracks their progress, and keeps their latest state. You can develop SWF based programs using AWS Flow Framework. Here is the link to get started with the AWS Flow Framework: e.g. you have about hundreds of machines which at any random intervals upload their logs to S3 which are to be analyzed. In this situation those instances can be reporting workers and few more separate workers can pick up the ready logs and start analyzing. Another group of worker can compress the logs which have been analyzed. Multiple detailed examples are given at 6. Special needs : 6.01 I don't want to work on search algorithms, and overheads related to that : 6.01.1 CloudSearch : Amazon CloudSearch is a fully-managed service in the cloud that makes it easy to set up, manage, and scale a search solution for your system. It lets you index textual content, and provides basic full text search. You do not even have to manage the hosts and data-scaling issues. e.g. search engine for your website 6.02 I use lot of heavy Video/Audio conversions : 6.02.1 Amazon Elastic Transcoder(ETS): The Amazon Elastic Transcoder(ETS) service simplifies and automates what is usually a complex process of converting media files from one format, size, and/or quality to another. It allows clipping of audio and video files. You can also add watermarks to your processed jobs. e.g. create files compatible with KindleFire. They should also have watermark '' 6.03 What are my options for processing huge amount of data? “In pioneer days they used oxen for heavy pulling, and when one ox couldn't budge a log, they didn't try to grow a larger ox. We shouldn't be trying for bigger computers, but for more systems of computers.”- Grace Hopper Whenever you have to deal with huge amount of data you need to make use of clustering and parallelism. With cluster computing you can do lots of things such as: -Data mining (Log processing, click stream analysis, similarity algorithms, etc.) -Bio-informatics (Genome analysis) -Web indexing -File processing (resize jpegs) -Financial simulation (Monte Carlo simulation) Page 8 of 12 Amazon SQS Amazon SWF Amazon CloudSearch Amazon Elastic Transcoder
  9. 9. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 Amazon provides you lots of options for cluster computing. Intensive jobs can get started within minutes. Redshift is one of the options which was already described in section 4.4.7. 6.03.1 I will do it manually with just EC2 instances : Placement group : Cluster instances can be launched together called placement group. They are launched with low latency and with bandwidth of 10gbps between the instances. Applications need to handle Race conditions, synchronization delays, and partial failures. More about clustering: You can have GPU clusters as well. More about GPU instances described in section 6.04.1. e.g. launch a cluster of 5 EC2 instances in a placement group called 'cluster_1' 6.03.2 Elastic MapReduce(EMR) : Amazon EMR is a fully managed Hadoop framework for processing vast amount of data. It runs on the web-scale infrastructure of Amazon EC2 and Amazon S3. Hadoop is an open source, Java software framework that supports data-intensive distributed applications running on large clusters of commodity hardware. Hadoop implements a programming model called 'Map-Reduce' where the data is divided into many small fragments of work, each of which may be executed on any node in the cluster. Amazon EMR lets you focus on crunching or analyzing your data without having to worry about the time-consuming set-up, management and/or tuning of Hadoop clusters or the compute capacity upon which they sit. e.g. create an job for log analysis for an S3 bucket with the latest Hadoop version. I need a cluster of 10 instances in use which are ready for heavy computation EMR best practices can be read at: 6.03.3 Kinesis : Collect and store in real-time hundreds of terabytes of data per hour from hundreds of thousands of sources, such as web site click-streams, operational logs, digital marketing data, and many more enabling you to easily write applications that process that information in real-time. e.g. for a gaming website: multiple streams could be logs from ad server, user preferences database, user actions and location based data combined together to identify better target ads Section 6.06.1 also discusses about 'Data Pipeline', that is another service which can be used to process very high amount of data using other services(EMR, S3 etc.) in a very disciplined manner. 6.04 I want to have a workaround for GPU intensive work (e.g. CUDA) : 6.04.1 GPU instance family : AWS provides lots of instance types for different requirements(described in detail in section 5.01.1). GPU instance family is designed for applications that require 3D graphics capabilities. This instance is backed by a high-performance NVIDIA GPU, making it ideally suited for video creation services, 3D visualizations, streaming graphics-intensive applications, and other server-side workloads requiring massive parallel processing power. With this instance type, you can build high-performance DirectX, OpenGL, CUDA, and OpenCL applications and services without making expensive up-front capital investments. e.g. for a rendering application I need 2 G2 instances in Sydney region 6.05 I am designing a graphics intensive game, or something but I don't want to put any constraints on my users 6.05.1 AppStream : AppStream is designed to stream resource intensive applications and games. All the graphical and heavy CPU intensive work is handled by the AppStream. In general, if you are designing for thin clients then you end up handling CPU intensive and storage tasks. Assumptions for your application are minimal and you support larger customer base. So, end result is heavy resources at your end. On the contrary, if you want to get rid of heavy resources on your end, then you design for thick clients. The application assumes lots of things about the end-user device. This is at the cost of smaller customer base. AppStream helps you overcome these issues. Regardless of end-user's hardware limitations, your application can keep functioning. It also helps to have easy updates and least processing at user end. e.g. A high resolution game which runs on a smart TV Page 9 of 12 Amazon EMR Amazon Kinesis
  10. 10. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 6.06 I am looking for a solution for movement of data between various data stores 6.06.1 Data Pipeline : AWS has various types of data storage options depending on application needs. Data in one form is not compatible with the other storage.(e.g. data in S3 vs data in RDS vs Data in EBS). AWS Data Pipeline enables moving data between different data stores, and scheduling chained transformations. You can define data-driven workflows which are dependent on one another. e.g. daily backup web server logs from EBS to S3 bucket and once a month schedule EMR job to generate reports. 6.07 I already have some machines, and I want to use them with my new invisible machines : Many times you would like to integrate your Corporate data-center or just few machines which have been doing some kind of work for you. AWS has lots of options depending on your needs to merge the two environments. Read about extension of infrastructure with vpc : 6.07.1 VM Import/Export : VM import/Export allows importing existing Virtual Machine(VM) images. Allowing your existing configurations to create EC2 instances. You can export EC2 instances to VMware ESX VMDK, VMware ESX OVA, Microsoft Hyper-V VHD or Citrix Xen VHD images. 6.07.2 Virtual Private Network (VPN) : Amazon VPC allows VPN tunnels creation. All traffic to and from instances in your VPC can be routed to your corporate data-center over an industry standard, encrypted IPsec hardware VPN connection. e.g. Using Cisco ASA device creating a tunnel to a VPC in Tokyo region. 6.07.3 Direct Connect : AWS Direct Connect is a service that links physical infrastructure to the AWS services. One or more fiber connections are provisioned in an AWS Direct Connect location facility. Data between AWS and your data-center can travel through private lines. It also empowers with the bandwidth and provides more consistent experience than the Internet based connections. 1Gbps and 10Gbps ports are available. Speeds of upto 10gbps can be established supporting the AWS Direct Connect. e.g. connecting your data-center located in Ireland to a Direct Connect location in Ireland 6.07.4 Storage gateway : It enables you to connect on-premises software appliance with the AWS storage to provide seamless and secure integration. The AWS Storage Gateway supports industry-standard storage protocols that work with your existing applications. It provides low-latency performance by maintaining frequently accessed data on-premises while securely storing all of your data encrypted in Amazon S3 or Glacier. e.g. keep the contents of S3 bucket 'bucket_freq' locally with the data center because those files will be accessed frequently. 6.07.5 AWS Import/Export : This service can help moving large amounts of data into and out of AWS using portable storage devices for transport. Bypassing the Internet, AWS transfers data onto and off of storage devices with Amazon's internal network. e.g. getting 100TB of files from S3 buckets to your office machines 6.07.6 Ready for Disaster recovery : With AWS, you can plan disaster recovery with few possible options. With corporate data-centers as the primary solution and AWS as the backup plan. Amazon Machine Image (AMI) : AMI is a special type of pre-configured operating system image used to create a virtual machine(an Amazon EC2 instance) within the Amazon EC2 environment. Running application on the EC2 instance can be captured to create an army of Amazon EC2 instances. Page 10 of 12 AWS Data Pipeline VPN Connection AWS Direct Connect AWS Storage Gateway AWS Import/Export AMI
  11. 11. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 Snapshots of database : You can backup database snapshots or EBS volume snapshots over the S3 buckets. This can be useful for Point in time recovery. These snapshots are saved incrementally, so you can make sure that, the most up-to-date version of the volume can come handy in case of failures. Route 53 : Route 53 was described in section 5.07.1. You can use DNS fail-over option with Route 53. You can monitor some endpoint as primary. As a secondary DNS entry you can point to AWS resources. By this, in case of failure at the data-center will not cause interruption of service. More about disaster-recovery can be read at : 6.08 You know, actually I don't even have my own personal computers : 6.08.1 WorkSpaces : Amazon WorkSpaces allows the customers to easily provision cloud-based desktops that allow end-users to access the documents, applications and resources they need with the device of their choice, including laptops, iPad, Kindle Fire, or Android tablets. You can use a shared high end desktop and have multiple users sharing all the resources at the same time. It also securely integrates with your corporate Active Directory so that your users can continue to use their existing enterprise credentials to seamlessly access company resources. e.g. Using WorkSpaces for training students for particular sessions. 6.09 I need separation of the resources for a Government project : 6.09.1 AWS GovCloud (US) Region – Government Cloud Computing AWS GovCloud (US) is an isolated AWS Region designed to allow US government agencies and customers to move sensitive workloads into the cloud by addressing their specific regulatory and compliance requirements. More information can be found at e.g. Running an RDS instance for a government agency X 6.10 I need to automate things with scripting : Typically, AWS Managements console is Graphical User Interface(GUI). Sometime end users prefer Textual User Interface(TUI) or sometimes TUI is preferable for automation. AWS has multiple options for accessing resources. 6.10.1 AWS Command Line Interface (CLI) : The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services. With just one tool to download and configure, you can control multiple AWS services from the command line and automate them through scripts. Typically used in bash, Powershell, etc scripts e.g. # aws ec2 terminate-instances --instance-ids i-4312234c 6.10.2 AWS SDK : The AWS SDKs for various languages help taking the complexity out of coding by APIs for the most of the AWS services. SDKs of various popular programming languages are supported. Also, some of the services have their own libraries like Simple WorkFlow Framework. e.g. AWS SDK for Java You can read more about development and testing at : 6.11 I need help understanding my usage, and mis-configurations : 6.11.1 Trusted advisor : AWS Trusted advisor offers a one-view snapshot of your system and helps you identify the common security misconfiguration, suggestions for improving system performance, and underutilized resources. The best part is it helps you save money for unused resources. e.g. RDS idle DB instances: 2 out of 5 instances appear to be idle. Annual savings of up to $1,045 are available by minimizing idle RDS DB instances. Page 11 of 12 Snapshot Route 53
  12. 12. Amazon Web Services- AWS Educational Guide Last updated: Mar'14 6.11.2 Support : AWS also provides 24x7x365 available support with experienced and technical support engineers. Depending on your requirements, you can choose the appropriate Support plan. More about AWS Support can be read at : e.g. I have a production issue and need help; a call would be the fastest way to convey the problem. 6.12 I am falling under two, or more scenarios : It is possible that something which was not described in the architectural section is actually really important for your application. e.g. ad serving architecture may have data pipeline playing a very important role though it was listed under special needs. 7. Future directions : Understanding your application needs is the crucial part in deciding which block(s) to be used. It is possible that, you might end up using just an EC2 instance, or you might end up using most of the things mentioned in the whole paper. True power of Cloud lies in scalability, high availability, fault tolerance and creating self-healing parts. Latest releases for AWS at : AWS sample architectures for common use-cases : AWS white papers are the best resources for best practices : 8. References : 1) Definitions for some of the services from Migrating AWS Resources to a New Region by Simon Elisha, James Bromberger & Peter Stanski 2) Architecting for the Cloud: Best Practices by Jinesh Varia 3) Storage Options in the AWS Cloud by Joseph Baron, Sanjay Kotecha 4) AWS Security Best Practices by Dob Todorov Yinal Ozkan 5) Amazon Web Services: Overview of Security Processes 6) 7) 8){service name} 9){service name}/faqs Page 12 of 12