How to Migrate and
Govern Applications on
Cloud Infrastructure
Object Oriented Infrastructure Approach
Manuj Bawa
www.linkedin.com/in/bmanuj | manujbawa@gmail.com
Application Deployments in B.C. era (Before
Cloud) – An Architect’s perspective
u Hardware/Environment Planning sessions!
u Design sessions with Development/Test and COTS teams to get Application
requirements, disk usage, CPU requirements, communication requirements
u Translate all the requirements for Infrastructure Teams who would then add
firewalls, security and other standards that need to be implemented
u Estimate Costs/prepare detailed plans/get approvals/purchase/track…
u Track installation of software and finally the first environment is up
u Repeat the exercise for every procurement if data-centers were different
between Dev and Production
u Software Installations were repeated every time. Manual configuration errors
were high
u Firewall ports were a pain! Had to be repeated every time with the same set
of questions from Infrastructure Team. No Context!
u Entire process was a broken integration with no efficiencies.
u Large pool of diverse skilled resources maintained on the projects for
maintenance and peace of mind
Did this change in A.C. (After Cloud) era?
u Somewhat…
u Architects still have to conduct design sessions for requirements within the
teams
u Infrastructure Team still translates cloud to be just a data-center with
someone managing rack-and-stack for them.
u The re-usability provided by Cloud ecosystems has not been leveraged within
Infrastructure
u Deployment Designs have not matured
u Design contributions are limited to large companies like Netflix, CapitalOne
etc.
u B.C. era Infrastructure Teams are still figuring out what skills are required and
how to use the existing personnel skills
u Large gap between the B.C. and A.C. thought process
Production/
Monitoring
Application
Dev/Test
Installations
Procure and
Deploy
Design
Reality Check..
Using Cloud
Provided
Services
Reduced time to
procure, No
Rack and Stack
Using Cloud
Provided
Services/COTS
Design Cloud
Agnostic
Abstractions
Design for
Failure
Design with
Application
Context in mind
Repeatable and
Reusable
components
Automation with
CI/CD approach
Repeatable
Environments
Utilize the ecosystem
tools to maintain
elasticity and redundancy
Always run in Disaster
Recovery mode!
Maintain compliance with
standards (FISMA/ISO
etc.)
Don’t procure
and deploy for
eternity
Utilize Agile
principles for
deploy as you go
Overarching Governance (Compliance/Security/Processes/Standards/Controlled Environments/Audit/Business Continuity)
Before you start implementing:
u Understand Cloud
u Not just hardware for rent, but an expansive eco-system of tools.
u Compare these eco-systems to “Apple Store” where developers can utilize the tools to build
out their apps
u The line between System Administrators and Software Developers is diminished. You need both
skills to effectively design and govern
u Decide on a Cloud Provider
u Evaluate Services, tools and costs
u Evaluate your product/implementation path and if the Cloud Platform meets the requirements
(OS/Firewalls/Security)
u Compliance Requirements: Do you need to comply with any standards? FISMA/HIPAA/ISO?
u Work towards a budget: What are you spending right now? Take into account your Disaster
Recovery/Business Continuity facilities
u Map Services that will be used, establish that you have skills available within the teams or
hire, if required. Think of HW appliances or Software that can be replaced
u Prototype away! Cannot stress enough the importance of this step. This should provide
indicators on your performance requirements/COTS/your own product’s compatibility.
u For the purposes of this presentation, I am going to pick Amazon Web Services. Its eco-system
is far more mature than other providers and it provides a wide variety of services that can
cover 90% of implementations
Amazon Web Services – Thinking Cloud
implementation..
u Organized into multiple Regions, further broken down into Availability Zones
u Some Regions comply with standards you will need (FedRAMP/ISO) right down
to the service level
u Each Region provides Services. Not all Regions provide all services
Map your Services..
u Networking/Firewall: Route53, Routing Tables, AWS Shield, WAF, ELB = F5, Network
ACLs
u Storage: Global object storage (S3), NFS storage (Elastic File Store) = NAS/NFS, Disks
u Managed Application/Container Service: Elastic Bean Stalk, Light Sail = None
u Security: Identity Access Management, Key Management Services, Granular Policies =
LDAP/Local Accounts
u Monitoring: CloudWatch, CloudTrail, Load Balancer Logs = Network logs, System Logs,
DB Logs
u Application Integration: Lambda, Simple Notification Service, Simple Queue Services,
Workflows = SMTP, JMS
u CRM: Amazon Connect = Avaya, Cisco, Verint
u Antivirus service = Sophos etc.
= Traditional Data Center Equivalents
Create a Catalog of Services you will
use..
u EC2/EBS: Container Services for server instances
u S3, EFS and Glacier: For Storage of files that includes File Sharing, File
Archival. Also the store for Database archive logs
u Management Tools: For designing and maintaining the Cloud Data Center
u IAM, CloudTrail: For Security policies, PIV and SSO integration
u Certificate Manager: For Data Encryption: Replacement for Oracle’s
Encryption of data at rest module
u Networking/Firewall: VPC, Route53, Amazon Shield, WAF, ELB
u Integration Services: Lambda, SNS, SQS, Workflow
u Monitoring: CloudWatch, LogRythm, CloudTrail, SNS
u AWS Command Line Interface, CloudFormation: For automated management
of servers
u * Relational Database Service: Database for development and small
environments where standing up a separate instance is not cost effective
u *Amazon Connect/Lex and Kenesis: for IVR, speech recognition and call flow
logs
Map Team’s Skills
- Red Markers require
System/Network
Admin Skills
- Blue Dots require
some SA, but mostly
development skills
Do I need to cross-train my
System/Network administrators for
development skills?
What governance do I need to manage
the deployment?
Room for error because developers will
access these services now
Can I forklift my implementation the
cloud?
u Prohibitive Costs: Traditional Data Centers have pre-provisioned capacity for the
contract duration. Forklift, and you will end up spending excess dollars
u High Labor Costs per Environment: For example: Network Administrators looking
for F5 software appliance in the cloud, just because there was a F5 hardware
appliance in the private Data Center. The administration overhead will still be the
same in the cloud. Ask: Can your requirements be met with an equivalent service?
u DR/Redundancy: Not required at the same level as in Private Data Centers
u Operations Monitoring and Fault Recovery: Recovery is manual and often delayed
in Private Data Centers. Bringing in the same set of tools into Cloud does not
inculcate automation/process improvements/efficiencies that come with using the
cloud platform
u The probability that your servers will fail is higher: AWS’ annual failure rate for
disks is between 0.1% - 0.2% of hundreds of thousands of disks. You do the math. Is
that number significant percentage of total disks in your Private Data Center?
u Continuous monitoring and manual recoveries from fault failure in Operations
phase will require manual tasks through Cloud Consoles
AWS Recommended Best Practices
u Design for Failure: AWS assures that the ”resources” for deployment will
be available, but you should design your servers to tolerate and recover from
faults. Because of a sheer volume of servers in a region, the probability of faults
is extremely high if you compare numbers to a traditional data center.
u IT Assets are just programmable resources: All servers/managed services are
treated as disposable temporary resources that can be initiated in seconds, your
approach to change and configuration management changes as now the networks,
servers, storage, security rules are all programs that execute and replicate an
entire data center within seconds.
u Automation: Improving Systems’ stability and efficiency with Automation tools
provided by AWS
u Services not Servers: Architectures that do not leverage that breadth (e.g., if
they use only Amazon EC2) might not be making the most of cloud computing and
might be missing an opportunity to increase developer productivity and
operational efficiency.
Ref: https://d1.awsstatic.com/whitepapers/AWS_Cloud_Best_Practices.pdf
Object Oriented Infrastructure Design
u Template of Objects: Create a template of all the components required
including: Networks, Firewalls, Routing Tables, Security Groups, Servers, disks
etc. Each environment is just another instance of the object
u Build once, replicate using automation: Utilize AWS Management and
Automation tools to create templates for Data Centers, Networking,
Firewalls, disks and servers and then replicate for all environments
u Utilize what you need based on CPU and Memory utilization in your current
environment and prototypes. You could be using on 2% of the CPU, 95% of the
times
u Redundancy on-Demand: Build with minimum requirements with no
redundancy. The Orchestration Framework ensures that replacement servers
are available on-demand at the click of a button or a monitored event
u Scale on Demand: add servers/volumes based on requirements and not pre-
provision high capacity servers that go under utilized
u Infrastructure as code: Create infrastructure as code that can be version
controlled. Configuration files can be applied to all existing infrastructure
code under a build process that is similar to Continuous
Integration/Deployment practices
An Object Oriented View
u The base Red Hat Linux AMI is the
super class we use
u Base-Project-OS-AMI inherits from
base RHEL AMI, and adds CIS
standards, local accounts and other
core infrastructure configurations
required for the project.
u The Web-Server/App-Server/Database
or other COTS Servers inherit all basic
attributes/hooks from Base Project AMI
and add app specific configurations
and become their own AMIs
u The Orchestration Framework reads
the database configuration for
inventory and instantiates each AMI as
configured
u The core methods in base AMIs are
scripts/lifecycle hooks configure in
each EC2 Instance to change default
configuration
Step 1 – Defining Classes
u Define each Infrastructure as a “Class”. Define “is-a” and “has-a” relationships between
Classes. For example, a VPC has subnets, a subnet is has an EC2 Server, an EC2 Server can be
an Oracle Server or a Web-Server.
u Define Infrastructure attributes: Security Groups, NACLs, Internet gateways, route tables etc.
u Define Abstractions: Define the base AMIs that will only account for OS, Security patches,
local user accounts that will be required to run services. Any CaC/PIV requirements can be
baked into the AMIs. The next abstraction would be specific application servers
u Define polymorphous behaviors for various application AMIs that inherit their base from OS
level AMIs. Example: both application and database servers inherit their OS from a base AMI
with all infrastructure requirements met.
u Define initializations, automation scripts that are required to instantiate the servers.
Example: I can take a base JBOSS Machine Image, and dynamically create profiles and
configure other required libraries as part of boot script in EC2 lifecycle. Define your
initialization scripts which operate on a machine image and plug in all the parameters from a
configuration store in a database
u Extend Initializations: There will be scenarios where servers will require initialization-time
installations of COTS libraries or IP addresses/hostnames that need to be updated. This can
be done by tapping into EC2 Lifecycle events, with S3 storing the libraries
u Finalize Application Layer AMIs: Make sure that these object definitions have all the
attributes that can be changed later during actual instantiations e.g. IP
addresses/HostNames/Subnets/Security Groups
Step 2 – Orchestration
u Create an inventory of all infrastructure components, AMI IDs, CIDRs, Security
Groups, desired hostnames, internal domains, DNS entries, Tags for each
item. Define a durable store for this configuration, possibly a database. This
will store all instances, its type and other attributes in an environment
u Define Cloud Formation Templates using the objects you defined.
u AMI IDs, CIDR blocks, Number of Subnets, Security Groups, Auto-scaling
groups, Elastic Load Balancers, S3 buckets are all parameters that can be
dynamically provided to the Cloud Formation Template
u Tap into the Cloud Formation Template APIs and custom Python/Java
programming to instantiate an environment by reading the Configuration
Store
u Utilize EC2 lifecycle hooks, Custom Events, CloudWatch events to trigger
automation scripts that initialize your application layer components
u Tag your resources per the Configuration Store values
u Register each resource with CloudWatch’s custom events to create an
inventory of running services/instances in your environment. This dynamic list
will be handed over to the Redundancy Framework for monitoring and
scalability
Infrastructure
Objects
VPC
Networking	(VPNs/
Internet	Gateways/
Subnets/NACLs/SGs)
Security	(CIS	standards)
Linux	ServersWindows	Servers
Database
Application	Servers	
(Servlet	Containers/J2EE)
Web-Servers
PIV/KMS
LDAP/DNS/NTP/SMTP Users
Golden	Template:
-	Golden	Images
-	DNS	Entries
-	Security	Groups
-	Inter	LAN	Communication	Ports
-	Routing	Tables
-	Linux	PIV	Integration
-	Local	service	accounts
-	Antivirus
-	Swap	Disks
-	Storage:	S3/EFS
-	Archival	Policies
-Key	Management	Service
Storage	(S3/EFS)
CI/CD	automation	
scripts
Backup/Archive	Log	
Automation
Security	Certs/logical	
Host	scripts
DB	Config	Scripts
ESBs/Reporting	etc.
Golden	Images	(AMIs)
Automation	Scripts
Base	Infrastructure	Template
Orchestration	and	Configuration	Framework
DEVELOPMENT INTEGRATION	TEST UAT PRODUCTION
Continuous	Monitoring	and	Redundancy	Framework
Configuration	and	Monitoring:
-	Replicates	environments	based	
on	initial	template
-	Applies	additional	URL/Load	
Balancing	Configurations
-	Reacts	to	CloudWatch	events	
and	spins	up	replacement	
servers	based	on	Golden	Images	
and	Configuration	Scripts
-	All	environment	configurations	
automatically	managed
Core	Services
BASE	
INFRASTRUCTURE	
LAYER
AMI	LAYER
CI/CD	Layer
FRAMEWORKS
ENVIRONMENTS
Environment
Instances
Deployment
Flow
Orchestration	and	Configuration	Framework
Database
Application	Servers
Web-Servers
ESB/Reports
Network	Template
Read	the	Golden	
Image	IDs	and	
the	Configuration
Instantiate	
Servers	and	
Services
Apply	
Configuration,	
Security,	Audits
Tie	all	new	services	to	
Cloudwatch	for	
Monitoring	and	
release	the	
environment
Setup	
Database,	
Applications
Environment Instance
App/Web	Servers
ESBs
ReportingWeb-Servers Amazon S3
Amazon Connect
Amazon Glacier
OBJECT	INSTANCE
Continues	Monitoring	the	environment
Detects	faults	based	on	CloudWatch	Events
Reads	configuration
Integrates	with	other	COTS	suck	as	Splunk	to	emit	
actionable	events
Performs	failovers	between	servers
Writes	to	SQS/SNS	for	custom	events	
Streams	logs	to	other	services	if	required	(Kinesis)
Notifies	Redundancy	Framework
CloudFormation	+	Lambda	+	Python	+	AWS	API	SDK
CloudWatch+	CloudFormation	+	Lambda	+	Python	+	
AWS	API	SDK
AMI	LAYER
Database	Servers
Step 3 – Monitoring and Redundancy
u Continuously monitor the CloudWatch events and check with the
configuration store for desired number of instances
u Utilize Splunk or Kenisis to analyze Application/COTS logs which can detect
degradation of a service
u React events by automatically terminating the degraded instances or adding
instances dynamically in response to a scheduled event or outage. This part of
the framework reads a subset of the CloudFormation template that contains
the service being replaced
u The redundancy framework monitors/replaces/reduces/increases the number
of instances to keep an environment running
u Provides a history view that helps isolate the errors and fix issues in the
application
u Generates custom reports at OS, Application and Accounting levels to
Components of the Framework
u CloudFormation Template for a VPC with all object skeletons (ports, security groups, VLANS, ELBs etc.)
u Configuration Store: Database for saving variables:
u CIDR Blocks, Security Groups, Ports, ELB, VLANs and their AZs, ELBs
u Number of Servers, databases
u IP Addresses, HostNames
u AMI IDs
u Location of installers, scripts, EARs and other COTS products
u Server Types (m4, m5, t4 etc.)
u S3 bucket names
u SHA Keys
u Route53 entries
u CloudWatch Event Handlers, Queues where events will post data
u Lambda: Provides CloudFormation with all the required configuration parameters after reading
configuration from database and subsequently executing the lifecycle hooks or other initialization scripts
for each object
u CloudWatch: To monitor the health of Servers, traffic and overall VPC. Events will be monitored and
configured event handlers will be invoked
How does it help your Implementation?
u No pre-provisioned redundancy: The redundancy framework heals any deficiencies within the
environments by using the “Orchestration Framework” and stands up replacement servers in case of
failures
u Always latest hardware: The latest server instance types are just a variable in Orchestration
Framework”. Switch to latest hardware by just changing variables!
u No Active DR site: The AWS regions have multiple Availability Zones (AZ), which in turn have
multiple data-centers. In AWS Architecture, your services should be spread across multiple AZs. If an
AZ goes down, “Redundancy Framework” heals the environment by instantly bringing up services in
other AZ.
u 50%-75% reduction in direct labor costs: Labor costs incurred in building once. Infrastructure labor
costs are limited to monitoring and patching golden images once the templates and framework are
functional
u Scale Up and Scale Out, automatically with flexibility: Changing the CPU, Memory and even the
type of servers (Memory Intensive vs Compute Intensive) for various use cases is just another
automation script that is executed from the Orchestration Framework
u Always running in DR mode: The data-centers run in constant Disaster Recovery mode with
failovers and replacements happening instantly
u Version Controlled Infrastructure: Running version controlled Infrastructure helps with Federal
compliance and yearly audits. Eventually helps with Governance and Compliance. Once such
example is the Authority to Operate Certification and re-certification process that needs to be
assessed during the project lifecycle of Federal Projects
u Federal Projects: FISMA Compliant ATO: ATO includes all environments for a project (Dev, Test, GAT,
PROD) as well as DR. Identical environments and policies lead to easier route to certifications
How can this be implemented and
Tested?
u AWS Automation Toolset:
u CloudFormation: A service that helps model and set up Amazon Web Services
resources in a template
u Lambda: Automation Service that will be tied with CloudFormation templates to
create/destroy DataCenter infrastructure and services
u DynamoDB or mySQL: Configuration Store
u AWS API SDK for Python or Java 1.8
u Programming Language: Python 3.x or Java 1.8
u Testing: Other than the existing tools for Applications
u Chaos Monkey/Gorilla: For Resilience, Redundancy and DR Testing of IT
Infrastructure
u StresStimulus/Stress-ng: Cloud/Linux Stress Testing tool
u Key is treating Infrastructure Implementation as development activity and
track the implementation using the same Agile process (Sprints).
https://medium.com/netflix-techblog
How does this help your implementation
in 5 years
u Upgrade to the latest hardware available anytime without any downtime
u Hardware patches are orchestrated via the framework and AWS tools (Systems
Manager), which can be tracked through various environments (DEV > SYSTEST
> UAT > PROD)
u Infrastructure Management framework has built in controls required for
Operational Audits that can be available to Operations Team
u Always running in DR mode means continuous testing of failovers. No special
time-window for DR tests that take away important resources from regular
operations
u Federal Projects: ATO Re-Certification is required every 3 years. Maintaining
the baseline version of infrastructure, monitoring with automation ensures no
variables get introduced into the environments without validation in lower
environments
Stakeholders and their Roles
u Security Team:
u Review and audit reports from environments
u Review and approve AMI IDs to use
u Complete access to the environment to run and review Security and Compliance
assessments e.g. Nessus scans
u Project Team
u Provides the specifications for AWS Infrastructure
u Reviews and Approves AWS project templates for deployments
u Tests and accepts the Infrastructure framework and AWS Environments
u Maintains overall documentation as it pertains to ATO and System Security Plan
u SA/DBA/Networking Team
u Develops the framework and servers per specifications
u Creates new environments based on framework and project approved templates
u Maintains AWS environments and documentation trail of all changes performed to the
VPCs
u Administers servers
u IAM Policies and change logs for roles required by Application to access various AWS
services

Migrate and Govern Applications on Cloud Infrastructure

  • 1.
    How to Migrateand Govern Applications on Cloud Infrastructure Object Oriented Infrastructure Approach Manuj Bawa www.linkedin.com/in/bmanuj | manujbawa@gmail.com
  • 2.
    Application Deployments inB.C. era (Before Cloud) – An Architect’s perspective u Hardware/Environment Planning sessions! u Design sessions with Development/Test and COTS teams to get Application requirements, disk usage, CPU requirements, communication requirements u Translate all the requirements for Infrastructure Teams who would then add firewalls, security and other standards that need to be implemented u Estimate Costs/prepare detailed plans/get approvals/purchase/track… u Track installation of software and finally the first environment is up u Repeat the exercise for every procurement if data-centers were different between Dev and Production u Software Installations were repeated every time. Manual configuration errors were high u Firewall ports were a pain! Had to be repeated every time with the same set of questions from Infrastructure Team. No Context! u Entire process was a broken integration with no efficiencies. u Large pool of diverse skilled resources maintained on the projects for maintenance and peace of mind
  • 3.
    Did this changein A.C. (After Cloud) era? u Somewhat… u Architects still have to conduct design sessions for requirements within the teams u Infrastructure Team still translates cloud to be just a data-center with someone managing rack-and-stack for them. u The re-usability provided by Cloud ecosystems has not been leveraged within Infrastructure u Deployment Designs have not matured u Design contributions are limited to large companies like Netflix, CapitalOne etc. u B.C. era Infrastructure Teams are still figuring out what skills are required and how to use the existing personnel skills u Large gap between the B.C. and A.C. thought process
  • 4.
    Production/ Monitoring Application Dev/Test Installations Procure and Deploy Design Reality Check.. UsingCloud Provided Services Reduced time to procure, No Rack and Stack Using Cloud Provided Services/COTS Design Cloud Agnostic Abstractions Design for Failure Design with Application Context in mind Repeatable and Reusable components Automation with CI/CD approach Repeatable Environments Utilize the ecosystem tools to maintain elasticity and redundancy Always run in Disaster Recovery mode! Maintain compliance with standards (FISMA/ISO etc.) Don’t procure and deploy for eternity Utilize Agile principles for deploy as you go Overarching Governance (Compliance/Security/Processes/Standards/Controlled Environments/Audit/Business Continuity)
  • 5.
    Before you startimplementing: u Understand Cloud u Not just hardware for rent, but an expansive eco-system of tools. u Compare these eco-systems to “Apple Store” where developers can utilize the tools to build out their apps u The line between System Administrators and Software Developers is diminished. You need both skills to effectively design and govern u Decide on a Cloud Provider u Evaluate Services, tools and costs u Evaluate your product/implementation path and if the Cloud Platform meets the requirements (OS/Firewalls/Security) u Compliance Requirements: Do you need to comply with any standards? FISMA/HIPAA/ISO? u Work towards a budget: What are you spending right now? Take into account your Disaster Recovery/Business Continuity facilities u Map Services that will be used, establish that you have skills available within the teams or hire, if required. Think of HW appliances or Software that can be replaced u Prototype away! Cannot stress enough the importance of this step. This should provide indicators on your performance requirements/COTS/your own product’s compatibility. u For the purposes of this presentation, I am going to pick Amazon Web Services. Its eco-system is far more mature than other providers and it provides a wide variety of services that can cover 90% of implementations
  • 6.
    Amazon Web Services– Thinking Cloud implementation.. u Organized into multiple Regions, further broken down into Availability Zones u Some Regions comply with standards you will need (FedRAMP/ISO) right down to the service level u Each Region provides Services. Not all Regions provide all services
  • 7.
    Map your Services.. uNetworking/Firewall: Route53, Routing Tables, AWS Shield, WAF, ELB = F5, Network ACLs u Storage: Global object storage (S3), NFS storage (Elastic File Store) = NAS/NFS, Disks u Managed Application/Container Service: Elastic Bean Stalk, Light Sail = None u Security: Identity Access Management, Key Management Services, Granular Policies = LDAP/Local Accounts u Monitoring: CloudWatch, CloudTrail, Load Balancer Logs = Network logs, System Logs, DB Logs u Application Integration: Lambda, Simple Notification Service, Simple Queue Services, Workflows = SMTP, JMS u CRM: Amazon Connect = Avaya, Cisco, Verint u Antivirus service = Sophos etc. = Traditional Data Center Equivalents
  • 8.
    Create a Catalogof Services you will use.. u EC2/EBS: Container Services for server instances u S3, EFS and Glacier: For Storage of files that includes File Sharing, File Archival. Also the store for Database archive logs u Management Tools: For designing and maintaining the Cloud Data Center u IAM, CloudTrail: For Security policies, PIV and SSO integration u Certificate Manager: For Data Encryption: Replacement for Oracle’s Encryption of data at rest module u Networking/Firewall: VPC, Route53, Amazon Shield, WAF, ELB u Integration Services: Lambda, SNS, SQS, Workflow u Monitoring: CloudWatch, LogRythm, CloudTrail, SNS u AWS Command Line Interface, CloudFormation: For automated management of servers u * Relational Database Service: Database for development and small environments where standing up a separate instance is not cost effective u *Amazon Connect/Lex and Kenesis: for IVR, speech recognition and call flow logs
  • 9.
    Map Team’s Skills -Red Markers require System/Network Admin Skills - Blue Dots require some SA, but mostly development skills Do I need to cross-train my System/Network administrators for development skills? What governance do I need to manage the deployment? Room for error because developers will access these services now
  • 10.
    Can I forkliftmy implementation the cloud? u Prohibitive Costs: Traditional Data Centers have pre-provisioned capacity for the contract duration. Forklift, and you will end up spending excess dollars u High Labor Costs per Environment: For example: Network Administrators looking for F5 software appliance in the cloud, just because there was a F5 hardware appliance in the private Data Center. The administration overhead will still be the same in the cloud. Ask: Can your requirements be met with an equivalent service? u DR/Redundancy: Not required at the same level as in Private Data Centers u Operations Monitoring and Fault Recovery: Recovery is manual and often delayed in Private Data Centers. Bringing in the same set of tools into Cloud does not inculcate automation/process improvements/efficiencies that come with using the cloud platform u The probability that your servers will fail is higher: AWS’ annual failure rate for disks is between 0.1% - 0.2% of hundreds of thousands of disks. You do the math. Is that number significant percentage of total disks in your Private Data Center? u Continuous monitoring and manual recoveries from fault failure in Operations phase will require manual tasks through Cloud Consoles
  • 11.
    AWS Recommended BestPractices u Design for Failure: AWS assures that the ”resources” for deployment will be available, but you should design your servers to tolerate and recover from faults. Because of a sheer volume of servers in a region, the probability of faults is extremely high if you compare numbers to a traditional data center. u IT Assets are just programmable resources: All servers/managed services are treated as disposable temporary resources that can be initiated in seconds, your approach to change and configuration management changes as now the networks, servers, storage, security rules are all programs that execute and replicate an entire data center within seconds. u Automation: Improving Systems’ stability and efficiency with Automation tools provided by AWS u Services not Servers: Architectures that do not leverage that breadth (e.g., if they use only Amazon EC2) might not be making the most of cloud computing and might be missing an opportunity to increase developer productivity and operational efficiency. Ref: https://d1.awsstatic.com/whitepapers/AWS_Cloud_Best_Practices.pdf
  • 12.
    Object Oriented InfrastructureDesign u Template of Objects: Create a template of all the components required including: Networks, Firewalls, Routing Tables, Security Groups, Servers, disks etc. Each environment is just another instance of the object u Build once, replicate using automation: Utilize AWS Management and Automation tools to create templates for Data Centers, Networking, Firewalls, disks and servers and then replicate for all environments u Utilize what you need based on CPU and Memory utilization in your current environment and prototypes. You could be using on 2% of the CPU, 95% of the times u Redundancy on-Demand: Build with minimum requirements with no redundancy. The Orchestration Framework ensures that replacement servers are available on-demand at the click of a button or a monitored event u Scale on Demand: add servers/volumes based on requirements and not pre- provision high capacity servers that go under utilized u Infrastructure as code: Create infrastructure as code that can be version controlled. Configuration files can be applied to all existing infrastructure code under a build process that is similar to Continuous Integration/Deployment practices
  • 13.
    An Object OrientedView u The base Red Hat Linux AMI is the super class we use u Base-Project-OS-AMI inherits from base RHEL AMI, and adds CIS standards, local accounts and other core infrastructure configurations required for the project. u The Web-Server/App-Server/Database or other COTS Servers inherit all basic attributes/hooks from Base Project AMI and add app specific configurations and become their own AMIs u The Orchestration Framework reads the database configuration for inventory and instantiates each AMI as configured u The core methods in base AMIs are scripts/lifecycle hooks configure in each EC2 Instance to change default configuration
  • 14.
    Step 1 –Defining Classes u Define each Infrastructure as a “Class”. Define “is-a” and “has-a” relationships between Classes. For example, a VPC has subnets, a subnet is has an EC2 Server, an EC2 Server can be an Oracle Server or a Web-Server. u Define Infrastructure attributes: Security Groups, NACLs, Internet gateways, route tables etc. u Define Abstractions: Define the base AMIs that will only account for OS, Security patches, local user accounts that will be required to run services. Any CaC/PIV requirements can be baked into the AMIs. The next abstraction would be specific application servers u Define polymorphous behaviors for various application AMIs that inherit their base from OS level AMIs. Example: both application and database servers inherit their OS from a base AMI with all infrastructure requirements met. u Define initializations, automation scripts that are required to instantiate the servers. Example: I can take a base JBOSS Machine Image, and dynamically create profiles and configure other required libraries as part of boot script in EC2 lifecycle. Define your initialization scripts which operate on a machine image and plug in all the parameters from a configuration store in a database u Extend Initializations: There will be scenarios where servers will require initialization-time installations of COTS libraries or IP addresses/hostnames that need to be updated. This can be done by tapping into EC2 Lifecycle events, with S3 storing the libraries u Finalize Application Layer AMIs: Make sure that these object definitions have all the attributes that can be changed later during actual instantiations e.g. IP addresses/HostNames/Subnets/Security Groups
  • 15.
    Step 2 –Orchestration u Create an inventory of all infrastructure components, AMI IDs, CIDRs, Security Groups, desired hostnames, internal domains, DNS entries, Tags for each item. Define a durable store for this configuration, possibly a database. This will store all instances, its type and other attributes in an environment u Define Cloud Formation Templates using the objects you defined. u AMI IDs, CIDR blocks, Number of Subnets, Security Groups, Auto-scaling groups, Elastic Load Balancers, S3 buckets are all parameters that can be dynamically provided to the Cloud Formation Template u Tap into the Cloud Formation Template APIs and custom Python/Java programming to instantiate an environment by reading the Configuration Store u Utilize EC2 lifecycle hooks, Custom Events, CloudWatch events to trigger automation scripts that initialize your application layer components u Tag your resources per the Configuration Store values u Register each resource with CloudWatch’s custom events to create an inventory of running services/instances in your environment. This dynamic list will be handed over to the Redundancy Framework for monitoring and scalability
  • 16.
    Infrastructure Objects VPC Networking (VPNs/ Internet Gateways/ Subnets/NACLs/SGs) Security (CIS standards) Linux ServersWindows Servers Database Application Servers (Servlet Containers/J2EE) Web-Servers PIV/KMS LDAP/DNS/NTP/SMTP Users Golden Template: - Golden Images - DNS Entries - Security Groups - Inter LAN Communication Ports - Routing Tables - Linux PIV Integration - Local service accounts - Antivirus - Swap Disks - Storage: S3/EFS - Archival Policies -Key Management Service Storage (S3/EFS) CI/CD automation scripts Backup/Archive Log Automation Security Certs/logical Host scripts DB Config Scripts ESBs/Reporting etc. Golden Images (AMIs) Automation Scripts Base Infrastructure Template Orchestration and Configuration Framework DEVELOPMENT INTEGRATION TESTUAT PRODUCTION Continuous Monitoring and Redundancy Framework Configuration and Monitoring: - Replicates environments based on initial template - Applies additional URL/Load Balancing Configurations - Reacts to CloudWatch events and spins up replacement servers based on Golden Images and Configuration Scripts - All environment configurations automatically managed Core Services BASE INFRASTRUCTURE LAYER AMI LAYER CI/CD Layer FRAMEWORKS ENVIRONMENTS
  • 17.
    Environment Instances Deployment Flow Orchestration and Configuration Framework Database Application Servers Web-Servers ESB/Reports Network Template Read the Golden Image IDs and the Configuration Instantiate Servers and Services Apply Configuration, Security, Audits Tie all new services to Cloudwatch for Monitoring and release the environment Setup Database, Applications Environment Instance App/Web Servers ESBs ReportingWeb-Servers AmazonS3 Amazon Connect Amazon Glacier OBJECT INSTANCE Continues Monitoring the environment Detects faults based on CloudWatch Events Reads configuration Integrates with other COTS suck as Splunk to emit actionable events Performs failovers between servers Writes to SQS/SNS for custom events Streams logs to other services if required (Kinesis) Notifies Redundancy Framework CloudFormation + Lambda + Python + AWS API SDK CloudWatch+ CloudFormation + Lambda + Python + AWS API SDK AMI LAYER Database Servers
  • 18.
    Step 3 –Monitoring and Redundancy u Continuously monitor the CloudWatch events and check with the configuration store for desired number of instances u Utilize Splunk or Kenisis to analyze Application/COTS logs which can detect degradation of a service u React events by automatically terminating the degraded instances or adding instances dynamically in response to a scheduled event or outage. This part of the framework reads a subset of the CloudFormation template that contains the service being replaced u The redundancy framework monitors/replaces/reduces/increases the number of instances to keep an environment running u Provides a history view that helps isolate the errors and fix issues in the application u Generates custom reports at OS, Application and Accounting levels to
  • 19.
    Components of theFramework u CloudFormation Template for a VPC with all object skeletons (ports, security groups, VLANS, ELBs etc.) u Configuration Store: Database for saving variables: u CIDR Blocks, Security Groups, Ports, ELB, VLANs and their AZs, ELBs u Number of Servers, databases u IP Addresses, HostNames u AMI IDs u Location of installers, scripts, EARs and other COTS products u Server Types (m4, m5, t4 etc.) u S3 bucket names u SHA Keys u Route53 entries u CloudWatch Event Handlers, Queues where events will post data u Lambda: Provides CloudFormation with all the required configuration parameters after reading configuration from database and subsequently executing the lifecycle hooks or other initialization scripts for each object u CloudWatch: To monitor the health of Servers, traffic and overall VPC. Events will be monitored and configured event handlers will be invoked
  • 20.
    How does ithelp your Implementation? u No pre-provisioned redundancy: The redundancy framework heals any deficiencies within the environments by using the “Orchestration Framework” and stands up replacement servers in case of failures u Always latest hardware: The latest server instance types are just a variable in Orchestration Framework”. Switch to latest hardware by just changing variables! u No Active DR site: The AWS regions have multiple Availability Zones (AZ), which in turn have multiple data-centers. In AWS Architecture, your services should be spread across multiple AZs. If an AZ goes down, “Redundancy Framework” heals the environment by instantly bringing up services in other AZ. u 50%-75% reduction in direct labor costs: Labor costs incurred in building once. Infrastructure labor costs are limited to monitoring and patching golden images once the templates and framework are functional u Scale Up and Scale Out, automatically with flexibility: Changing the CPU, Memory and even the type of servers (Memory Intensive vs Compute Intensive) for various use cases is just another automation script that is executed from the Orchestration Framework u Always running in DR mode: The data-centers run in constant Disaster Recovery mode with failovers and replacements happening instantly u Version Controlled Infrastructure: Running version controlled Infrastructure helps with Federal compliance and yearly audits. Eventually helps with Governance and Compliance. Once such example is the Authority to Operate Certification and re-certification process that needs to be assessed during the project lifecycle of Federal Projects u Federal Projects: FISMA Compliant ATO: ATO includes all environments for a project (Dev, Test, GAT, PROD) as well as DR. Identical environments and policies lead to easier route to certifications
  • 21.
    How can thisbe implemented and Tested? u AWS Automation Toolset: u CloudFormation: A service that helps model and set up Amazon Web Services resources in a template u Lambda: Automation Service that will be tied with CloudFormation templates to create/destroy DataCenter infrastructure and services u DynamoDB or mySQL: Configuration Store u AWS API SDK for Python or Java 1.8 u Programming Language: Python 3.x or Java 1.8 u Testing: Other than the existing tools for Applications u Chaos Monkey/Gorilla: For Resilience, Redundancy and DR Testing of IT Infrastructure u StresStimulus/Stress-ng: Cloud/Linux Stress Testing tool u Key is treating Infrastructure Implementation as development activity and track the implementation using the same Agile process (Sprints).
  • 22.
  • 23.
    How does thishelp your implementation in 5 years u Upgrade to the latest hardware available anytime without any downtime u Hardware patches are orchestrated via the framework and AWS tools (Systems Manager), which can be tracked through various environments (DEV > SYSTEST > UAT > PROD) u Infrastructure Management framework has built in controls required for Operational Audits that can be available to Operations Team u Always running in DR mode means continuous testing of failovers. No special time-window for DR tests that take away important resources from regular operations u Federal Projects: ATO Re-Certification is required every 3 years. Maintaining the baseline version of infrastructure, monitoring with automation ensures no variables get introduced into the environments without validation in lower environments
  • 24.
    Stakeholders and theirRoles u Security Team: u Review and audit reports from environments u Review and approve AMI IDs to use u Complete access to the environment to run and review Security and Compliance assessments e.g. Nessus scans u Project Team u Provides the specifications for AWS Infrastructure u Reviews and Approves AWS project templates for deployments u Tests and accepts the Infrastructure framework and AWS Environments u Maintains overall documentation as it pertains to ATO and System Security Plan u SA/DBA/Networking Team u Develops the framework and servers per specifications u Creates new environments based on framework and project approved templates u Maintains AWS environments and documentation trail of all changes performed to the VPCs u Administers servers u IAM Policies and change logs for roles required by Application to access various AWS services