Migrate and Govern Applications on Cloud Infrastructure

How to Migrate and
Govern Applications on
Cloud Infrastructure
Object Oriented Infrastructure Approach
Manuj Bawa
www.linkedin.com/in/bmanuj | manujbawa@gmail.com

Application Deployments in B.C. era (Before
Cloud) – An Architect’s perspective
u Hardware/Environment Planning sessions!
u Design sessions with Development/Test and COTS teams to get Application
requirements, disk usage, CPU requirements, communication requirements
u Translate all the requirements for Infrastructure Teams who would then add
firewalls, security and other standards that need to be implemented
u Estimate Costs/prepare detailed plans/get approvals/purchase/track…
u Track installation of software and finally the first environment is up
u Repeat the exercise for every procurement if data-centers were different
between Dev and Production
u Software Installations were repeated every time. Manual configuration errors
were high
u Firewall ports were a pain! Had to be repeated every time with the same set
of questions from Infrastructure Team. No Context!
u Entire process was a broken integration with no efficiencies.
u Large pool of diverse skilled resources maintained on the projects for
maintenance and peace of mind

Did this change in A.C. (After Cloud) era?
u Somewhat…
u Architects still have to conduct design sessions for requirements within the
teams
u Infrastructure Team still translates cloud to be just a data-center with
someone managing rack-and-stack for them.
u The re-usability provided by Cloud ecosystems has not been leveraged within
Infrastructure
u Deployment Designs have not matured
u Design contributions are limited to large companies like Netflix, CapitalOne
etc.
u B.C. era Infrastructure Teams are still figuring out what skills are required and
how to use the existing personnel skills
u Large gap between the B.C. and A.C. thought process

Production/
Monitoring
Application
Dev/Test
Installations
Procure and
Deploy
Design
Reality Check..
Using Cloud
Provided
Services
Reduced time to
procure, No
Rack and Stack
Using Cloud
Provided
Services/COTS
Design Cloud
Agnostic
Abstractions
Design for
Failure
Design with
Application
Context in mind
Repeatable and
Reusable
components
Automation with
CI/CD approach
Repeatable
Environments
Utilize the ecosystem
tools to maintain
elasticity and redundancy
Always run in Disaster
Recovery mode!
Maintain compliance with
standards (FISMA/ISO
etc.)
Don’t procure
and deploy for
eternity
Utilize Agile
principles for
deploy as you go
Overarching Governance (Compliance/Security/Processes/Standards/Controlled Environments/Audit/Business Continuity)

Before you start implementing:
u Understand Cloud
u Not just hardware for rent, but an expansive eco-system of tools.
u Compare these eco-systems to “Apple Store” where developers can utilize the tools to build
out their apps
u The line between System Administrators and Software Developers is diminished. You need both
skills to effectively design and govern
u Decide on a Cloud Provider
u Evaluate Services, tools and costs
u Evaluate your product/implementation path and if the Cloud Platform meets the requirements
(OS/Firewalls/Security)
u Compliance Requirements: Do you need to comply with any standards? FISMA/HIPAA/ISO?
u Work towards a budget: What are you spending right now? Take into account your Disaster
Recovery/Business Continuity facilities
u Map Services that will be used, establish that you have skills available within the teams or
hire, if required. Think of HW appliances or Software that can be replaced
u Prototype away! Cannot stress enough the importance of this step. This should provide
indicators on your performance requirements/COTS/your own product’s compatibility.
u For the purposes of this presentation, I am going to pick Amazon Web Services. Its eco-system
is far more mature than other providers and it provides a wide variety of services that can
cover 90% of implementations

Amazon Web Services – Thinking Cloud
implementation..
u Organized into multiple Regions, further broken down into Availability Zones
u Some Regions comply with standards you will need (FedRAMP/ISO) right down
to the service level
u Each Region provides Services. Not all Regions provide all services

Map your Services..
u Networking/Firewall: Route53, Routing Tables, AWS Shield, WAF, ELB = F5, Network
ACLs
u Storage: Global object storage (S3), NFS storage (Elastic File Store) = NAS/NFS, Disks
u Managed Application/Container Service: Elastic Bean Stalk, Light Sail = None
u Security: Identity Access Management, Key Management Services, Granular Policies =
LDAP/Local Accounts
u Monitoring: CloudWatch, CloudTrail, Load Balancer Logs = Network logs, System Logs,
DB Logs
u Application Integration: Lambda, Simple Notification Service, Simple Queue Services,
Workflows = SMTP, JMS
u CRM: Amazon Connect = Avaya, Cisco, Verint
u Antivirus service = Sophos etc.
= Traditional Data Center Equivalents

Create a Catalog of Services you will
use..
u EC2/EBS: Container Services for server instances
u S3, EFS and Glacier: For Storage of files that includes File Sharing, File
Archival. Also the store for Database archive logs
u Management Tools: For designing and maintaining the Cloud Data Center
u IAM, CloudTrail: For Security policies, PIV and SSO integration
u Certificate Manager: For Data Encryption: Replacement for Oracle’s
Encryption of data at rest module
u Networking/Firewall: VPC, Route53, Amazon Shield, WAF, ELB
u Integration Services: Lambda, SNS, SQS, Workflow
u Monitoring: CloudWatch, LogRythm, CloudTrail, SNS
u AWS Command Line Interface, CloudFormation: For automated management
of servers
u * Relational Database Service: Database for development and small
environments where standing up a separate instance is not cost effective
u *Amazon Connect/Lex and Kenesis: for IVR, speech recognition and call flow
logs

Map Team’s Skills
- Red Markers require
System/Network
Admin Skills
- Blue Dots require
some SA, but mostly
development skills
Do I need to cross-train my
System/Network administrators for
development skills?
What governance do I need to manage
the deployment?
Room for error because developers will
access these services now

Can I forklift my implementation the
cloud?
u Prohibitive Costs: Traditional Data Centers have pre-provisioned capacity for the
contract duration. Forklift, and you will end up spending excess dollars
u High Labor Costs per Environment: For example: Network Administrators looking
for F5 software appliance in the cloud, just because there was a F5 hardware
appliance in the private Data Center. The administration overhead will still be the
same in the cloud. Ask: Can your requirements be met with an equivalent service?
u DR/Redundancy: Not required at the same level as in Private Data Centers
u Operations Monitoring and Fault Recovery: Recovery is manual and often delayed
in Private Data Centers. Bringing in the same set of tools into Cloud does not
inculcate automation/process improvements/efficiencies that come with using the
cloud platform
u The probability that your servers will fail is higher: AWS’ annual failure rate for
disks is between 0.1% - 0.2% of hundreds of thousands of disks. You do the math. Is
that number significant percentage of total disks in your Private Data Center?
u Continuous monitoring and manual recoveries from fault failure in Operations
phase will require manual tasks through Cloud Consoles

AWS Recommended Best Practices
u Design for Failure: AWS assures that the ”resources” for deployment will
be available, but you should design your servers to tolerate and recover from
faults. Because of a sheer volume of servers in a region, the probability of faults
is extremely high if you compare numbers to a traditional data center.
u IT Assets are just programmable resources: All servers/managed services are
treated as disposable temporary resources that can be initiated in seconds, your
approach to change and configuration management changes as now the networks,
servers, storage, security rules are all programs that execute and replicate an
entire data center within seconds.
u Automation: Improving Systems’ stability and efficiency with Automation tools
provided by AWS
u Services not Servers: Architectures that do not leverage that breadth (e.g., if
they use only Amazon EC2) might not be making the most of cloud computing and
might be missing an opportunity to increase developer productivity and
operational efficiency.
Ref: https://d1.awsstatic.com/whitepapers/AWS_Cloud_Best_Practices.pdf

Object Oriented Infrastructure Design
u Template of Objects: Create a template of all the components required
including: Networks, Firewalls, Routing Tables, Security Groups, Servers, disks
etc. Each environment is just another instance of the object
u Build once, replicate using automation: Utilize AWS Management and
Automation tools to create templates for Data Centers, Networking,
Firewalls, disks and servers and then replicate for all environments
u Utilize what you need based on CPU and Memory utilization in your current
environment and prototypes. You could be using on 2% of the CPU, 95% of the
times
u Redundancy on-Demand: Build with minimum requirements with no
redundancy. The Orchestration Framework ensures that replacement servers
are available on-demand at the click of a button or a monitored event
u Scale on Demand: add servers/volumes based on requirements and not pre-
provision high capacity servers that go under utilized
u Infrastructure as code: Create infrastructure as code that can be version
controlled. Configuration files can be applied to all existing infrastructure
code under a build process that is similar to Continuous
Integration/Deployment practices

An Object Oriented View
u The base Red Hat Linux AMI is the
super class we use
u Base-Project-OS-AMI inherits from
base RHEL AMI, and adds CIS
standards, local accounts and other
core infrastructure configurations
required for the project.
u The Web-Server/App-Server/Database
or other COTS Servers inherit all basic
attributes/hooks from Base Project AMI
and add app specific configurations
and become their own AMIs
u The Orchestration Framework reads
the database configuration for
inventory and instantiates each AMI as
configured
u The core methods in base AMIs are
scripts/lifecycle hooks configure in
each EC2 Instance to change default
configuration

Step 1 – Defining Classes
u Define each Infrastructure as a “Class”. Define “is-a” and “has-a” relationships between
Classes. For example, a VPC has subnets, a subnet is has an EC2 Server, an EC2 Server can be
an Oracle Server or a Web-Server.
u Define Infrastructure attributes: Security Groups, NACLs, Internet gateways, route tables etc.
u Define Abstractions: Define the base AMIs that will only account for OS, Security patches,
local user accounts that will be required to run services. Any CaC/PIV requirements can be
baked into the AMIs. The next abstraction would be specific application servers
u Define polymorphous behaviors for various application AMIs that inherit their base from OS
level AMIs. Example: both application and database servers inherit their OS from a base AMI
with all infrastructure requirements met.
u Define initializations, automation scripts that are required to instantiate the servers.
Example: I can take a base JBOSS Machine Image, and dynamically create profiles and
configure other required libraries as part of boot script in EC2 lifecycle. Define your
initialization scripts which operate on a machine image and plug in all the parameters from a
configuration store in a database
u Extend Initializations: There will be scenarios where servers will require initialization-time
installations of COTS libraries or IP addresses/hostnames that need to be updated. This can
be done by tapping into EC2 Lifecycle events, with S3 storing the libraries
u Finalize Application Layer AMIs: Make sure that these object definitions have all the
attributes that can be changed later during actual instantiations e.g. IP
addresses/HostNames/Subnets/Security Groups

Step 2 – Orchestration
u Create an inventory of all infrastructure components, AMI IDs, CIDRs, Security
Groups, desired hostnames, internal domains, DNS entries, Tags for each
item. Define a durable store for this configuration, possibly a database. This
will store all instances, its type and other attributes in an environment
u Define Cloud Formation Templates using the objects you defined.
u AMI IDs, CIDR blocks, Number of Subnets, Security Groups, Auto-scaling
groups, Elastic Load Balancers, S3 buckets are all parameters that can be
dynamically provided to the Cloud Formation Template
u Tap into the Cloud Formation Template APIs and custom Python/Java
programming to instantiate an environment by reading the Configuration
Store
u Utilize EC2 lifecycle hooks, Custom Events, CloudWatch events to trigger
automation scripts that initialize your application layer components
u Tag your resources per the Configuration Store values
u Register each resource with CloudWatch’s custom events to create an
inventory of running services/instances in your environment. This dynamic list
will be handed over to the Redundancy Framework for monitoring and
scalability

Infrastructure
Objects
VPC
Networking (VPNs/
Internet Gateways/
Subnets/NACLs/SGs)
Security (CIS standards)
Linux ServersWindows Servers
Database
Application Servers
(Servlet Containers/J2EE)
Web-Servers
PIV/KMS
LDAP/DNS/NTP/SMTP Users
Golden Template:
- Golden Images
- DNS Entries
- Security Groups
- Inter LAN Communication Ports
- Routing Tables
- Linux PIV Integration
- Local service accounts
- Antivirus
- Swap Disks
- Storage: S3/EFS
- Archival Policies
-Key Management Service
Storage (S3/EFS)
CI/CD automation
scripts
Backup/Archive Log
Automation
Security Certs/logical
Host scripts
DB Config Scripts
ESBs/Reporting etc.
Golden Images (AMIs)
Automation Scripts
Base Infrastructure Template
Orchestration and Configuration Framework
DEVELOPMENT INTEGRATION TEST UAT PRODUCTION
Continuous Monitoring and Redundancy Framework
Configuration and Monitoring:
- Replicates environments based
on initial template
- Applies additional URL/Load
Balancing Configurations
- Reacts to CloudWatch events
and spins up replacement
servers based on Golden Images
and Configuration Scripts
- All environment configurations
automatically managed
Core Services
BASE
INFRASTRUCTURE
LAYER
AMI LAYER
CI/CD Layer
FRAMEWORKS
ENVIRONMENTS

Environment
Instances
Deployment
Flow
Orchestration and Configuration Framework
Database
Application Servers
Web-Servers
ESB/Reports
Network Template
Read the Golden
Image IDs and
the Configuration
Instantiate
Servers and
Services
Apply
Configuration,
Security, Audits
Tie all new services to
Cloudwatch for
Monitoring and
release the
environment
Setup
Database,
Applications
Environment Instance
App/Web Servers
ESBs
ReportingWeb-Servers Amazon S3
Amazon Connect
Amazon Glacier
OBJECT INSTANCE
Continues Monitoring the environment
Detects faults based on CloudWatch Events
Reads configuration
Integrates with other COTS suck as Splunk to emit
actionable events
Performs failovers between servers
Writes to SQS/SNS for custom events
Streams logs to other services if required (Kinesis)
Notifies Redundancy Framework
CloudFormation + Lambda + Python + AWS API SDK
CloudWatch+ CloudFormation + Lambda + Python +
AWS API SDK
AMI LAYER
Database Servers

Step 3 – Monitoring and Redundancy
u Continuously monitor the CloudWatch events and check with the
configuration store for desired number of instances
u Utilize Splunk or Kenisis to analyze Application/COTS logs which can detect
degradation of a service
u React events by automatically terminating the degraded instances or adding
instances dynamically in response to a scheduled event or outage. This part of
the framework reads a subset of the CloudFormation template that contains
the service being replaced
u The redundancy framework monitors/replaces/reduces/increases the number
of instances to keep an environment running
u Provides a history view that helps isolate the errors and fix issues in the
application
u Generates custom reports at OS, Application and Accounting levels to

Components of the Framework
u CloudFormation Template for a VPC with all object skeletons (ports, security groups, VLANS, ELBs etc.)
u Configuration Store: Database for saving variables:
u CIDR Blocks, Security Groups, Ports, ELB, VLANs and their AZs, ELBs
u Number of Servers, databases
u IP Addresses, HostNames
u AMI IDs
u Location of installers, scripts, EARs and other COTS products
u Server Types (m4, m5, t4 etc.)
u S3 bucket names
u SHA Keys
u Route53 entries
u CloudWatch Event Handlers, Queues where events will post data
u Lambda: Provides CloudFormation with all the required configuration parameters after reading
configuration from database and subsequently executing the lifecycle hooks or other initialization scripts
for each object
u CloudWatch: To monitor the health of Servers, traffic and overall VPC. Events will be monitored and
configured event handlers will be invoked

How does it help your Implementation?
u No pre-provisioned redundancy: The redundancy framework heals any deficiencies within the
environments by using the “Orchestration Framework” and stands up replacement servers in case of
failures
u Always latest hardware: The latest server instance types are just a variable in Orchestration
Framework”. Switch to latest hardware by just changing variables!
u No Active DR site: The AWS regions have multiple Availability Zones (AZ), which in turn have
multiple data-centers. In AWS Architecture, your services should be spread across multiple AZs. If an
AZ goes down, “Redundancy Framework” heals the environment by instantly bringing up services in
other AZ.
u 50%-75% reduction in direct labor costs: Labor costs incurred in building once. Infrastructure labor
costs are limited to monitoring and patching golden images once the templates and framework are
functional
u Scale Up and Scale Out, automatically with flexibility: Changing the CPU, Memory and even the
type of servers (Memory Intensive vs Compute Intensive) for various use cases is just another
automation script that is executed from the Orchestration Framework
u Always running in DR mode: The data-centers run in constant Disaster Recovery mode with
failovers and replacements happening instantly
u Version Controlled Infrastructure: Running version controlled Infrastructure helps with Federal
compliance and yearly audits. Eventually helps with Governance and Compliance. Once such
example is the Authority to Operate Certification and re-certification process that needs to be
assessed during the project lifecycle of Federal Projects
u Federal Projects: FISMA Compliant ATO: ATO includes all environments for a project (Dev, Test, GAT,
PROD) as well as DR. Identical environments and policies lead to easier route to certifications

How can this be implemented and
Tested?
u AWS Automation Toolset:
u CloudFormation: A service that helps model and set up Amazon Web Services
resources in a template
u Lambda: Automation Service that will be tied with CloudFormation templates to
create/destroy DataCenter infrastructure and services
u DynamoDB or mySQL: Configuration Store
u AWS API SDK for Python or Java 1.8
u Programming Language: Python 3.x or Java 1.8
u Testing: Other than the existing tools for Applications
u Chaos Monkey/Gorilla: For Resilience, Redundancy and DR Testing of IT
Infrastructure
u StresStimulus/Stress-ng: Cloud/Linux Stress Testing tool
u Key is treating Infrastructure Implementation as development activity and
track the implementation using the same Agile process (Sprints).

https://medium.com/netflix-techblog

How does this help your implementation
in 5 years
u Upgrade to the latest hardware available anytime without any downtime
u Hardware patches are orchestrated via the framework and AWS tools (Systems
Manager), which can be tracked through various environments (DEV > SYSTEST
> UAT > PROD)
u Infrastructure Management framework has built in controls required for
Operational Audits that can be available to Operations Team
u Always running in DR mode means continuous testing of failovers. No special
time-window for DR tests that take away important resources from regular
operations
u Federal Projects: ATO Re-Certification is required every 3 years. Maintaining
the baseline version of infrastructure, monitoring with automation ensures no
variables get introduced into the environments without validation in lower
environments

Stakeholders and their Roles
u Security Team:
u Review and audit reports from environments
u Review and approve AMI IDs to use
u Complete access to the environment to run and review Security and Compliance
assessments e.g. Nessus scans
u Project Team
u Provides the specifications for AWS Infrastructure
u Reviews and Approves AWS project templates for deployments
u Tests and accepts the Infrastructure framework and AWS Environments
u Maintains overall documentation as it pertains to ATO and System Security Plan
u SA/DBA/Networking Team
u Develops the framework and servers per specifications
u Creates new environments based on framework and project approved templates
u Maintains AWS environments and documentation trail of all changes performed to the
VPCs
u Administers servers
u IAM Policies and change logs for roles required by Application to access various AWS
services

Migrate and Govern Applications on Cloud Infrastructure

More Related Content

What's hot

Similar to Migrate and Govern Applications on Cloud Infrastructure

Recently uploaded

Migrate and Govern Applications on Cloud Infrastructure