Damien Gallagher Dissertation

Architecting Cloud Applications for High Availability
Damien Gallagher
M.Sc. in
Computing in
DevOps 2022

Computing Department, Atlantic Technological University, Port Road, Letterkenny, Co. Donegal,
Ireland.
Architecting Cloud Applications for High Availability
Author: Damien Gallagher
Supervised by: Ruth G. Lennon
A thesis submitted in partial fulfilment of the requirements for the
Master of Science in Computing in DevOps
Submitted to Quality and Qualifications Ireland (QQI)
Dearbhú Cáilíochta agus Cáilíochtaí Éireann January 2022

1
Declaration
I hereby certify that the material, which l now submit for assessment on the programmes of
study leading to the award of Master of Science in Computing in DevOps, is entirely my own
work and has not been taken from the work of others except to the extent that such work has
been cited and acknowledged within the text of my own work. No portion of the work
contained in this thesis has been submitted in support of an application for another degree
or qualification to this or any other institution. I understand that it is my responsibility to
ensure that I have adhered to ATU’s rules and regulations.
I hereby certify that the material on which I have relied on for the purpose of my assessment
is not deemed as personal data under the GDPR Regulations. Personal data is any data from
living people that can be identified. Any personal data used for the purpose of my assessment
has been pseudonymised and the data set and identifiers are not held by ATU. Alternatively,
personal data has been anonymised in line with the Data Protection Commissioners
Guidelines on Anonymisation.
I consent that my work will be held for the purposes of education assistance to future students
and will be shared on the ATU Computing website (www.lyitcomputing.com) and Research
THEA website (https://research.thea.ie/). I understand that documents once uploaded onto
the website can be viewed throughout the world and not just in Ireland. Consent can be
withdrawn for the publishing of material online by emailing Thomas Dowling; Head of
Department at Thomas.Dowling@atu.ie to remove items from the ATU Computing website
and by emailing Denise McCaul; Systems Librarian at Denise.McCaul@atu.ie to remove items
from the Research THEA website. Material will continue to appear in printed formats once
published and as websites are public medium, ATU cannot guarantee that the material has
not been saved or downloaded.
Signature of Candidate Date

1
Acknowledgements
I would like to thank my supervisor, Ruth Lennon for all the guidance and support she has
provided throughout this process. I would also like to mention the colleagues who I worked
with throughout the master’s program. We did not know each other prior to the program but
we have become close over the past few years. Finally, I would like to dedicate this
dissertation to my wonderful wife Tracy as well as my amazing sons, Jayden and Logan. They
gave me the time I needed to complete this program as well as listen to my thoughts about
the topic I was currently exploring. I look forward to watching them grow and pursue their
own careers in the future.

D. Gallagher
2
Abstract
Outages with a cloud provider can lead to applications becoming unavailable and costing
organizations in terms of loss of earnings as well as damaging their reputation. Avoidable
application outages can also occur due to misaligned DevOps processes.
This dissertation aims to prove that consistent DevOps processes as well as regional based
application monitoring can help to maintain highly available applications for organizations.
Cloud providers aim to a guarantee a certain level of uptime for their services, but they too
encounter unforeseen outages.
This dissertation discusses the various tools and techniques that can be employed to help
with maintaining the high availability of an application. An accompanying research artifact
also demonstrates the techniques that are discussed as well as outlining the various tests that
were executed against the developed solution.
The output of the practical was successful in that outages due to inconsistent DevOps
processes and outages at a regional level were mitigated. It was identified that some cloud
services were not suitable for a certain subset of applications that require high availability
and overall, the costs can increase exponentially.
This research has proven that a highly available application architecture is possible when
deployed on a cloud platform. This research focuses on 1 small area of the overall problem
domain, but it has the potential to form a basis to experiment in highly available applications
for other problem domains as well managing highly available applications when deployed on
other cloud providers.

D. Gallagher
3
Table of Contents
Declaration........................................................................................................................................ 1
Acknowledgements........................................................................................................................... 1
Abstract ............................................................................................................................................ 2
Table of Contents .............................................................................................................................. 3
Table of Figures................................................................................................................................. 9
Table of Tables ................................................................................................................................ 10
Table of Code Listings...................................................................................................................... 10
Nomenclature ................................................................................................................................. 11
1. Introduction .............................................................................................................................. 1
1.1. Purpose.......................................................................................................................... 2
1.2. Background.................................................................................................................... 3
1.3. Problem Statement ........................................................................................................ 3
1.4. Research Question ......................................................................................................... 4
1.5. Scope and Limitations .................................................................................................... 5
1.6. Methodological Approach .............................................................................................. 5
1.7. Report Outline................................................................................................................ 6
2. Literature Survey ....................................................................................................................... 7
2.1. Overview of Cloud Computing........................................................................................ 7
2.1.1. Why deploy code to the Cloud vs On Premises ............................................................... 7
2.1.2. Challenges faced when deploying to the Cloud............................................................... 8
2.1.3. Types of Outages............................................................................................................ 9
2.2. Infrastructure............................................................................................................... 10
2.2.1. Virtual Machines .......................................................................................................... 10
2.2.2. Cloud Service Offerings ................................................................................................ 10
2.2.3. Cloud Service Decision Matrix ...................................................................................... 11
2.3. High Availability and Disaster Recovery ........................................................................ 11
2.3.1. Disaster Recovery Metrics ............................................................................................ 12
2.3.2. Backup and Restore...................................................................................................... 12
2.3.3. Pilot Light..................................................................................................................... 13
2.3.4. Warm Standby ............................................................................................................. 13
2.3.5. Multi-site Active / Active.............................................................................................. 14
2.3.6. Comparison of Disaster Recovery Options .................................................................... 14

D. Gallagher
4
2.4. Multi Region Architecture Considerations .................................................................... 15
2.4.1. Application................................................................................................................... 15
2.4.2. Database...................................................................................................................... 15
2.5. Region Failover Considerations..................................................................................... 16
2.6. Code Deployment......................................................................................................... 17
2.6.1. Manual......................................................................................................................... 17
2.6.2. Automated................................................................................................................... 18
2.6.3. Hybrid.......................................................................................................................... 18
2.7. Code Management....................................................................................................... 19
2.7.1. Single Developer Projects............................................................................................. 19
2.7.2. Source Control ............................................................................................................. 20
2.7.2.1. Branching Model: GitFlow ............................................................................................ 21
2.7.2.2. Branching Model: Trunk ............................................................................................... 22
2.7.3. DevOps Code Pipelines................................................................................................. 23
2.8. Cloud Application Architecture..................................................................................... 24
2.8.1. Frontend Application.................................................................................................... 25
2.8.1.1. Monolithic Frontend .................................................................................................... 25
2.8.1.2. Micro-frontend ............................................................................................................ 25
2.8.1.3. Single Page Application (SPA) ....................................................................................... 27
2.8.2. Backend Application..................................................................................................... 27
2.8.2.1. Monolith ...................................................................................................................... 27
2.8.2.2. Microservice................................................................................................................. 28
2.8.2.3. Serverless..................................................................................................................... 29
2.8.3. Database...................................................................................................................... 30
2.8.3.1. Relational Database...................................................................................................... 30
2.8.3.2. NoSQL.......................................................................................................................... 30
2.8.3.3. File / Object Storage..................................................................................................... 31
2.9. Networking .................................................................................................................. 32
2.9.1. Load Balancing ............................................................................................................. 33
2.9.2. Content Delivery Network (CDN).................................................................................. 33
2.10. Monitoring................................................................................................................... 34
2.10.1. Active Monitoring......................................................................................................... 34
2.10.2. Passive Monitoring....................................................................................................... 36
2.11. Observability................................................................................................................ 36

D. Gallagher
5
2.12. Security and Regulatory Compliance ............................................................................ 37
2.12.1. Security........................................................................................................................ 37
2.12.2. Compliance .................................................................................................................. 38
2.13. Chapter Conclusions..................................................................................................... 39
3. Design ..................................................................................................................................... 41
3.1. System Context Diagram .............................................................................................. 41
3.2. Cloud Provider ............................................................................................................. 42
3.2.1. Amazon Web Services (AWS)........................................................................................ 42
3.2.2. Azure............................................................................................................................ 43
3.2.3. Google Cloud Platform (GCP)........................................................................................ 43
3.2.4. Selection Criteria.......................................................................................................... 44
3.2.5. Final Selection Justification........................................................................................... 44
3.3. Cloud Infrastructure ..................................................................................................... 45
3.3.1. Virtual Machines (VM).................................................................................................. 45
3.3.2. Containers.................................................................................................................... 46
3.3.3. Serverless..................................................................................................................... 47
3.4. DevOps Tooling ............................................................................................................ 48
3.4.1. Source Control ............................................................................................................. 49
3.4.1.1. GitHub.......................................................................................................................... 49
3.4.1.2. AWS CodeCommit........................................................................................................ 49
3.4.1.3. Gitea ............................................................................................................................ 50
3.4.1.4. Selection Criteria.......................................................................................................... 50
3.4.1.5. Final Selection Justification........................................................................................... 50
3.4.2. Pipelines....................................................................................................................... 51
3.4.2.1. GitHub Actions ............................................................................................................. 51
3.4.2.2. AWS CodePipeline........................................................................................................ 52
3.4.2.3. Jenkins ......................................................................................................................... 52
3.4.3. Infrastructure As Code (IAC)......................................................................................... 54
3.4.3.1. AWS CloudFormation................................................................................................... 54
3.4.3.2. Cloud Development Kit (CDK)....................................................................................... 54

D. Gallagher
6
3.4.3.3. Terraform..................................................................................................................... 55
3.5. Application Architecture............................................................................................... 56
3.5.1.1. HTML and JavaScript .................................................................................................... 57
3.5.1.2. React............................................................................................................................ 57
3.5.1.3. Angular ........................................................................................................................ 58
3.5.2.1. Java.............................................................................................................................. 59
3.5.2.2. NodeJS......................................................................................................................... 60
3.5.2.3. Python ......................................................................................................................... 60
3.5.3. Database...................................................................................................................... 62
3.5.3.1. Relational Database...................................................................................................... 62
3.5.3.2. NoSQL.......................................................................................................................... 63
3.5.3.3. File / Object Storage..................................................................................................... 64
3.6. Application High Availability ......................................................................................... 65
3.6.1. Static IP Address........................................................................................................... 65
3.6.2. Load Balancer............................................................................................................... 65
3.6.3. AWS Global Accelerator ............................................................................................... 66
3.6.4. Amazon API Gateway ................................................................................................... 67
3.6.5. Amazon Route53.......................................................................................................... 67
3.6.6. Amazon CloudFront...................................................................................................... 68
3.7. Application Monitoring and Observability..................................................................... 69

D. Gallagher
7
3.7.3. Observability................................................................................................................ 70
3.8. Pre-Implementation Details.......................................................................................... 71
3.9. Chapter Conclusions..................................................................................................... 71
4. Implementation....................................................................................................................... 72
4.1. System Context Diagram .............................................................................................. 72
4.1.1. Single Region Deployment............................................................................................ 72
4.1.2. Multi Region Deployment............................................................................................. 74
4.2. Cloud Provider ............................................................................................................. 76
4.3. Cloud Infrastructure ..................................................................................................... 77
4.4. DevOps Tooling ............................................................................................................ 77
4.4.1. Source Control ............................................................................................................. 77
4.4.2. Pipelines....................................................................................................................... 77
4.4.3. Infrastructure As Code (IAC)......................................................................................... 80
4.5. Application Architecture............................................................................................... 81
4.5.3. Database...................................................................................................................... 83
4.6. Application High Availability ......................................................................................... 84
4.7. Application Monitoring and Observability..................................................................... 87
4.8. Chapter Conclusion ...................................................................................................... 89
5. Results..................................................................................................................................... 91
5.1. Test Strategy ................................................................................................................ 91
5.2. Application Unit Testing ............................................................................................... 91
5.3. Using Integration Tests to Assist with High Availability ................................................. 92
5.4. Using Functional Tests to Validate a DevOps Pipeline ................................................... 94
5.5. Performance Testing of Cloud Applications .................................................................. 95

D. Gallagher
8
5.5.1. Performance Test Results............................................................................................. 96
5.5.2. Performance Test Observations ......................................................................................... 97
5.6. Load Testing Highly Available Cloud Applications .................................................................. 97
5.6.1. Load Test Results ............................................................................................................... 98
5.6.2. Load Test Observations...................................................................................................... 99
5.7. Performing Security Testing on the Cloud ............................................................................. 99
5.8. UI Testing of Multi Region Cloud Applications..................................................................... 101
5.9. API Testing of Multi Region Cloud Applications ................................................................... 103
5.10. Chaos Testing of Multi Region Cloud Applications ............................................................. 105
5.11. Cost Analysis of Highly Available Cloud Applications.......................................................... 105
5.12. Chapter Conclusion........................................................................................................... 108
6. Conclusions ........................................................................................................................... 109
6.1. Conclusions on the State of the Art ............................................................................ 109
6.2. Conclusions on Practical Element ............................................................................... 109
6.2.1. Technologies to Enable High Availability..................................................................... 110
6.2.2. Tools and Techniques for Maintaining High Availability .............................................. 111
6.2.2. Effects of Developing for High Availability .................................................................. 112
6.3. Limitations Discussion ................................................................................................ 113
6.4. Further Work.............................................................................................................. 114
Appendices........................................................................................................................................ 1
Appendix A: References.................................................................................................................... 1
Appendix B: Code Listing .................................................................................................................. 6
Appendix C: Test Project Locally ....................................................................................................... 7
Appendix D: Configure AWS Real User Monitoring............................................................................ 8

D. Gallagher
9
Table of Figures
FIGURE 1.1. SOFTWARE DELIVERY SCHEDULE...............................................................................................................1
FIGURE 2.1. CLOUD DISASTER RECOVERY OPTIONS .....................................................................................................14
FIGURE 2.2. GITHUB ACTIONS - MANUAL APPROVAL ..................................................................................................19
FIGURE 2.3. GITFLOW BRANCHING STRATEGY. ..........................................................................................................21
FIGURE 2.4. TRUNK BASED BRANCHING MODEL.........................................................................................................22
FIGURE 2.5. MONOLITHIC FRONTEND......................................................................................................................25
FIGURE 2.6. MICRO FRONTEND ARCHITECTURE..........................................................................................................27
FIGURE 2.7. SYNTHETIC MONITORING .....................................................................................................................35
FIGURE 3.1. SYSTEM CONTEXT DIAGRAM – HIGH LEVEL ...............................................................................................42
FIGURE 3.2. CLOUD PROVIDERS MARKET SHARE (‘INFOGRAPHIC: CLOUD MARKET SHARE’ N.D.)............................................45
FIGURE 3.3. GITHUB ACTIONS - SAMPLE WORKFLOW .................................................................................................51
FIGURE 4.1. SYSTEM CONTEXT DIAGRAM - SINGLE REGION DEPLOYMENT.........................................................................73
FIGURE 4.2. SYSTEM CONTEXT DIAGRAM - MULTI REGION DEPLOYMENT .........................................................................75
FIGURE 4.3. FRONTEND APPLICATION PIPELINE. .........................................................................................................78
FIGURE 4.4. GITHUB ACTIONS BACKEND PIPELINE - DEPLOY APPLICATION STEPS................................................................79
FIGURE 4.5. TERRAFORM CLOUD CONSOLE. ..............................................................................................................81
FIGURE 4.6. FRONTEND APPLICATION......................................................................................................................82
FIGURE 4.7. DYNAMODB GLOBAL TABLES. ...............................................................................................................84
FIGURE 4.8. FRONTEND GLOBAL ACCELERATOR ENDPOINT GROUPS................................................................................85
FIGURE 4.9. CLOUDFRONT DISTRIBUTIONS FOR FRONTEND APPLICATION. ........................................................................86
FIGURE 4.10. BACKEND APPLICATION LOAD BALANCER - US-EAST-1 REGION.....................................................................86
FIGURE 4.11. FRONTEND APPLICATION CANARY TEST RESULTS – NO ISSUES. ....................................................................87
FIGURE 4.12. FRONTEND APPLICATION CANARY TEST RESULTS – ISSUE IN A REGION...........................................................88
FIGURE 4.13. FRONTEND APPLICATION CANARY TEST RESULTS – ISSUE RESOLVED..............................................................88
FIGURE 4.14. REAL USER MONITORING REPORT - US-EAST-1. .......................................................................................89
FIGURE 5.1. PIPELINE FAILED - FRONTEND TEST COVERAGE THRESHOLD NOT SATISFIED.......................................................91
FIGURE 5.2. PIPELINE SUCCESSFUL - FRONTEND TEST COVERAGE THRESHOLD SATISFIED.......................................................91
FIGURE 5.3. PIPELINE FAILED - BACKEND TEST COVERAGE THRESHOLD NOT SATISFIED.........................................................92
FIGURE 5.4. PIPELINE SUCCESSFUL - BACKEND TEST COVERAGE THRESHOLD SATISFIED ........................................................92
FIGURE 5.5. INTEGRATION TESTS - SYSTEM RUNNING NORMALLY ...................................................................................93
FIGURE 5.6. INTEGRATION TESTS – FRONTEND APPLICATION NOT FUNCTIONING CORRECTLY ................................................94
FIGURE 5.7. FUNCTIONAL TEST FAILED AFTER APPLICATION DEPLOYMENT........................................................................95
FIGURE 5.8. PERFORMANCE TEST RESULTS - PERCENTILE RESPONSE TIMES. ......................................................................96
FIGURE 5.9. PERFORMANCE TEST RESULTS - HIGH LEVEL STATISTICS. ..............................................................................96
FIGURE 5.10. LOAD TEST RESULTS - HIGH LEVEL STATISTICS..........................................................................................98
FIGURE 5.11. LOAD TEST RESULTS - PERCENTILE RESPONSE TIMES..................................................................................98
FIGURE 5.12. CURRENT SECURITY ISSUES REPORTED BY SNYK. .....................................................................................100
FIGURE 5.13. ISSUE INFORMATION PROVIDED BY SNYK. .............................................................................................100
FIGURE 5.14. SELENIUM TEST FAILED. ...................................................................................................................102
FIGURE 5.15. SELENIUM TEST PASSED. ..................................................................................................................102
FIGURE 5.16. POSTMAN USER INTERFACE ..............................................................................................................103
FIGURE 5.17. POSTMAN REPORTS FAILED TESTS IN PIPELINE. ......................................................................................104
FIGURE 5.18. POSTMAN REPORTS ALL TESTS SUCCESSFUL IN PIPELINE...........................................................................104

D. Gallagher
10
Table of Tables
TABLE 1. CLOUD SERVICE DECISION MATRIX..............................................................................................................11
TABLE 2. AWS CONTAINER OFFERINGS....................................................................................................................47
TABLE 3. AWS NOSQL OFFERINGS.........................................................................................................................63
TABLE 4. AWS LOAD BALANCER OPTIONS. ...............................................................................................................66
TABLE 5. AWS OBSERVABILITY SERVICES ..................................................................................................................70
TABLE 6. FRONTEND APPLICATION COST ESTIMATION. ...............................................................................................106
TABLE 7. BACKEND APPLICATION COST ESTIMATION..................................................................................................106
TABLE 8. COMBINED APPLICATION COST ESTIMATION – SINGLE REGION.........................................................................107
TABLE 9. COMBINED APPLICATION COST ESTIMATION – MULTIPLE REGIONS. ..................................................................107
TABLE 10. CODE REPOSITORIES. ...............................................................................................................................6
Table of Code Listings
CODE LISTING 1. PSEUDO CODE FOR CODE DEPLOYMENTS............................................................................................80
CODE LISTING 2. SNIPPET OF BACKEND LAMBDA CODE.................................................................................................83
CODE LISTING 3. TEST FRONTEND CODE LOCALLY..........................................................................................................7
CODE LISTING 4. INSTALL RUM JAVASCRIPT LIBRARY.....................................................................................................8
CODE LISTING 5. RUM FRONTNED CODE SNIPPET. .......................................................................................................8

D. Gallagher
11
Nomenclature
Acronym Definition Page
AWS Amazon Web Services 2
Azure Microsoft Azure 2
AWS Region An AWS region is a location positioned in a disperse around the world where AWS
will cluster their data centres
4
AWS Availability
Zone
A logical grouping of data centres in an AWS Region 4
React React is a JavaScript library that is used for creating rich graphical user interfaces 5
DynamoDB DynamoDB is a NoSQL database provided by AWS that is both fast and flexible. 5
Container A container is a lightweight, standalone piece of software for packaging code
along with the required dependencies. Containers enable an application to run
quickly from 1 computing environment to another
6
Serverless Serverless is a development model for the cloud that allows developers to create,
build, run and support applications whilst having to manage servers in the
backend.
6
Active Directory
(AD)
Active Directory contains detailed information about user objects on a network
and makes it straightforward for administrators and users to access this
information
8
RTO Recovery Time Objective (RTO) is the time duration where a process must be
restored to production normal after a disaster situation to mitigate against any
adverse consequences associated with the disaster
12
RPO Recovery Point Objective (RPO) is the tolerable quantity of data which could be
wiped or must be re-instated after application downtime.
12
DR Disaster recovery (DR) is the potential ability of an organization to respond to and
recover from a disaster event which could impact upon the normal running of the
businesses processes. The goal of DR is to resume normal operations with an
organizations IT infrastructure as quickly as possible after a disaster occurs.
13
DNS Domain Name System (DNS) is how IP addresses associated with online services
are mapped to domain names which are more human readable. DNS translates
the domain name to IP addresses to the required content can be loaded in a
browser.
15
MVP A minimum viable product (MVP) is a variant of a product that contains a subset
of features to be evaluated by early customers to obtain feedback for future
product development.
21
API Api is the abbreviation for Application Programming Interface. It is a collection of
protocols and definitions for the development and integration of software
applications.
24

D. Gallagher
12
YAML YAML stands for ‘Yet Another Markup Language’. It is a data-serialization language
that is human-readable and extensively used in data transmission as well as
storage applications. YAML is also frequently used in application configuration
files.
51
JSON JavaScript Object Notation (JSON) is based on the object syntax of JavaScript. It is
a text-based format for encoding structured data.
54
PWA PWA stands for Progressive Web Applications. They are applications developed
with manifests, service workers and other web-platform features to give end
users an experience that may be similar in nature to native applications.
58
OSI Model The Open Systems Interconnection model or OSI model serves as a common
communication standard that can be used for communication with different
computer systems
66
RESTful API Also known as a REST Api. It is an application programming interface that satisfies
the constraints of the REST architectural style and allows for communication
between various systems.
67
WebSocket API The WebSocket Api is a technology that facilitates an open 2-way interactive
communication session between the browser of a user and a backend server.
Messages are sent to a backend server and the browser can receive event-driven
responses without the need to poll the backend server for a reply
67
CRUD The term CRUD refers to applications that perform simple read, write, update and
delete operations on a database table.
81
UI The UI or User Interface is the entry point for human-computer communication and
interaction on a device. This can include keyboards, a mouse, display screens and
the appearance of a desktop
101
GUI A GUI or Graphical User Interface is a screen through which a user interacts with
electronic devices such as smartphones or computers using menus, icons and other
graphics or visual indicators.
101

D. Gallagher
1
1. Introduction
Recent improvements in software development tooling have allowed small companies as well
as large organizations to deliver software faster than ever before. From defining the initial
requirements, to development of the feature, including shipping the feature can take
anything from hours to months. Some companies have advanced DevOps processes with
thorough automated test suites which are reliable whilst also allowing features to be
delivered without human interaction (‘Continuous Integration, Delivery, and Deployment’
n.d.:21). Other companies have no pipelines or automated tests, with much of the testing
completed manually. This becomes an issue with regression testing as quite often, defects
can be introduced and not noticed if most of the testing is done manually. Figure 1.1 depicts
an image that is commonplace in presentation decks for large organizations which depicts the
excepted delivery date of a feature. Often, tasks such as pipeline development or code quality
checks are neglected to hit pre-defined deadlines. This may result in the code artifact getting
shipped on schedule, but over time when code changes need to be introduced, skipping the
tasks to meet the original schedule come back to haunt the development teams.
Figure 1.1. Software Delivery Schedule

D. Gallagher
2
The main goal in common with organizations, it is vital their software solutions are highly
available to their end users. If a system is not available when a customer wants to use it, they
may never return to use that system again.
Reasons a system may not be available include:
• A code defect may have been introduced
• A database may become unresponsive
• A server the code is deployed to is experiencing issues.
These issues are when the code is already deployed, issues could also arise during the
deployment of new code or features. Any of these issues will result in a negative impact for
the reputation of that organization. For systems in the social media space, this may not be
such a significant problem , but reputation is extremely important when working in sectors
such as financial services.
Whilst reviewing various cloud providers such as Azure (‘Cloud Computing Services |
Microsoft Azure’ n.d.) or AWS (‘Cloud Services - Amazon Web Services (AWS)’ n.d.), services
are available which limit the impact of servers becoming unavailable. Both providers offer a
significant range of services which are being expanded on constantly. AWS and Azure both
have Well Architected frameworks (‘AWS Well-Architected Framework’ n.d.; ‘Microsoft Azure
Well-Architected Framework’ n.d.) which can be used be by clients as a template to improve
the quality of their workloads. The aim for systems is to ensure they are highly available. It is
also not uncommon for issues to exist with the services provided by cloud providers. This
dissertation introduces various approaches and patterns that can be followed to ensure high
availability for many aspects of a cloud application.
1.1. Purpose
High availability is vitally important for applications, regardless of the size of the customer
base. Systems such as social media or banking platforms are developed to be used by
customers on the customers’ schedule. As discussed in the introduction, there are reasons
why a system may not be available. This research presents an approach for architecting
applications along with the associated DevOps processes to minimize any disruptions to a
system. Following typical architectures will provide a good level of availability but as we have
seen with recent AWS outages (Moss n.d.), it is possible for an issue with the cloud provider

D. Gallagher
3
to bring down your application. This dissertation will explore approaches that can be followed
with pipelines to ensure a code deploy will not affect an application. It will also examine a
fault tolerant architecture for a 3-tier application including how issues may be identified in an
automated fashion before being reported by the end users.
1.2. Background
With the ever-increasing need for software features to get delivered quickly, perform in a
highly available fashion, it is imperative that solutions are setup in a way to facilitate this. This
starts with the DevOps process around how the code is stored, the pipelines that are used to
package, test, quality check along with deploying the code. Within the chosen cloud provider,
how that code is deployed, including serving that code to end customers is also a key
consideration. It is straightforward to deploy code onto a single server as well as expose an
endpoint for customers. However, if the traffic increases to that service, or there is a
disruption with the cloud provider, that application could easily become unavailable.
Techniques such as application auto scaling can be setup to avoid this situation but there are
other factors that need to be managed to avoid a complete outage.
End users may become frustrated when an application they want to use is not available.
Ideally, system issues should be resolved before impacting upon users. Techniques such as
synthetic monitoring (‘Using synthetic monitoring - Amazon CloudWatch’ n.d.) provide a
mechanism to identify issues in a scheduled fashion. Systems such as static websites often
are neglected when it comes to automated monitoring whereas backend applications tend to
have monitoring setup. As with automated testing, the more automated monitoring that is in
place, the more beneficial it is for an application. Automated testing increases the likelihood
that errors or outages will have been identified automatically before users try to access the
system.
1.3. Problem Statement
With the vast adoption of cloud services, organizations are deploying their code with a trusted
cloud provider. This brings its own benefits such as not having to manage servers, reduced
costs, plus the ability to spin up systems in a fast manner without having to purchase servers.
However, as was seen with the AWS issues on the 6th December 2021 (Moss n.d.) and on the
22nd December 2021 (Moss n.d.), more needs to be done to ensure applications are highly

D. Gallagher
4
available. Both outages in the us-east-1 region caused major disruptions for hundreds of
reputable brands around the world. An AWS region is a physical area where AWS data centres
are clustered throughout the world (‘Global Infrastructure Regions & AZs’ n.d.).
The problem statement is:
Recent outages in the AWS us-east-1 region lasting 5 and 8 hours
respectively have impacted on applications such as Slack, IMDb and
McDonalds, architecting cloud applications to be highly available will limit
the impact of regional outages in the future.
As identified with the outages, the reliance with the us-east-1 region is huge. It is the default
region when a new AWS account is created. AWS has the concept of availability zones. An
availability zone is an isolated location within an AWS region (‘Global Infrastructure Regions
& AZs’ n.d.). An application can be deployed in numerous availability zones within a specific
AWS region to help with high availability. But this will not guarantee high availability. If a
natural disaster such as an earthquake or tsunami occurred close to an AWS datacentre for a
region, there is every possibility that every availability zone in that region could become
compromised. It is important to remember that achieving high availability can be costly.
Implementing high availability must align with the main objectives that a business is aiming
to achieve (Sarkar and Shah 2018). This dissertation will explore what it takes to deploy
applications across multiple cloud regions.
This will raise questions in terms of how the data is stored in databases across multiple
regions, traffic management between these regions. How code is deployed to these regions
in a safe, efficient manner will also be explored.
This dissertation will also look at how a region can be taken out of service if issues are
occurring in that region or scheduled maintenance is occurring. Overall, the goal is to ensure
that if a user wants to interact with the system, it is available no matter what event is
occurring across the network.
1.4. Research Question
Ensuring an application is highly available, accounts for most outage scenarios, including
those with a cloud provider is not a straightforward task. Costs may well increase by
architecting an application to account for every outage scenario, but it may be a small price
to pay to maintain a solid reputation.

D. Gallagher
5
The research question considered in this research is:
Can DevOps play a part in architecting an application to withstand various outages
which may occur in a cloud environment?
To answer this research question, 3 aims were identified. These are:
1. Identity key characteristics which indicate high availability of an application with
reference to how it is different in a cloud-based environment.
2. Identify key processes that can be applied to enable high availability of applications
deployed in the cloud.
3. Design and implement a solution to deploy a highly available application that will
function in the event of sample test outages
1.5. Scope and Limitations
GitHub actions will be used to create the necessary pipelines. This will integrate with the
source code located in GitHub to provide a seamless integration between the pipelines and
the code.
AWS will be used as the cloud provider to deploy a 3-tier application in multiple geographically
disperse regions. Azure and others provide similar functionality but due to limited scope, this
dissertation will be restricted to AWS.
AWS tooling such as CloudWatch will be used to monitor the application whilst also ensuring
any outages are identified before consumers identify the issues. In the proposed solution, if
an issue is encountered, scripts will be executed to bring an AWS region out of the pool which
accepts traffic. Research will be done on open-source tooling that can be used to assist with
monitoring.
1.6. Methodological Approach
A sample 3-tier application will be created which will have a React frontend, Python backend,
talking to a DynamoDB database. This application will be stored in GitHub separate
repositories for the frontend and backend. A series of GitHub actions will be created for the
various tiers which will be used for deploying the application onto AWS. Any cloud
infrastructure created will use an Infrastructure as Code tool such as Terraform or AWS CDK
to ensure the process is repeatable.

D. Gallagher
6
For testing of outages or errors, the AWS CloudWatch toolset will be used which will also be
setup with the chosen Infrastructure as Code tool. The final delivered artifact will be a
reference implementation which can be used for any programming language to architect
highly available applications.
1.7. Report Outline
A review of current literature related to the subject matter was conducted and is discussed in
chapter 2, covering Cloud Computing, Microservices infrastructure, containers, serverless as
well as DevOps. The design and implementation of architecting a highly available application
is covered in chapters 3 and 4, answering the research question posed in chapter 1. Chapter
6 concludes the dissertation by discussing the architected solution of architecting highly
available applications which suggests further work and experiments which could occur in the
future.

D. Gallagher
7
2. Literature Survey
The research in this chapter will initially focus on cloud computing and how companies have
evolved from on premise deployments to using the cloud. There is a focus on highlighting the
challenges that exist when deploying to the cloud before focusing on the various cloud service
offerings. Disaster recovery is discussed in terms of how it can relate to cloud environments
along with the various considerations which need to be thought of for multi region
deployments as well as region failover scenarios. The research then looks at the DevOps
process around code management, the pipelines involved, before discussing the makeup of a
cloud application which can prove out the research question. This dissertation will have a
focus on best practices and techniques for architecting applications that are highly available
when deployed to a cloud environment.
2.1. Overview of Cloud Computing
2.1.1. Why deploy code to the Cloud vs On Premises
When the decision arrives to deploy an application on the cloud or deploy to on premise
servers, that decision needs to be evaluated on a per organization basis. In the past, larger
organizations would have had the resources to spin up their own datacentres to deploy their
applications. Applications as well as databases for the organization would live in that 1
datacentre and there may also exist a disaster recovery site. However, those servers would
have to be accounted for, those servers came with costs associated with licencing, power,
rack space as well as costs associated with the upkeep of the building the servers are located
in. The duration of time it takes to provision a new server in a datacentre ranged from weeks
to months which meant the need for substantial planning to be made in advance prior to an
application being deployed to production.
When deploying applications on the cloud, it is a different mindset. The costs that exist per
server for power, rack space is no longer present when deploying to the cloud. The upfront
cost of purchasing a server does not have to be made, but cloud providers do allow you to
purchase credits on a server to keep the overall costs down. In relation to cloud, organizations
have the option to adopt a pay as you go model.

D. Gallagher
8
“Using cloud infrastructures and platforms is convenient because services
on demand offers high flexibility and pay as you go pricing offers low costs.”
(Toivonen 2013, p.17)
As mentioned with on premise servers, the time to provision a server does not exist on the
cloud which leads to shorter times to turn around new applications or proof of concepts. With
the evolution of new cloud services, it is possible to design and build applications in a fashion
which is not possible in a traditional datacentre.
2.1.2. Challenges faced when deploying to the Cloud
Deploying to the cloud versus deploying to on premise servers does bring its own challenges.
For security conscious organizations, source code and access to that code has restricted
access. Deployment of code to any cloud provider needs to be performed in a safe fashion to
ensure that proprietary code does not make its way into the public domain. When working
on premise, the access to those servers is controlled with systems such as Active Directory.
Different authentication mechanisms need to be setup to ensure appropriate access is
provided to the correct individuals when accessing cloud resources (Garrison et al. 2012).
Services on the cloud bring about their own costs with regards to data transfer, storage,
backups that are not an issue on-premises. It is important when deploying to the cloud that
considerations are made for setting up automated backups as well as testing strategies for
performing rollbacks if required.
With on-premises servers, the organization oversees the software installed. With cloud
services, it is the cloud provider who oversees the core software installed on platforms.
Organizations need to be vigilant when it comes to monitoring the cloud provider roadmaps
to ensure they are complying with the supported versions of software they are using.
Finally, a common danger with the vast array of service offerings in the cloud is ensuring that
the correct service is used to satisfy the requirements of an applications. It is straightforward
to write a hello world application that uses a particular service but supporting that service in
a production environment can lead to its own array of challenges. The ability to manage peak
loads, disaster recovery situations or being able to manage scheduled downtime during
patching of certain services are among items that need to be considered.

D. Gallagher
9
2.1.3. Types of Outages
There are multiple distinct types of outages that can occur on any cloud provider. From an
application perspective, if deployed code is not thoroughly tested, this could make an
application unresponsive after a rogue deployment. The development team is responsible for
ensuring they follow a rigid DevOps process to mitigate against untested code getting
deployed. An outage could also occur on an application during a code deployment. The outage
may not last long, but for end users, this is still and unexpected outage that will affect the
brands reputation. This research will examine how outages can be avoided as part of the
DevOps process as well when an application is running in a production setting.
From a cloud provider perspective, there are occasions where a region may become unstable
or unresponsive (Moss n.d.). Incidents such as data centre power loss, natural disasters such
as earthquakes or tornados could bring an entire region offline. Scheduled maintenance of a
managed cloud provider offering could make the service unusable. When deploying to the
cloud, it is important to consider that all services can have outages. It is imperative to have
this mindset to make an application as resistant to outages as possible. In an article related
to cloud storage mechanisms, the authors evaluate various outages, where they arrived at
the following conclusion.
“The reason after the post-investigation of most of these outages revealed
that the main root cause was the expected and predicted failures, while
others happened due to the failure of correct components in the recovery
process.” (Tahir et al. 2020)
In this statement, the words expected, and predicted failures stand out to this author. Teams
need to ensure that expected and predicted failure situations are accounted for before trying
to manage the unexpected failures that could potentially occur. This research will examine
the techniques that can be used for a 3-tier application to ensure there are no application
outages.
The survey of research into the area of cloud computing indicates the need for research into
the role cloud computing can play in achieving high availability.

D. Gallagher
10
2.2. Infrastructure
2.2.1. Virtual Machines
There are a multitude of different options for deploying applications on the cloud with 1 of
those options being virtual machines. This closely resembles deploying an application on an
on-premises server. The main difference with virtual machines on the cloud is that they are
meant to be ephemeral in nature, meaning they should not have an exceedingly long lifespan.
As mentioned in a previous section, it is the cloud provider who oversees the underlying
operating system on the virtual machines. There are no guarantees that a virtual machine
which is launched on a given day will be compliant the following day. To get around this, it is
imperative that the process for deploying applications to these virtual machines is automated
with a rigorous CI/CD process.
With virtual machines, there are a vast array different instance types that can be chosen from.
Each instance type has their own characteristics, there are instance types that are compute
optimised, whilst others are storage optimised. Newer versions of these instance types can
offer increased performance or greater cost savings. It is important to periodically benchmark
the instance types (Akioka and Muraoka 2010) that are in use with the newer offerings from
a cloud provider to validate the applications are running as efficiently as possible.
2.2.2. Cloud Service Offerings
Every cloud provider offers a waste array of services that can be used. These can range from
virtual machines that were discussed in the previous section to containers(Merkel 2015), to
serverless (Eismann et al. 2021)offerings. Again, choosing the correct service is a choice which
is made based on the organizations needs and the application that is being deployed.
Containers are useful for applications that maintain state or need to be long running in nature.
A container is a way of packaging code in a format that can run on any operating system in a
uniform way. The way of packaging the container is called an image. That image contains the
necessary steps required to install the software. When deploying containers, they can be
deployed in a fashion that is always on or deployed in a serverless fashion using newer
offerings as provider by the larger cloud providers.
On the other side, serverless is a solid option for short lived REST calls, standard CRUD
operations and for deploying individual microservices. Serverless is an excellent option when
you want the ability to scale from little or no traffic to huge spikes in traffic. The main selling

D. Gallagher
11
point with serverless is you pay for the resources that you use. This makes it an attractive
option for cost conscious organizations.
2.2.3. Cloud Service Decision Matrix
The following table displays a decision matrix that can be used to determine the correct cloud
solution to be adopted based on the authors experiences. This table evaluates the current
state of an application and the deployment choices that can be made with or without code
changes. The correct service should be chosen based on the characteristics of the deployed
application.
Table 1. Cloud Service Decision Matrix.
Development Team Choice Use
Virtual
Machines
Use
Containers
Use
Serverless
Are you deploying an existing application that requires no code
changes?
Yes No No
Is there a docker image for the application or are there immediate
plans to create a docker image?
Yes Yes Yes
Has the development / DevOps team the required bandwidth to
learn a new cloud provider service such as serverless?
No Yes Yes
Is there potential for the existing application to grow into multiple
smaller applications in the future?
No Yes Yes
Are keeping costs low with guaranteed high availability a
requirement for this application?
No Yes Yes
Is there sufficient time available for the development team to
create a correct solution for perceived future growth of the
application?
No Yes Yes
Does the application contain tasks that may be long running (over
15 minutes)?
Yes Yes No
The choice of infrastructure will determine which DevOps techniques to use. The DevOps
process for deploying applications to a virtual machine is different than when deploying to a
container in the cloud. To focus the research, it is important to choose an overall
infrastructure that requires processes to maintain high availability.
2.3. High Availability and Disaster Recovery
With regards deploying applications in an organizations data centre, managing high
availability and disaster recovery may involve switching traffic from 1 data centre to another
or periodically switching the active data centre to be the previously idle data centre. In these

D. Gallagher
12
situations, the organization is in control of the infrastructure and operations that occur within
the datacentre.
In terms of handling disaster recovery in the cloud, there are more considerations that need
to be made. The cloud provider oversees the infrastructure and how the disaster recovery
situations are managed. It is the responsibility of the organization to design their applications
in such a way that they can manage disaster recovery situations and hence be highly available.
Disaster scenarios can occur due to human error within the datacentres, natural disasters
such as earthquakes or tornados destroying a datacentre or even just loss of power to a
datacentre within a specific region.
The following sections discuss disaster recovery terms as well as various disaster recovery
options in the cloud, with a focus on how each approach impacts on high availability and cost.
2.3.1. Disaster Recovery Metrics
In terms of disaster recovery metrics, there are 2 metrics which can be used to measure an
application. Recovery Time Objective or RTO is a term used to define the time it takes to
restore an organizations process to the agreed upon service levels after a disruption or
disaster (Hamadah and Aqel 2019:1). For example, if a disaster were to occur at 1PM noon
and the RTO is 8 hours, the disaster recovery process should recover the organizations service
to the previously accepted service level by 9PM.
Recovery Point Objective or RPO is the tolerable quantity of data loss for a system measured
in units of time (Mendonça et al. 2019:2). For example, were a disaster situation to occur at
2PM and the agreed RPO is 1 hour. The system needs to be capable of recovering the entire
dataset that was in the system before 1PM. In this situation, the data loss will be for 1 hour –
1PM to 2PM.
2.3.2. Backup and Restore
Backup and Restore is a technique where backups of data is stored in a region, if a disaster
were to occur in that region, the data is exported to a separate region (Robinson et al. 2014:9).
Additionally, to exporting the data, the configuration must be redeployed, as well as the
infrastructure, and application code in the new target region. An output of this process means
the RTO and RPO would be low. There is a potential for data loss which may not be acceptable
in domains such as financial services. This approach would not guarantee high availability as
the time to copy the data to the new region as well as provision the required infrastructure

D. Gallagher
13
would lead to application downtime. On the flipside, this approach would be cost effective as
the applications are only deployed in 1 region at a time. Following a backup and restore
strategy may be suitable for charity organizations, static websites or in problem domains
where data loss and downtime is acceptable.
2.3.3. Pilot Light
Pilot Light is a technique where data is replicated from 1 region to another region where a
core minimal version of an organizations workload infrastructure is in operation (Trovato et
al. 2019:5). Processes to replicate databases or file storage are always turned on. Application
servers are pre-installed with application code, configuration and left in a turned off state
unless testing is taking place or there is a disaster recovery situation. Systems in the DR region
will only be switched on when a disaster recovery situation occurs. Unlike Backup and
Restore, the core infrastructure is always ready to be turned on. The RTO and RPO for Pilot
Light is lower than Backup and Restore but there is still the possibility of data loss when
switching regions as well as the application being unavailable. This approach will cost more
than Backup and Restore but it allows organizations to recover business critical applications
in a timelier fashion. Pilot Light may be suitable for organizations that have a small set of
critical applications with other applications being deemed non mission critical. The critical
applications will be in a ready to launch state in a separate region at any stage.
2.3.4. Warm Standby
Warm Standby is an approach where a fully functional scaled down copy of an organizations
production environment is available in a separate region (Robinson et al. 2014:14). This
approach extends the Pilot Light concept and decreases the time it takes to recover from a
disaster situation as the workload is always running in another region. The environment in
the DR region can be scaled up when required to guarantee it can manage the expected traffic
volumes. This approach will be more expensive than Pilot Light as the infrastructure in the DR
region is always running. It does however offer benefits of having a decreased RTO and RPO.
Warm Standy may be suitable for organizations who have business critical applications and
require high availability.

D. Gallagher
14
2.3.5. Multi-site Active / Active
Multi-Site Active/Active is an approach where an organization simultaneously runs their
workload in multiple regions at the same time in an active/active or a hot standby active/
passive strategy (Robinson et al. 2014:16). The active/active strategy is used to serve traffic
from every region in which the application has been deployed to. The warm standby strategy
is used when serving traffic from a single region only, with different regions used in a disaster
recovery situation. This approach is the most complex and expensive, but it is the only
approach which will guarantee high availability if following active/active. In the hot standby
approach, there is the possibility that users may not be able to access an application whilst
the hot standby version of the application becomes the primary version of the application.
This approach is the preferred option for organizations that require high availability and
cannot tolerate any level of downtime for their applications. It also offers the benefit of being
able to serve customers in various locations based on their geographic location to a specific
cloud region.
2.3.6. Comparison of Disaster Recovery Options
The diagram below in figure 2.1 displays a comparison of the various disaster recovery options
with specific emphasis on the impact in relation to RTO and RPO.
Understanding the various disaster recovery techniques are key when it relates to measuring
the high availability characteristics of an application. For mission critical applications, A multi-
site active / active approach may be needed. For small traffic applications that are not mission
Figure 2.1. Cloud Disaster Recovery Options

D. Gallagher
15
critical, the backup and restore technique may suffice. By understanding the criticality of an
application, it can help decide on the required high availability characteristics. The research
in this dissertation will outline the processes that can be developed to achieve high
availability.
2.4. Multi Region Architecture Considerations
2.4.1. Application
When developing applications to be deployed in a multi-region configuration, some
considerations need to be made. The application needs to be developed in such a way that it
can be deployed to separate regions with no code changes. Any items that require changes
should belong to configuration that is specific to a region. It is imperative that the application
can be deployed to net new regions with no code changes. There are a set of guidelines called
the Twelve Factor Application (Wurster et al. 2017:4) which should be followed for every
cloud application. These guidelines become more important as organizations discover the
need to deploy an application across multiple cloud regions.
Another consideration that needs to be made is when connecting to services like databases
or message queues. In a scenario where these services are also deployed in a multi-region
fashion, organizations need to guarantee that if those services failover to another region, the
application can manage this situation. Using techniques such as top-level DNS entries can help
to ensure that applications are not concerned with what region a database or service is
deployed in. The responsibility of managing situations where a database or other service fails
over, rests on the application developer. They must ensure the application can manage this
situation gracefully and continuing responding successfully to user requests.
2.4.2. Database
When deploying databases, it is important to think of how the database will behave in the
event of a DR situation. For high availability, it is important the database service is deployed
in a multi-region fashion. Deciding which region contains the primary database is important
so every other database replica can keep coordinated with the main copy. The speed of
replication of data between regions is important to guarantee data consistency. As discussed
earlier, how applications connect to the database needs to be considered. In the world of
microservices with smaller services getting deployed, managing the number of connections

D. Gallagher
16
to the database is important to ensure the database does not get overloaded. It is vital to set
the maximum allowed connections on the database to an acceptable level, then work with
the application teams to guarantee this will satisfy the projected connection request
demands.
Application teams should consider deploying read replica versions of databases across regions
to serve read only requests. This will free up the main database for write requests by taking
away the load which would have been generated by read requests.
Finally with databases, it is important a backup strategy is in place to ensure if an issue arises
in any region, the database can be restored to a known good state.
It is important to consider the multi region architecture decisions for each layer of an
application. What will work for a frontend application will not work for a database. This
section has been included to highlight that this research will look at high availability across an
entire application tier.
2.5. Region Failover Considerations
Performing a region failover needs to be a task that has been planned and evaluated prior to
the event happening in production unexpectantly. By testing out the process, minor issues
such as missing credentials or invalid paths for application source code can be found and
rectified. When failing over an application between regions, topics such as the database
connection string as well as the top-level DNS to use, need to be seriously considered by
application teams.
It is important that a region failover can occur as efficiently as possible in a production like
environment. Any delay in performing a failover can result in an adverse impact on the overall
availability of an application, which in turn may lead to disgruntled customers. For the ideal
scenario, a region failover should be transparent to the end customer and every step should
occur in an automated fashion. DR strategies such as Warm Standby and Multi-site
Active/Active can help to make the process of a region failover smoother.
When architecting applications, it is imperative to choose cloud service offerings that will
work for the application whilst also supporting multi-region capabilities. By choosing the
cloud services wisely, it can simplify the regional failover process.

D. Gallagher
17
For UI based applications and REST based services, high availability when it comes to regional
failover can be obtained using load balancers. In a paper which discusses high availability in
the cloud, the authors discuss using the Hadoop software library for managing high
availability.
“Rather than rely on hardware to deliver high availability, the library itself is
designed to detect and manage failures at the application layer, so delivering a
highly available service(s) on top of a cluster of computers, each of which may be
prone to failures” (Singh et al. 2012)
The paper further discusses how hardware can fail which could in affect make nodes inactive
when in fact they could service traffic. Choosing a software-based load balancing approach
over a hardware-based approach guarantees the load balancing can be tweaked to suit the
applications needs.
The focus of this research is the ability to divert traffic to different regions in the event of a
disaster situation. It is important to understand that any layer within the application can fail,
the ability to handle this failure gracefully will prove crucial in validating the success of this
research. The practical element of this dissertation will outline a solution to prove this
approach is feasible.
2.6. Code Deployment
2.6.1. Manual
When it comes to deploying code to a cloud service, the quickest approach is to package the
code up on the developer’s machine and manually deploy to the cloud service. This approach
is sufficient for quick proof of concept projects or demonstrations, but it soon becomes very
inefficient. By factoring in the time duration of packaging the code, run the automated tests,
log into the cloud provider console, upload the packaged artifact, and deploy, this time adds
up daily. If the process takes 10 minutes and the developer attempts 6 deployments a day,
this is an hour taken up in that developer’s day.
This approach is error prone and can lead to issues further on in a project’s lifecycle. Required
dependencies to build an artifact or run tests may exist on the developer’s machine. The steps
to properly execute the steps may not be documented or properly defined. Overall, this
makes tasks for future developers who may inherit this work more complex.

D. Gallagher
18
The longer a manual process is followed, the more complex it is to obtain buy in from
management to spend time on automating this task.
2.6.2. Automated
In well-structured teams, there is evidence of rigid CI/CD processes.
• Code is stored in a code management tool
• Code is built using pipelines
• Automatic testing of the code is performed in the pipeline
• Every deployment to the cloud is automated
In a situation where an automated pipeline exists, it is more straightforward to extend a
pipeline to add code quality check tools, vulnerability checkers and other tools which may
improve the overall codebase.
By removing manual steps from the process of deploying code, it ensures there is an accurate,
repeatable process in place for deploying code to a production environment. As will be
discussed in a subsequent section, there are many benefits to using pipelines, not least the
amount of developer time that will be saved with not having to manually deploy code.
2.6.3. Hybrid
In a hybrid approach, there is an automated pipeline in place but certain steps in the process
require manual approval. To be fully confident of shipping code directly from source control
to production with no manual checking, requires a full suite of unit tests, integration tests and
performance tests. If a project is not at that stage of their evolution, the best that can be done
is to deploy code to a non-production environment, perform sanity checks / testing in that
environment before approving the deployment to production.
It would be ideal to be able to automatically deploy code to production but in cases where
this is not possible, the manually approval is a safeguard to ensure rogue code does not
inadvertently find its way into production.
The hybrid approach may be used in organizations that have a rigid change control process
involved for production installations. In this scenario, the manual step could be to enter a
ticket number for a fully approved change ticket before the change is deployed.
The following diagram highlights what an approval may look like in a sample GitHub actions
pipeline.

D. Gallagher
19
Figure 2.2. GitHub Actions - Manual Approval
Having an approach for managing code deployment is vital to ensuring there are processes in
place to automatically handle regional outages. Multiple developers can work on the overall
process, and it can be refined over time as well as shared with other groups. These approaches
play a small part in the overall process of achieving high availability.
2.7. Code Management
2.7.1. Single Developer Projects
When it comes to projects that involve just 1 developer, often speed of development is
treated as priority over following standards. It is effortless to develop code on a developer’s
machine, ignore unit tests and deploy the same code from the developer’s machine. In cases
where the developer may be developing a proof of concept for a larger design, this approach
is justified. In larger projects intended for production use, the pitfalls of ignoring standards
could decrease the quality of the generated project which over time may impact on the
product. Potential pitfalls that may be encountered by not following a set of standards
include:
• No source control system in place:
o Harder to onboard new developers to the project
o Potential loss of code if developers’ machine is lost/stolen / damaged
• No unit tests developed for project:
o Issues that could have been found and resolved with unit tests make their
way to production

D. Gallagher
20
o Potential to introduce defects with every release
o Low level of confidence that a slight code change will not have a negative effect
on the rest of the codebase
• Deploying code to production from a developer’s machine:
o The process to deploy code to production is only known by 1 developer
o Potential inconsistency in the artifact(s) deployed to production
In an article that focuses on software development for individual developers, the authors talk
about standards that apply to team projects (MIDS in this case) can be easily applied in single
developer projects without changing the core essence of the standard (de León-Sigg et al.
2018).
It is important that standards are followed where possible. An adoption of standards will not
only improve the overall quality of the code delivered, but it will also help to simplify the
onboarding of new developers to the project. When multiple developers are on the project,
the process of managing code changes will be simplified. In an article related to coding
practices, the authors discuss some techniques which can be used to improve code readability
which in turn will help to devise the coding standards for a project (dos Santos and Gerosa
2018). Using techniques in the paper by dos Santos and Gerosa (2018) will help to improve a
project whilst also helping to move away from the single developer mindset.
2.7.2. Source Control
The importance of using source control for any project cannot be understated. As discussed
in the previous section, using source control is a technique that can be used which can help
move a project away from the single developer mindset. By using a source control system, it
makes the process of collaboration amongst a team more straightforward. The collaboration
benefits of using a source control tool like git are evident in an article where the author
discusses using git to foster teamwork in the South African classroom (Blauw 2018).
Git can store code for small projects as well as large enterprise grade projects. It can be used
for projects developed in any language and has many features such as branching and pull
requests which can be used for developers collaborating on projects. When starting with git,
it is important the team members decide on the branching strategy to follow.

D. Gallagher
21
2.7.2.1. Branching Model: GitFlow
GitFlow is a branching strategy that employs the use of feature branches and multiple primary
branches (Atlassian n.d.). GitFlow utilizes branches that are longer lived and contain larger
commits. When using this strategy, developers can create feature branches and delay the
merging of code into the main branch until the feature is fully implemented. A downside of
this approach with long-lived feature branches is the increase in the collaboration required
amongst developers to merge changes. It is also straightforward for conflicting updates to be
introduced by developers. Refer to the diagram in figure 2.2 for an overview of the GitFlow
Branching Strategy.
GitFlow works best:
• For managing an open-source project as all code must be checked in pull requests
• When there are mostly junior developers on the team who can preview their changes
on long lived feature branches before merging into the main branch
• When the product that you are maintaining is well established as future changes are
minimal and need to be monitored closely.
Cases to avoid GitFlow are:
• When you are starting a project as the pull request process can slow down the task of
generating an MVP
• When you need to iterate quickly as the pull request process can get in the way
Figure 2.3. GitFlow Branching Strategy.

D. Gallagher
22
• When there are mostly senior developers on the team as they are trusted and should
be given the autonomy to do their job
2.7.2.2. Branching Model: Trunk
Trunk based development is a source control branching model which allows developers to
merge smaller, more frequent updates to the core main or trunk branch (paul-hammant n.d.).
As the trunk-based approach streamlines the merging and integration phases, it helps bring
about continuous integration and continuous deployment as well as increasing software
delivery. The diagram represented in Figure 2.3 gives an overview of the Trunk Based
Branching Model. High-performing engineering teams use the trunk-based development
strategy as it sets and maintains a simplified Git branching strategy for teams. It also gives
teams the flexibility and control over how and when software is delivered to customers.
Trunk based development works best for (‘Trunk-based Development vs. Git Flow’ n.d.):
• When a project is just starting up as it offers maximum development speed
• When you need to iterate quickly as the trunk based approached allows you to change
the product quickly when required
• When there are mostly senior developers on the team
Cases to avoid the Trunk based approach are:
• When you run open-source projects as those projects are more suited to GitFlow
• When the product is established, or you have large teams as strict control is required.
GitFlow is recommended for this scenario.
Figure 2.4. Trunk Based Branching Model.

D. Gallagher
23
• When there are mostly junior developers on the team
2.7.3. DevOps Code Pipelines
When it comes to deploying code, it is imperative to implement a pipeline strategy to achieve
the goal. A pipeline can take the manual steps away from deploying code and replace those
steps with a repeatable process. The use of a pipeline not only provides structure for
deployments, but they can also be used to run code quality checks, run various forms of tests
as well as drastically decrease the time duration required for a developer to deploy code to a
production environment.
The use of a pipeline is necessary when implementing Continuous Integration and Continuous
Deployment for a project. By having a pipeline that is executed regularly, it can supply a
benchmark for improving the overall quality of the project. A pipeline can be treated like a
code artifact which can evolve over time. In a recently reviewed article, the authors highlight
the importance of pipelines by stating they are mainly used for continuously executing steps
to ensure an application can be deployable at any time (Beetz and Harrer 2021).
A pipeline can be basic at the beginning with iterations taking place to add extra features for
tasks such as code validation or running tests. Once a pipeline structure is in place, there is
no limit to what can be achieved during the lifetime of the pipeline.
There are benefits to implementing pipelines but there are also challenges including:
• Choosing the pipeline technology to use. There is various open source as well as
commercial pipeline options available. Choosing the pipeline technology to use for a
project can be difficult.
• The ramp up time for developers to learn a particular pipeline technology or syntax
needs to be factored in when deciding on the technology to use.
• Maintaining the pipeline infrastructure if a self-hosted pipeline technology is chosen
• Managing the security for integrations (e.g., credentials for deploying to a cloud
provider)
• Finally, switching between pipeline options is not a trivial task and has the potential
to introduce the need for re-work on the code pipelines.
This research will show how a combination of DevOps processes and techniques will form
grounding for building a highly available solution.

D. Gallagher
24
2.8. Cloud Application Architecture
This section will examine the various application types that can be architected and developed
as part of this research. A key aim of this research is to implement a solution for deploying a
highly available application that will function in the event of sample test outages. It is this
authors opinion that the best way of achieving this is to develop a 3-Tier application. A 3-tier
application consists of a presentation layer (frontend), application layer (backend code) and
the data layer (data storage). A benefit of a 3-tier application is the ability to foster the reuse
of software components between various different applications (Abdelrahman et al. 2020).
When it comes to the various application tiers, each tier has their own responsibilities. The
frontend is the gateway to the world, it is the frontend that contains any user interfaces which
can be used by end customers. The frontend will contain visual screens which simplify the
process of interacting with the backend. There are a vast array of programming languages
and frameworks available to develop frontend applications with further technologies being
developed on a regular basis.
The backend performs the heavy lifting for the application. The backend runs any business
logic in response to events from the frontend. With the frontend being the gateway for
customers, the backend is the gateway to the required data. The backend can be accessed by
advanced users or systems using api calls but for most users, the interaction to the backend
is via the frontend. Like the frontend, there are a vast array of technologies and frameworks
which can be used to develop backend applications.
Finally, the data layer (database) is the most important part of any application. The data layer
has the responsibility of storing the data which is accessed by the backend processes and is
subsequently rendered to customers in the frontend. Every application has its own data
requirements, and these will be touched on in this section. There is a vast array of different
technology choices which are available when it comes to the data layer. This research will
expand on the 3-tier application and highlight key considerations that need to be
implemented to make an application highly available.

D. Gallagher
25
2.8.1. Frontend Application
2.8.1.1. Monolithic Frontend
A monolithic frontend application is a feature-rich, powerful browser-based application which
interacts with micro services in the backend. Over time the frontend layer grows and may be
developed by separate teams. In this situation, the frontend application becomes more
difficult to maintain as it grows and adds new functionality.
The diagram below in figure 2.4 depicts a high level architecture of a monolithic frontend for
a Shop application. As visible, there are multiple microservices in the backend but only the 1
frontend application which may be maintained and developed on by multiple teams.
The monolithic frontend is an anti-pattern which can occur over time on a frontend project.
Pavlenko discusses how a monolithic architectural style frontend is difficult to scale and in
some cases, impossible to scale (Pavlenko et al. 2020). Teams may still want to develop
features concurrently, but this may not be possible in all cases. Pavlenko et al. argue that the
use of micro frontends is a solution to this problem.
2.8.1.2. Micro-frontend
A micro-frontend is a pattern where web application user interfaces are composed from
independent fragments which may be built by different teams using a broad array of
Figure 2.5. Monolithic Frontend

D. Gallagher
26
technologies. A micro-frontend architecture resembles a micro service backend architecture
where the backend is composed of independent microservices.
Various approaches exist in which micro frontends can be implemented in terms of splitting
up functionality (Mezzalira 2021). In the horizonal split approach, multiple micro-frontends
can exist within the same UI view. Multiple teams will be responsible for distinct parts of the
view and must coordinate their efforts. This approach offers flexibility in that teams can share
functionality but also teams need to be careful to not introduce unnecessary micro-frontends
within the same project. This approach is suitable for large sites with an extensive feature set
such as shopping sites. A team could develop the catalogue where another team could
develop product recommendations.
The second approach is a vertical split, where the individual teams are accountable for a
particular problem domain. In this approach, it is harder to share code between teams, but it
allows flexibility in terms of deployments. These systems are developed as individual systems
but branded with a company header and footer to give the appearance of the systems
belonging together. This approach is suitable for systems such as company intranets where
different teams develop different intranet sites, but they all utilize a common theme such as
colours and fonts.
The horizontal and vertical split approaches both have the same end goal, which is to split up
the frontend code into smaller more manageable chunks. Various teams can potentially work
on different codebases.
Benefits of using micro-frontends include:
• Micro sites are technology agnostic – teams can use different technologies
• The generated applications are independent and self-contained
• Multiple teams can work on distinctive features
• Development and deployment of the individual micro-frontends may be faster

D. Gallagher
27
The diagram referenced in figure 2.5 depicts how the code for micro frontends can exist in
the same source control repository or different source control repositories. The goal of the
CI/CD pipelines process is around building the unique micro frontends which are combined
for the overall deployed frontend application.
2.8.1.3. Single Page Application (SPA)
A user interface that operates directly inside the browser which does not require a page
reload when navigating between pages is referred to as a Single Page Application. This is
achieved by the browser loading JavaScript chunks on page load which contains all the
required logic that the browser will be dependent on. For any requests to the backend for
data, these are done in an asynchronous fashion using ajax requests. Jadhav et al. discuss
creating a single application using AngularJS, however they do not delve into topics such as
server side rendering or authentication (Jadhav et al. n.d.). The focus on Jadhav’s articles is
purely around getting started with developing a single page application. In an article with
comparable topics on creating a single page application, the authors delve further into topics
such as performance and reuse of components which are important for modern day
applications (Gavrilă et al. 2019).
2.8.2. Backend Application
2.8.2.1. Monolith
Like the frontend monolith, a backend monolith is essentially 1 project which contains all the
business logic. All teams work on the same project and need to coordinate changes between
each other. Any code changes to 1 module in the monolith could influence the other services
Figure 2.6. Micro Frontend Architecture

D. Gallagher
28
within the project. As the project grows, changes become harder to develop as well as cover
with automated testing. Routine maintenance of the project could become a more complex
task for developers.
When deploying monolithic backend applications to the cloud, the deployment options are
limited due to the size of the artifact to deploy and other constraints within the application.
Creating monolithic applications make the development process more straightforward than
creating microservices. Monolithic applications offer easier development and deployment
options but are lacking when it comes to complex maintenance, reliability, availability and
difficulties in scaling a monolith (Gos and Zabierowski 2020).
2.8.2.2. Microservice
A microservice architecture is where an application is structured as a collection of smaller
services that have the following attributes:
• Are highly testable and maintainable
• Have a more straightforward development process than with monolith applications
• Can be deployed independently of the other services
• Are loosely coupled
• Are focused on a business capability
• Are owned by a small team
In the world of agile, microservices is a great enabler for rapid, frequent, and reliable delivery
of applications to a production environment. In cases where microservices need to
communicate with each other, challenges can arise. In those scenarios, queues or message
bus technologies can be used for asynchronous communication. For synchronous
communication, a method of http calls with associated retries will need to be implemented.
There are a wide array of technologies which can be used for microservices, as well as steps
for migrating to microservices (Larrucea et al. 2018). Larrucea et al discuss the pitfalls of
microservices, but it is this authors opinion they could have discussed the complexities of
managing highly available applications when it relates to micro-services.

Damien Gallagher Dissertation

Recommended

Recommended

More Related Content

Similar to Damien Gallagher Dissertation

Similar to Damien Gallagher Dissertation (20)

Recently uploaded

Recently uploaded (20)

Damien Gallagher Dissertation