SlideShare a Scribd company logo
Architecting Cloud Applications for High Availability
Damien Gallagher
M.Sc. in
Computing in
DevOps 2022
Computing Department, Atlantic Technological University, Port Road, Letterkenny, Co. Donegal,
Ireland.
Architecting Cloud Applications for High Availability
Author: Damien Gallagher
Supervised by: Ruth G. Lennon
A thesis submitted in partial fulfilment of the requirements for the
Master of Science in Computing in DevOps
Submitted to Quality and Qualifications Ireland (QQI)
Dearbhú Cáilíochta agus Cáilíochtaí Éireann January 2022
1
Declaration
I hereby certify that the material, which l now submit for assessment on the programmes of
study leading to the award of Master of Science in Computing in DevOps, is entirely my own
work and has not been taken from the work of others except to the extent that such work has
been cited and acknowledged within the text of my own work. No portion of the work
contained in this thesis has been submitted in support of an application for another degree
or qualification to this or any other institution. I understand that it is my responsibility to
ensure that I have adhered to ATU’s rules and regulations.
I hereby certify that the material on which I have relied on for the purpose of my assessment
is not deemed as personal data under the GDPR Regulations. Personal data is any data from
living people that can be identified. Any personal data used for the purpose of my assessment
has been pseudonymised and the data set and identifiers are not held by ATU. Alternatively,
personal data has been anonymised in line with the Data Protection Commissioners
Guidelines on Anonymisation.
I consent that my work will be held for the purposes of education assistance to future students
and will be shared on the ATU Computing website (www.lyitcomputing.com) and Research
THEA website (https://research.thea.ie/). I understand that documents once uploaded onto
the website can be viewed throughout the world and not just in Ireland. Consent can be
withdrawn for the publishing of material online by emailing Thomas Dowling; Head of
Department at Thomas.Dowling@atu.ie to remove items from the ATU Computing website
and by emailing Denise McCaul; Systems Librarian at Denise.McCaul@atu.ie to remove items
from the Research THEA website. Material will continue to appear in printed formats once
published and as websites are public medium, ATU cannot guarantee that the material has
not been saved or downloaded.
Signature of Candidate Date
1
Acknowledgements
I would like to thank my supervisor, Ruth Lennon for all the guidance and support she has
provided throughout this process. I would also like to mention the colleagues who I worked
with throughout the master’s program. We did not know each other prior to the program but
we have become close over the past few years. Finally, I would like to dedicate this
dissertation to my wonderful wife Tracy as well as my amazing sons, Jayden and Logan. They
gave me the time I needed to complete this program as well as listen to my thoughts about
the topic I was currently exploring. I look forward to watching them grow and pursue their
own careers in the future.
D. Gallagher
2
Abstract
Outages with a cloud provider can lead to applications becoming unavailable and costing
organizations in terms of loss of earnings as well as damaging their reputation. Avoidable
application outages can also occur due to misaligned DevOps processes.
This dissertation aims to prove that consistent DevOps processes as well as regional based
application monitoring can help to maintain highly available applications for organizations.
Cloud providers aim to a guarantee a certain level of uptime for their services, but they too
encounter unforeseen outages.
This dissertation discusses the various tools and techniques that can be employed to help
with maintaining the high availability of an application. An accompanying research artifact
also demonstrates the techniques that are discussed as well as outlining the various tests that
were executed against the developed solution.
The output of the practical was successful in that outages due to inconsistent DevOps
processes and outages at a regional level were mitigated. It was identified that some cloud
services were not suitable for a certain subset of applications that require high availability
and overall, the costs can increase exponentially.
This research has proven that a highly available application architecture is possible when
deployed on a cloud platform. This research focuses on 1 small area of the overall problem
domain, but it has the potential to form a basis to experiment in highly available applications
for other problem domains as well managing highly available applications when deployed on
other cloud providers.
D. Gallagher
3
Table of Contents
Declaration........................................................................................................................................ 1
Acknowledgements........................................................................................................................... 1
Abstract ............................................................................................................................................ 2
Table of Contents .............................................................................................................................. 3
Table of Figures................................................................................................................................. 9
Table of Tables ................................................................................................................................ 10
Table of Code Listings...................................................................................................................... 10
Nomenclature ................................................................................................................................. 11
1. Introduction .............................................................................................................................. 1
1.1. Purpose.......................................................................................................................... 2
1.2. Background.................................................................................................................... 3
1.3. Problem Statement ........................................................................................................ 3
1.4. Research Question ......................................................................................................... 4
1.5. Scope and Limitations .................................................................................................... 5
1.6. Methodological Approach .............................................................................................. 5
1.7. Report Outline................................................................................................................ 6
2. Literature Survey ....................................................................................................................... 7
2.1. Overview of Cloud Computing........................................................................................ 7
2.1.1. Why deploy code to the Cloud vs On Premises ............................................................... 7
2.1.2. Challenges faced when deploying to the Cloud............................................................... 8
2.1.3. Types of Outages............................................................................................................ 9
2.2. Infrastructure............................................................................................................... 10
2.2.1. Virtual Machines .......................................................................................................... 10
2.2.2. Cloud Service Offerings ................................................................................................ 10
2.2.3. Cloud Service Decision Matrix ...................................................................................... 11
2.3. High Availability and Disaster Recovery ........................................................................ 11
2.3.1. Disaster Recovery Metrics ............................................................................................ 12
2.3.2. Backup and Restore...................................................................................................... 12
2.3.3. Pilot Light..................................................................................................................... 13
2.3.4. Warm Standby ............................................................................................................. 13
2.3.5. Multi-site Active / Active.............................................................................................. 14
2.3.6. Comparison of Disaster Recovery Options .................................................................... 14
D. Gallagher
4
2.4. Multi Region Architecture Considerations .................................................................... 15
2.4.1. Application................................................................................................................... 15
2.4.2. Database...................................................................................................................... 15
2.5. Region Failover Considerations..................................................................................... 16
2.6. Code Deployment......................................................................................................... 17
2.6.1. Manual......................................................................................................................... 17
2.6.2. Automated................................................................................................................... 18
2.6.3. Hybrid.......................................................................................................................... 18
2.7. Code Management....................................................................................................... 19
2.7.1. Single Developer Projects............................................................................................. 19
2.7.2. Source Control ............................................................................................................. 20
2.7.2.1. Branching Model: GitFlow ............................................................................................ 21
2.7.2.2. Branching Model: Trunk ............................................................................................... 22
2.7.3. DevOps Code Pipelines................................................................................................. 23
2.8. Cloud Application Architecture..................................................................................... 24
2.8.1. Frontend Application.................................................................................................... 25
2.8.1.1. Monolithic Frontend .................................................................................................... 25
2.8.1.2. Micro-frontend ............................................................................................................ 25
2.8.1.3. Single Page Application (SPA) ....................................................................................... 27
2.8.2. Backend Application..................................................................................................... 27
2.8.2.1. Monolith ...................................................................................................................... 27
2.8.2.2. Microservice................................................................................................................. 28
2.8.2.3. Serverless..................................................................................................................... 29
2.8.3. Database...................................................................................................................... 30
2.8.3.1. Relational Database...................................................................................................... 30
2.8.3.2. NoSQL.......................................................................................................................... 30
2.8.3.3. File / Object Storage..................................................................................................... 31
2.9. Networking .................................................................................................................. 32
2.9.1. Load Balancing ............................................................................................................. 33
2.9.2. Content Delivery Network (CDN).................................................................................. 33
2.10. Monitoring................................................................................................................... 34
2.10.1. Active Monitoring......................................................................................................... 34
2.10.2. Passive Monitoring....................................................................................................... 36
2.11. Observability................................................................................................................ 36
D. Gallagher
5
2.12. Security and Regulatory Compliance ............................................................................ 37
2.12.1. Security........................................................................................................................ 37
2.12.2. Compliance .................................................................................................................. 38
2.13. Chapter Conclusions..................................................................................................... 39
3. Design ..................................................................................................................................... 41
3.1. System Context Diagram .............................................................................................. 41
3.2. Cloud Provider ............................................................................................................. 42
3.2.1. Amazon Web Services (AWS)........................................................................................ 42
3.2.2. Azure............................................................................................................................ 43
3.2.3. Google Cloud Platform (GCP)........................................................................................ 43
3.2.4. Selection Criteria.......................................................................................................... 44
3.2.5. Final Selection Justification........................................................................................... 44
3.3. Cloud Infrastructure ..................................................................................................... 45
3.3.1. Virtual Machines (VM).................................................................................................. 45
3.3.2. Containers.................................................................................................................... 46
3.3.3. Serverless..................................................................................................................... 47
3.3.4. Selection Criteria.......................................................................................................... 48
3.3.5. Final Selection Justification........................................................................................... 48
3.4. DevOps Tooling ............................................................................................................ 48
3.4.1. Source Control ............................................................................................................. 49
3.4.1.1. GitHub.......................................................................................................................... 49
3.4.1.2. AWS CodeCommit........................................................................................................ 49
3.4.1.3. Gitea ............................................................................................................................ 50
3.4.1.4. Selection Criteria.......................................................................................................... 50
3.4.1.5. Final Selection Justification........................................................................................... 50
3.4.2. Pipelines....................................................................................................................... 51
3.4.2.1. GitHub Actions ............................................................................................................. 51
3.4.2.2. AWS CodePipeline........................................................................................................ 52
3.4.2.3. Jenkins ......................................................................................................................... 52
3.4.2.4. Selection Criteria.......................................................................................................... 53
3.4.2.5. Final Selection Justification........................................................................................... 53
3.4.3. Infrastructure As Code (IAC)......................................................................................... 54
3.4.3.1. AWS CloudFormation................................................................................................... 54
3.4.3.2. Cloud Development Kit (CDK)....................................................................................... 54
D. Gallagher
6
3.4.3.3. Terraform..................................................................................................................... 55
3.4.3.4. Selection Criteria.......................................................................................................... 56
3.4.3.5. Final Selection Justification........................................................................................... 56
3.5. Application Architecture............................................................................................... 56
3.5.1. Frontend Application.................................................................................................... 56
3.5.1.1. HTML and JavaScript .................................................................................................... 57
3.5.1.2. React............................................................................................................................ 57
3.5.1.3. Angular ........................................................................................................................ 58
3.5.1.4. Selection Criteria.......................................................................................................... 59
3.5.1.5. Final Selection Justification........................................................................................... 59
3.5.2. Backend Application..................................................................................................... 59
3.5.2.1. Java.............................................................................................................................. 59
3.5.2.2. NodeJS......................................................................................................................... 60
3.5.2.3. Python ......................................................................................................................... 60
3.5.2.4. Selection Criteria.......................................................................................................... 61
3.5.2.5. Final Selection Justification........................................................................................... 61
3.5.3. Database...................................................................................................................... 62
3.5.3.1. Relational Database...................................................................................................... 62
3.5.3.2. NoSQL.......................................................................................................................... 63
3.5.3.3. File / Object Storage..................................................................................................... 64
3.5.3.4. Selection Criteria.......................................................................................................... 64
3.5.3.5. Final Selection Justification........................................................................................... 64
3.6. Application High Availability ......................................................................................... 65
3.6.1. Static IP Address........................................................................................................... 65
3.6.2. Load Balancer............................................................................................................... 65
3.6.3. AWS Global Accelerator ............................................................................................... 66
3.6.4. Amazon API Gateway ................................................................................................... 67
3.6.5. Amazon Route53.......................................................................................................... 67
3.6.6. Amazon CloudFront...................................................................................................... 68
3.6.7. Selection Criteria.......................................................................................................... 68
3.6.8. Final Selection Justification........................................................................................... 68
3.7. Application Monitoring and Observability..................................................................... 69
3.7.1. Active Monitoring......................................................................................................... 69
3.7.2. Passive Monitoring....................................................................................................... 69
D. Gallagher
7
3.7.3. Observability................................................................................................................ 70
3.7.4. Selection Criteria.......................................................................................................... 70
3.7.5. Final Selection Justification........................................................................................... 70
3.8. Pre-Implementation Details.......................................................................................... 71
3.9. Chapter Conclusions..................................................................................................... 71
4. Implementation....................................................................................................................... 72
4.1. System Context Diagram .............................................................................................. 72
4.1.1. Single Region Deployment............................................................................................ 72
4.1.2. Multi Region Deployment............................................................................................. 74
4.2. Cloud Provider ............................................................................................................. 76
4.3. Cloud Infrastructure ..................................................................................................... 77
4.4. DevOps Tooling ............................................................................................................ 77
4.4.1. Source Control ............................................................................................................. 77
4.4.2. Pipelines....................................................................................................................... 77
4.4.3. Infrastructure As Code (IAC)......................................................................................... 80
4.5. Application Architecture............................................................................................... 81
4.5.1. Frontend Application.................................................................................................... 81
4.5.2. Backend Application..................................................................................................... 82
4.5.3. Database...................................................................................................................... 83
4.6. Application High Availability ......................................................................................... 84
4.6.1. Frontend Application.................................................................................................... 85
4.6.2. Backend Application..................................................................................................... 86
4.7. Application Monitoring and Observability..................................................................... 87
4.7.1. Active Monitoring......................................................................................................... 87
4.7.2. Passive Monitoring....................................................................................................... 88
4.8. Chapter Conclusion ...................................................................................................... 89
5. Results..................................................................................................................................... 91
5.1. Test Strategy ................................................................................................................ 91
5.2. Application Unit Testing ............................................................................................... 91
5.2.1. Frontend Application.................................................................................................... 91
5.2.2. Backend Application..................................................................................................... 92
5.3. Using Integration Tests to Assist with High Availability ................................................. 92
5.4. Using Functional Tests to Validate a DevOps Pipeline ................................................... 94
5.5. Performance Testing of Cloud Applications .................................................................. 95
D. Gallagher
8
5.5.1. Performance Test Results............................................................................................. 96
5.5.2. Performance Test Observations ......................................................................................... 97
5.6. Load Testing Highly Available Cloud Applications .................................................................. 97
5.6.1. Load Test Results ............................................................................................................... 98
5.6.2. Load Test Observations...................................................................................................... 99
5.7. Performing Security Testing on the Cloud ............................................................................. 99
5.8. UI Testing of Multi Region Cloud Applications..................................................................... 101
5.9. API Testing of Multi Region Cloud Applications ................................................................... 103
5.10. Chaos Testing of Multi Region Cloud Applications ............................................................. 105
5.11. Cost Analysis of Highly Available Cloud Applications.......................................................... 105
5.12. Chapter Conclusion........................................................................................................... 108
6. Conclusions ........................................................................................................................... 109
6.1. Conclusions on the State of the Art ............................................................................ 109
6.2. Conclusions on Practical Element ............................................................................... 109
6.2.1. Technologies to Enable High Availability..................................................................... 110
6.2.2. Tools and Techniques for Maintaining High Availability .............................................. 111
6.2.2. Effects of Developing for High Availability .................................................................. 112
6.3. Limitations Discussion ................................................................................................ 113
6.4. Further Work.............................................................................................................. 114
Appendices........................................................................................................................................ 1
Appendix A: References.................................................................................................................... 1
Appendix B: Code Listing .................................................................................................................. 6
Appendix C: Test Project Locally ....................................................................................................... 7
Appendix D: Configure AWS Real User Monitoring............................................................................ 8
D. Gallagher
9
Table of Figures
FIGURE 1.1. SOFTWARE DELIVERY SCHEDULE...............................................................................................................1
FIGURE 2.1. CLOUD DISASTER RECOVERY OPTIONS .....................................................................................................14
FIGURE 2.2. GITHUB ACTIONS - MANUAL APPROVAL ..................................................................................................19
FIGURE 2.3. GITFLOW BRANCHING STRATEGY. ..........................................................................................................21
FIGURE 2.4. TRUNK BASED BRANCHING MODEL.........................................................................................................22
FIGURE 2.5. MONOLITHIC FRONTEND......................................................................................................................25
FIGURE 2.6. MICRO FRONTEND ARCHITECTURE..........................................................................................................27
FIGURE 2.7. SYNTHETIC MONITORING .....................................................................................................................35
FIGURE 3.1. SYSTEM CONTEXT DIAGRAM – HIGH LEVEL ...............................................................................................42
FIGURE 3.2. CLOUD PROVIDERS MARKET SHARE (‘INFOGRAPHIC: CLOUD MARKET SHARE’ N.D.)............................................45
FIGURE 3.3. GITHUB ACTIONS - SAMPLE WORKFLOW .................................................................................................51
FIGURE 4.1. SYSTEM CONTEXT DIAGRAM - SINGLE REGION DEPLOYMENT.........................................................................73
FIGURE 4.2. SYSTEM CONTEXT DIAGRAM - MULTI REGION DEPLOYMENT .........................................................................75
FIGURE 4.3. FRONTEND APPLICATION PIPELINE. .........................................................................................................78
FIGURE 4.4. GITHUB ACTIONS BACKEND PIPELINE - DEPLOY APPLICATION STEPS................................................................79
FIGURE 4.5. TERRAFORM CLOUD CONSOLE. ..............................................................................................................81
FIGURE 4.6. FRONTEND APPLICATION......................................................................................................................82
FIGURE 4.7. DYNAMODB GLOBAL TABLES. ...............................................................................................................84
FIGURE 4.8. FRONTEND GLOBAL ACCELERATOR ENDPOINT GROUPS................................................................................85
FIGURE 4.9. CLOUDFRONT DISTRIBUTIONS FOR FRONTEND APPLICATION. ........................................................................86
FIGURE 4.10. BACKEND APPLICATION LOAD BALANCER - US-EAST-1 REGION.....................................................................86
FIGURE 4.11. FRONTEND APPLICATION CANARY TEST RESULTS – NO ISSUES. ....................................................................87
FIGURE 4.12. FRONTEND APPLICATION CANARY TEST RESULTS – ISSUE IN A REGION...........................................................88
FIGURE 4.13. FRONTEND APPLICATION CANARY TEST RESULTS – ISSUE RESOLVED..............................................................88
FIGURE 4.14. REAL USER MONITORING REPORT - US-EAST-1. .......................................................................................89
FIGURE 5.1. PIPELINE FAILED - FRONTEND TEST COVERAGE THRESHOLD NOT SATISFIED.......................................................91
FIGURE 5.2. PIPELINE SUCCESSFUL - FRONTEND TEST COVERAGE THRESHOLD SATISFIED.......................................................91
FIGURE 5.3. PIPELINE FAILED - BACKEND TEST COVERAGE THRESHOLD NOT SATISFIED.........................................................92
FIGURE 5.4. PIPELINE SUCCESSFUL - BACKEND TEST COVERAGE THRESHOLD SATISFIED ........................................................92
FIGURE 5.5. INTEGRATION TESTS - SYSTEM RUNNING NORMALLY ...................................................................................93
FIGURE 5.6. INTEGRATION TESTS – FRONTEND APPLICATION NOT FUNCTIONING CORRECTLY ................................................94
FIGURE 5.7. FUNCTIONAL TEST FAILED AFTER APPLICATION DEPLOYMENT........................................................................95
FIGURE 5.8. PERFORMANCE TEST RESULTS - PERCENTILE RESPONSE TIMES. ......................................................................96
FIGURE 5.9. PERFORMANCE TEST RESULTS - HIGH LEVEL STATISTICS. ..............................................................................96
FIGURE 5.10. LOAD TEST RESULTS - HIGH LEVEL STATISTICS..........................................................................................98
FIGURE 5.11. LOAD TEST RESULTS - PERCENTILE RESPONSE TIMES..................................................................................98
FIGURE 5.12. CURRENT SECURITY ISSUES REPORTED BY SNYK. .....................................................................................100
FIGURE 5.13. ISSUE INFORMATION PROVIDED BY SNYK. .............................................................................................100
FIGURE 5.14. SELENIUM TEST FAILED. ...................................................................................................................102
FIGURE 5.15. SELENIUM TEST PASSED. ..................................................................................................................102
FIGURE 5.16. POSTMAN USER INTERFACE ..............................................................................................................103
FIGURE 5.17. POSTMAN REPORTS FAILED TESTS IN PIPELINE. ......................................................................................104
FIGURE 5.18. POSTMAN REPORTS ALL TESTS SUCCESSFUL IN PIPELINE...........................................................................104
D. Gallagher
10
Table of Tables
TABLE 1. CLOUD SERVICE DECISION MATRIX..............................................................................................................11
TABLE 2. AWS CONTAINER OFFERINGS....................................................................................................................47
TABLE 3. AWS NOSQL OFFERINGS.........................................................................................................................63
TABLE 4. AWS LOAD BALANCER OPTIONS. ...............................................................................................................66
TABLE 5. AWS OBSERVABILITY SERVICES ..................................................................................................................70
TABLE 6. FRONTEND APPLICATION COST ESTIMATION. ...............................................................................................106
TABLE 7. BACKEND APPLICATION COST ESTIMATION..................................................................................................106
TABLE 8. COMBINED APPLICATION COST ESTIMATION – SINGLE REGION.........................................................................107
TABLE 9. COMBINED APPLICATION COST ESTIMATION – MULTIPLE REGIONS. ..................................................................107
TABLE 10. CODE REPOSITORIES. ...............................................................................................................................6
Table of Code Listings
CODE LISTING 1. PSEUDO CODE FOR CODE DEPLOYMENTS............................................................................................80
CODE LISTING 2. SNIPPET OF BACKEND LAMBDA CODE.................................................................................................83
CODE LISTING 3. TEST FRONTEND CODE LOCALLY..........................................................................................................7
CODE LISTING 4. INSTALL RUM JAVASCRIPT LIBRARY.....................................................................................................8
CODE LISTING 5. RUM FRONTNED CODE SNIPPET. .......................................................................................................8
D. Gallagher
11
Nomenclature
Acronym Definition Page
AWS Amazon Web Services 2
Azure Microsoft Azure 2
AWS Region An AWS region is a location positioned in a disperse around the world where AWS
will cluster their data centres
4
AWS Availability
Zone
A logical grouping of data centres in an AWS Region 4
React React is a JavaScript library that is used for creating rich graphical user interfaces 5
DynamoDB DynamoDB is a NoSQL database provided by AWS that is both fast and flexible. 5
Container A container is a lightweight, standalone piece of software for packaging code
along with the required dependencies. Containers enable an application to run
quickly from 1 computing environment to another
6
Serverless Serverless is a development model for the cloud that allows developers to create,
build, run and support applications whilst having to manage servers in the
backend.
6
Active Directory
(AD)
Active Directory contains detailed information about user objects on a network
and makes it straightforward for administrators and users to access this
information
8
RTO Recovery Time Objective (RTO) is the time duration where a process must be
restored to production normal after a disaster situation to mitigate against any
adverse consequences associated with the disaster
12
RPO Recovery Point Objective (RPO) is the tolerable quantity of data which could be
wiped or must be re-instated after application downtime.
12
DR Disaster recovery (DR) is the potential ability of an organization to respond to and
recover from a disaster event which could impact upon the normal running of the
businesses processes. The goal of DR is to resume normal operations with an
organizations IT infrastructure as quickly as possible after a disaster occurs.
13
DNS Domain Name System (DNS) is how IP addresses associated with online services
are mapped to domain names which are more human readable. DNS translates
the domain name to IP addresses to the required content can be loaded in a
browser.
15
MVP A minimum viable product (MVP) is a variant of a product that contains a subset
of features to be evaluated by early customers to obtain feedback for future
product development.
21
API Api is the abbreviation for Application Programming Interface. It is a collection of
protocols and definitions for the development and integration of software
applications.
24
D. Gallagher
12
YAML YAML stands for ‘Yet Another Markup Language’. It is a data-serialization language
that is human-readable and extensively used in data transmission as well as
storage applications. YAML is also frequently used in application configuration
files.
51
JSON JavaScript Object Notation (JSON) is based on the object syntax of JavaScript. It is
a text-based format for encoding structured data.
54
PWA PWA stands for Progressive Web Applications. They are applications developed
with manifests, service workers and other web-platform features to give end
users an experience that may be similar in nature to native applications.
58
OSI Model The Open Systems Interconnection model or OSI model serves as a common
communication standard that can be used for communication with different
computer systems
66
RESTful API Also known as a REST Api. It is an application programming interface that satisfies
the constraints of the REST architectural style and allows for communication
between various systems.
67
WebSocket API The WebSocket Api is a technology that facilitates an open 2-way interactive
communication session between the browser of a user and a backend server.
Messages are sent to a backend server and the browser can receive event-driven
responses without the need to poll the backend server for a reply
67
CRUD The term CRUD refers to applications that perform simple read, write, update and
delete operations on a database table.
81
UI The UI or User Interface is the entry point for human-computer communication and
interaction on a device. This can include keyboards, a mouse, display screens and
the appearance of a desktop
101
GUI A GUI or Graphical User Interface is a screen through which a user interacts with
electronic devices such as smartphones or computers using menus, icons and other
graphics or visual indicators.
101
D. Gallagher
1
1. Introduction
Recent improvements in software development tooling have allowed small companies as well
as large organizations to deliver software faster than ever before. From defining the initial
requirements, to development of the feature, including shipping the feature can take
anything from hours to months. Some companies have advanced DevOps processes with
thorough automated test suites which are reliable whilst also allowing features to be
delivered without human interaction (‘Continuous Integration, Delivery, and Deployment’
n.d.:21). Other companies have no pipelines or automated tests, with much of the testing
completed manually. This becomes an issue with regression testing as quite often, defects
can be introduced and not noticed if most of the testing is done manually. Figure 1.1 depicts
an image that is commonplace in presentation decks for large organizations which depicts the
excepted delivery date of a feature. Often, tasks such as pipeline development or code quality
checks are neglected to hit pre-defined deadlines. This may result in the code artifact getting
shipped on schedule, but over time when code changes need to be introduced, skipping the
tasks to meet the original schedule come back to haunt the development teams.
Figure 1.1. Software Delivery Schedule
D. Gallagher
2
The main goal in common with organizations, it is vital their software solutions are highly
available to their end users. If a system is not available when a customer wants to use it, they
may never return to use that system again.
Reasons a system may not be available include:
• A code defect may have been introduced
• A database may become unresponsive
• A server the code is deployed to is experiencing issues.
These issues are when the code is already deployed, issues could also arise during the
deployment of new code or features. Any of these issues will result in a negative impact for
the reputation of that organization. For systems in the social media space, this may not be
such a significant problem , but reputation is extremely important when working in sectors
such as financial services.
Whilst reviewing various cloud providers such as Azure (‘Cloud Computing Services |
Microsoft Azure’ n.d.) or AWS (‘Cloud Services - Amazon Web Services (AWS)’ n.d.), services
are available which limit the impact of servers becoming unavailable. Both providers offer a
significant range of services which are being expanded on constantly. AWS and Azure both
have Well Architected frameworks (‘AWS Well-Architected Framework’ n.d.; ‘Microsoft Azure
Well-Architected Framework’ n.d.) which can be used be by clients as a template to improve
the quality of their workloads. The aim for systems is to ensure they are highly available. It is
also not uncommon for issues to exist with the services provided by cloud providers. This
dissertation introduces various approaches and patterns that can be followed to ensure high
availability for many aspects of a cloud application.
1.1. Purpose
High availability is vitally important for applications, regardless of the size of the customer
base. Systems such as social media or banking platforms are developed to be used by
customers on the customers’ schedule. As discussed in the introduction, there are reasons
why a system may not be available. This research presents an approach for architecting
applications along with the associated DevOps processes to minimize any disruptions to a
system. Following typical architectures will provide a good level of availability but as we have
seen with recent AWS outages (Moss n.d.), it is possible for an issue with the cloud provider
D. Gallagher
3
to bring down your application. This dissertation will explore approaches that can be followed
with pipelines to ensure a code deploy will not affect an application. It will also examine a
fault tolerant architecture for a 3-tier application including how issues may be identified in an
automated fashion before being reported by the end users.
1.2. Background
With the ever-increasing need for software features to get delivered quickly, perform in a
highly available fashion, it is imperative that solutions are setup in a way to facilitate this. This
starts with the DevOps process around how the code is stored, the pipelines that are used to
package, test, quality check along with deploying the code. Within the chosen cloud provider,
how that code is deployed, including serving that code to end customers is also a key
consideration. It is straightforward to deploy code onto a single server as well as expose an
endpoint for customers. However, if the traffic increases to that service, or there is a
disruption with the cloud provider, that application could easily become unavailable.
Techniques such as application auto scaling can be setup to avoid this situation but there are
other factors that need to be managed to avoid a complete outage.
End users may become frustrated when an application they want to use is not available.
Ideally, system issues should be resolved before impacting upon users. Techniques such as
synthetic monitoring (‘Using synthetic monitoring - Amazon CloudWatch’ n.d.) provide a
mechanism to identify issues in a scheduled fashion. Systems such as static websites often
are neglected when it comes to automated monitoring whereas backend applications tend to
have monitoring setup. As with automated testing, the more automated monitoring that is in
place, the more beneficial it is for an application. Automated testing increases the likelihood
that errors or outages will have been identified automatically before users try to access the
system.
1.3. Problem Statement
With the vast adoption of cloud services, organizations are deploying their code with a trusted
cloud provider. This brings its own benefits such as not having to manage servers, reduced
costs, plus the ability to spin up systems in a fast manner without having to purchase servers.
However, as was seen with the AWS issues on the 6th December 2021 (Moss n.d.) and on the
22nd December 2021 (Moss n.d.), more needs to be done to ensure applications are highly
D. Gallagher
4
available. Both outages in the us-east-1 region caused major disruptions for hundreds of
reputable brands around the world. An AWS region is a physical area where AWS data centres
are clustered throughout the world (‘Global Infrastructure Regions & AZs’ n.d.).
The problem statement is:
Recent outages in the AWS us-east-1 region lasting 5 and 8 hours
respectively have impacted on applications such as Slack, IMDb and
McDonalds, architecting cloud applications to be highly available will limit
the impact of regional outages in the future.
As identified with the outages, the reliance with the us-east-1 region is huge. It is the default
region when a new AWS account is created. AWS has the concept of availability zones. An
availability zone is an isolated location within an AWS region (‘Global Infrastructure Regions
& AZs’ n.d.). An application can be deployed in numerous availability zones within a specific
AWS region to help with high availability. But this will not guarantee high availability. If a
natural disaster such as an earthquake or tsunami occurred close to an AWS datacentre for a
region, there is every possibility that every availability zone in that region could become
compromised. It is important to remember that achieving high availability can be costly.
Implementing high availability must align with the main objectives that a business is aiming
to achieve (Sarkar and Shah 2018). This dissertation will explore what it takes to deploy
applications across multiple cloud regions.
This will raise questions in terms of how the data is stored in databases across multiple
regions, traffic management between these regions. How code is deployed to these regions
in a safe, efficient manner will also be explored.
This dissertation will also look at how a region can be taken out of service if issues are
occurring in that region or scheduled maintenance is occurring. Overall, the goal is to ensure
that if a user wants to interact with the system, it is available no matter what event is
occurring across the network.
1.4. Research Question
Ensuring an application is highly available, accounts for most outage scenarios, including
those with a cloud provider is not a straightforward task. Costs may well increase by
architecting an application to account for every outage scenario, but it may be a small price
to pay to maintain a solid reputation.
D. Gallagher
5
The research question considered in this research is:
Can DevOps play a part in architecting an application to withstand various outages
which may occur in a cloud environment?
To answer this research question, 3 aims were identified. These are:
1. Identity key characteristics which indicate high availability of an application with
reference to how it is different in a cloud-based environment.
2. Identify key processes that can be applied to enable high availability of applications
deployed in the cloud.
3. Design and implement a solution to deploy a highly available application that will
function in the event of sample test outages
1.5. Scope and Limitations
GitHub actions will be used to create the necessary pipelines. This will integrate with the
source code located in GitHub to provide a seamless integration between the pipelines and
the code.
AWS will be used as the cloud provider to deploy a 3-tier application in multiple geographically
disperse regions. Azure and others provide similar functionality but due to limited scope, this
dissertation will be restricted to AWS.
AWS tooling such as CloudWatch will be used to monitor the application whilst also ensuring
any outages are identified before consumers identify the issues. In the proposed solution, if
an issue is encountered, scripts will be executed to bring an AWS region out of the pool which
accepts traffic. Research will be done on open-source tooling that can be used to assist with
monitoring.
1.6. Methodological Approach
A sample 3-tier application will be created which will have a React frontend, Python backend,
talking to a DynamoDB database. This application will be stored in GitHub separate
repositories for the frontend and backend. A series of GitHub actions will be created for the
various tiers which will be used for deploying the application onto AWS. Any cloud
infrastructure created will use an Infrastructure as Code tool such as Terraform or AWS CDK
to ensure the process is repeatable.
D. Gallagher
6
For testing of outages or errors, the AWS CloudWatch toolset will be used which will also be
setup with the chosen Infrastructure as Code tool. The final delivered artifact will be a
reference implementation which can be used for any programming language to architect
highly available applications.
1.7. Report Outline
A review of current literature related to the subject matter was conducted and is discussed in
chapter 2, covering Cloud Computing, Microservices infrastructure, containers, serverless as
well as DevOps. The design and implementation of architecting a highly available application
is covered in chapters 3 and 4, answering the research question posed in chapter 1. Chapter
6 concludes the dissertation by discussing the architected solution of architecting highly
available applications which suggests further work and experiments which could occur in the
future.
D. Gallagher
7
2. Literature Survey
The research in this chapter will initially focus on cloud computing and how companies have
evolved from on premise deployments to using the cloud. There is a focus on highlighting the
challenges that exist when deploying to the cloud before focusing on the various cloud service
offerings. Disaster recovery is discussed in terms of how it can relate to cloud environments
along with the various considerations which need to be thought of for multi region
deployments as well as region failover scenarios. The research then looks at the DevOps
process around code management, the pipelines involved, before discussing the makeup of a
cloud application which can prove out the research question. This dissertation will have a
focus on best practices and techniques for architecting applications that are highly available
when deployed to a cloud environment.
2.1. Overview of Cloud Computing
2.1.1. Why deploy code to the Cloud vs On Premises
When the decision arrives to deploy an application on the cloud or deploy to on premise
servers, that decision needs to be evaluated on a per organization basis. In the past, larger
organizations would have had the resources to spin up their own datacentres to deploy their
applications. Applications as well as databases for the organization would live in that 1
datacentre and there may also exist a disaster recovery site. However, those servers would
have to be accounted for, those servers came with costs associated with licencing, power,
rack space as well as costs associated with the upkeep of the building the servers are located
in. The duration of time it takes to provision a new server in a datacentre ranged from weeks
to months which meant the need for substantial planning to be made in advance prior to an
application being deployed to production.
When deploying applications on the cloud, it is a different mindset. The costs that exist per
server for power, rack space is no longer present when deploying to the cloud. The upfront
cost of purchasing a server does not have to be made, but cloud providers do allow you to
purchase credits on a server to keep the overall costs down. In relation to cloud, organizations
have the option to adopt a pay as you go model.
D. Gallagher
8
“Using cloud infrastructures and platforms is convenient because services
on demand offers high flexibility and pay as you go pricing offers low costs.”
(Toivonen 2013, p.17)
As mentioned with on premise servers, the time to provision a server does not exist on the
cloud which leads to shorter times to turn around new applications or proof of concepts. With
the evolution of new cloud services, it is possible to design and build applications in a fashion
which is not possible in a traditional datacentre.
2.1.2. Challenges faced when deploying to the Cloud
Deploying to the cloud versus deploying to on premise servers does bring its own challenges.
For security conscious organizations, source code and access to that code has restricted
access. Deployment of code to any cloud provider needs to be performed in a safe fashion to
ensure that proprietary code does not make its way into the public domain. When working
on premise, the access to those servers is controlled with systems such as Active Directory.
Different authentication mechanisms need to be setup to ensure appropriate access is
provided to the correct individuals when accessing cloud resources (Garrison et al. 2012).
Services on the cloud bring about their own costs with regards to data transfer, storage,
backups that are not an issue on-premises. It is important when deploying to the cloud that
considerations are made for setting up automated backups as well as testing strategies for
performing rollbacks if required.
With on-premises servers, the organization oversees the software installed. With cloud
services, it is the cloud provider who oversees the core software installed on platforms.
Organizations need to be vigilant when it comes to monitoring the cloud provider roadmaps
to ensure they are complying with the supported versions of software they are using.
Finally, a common danger with the vast array of service offerings in the cloud is ensuring that
the correct service is used to satisfy the requirements of an applications. It is straightforward
to write a hello world application that uses a particular service but supporting that service in
a production environment can lead to its own array of challenges. The ability to manage peak
loads, disaster recovery situations or being able to manage scheduled downtime during
patching of certain services are among items that need to be considered.
D. Gallagher
9
2.1.3. Types of Outages
There are multiple distinct types of outages that can occur on any cloud provider. From an
application perspective, if deployed code is not thoroughly tested, this could make an
application unresponsive after a rogue deployment. The development team is responsible for
ensuring they follow a rigid DevOps process to mitigate against untested code getting
deployed. An outage could also occur on an application during a code deployment. The outage
may not last long, but for end users, this is still and unexpected outage that will affect the
brands reputation. This research will examine how outages can be avoided as part of the
DevOps process as well when an application is running in a production setting.
From a cloud provider perspective, there are occasions where a region may become unstable
or unresponsive (Moss n.d.). Incidents such as data centre power loss, natural disasters such
as earthquakes or tornados could bring an entire region offline. Scheduled maintenance of a
managed cloud provider offering could make the service unusable. When deploying to the
cloud, it is important to consider that all services can have outages. It is imperative to have
this mindset to make an application as resistant to outages as possible. In an article related
to cloud storage mechanisms, the authors evaluate various outages, where they arrived at
the following conclusion.
“The reason after the post-investigation of most of these outages revealed
that the main root cause was the expected and predicted failures, while
others happened due to the failure of correct components in the recovery
process.” (Tahir et al. 2020)
In this statement, the words expected, and predicted failures stand out to this author. Teams
need to ensure that expected and predicted failure situations are accounted for before trying
to manage the unexpected failures that could potentially occur. This research will examine
the techniques that can be used for a 3-tier application to ensure there are no application
outages.
The survey of research into the area of cloud computing indicates the need for research into
the role cloud computing can play in achieving high availability.
D. Gallagher
10
2.2. Infrastructure
2.2.1. Virtual Machines
There are a multitude of different options for deploying applications on the cloud with 1 of
those options being virtual machines. This closely resembles deploying an application on an
on-premises server. The main difference with virtual machines on the cloud is that they are
meant to be ephemeral in nature, meaning they should not have an exceedingly long lifespan.
As mentioned in a previous section, it is the cloud provider who oversees the underlying
operating system on the virtual machines. There are no guarantees that a virtual machine
which is launched on a given day will be compliant the following day. To get around this, it is
imperative that the process for deploying applications to these virtual machines is automated
with a rigorous CI/CD process.
With virtual machines, there are a vast array different instance types that can be chosen from.
Each instance type has their own characteristics, there are instance types that are compute
optimised, whilst others are storage optimised. Newer versions of these instance types can
offer increased performance or greater cost savings. It is important to periodically benchmark
the instance types (Akioka and Muraoka 2010) that are in use with the newer offerings from
a cloud provider to validate the applications are running as efficiently as possible.
2.2.2. Cloud Service Offerings
Every cloud provider offers a waste array of services that can be used. These can range from
virtual machines that were discussed in the previous section to containers(Merkel 2015), to
serverless (Eismann et al. 2021)offerings. Again, choosing the correct service is a choice which
is made based on the organizations needs and the application that is being deployed.
Containers are useful for applications that maintain state or need to be long running in nature.
A container is a way of packaging code in a format that can run on any operating system in a
uniform way. The way of packaging the container is called an image. That image contains the
necessary steps required to install the software. When deploying containers, they can be
deployed in a fashion that is always on or deployed in a serverless fashion using newer
offerings as provider by the larger cloud providers.
On the other side, serverless is a solid option for short lived REST calls, standard CRUD
operations and for deploying individual microservices. Serverless is an excellent option when
you want the ability to scale from little or no traffic to huge spikes in traffic. The main selling
D. Gallagher
11
point with serverless is you pay for the resources that you use. This makes it an attractive
option for cost conscious organizations.
2.2.3. Cloud Service Decision Matrix
The following table displays a decision matrix that can be used to determine the correct cloud
solution to be adopted based on the authors experiences. This table evaluates the current
state of an application and the deployment choices that can be made with or without code
changes. The correct service should be chosen based on the characteristics of the deployed
application.
Table 1. Cloud Service Decision Matrix.
Development Team Choice Use
Virtual
Machines
Use
Containers
Use
Serverless
Are you deploying an existing application that requires no code
changes?
Yes No No
Is there a docker image for the application or are there immediate
plans to create a docker image?
Yes Yes Yes
Has the development / DevOps team the required bandwidth to
learn a new cloud provider service such as serverless?
No Yes Yes
Is there potential for the existing application to grow into multiple
smaller applications in the future?
No Yes Yes
Are keeping costs low with guaranteed high availability a
requirement for this application?
No Yes Yes
Is there sufficient time available for the development team to
create a correct solution for perceived future growth of the
application?
No Yes Yes
Does the application contain tasks that may be long running (over
15 minutes)?
Yes Yes No
The choice of infrastructure will determine which DevOps techniques to use. The DevOps
process for deploying applications to a virtual machine is different than when deploying to a
container in the cloud. To focus the research, it is important to choose an overall
infrastructure that requires processes to maintain high availability.
2.3. High Availability and Disaster Recovery
With regards deploying applications in an organizations data centre, managing high
availability and disaster recovery may involve switching traffic from 1 data centre to another
or periodically switching the active data centre to be the previously idle data centre. In these
D. Gallagher
12
situations, the organization is in control of the infrastructure and operations that occur within
the datacentre.
In terms of handling disaster recovery in the cloud, there are more considerations that need
to be made. The cloud provider oversees the infrastructure and how the disaster recovery
situations are managed. It is the responsibility of the organization to design their applications
in such a way that they can manage disaster recovery situations and hence be highly available.
Disaster scenarios can occur due to human error within the datacentres, natural disasters
such as earthquakes or tornados destroying a datacentre or even just loss of power to a
datacentre within a specific region.
The following sections discuss disaster recovery terms as well as various disaster recovery
options in the cloud, with a focus on how each approach impacts on high availability and cost.
2.3.1. Disaster Recovery Metrics
In terms of disaster recovery metrics, there are 2 metrics which can be used to measure an
application. Recovery Time Objective or RTO is a term used to define the time it takes to
restore an organizations process to the agreed upon service levels after a disruption or
disaster (Hamadah and Aqel 2019:1). For example, if a disaster were to occur at 1PM noon
and the RTO is 8 hours, the disaster recovery process should recover the organizations service
to the previously accepted service level by 9PM.
Recovery Point Objective or RPO is the tolerable quantity of data loss for a system measured
in units of time (Mendonça et al. 2019:2). For example, were a disaster situation to occur at
2PM and the agreed RPO is 1 hour. The system needs to be capable of recovering the entire
dataset that was in the system before 1PM. In this situation, the data loss will be for 1 hour –
1PM to 2PM.
2.3.2. Backup and Restore
Backup and Restore is a technique where backups of data is stored in a region, if a disaster
were to occur in that region, the data is exported to a separate region (Robinson et al. 2014:9).
Additionally, to exporting the data, the configuration must be redeployed, as well as the
infrastructure, and application code in the new target region. An output of this process means
the RTO and RPO would be low. There is a potential for data loss which may not be acceptable
in domains such as financial services. This approach would not guarantee high availability as
the time to copy the data to the new region as well as provision the required infrastructure
D. Gallagher
13
would lead to application downtime. On the flipside, this approach would be cost effective as
the applications are only deployed in 1 region at a time. Following a backup and restore
strategy may be suitable for charity organizations, static websites or in problem domains
where data loss and downtime is acceptable.
2.3.3. Pilot Light
Pilot Light is a technique where data is replicated from 1 region to another region where a
core minimal version of an organizations workload infrastructure is in operation (Trovato et
al. 2019:5). Processes to replicate databases or file storage are always turned on. Application
servers are pre-installed with application code, configuration and left in a turned off state
unless testing is taking place or there is a disaster recovery situation. Systems in the DR region
will only be switched on when a disaster recovery situation occurs. Unlike Backup and
Restore, the core infrastructure is always ready to be turned on. The RTO and RPO for Pilot
Light is lower than Backup and Restore but there is still the possibility of data loss when
switching regions as well as the application being unavailable. This approach will cost more
than Backup and Restore but it allows organizations to recover business critical applications
in a timelier fashion. Pilot Light may be suitable for organizations that have a small set of
critical applications with other applications being deemed non mission critical. The critical
applications will be in a ready to launch state in a separate region at any stage.
2.3.4. Warm Standby
Warm Standby is an approach where a fully functional scaled down copy of an organizations
production environment is available in a separate region (Robinson et al. 2014:14). This
approach extends the Pilot Light concept and decreases the time it takes to recover from a
disaster situation as the workload is always running in another region. The environment in
the DR region can be scaled up when required to guarantee it can manage the expected traffic
volumes. This approach will be more expensive than Pilot Light as the infrastructure in the DR
region is always running. It does however offer benefits of having a decreased RTO and RPO.
Warm Standy may be suitable for organizations who have business critical applications and
require high availability.
D. Gallagher
14
2.3.5. Multi-site Active / Active
Multi-Site Active/Active is an approach where an organization simultaneously runs their
workload in multiple regions at the same time in an active/active or a hot standby active/
passive strategy (Robinson et al. 2014:16). The active/active strategy is used to serve traffic
from every region in which the application has been deployed to. The warm standby strategy
is used when serving traffic from a single region only, with different regions used in a disaster
recovery situation. This approach is the most complex and expensive, but it is the only
approach which will guarantee high availability if following active/active. In the hot standby
approach, there is the possibility that users may not be able to access an application whilst
the hot standby version of the application becomes the primary version of the application.
This approach is the preferred option for organizations that require high availability and
cannot tolerate any level of downtime for their applications. It also offers the benefit of being
able to serve customers in various locations based on their geographic location to a specific
cloud region.
2.3.6. Comparison of Disaster Recovery Options
The diagram below in figure 2.1 displays a comparison of the various disaster recovery options
with specific emphasis on the impact in relation to RTO and RPO.
Understanding the various disaster recovery techniques are key when it relates to measuring
the high availability characteristics of an application. For mission critical applications, A multi-
site active / active approach may be needed. For small traffic applications that are not mission
Figure 2.1. Cloud Disaster Recovery Options
D. Gallagher
15
critical, the backup and restore technique may suffice. By understanding the criticality of an
application, it can help decide on the required high availability characteristics. The research
in this dissertation will outline the processes that can be developed to achieve high
availability.
2.4. Multi Region Architecture Considerations
2.4.1. Application
When developing applications to be deployed in a multi-region configuration, some
considerations need to be made. The application needs to be developed in such a way that it
can be deployed to separate regions with no code changes. Any items that require changes
should belong to configuration that is specific to a region. It is imperative that the application
can be deployed to net new regions with no code changes. There are a set of guidelines called
the Twelve Factor Application (Wurster et al. 2017:4) which should be followed for every
cloud application. These guidelines become more important as organizations discover the
need to deploy an application across multiple cloud regions.
Another consideration that needs to be made is when connecting to services like databases
or message queues. In a scenario where these services are also deployed in a multi-region
fashion, organizations need to guarantee that if those services failover to another region, the
application can manage this situation. Using techniques such as top-level DNS entries can help
to ensure that applications are not concerned with what region a database or service is
deployed in. The responsibility of managing situations where a database or other service fails
over, rests on the application developer. They must ensure the application can manage this
situation gracefully and continuing responding successfully to user requests.
2.4.2. Database
When deploying databases, it is important to think of how the database will behave in the
event of a DR situation. For high availability, it is important the database service is deployed
in a multi-region fashion. Deciding which region contains the primary database is important
so every other database replica can keep coordinated with the main copy. The speed of
replication of data between regions is important to guarantee data consistency. As discussed
earlier, how applications connect to the database needs to be considered. In the world of
microservices with smaller services getting deployed, managing the number of connections
D. Gallagher
16
to the database is important to ensure the database does not get overloaded. It is vital to set
the maximum allowed connections on the database to an acceptable level, then work with
the application teams to guarantee this will satisfy the projected connection request
demands.
Application teams should consider deploying read replica versions of databases across regions
to serve read only requests. This will free up the main database for write requests by taking
away the load which would have been generated by read requests.
Finally with databases, it is important a backup strategy is in place to ensure if an issue arises
in any region, the database can be restored to a known good state.
It is important to consider the multi region architecture decisions for each layer of an
application. What will work for a frontend application will not work for a database. This
section has been included to highlight that this research will look at high availability across an
entire application tier.
2.5. Region Failover Considerations
Performing a region failover needs to be a task that has been planned and evaluated prior to
the event happening in production unexpectantly. By testing out the process, minor issues
such as missing credentials or invalid paths for application source code can be found and
rectified. When failing over an application between regions, topics such as the database
connection string as well as the top-level DNS to use, need to be seriously considered by
application teams.
It is important that a region failover can occur as efficiently as possible in a production like
environment. Any delay in performing a failover can result in an adverse impact on the overall
availability of an application, which in turn may lead to disgruntled customers. For the ideal
scenario, a region failover should be transparent to the end customer and every step should
occur in an automated fashion. DR strategies such as Warm Standby and Multi-site
Active/Active can help to make the process of a region failover smoother.
When architecting applications, it is imperative to choose cloud service offerings that will
work for the application whilst also supporting multi-region capabilities. By choosing the
cloud services wisely, it can simplify the regional failover process.
D. Gallagher
17
For UI based applications and REST based services, high availability when it comes to regional
failover can be obtained using load balancers. In a paper which discusses high availability in
the cloud, the authors discuss using the Hadoop software library for managing high
availability.
“Rather than rely on hardware to deliver high availability, the library itself is
designed to detect and manage failures at the application layer, so delivering a
highly available service(s) on top of a cluster of computers, each of which may be
prone to failures” (Singh et al. 2012)
The paper further discusses how hardware can fail which could in affect make nodes inactive
when in fact they could service traffic. Choosing a software-based load balancing approach
over a hardware-based approach guarantees the load balancing can be tweaked to suit the
applications needs.
The focus of this research is the ability to divert traffic to different regions in the event of a
disaster situation. It is important to understand that any layer within the application can fail,
the ability to handle this failure gracefully will prove crucial in validating the success of this
research. The practical element of this dissertation will outline a solution to prove this
approach is feasible.
2.6. Code Deployment
2.6.1. Manual
When it comes to deploying code to a cloud service, the quickest approach is to package the
code up on the developer’s machine and manually deploy to the cloud service. This approach
is sufficient for quick proof of concept projects or demonstrations, but it soon becomes very
inefficient. By factoring in the time duration of packaging the code, run the automated tests,
log into the cloud provider console, upload the packaged artifact, and deploy, this time adds
up daily. If the process takes 10 minutes and the developer attempts 6 deployments a day,
this is an hour taken up in that developer’s day.
This approach is error prone and can lead to issues further on in a project’s lifecycle. Required
dependencies to build an artifact or run tests may exist on the developer’s machine. The steps
to properly execute the steps may not be documented or properly defined. Overall, this
makes tasks for future developers who may inherit this work more complex.
D. Gallagher
18
The longer a manual process is followed, the more complex it is to obtain buy in from
management to spend time on automating this task.
2.6.2. Automated
In well-structured teams, there is evidence of rigid CI/CD processes.
• Code is stored in a code management tool
• Code is built using pipelines
• Automatic testing of the code is performed in the pipeline
• Every deployment to the cloud is automated
In a situation where an automated pipeline exists, it is more straightforward to extend a
pipeline to add code quality check tools, vulnerability checkers and other tools which may
improve the overall codebase.
By removing manual steps from the process of deploying code, it ensures there is an accurate,
repeatable process in place for deploying code to a production environment. As will be
discussed in a subsequent section, there are many benefits to using pipelines, not least the
amount of developer time that will be saved with not having to manually deploy code.
2.6.3. Hybrid
In a hybrid approach, there is an automated pipeline in place but certain steps in the process
require manual approval. To be fully confident of shipping code directly from source control
to production with no manual checking, requires a full suite of unit tests, integration tests and
performance tests. If a project is not at that stage of their evolution, the best that can be done
is to deploy code to a non-production environment, perform sanity checks / testing in that
environment before approving the deployment to production.
It would be ideal to be able to automatically deploy code to production but in cases where
this is not possible, the manually approval is a safeguard to ensure rogue code does not
inadvertently find its way into production.
The hybrid approach may be used in organizations that have a rigid change control process
involved for production installations. In this scenario, the manual step could be to enter a
ticket number for a fully approved change ticket before the change is deployed.
The following diagram highlights what an approval may look like in a sample GitHub actions
pipeline.
D. Gallagher
19
Figure 2.2. GitHub Actions - Manual Approval
Having an approach for managing code deployment is vital to ensuring there are processes in
place to automatically handle regional outages. Multiple developers can work on the overall
process, and it can be refined over time as well as shared with other groups. These approaches
play a small part in the overall process of achieving high availability.
2.7. Code Management
2.7.1. Single Developer Projects
When it comes to projects that involve just 1 developer, often speed of development is
treated as priority over following standards. It is effortless to develop code on a developer’s
machine, ignore unit tests and deploy the same code from the developer’s machine. In cases
where the developer may be developing a proof of concept for a larger design, this approach
is justified. In larger projects intended for production use, the pitfalls of ignoring standards
could decrease the quality of the generated project which over time may impact on the
product. Potential pitfalls that may be encountered by not following a set of standards
include:
• No source control system in place:
o Harder to onboard new developers to the project
o Potential loss of code if developers’ machine is lost/stolen / damaged
• No unit tests developed for project:
o Issues that could have been found and resolved with unit tests make their
way to production
D. Gallagher
20
o Potential to introduce defects with every release
o Low level of confidence that a slight code change will not have a negative effect
on the rest of the codebase
• Deploying code to production from a developer’s machine:
o The process to deploy code to production is only known by 1 developer
o Potential inconsistency in the artifact(s) deployed to production
In an article that focuses on software development for individual developers, the authors talk
about standards that apply to team projects (MIDS in this case) can be easily applied in single
developer projects without changing the core essence of the standard (de León-Sigg et al.
2018).
It is important that standards are followed where possible. An adoption of standards will not
only improve the overall quality of the code delivered, but it will also help to simplify the
onboarding of new developers to the project. When multiple developers are on the project,
the process of managing code changes will be simplified. In an article related to coding
practices, the authors discuss some techniques which can be used to improve code readability
which in turn will help to devise the coding standards for a project (dos Santos and Gerosa
2018). Using techniques in the paper by dos Santos and Gerosa (2018) will help to improve a
project whilst also helping to move away from the single developer mindset.
2.7.2. Source Control
The importance of using source control for any project cannot be understated. As discussed
in the previous section, using source control is a technique that can be used which can help
move a project away from the single developer mindset. By using a source control system, it
makes the process of collaboration amongst a team more straightforward. The collaboration
benefits of using a source control tool like git are evident in an article where the author
discusses using git to foster teamwork in the South African classroom (Blauw 2018).
Git can store code for small projects as well as large enterprise grade projects. It can be used
for projects developed in any language and has many features such as branching and pull
requests which can be used for developers collaborating on projects. When starting with git,
it is important the team members decide on the branching strategy to follow.
D. Gallagher
21
2.7.2.1. Branching Model: GitFlow
GitFlow is a branching strategy that employs the use of feature branches and multiple primary
branches (Atlassian n.d.). GitFlow utilizes branches that are longer lived and contain larger
commits. When using this strategy, developers can create feature branches and delay the
merging of code into the main branch until the feature is fully implemented. A downside of
this approach with long-lived feature branches is the increase in the collaboration required
amongst developers to merge changes. It is also straightforward for conflicting updates to be
introduced by developers. Refer to the diagram in figure 2.2 for an overview of the GitFlow
Branching Strategy.
GitFlow works best:
• For managing an open-source project as all code must be checked in pull requests
• When there are mostly junior developers on the team who can preview their changes
on long lived feature branches before merging into the main branch
• When the product that you are maintaining is well established as future changes are
minimal and need to be monitored closely.
Cases to avoid GitFlow are:
• When you are starting a project as the pull request process can slow down the task of
generating an MVP
• When you need to iterate quickly as the pull request process can get in the way
Figure 2.3. GitFlow Branching Strategy.
D. Gallagher
22
• When there are mostly senior developers on the team as they are trusted and should
be given the autonomy to do their job
2.7.2.2. Branching Model: Trunk
Trunk based development is a source control branching model which allows developers to
merge smaller, more frequent updates to the core main or trunk branch (paul-hammant n.d.).
As the trunk-based approach streamlines the merging and integration phases, it helps bring
about continuous integration and continuous deployment as well as increasing software
delivery. The diagram represented in Figure 2.3 gives an overview of the Trunk Based
Branching Model. High-performing engineering teams use the trunk-based development
strategy as it sets and maintains a simplified Git branching strategy for teams. It also gives
teams the flexibility and control over how and when software is delivered to customers.
Trunk based development works best for (‘Trunk-based Development vs. Git Flow’ n.d.):
• When a project is just starting up as it offers maximum development speed
• When you need to iterate quickly as the trunk based approached allows you to change
the product quickly when required
• When there are mostly senior developers on the team
Cases to avoid the Trunk based approach are:
• When you run open-source projects as those projects are more suited to GitFlow
• When the product is established, or you have large teams as strict control is required.
GitFlow is recommended for this scenario.
Figure 2.4. Trunk Based Branching Model.
D. Gallagher
23
• When there are mostly junior developers on the team
2.7.3. DevOps Code Pipelines
When it comes to deploying code, it is imperative to implement a pipeline strategy to achieve
the goal. A pipeline can take the manual steps away from deploying code and replace those
steps with a repeatable process. The use of a pipeline not only provides structure for
deployments, but they can also be used to run code quality checks, run various forms of tests
as well as drastically decrease the time duration required for a developer to deploy code to a
production environment.
The use of a pipeline is necessary when implementing Continuous Integration and Continuous
Deployment for a project. By having a pipeline that is executed regularly, it can supply a
benchmark for improving the overall quality of the project. A pipeline can be treated like a
code artifact which can evolve over time. In a recently reviewed article, the authors highlight
the importance of pipelines by stating they are mainly used for continuously executing steps
to ensure an application can be deployable at any time (Beetz and Harrer 2021).
A pipeline can be basic at the beginning with iterations taking place to add extra features for
tasks such as code validation or running tests. Once a pipeline structure is in place, there is
no limit to what can be achieved during the lifetime of the pipeline.
There are benefits to implementing pipelines but there are also challenges including:
• Choosing the pipeline technology to use. There is various open source as well as
commercial pipeline options available. Choosing the pipeline technology to use for a
project can be difficult.
• The ramp up time for developers to learn a particular pipeline technology or syntax
needs to be factored in when deciding on the technology to use.
• Maintaining the pipeline infrastructure if a self-hosted pipeline technology is chosen
• Managing the security for integrations (e.g., credentials for deploying to a cloud
provider)
• Finally, switching between pipeline options is not a trivial task and has the potential
to introduce the need for re-work on the code pipelines.
This research will show how a combination of DevOps processes and techniques will form
grounding for building a highly available solution.
D. Gallagher
24
2.8. Cloud Application Architecture
This section will examine the various application types that can be architected and developed
as part of this research. A key aim of this research is to implement a solution for deploying a
highly available application that will function in the event of sample test outages. It is this
authors opinion that the best way of achieving this is to develop a 3-Tier application. A 3-tier
application consists of a presentation layer (frontend), application layer (backend code) and
the data layer (data storage). A benefit of a 3-tier application is the ability to foster the reuse
of software components between various different applications (Abdelrahman et al. 2020).
When it comes to the various application tiers, each tier has their own responsibilities. The
frontend is the gateway to the world, it is the frontend that contains any user interfaces which
can be used by end customers. The frontend will contain visual screens which simplify the
process of interacting with the backend. There are a vast array of programming languages
and frameworks available to develop frontend applications with further technologies being
developed on a regular basis.
The backend performs the heavy lifting for the application. The backend runs any business
logic in response to events from the frontend. With the frontend being the gateway for
customers, the backend is the gateway to the required data. The backend can be accessed by
advanced users or systems using api calls but for most users, the interaction to the backend
is via the frontend. Like the frontend, there are a vast array of technologies and frameworks
which can be used to develop backend applications.
Finally, the data layer (database) is the most important part of any application. The data layer
has the responsibility of storing the data which is accessed by the backend processes and is
subsequently rendered to customers in the frontend. Every application has its own data
requirements, and these will be touched on in this section. There is a vast array of different
technology choices which are available when it comes to the data layer. This research will
expand on the 3-tier application and highlight key considerations that need to be
implemented to make an application highly available.
D. Gallagher
25
2.8.1. Frontend Application
2.8.1.1. Monolithic Frontend
A monolithic frontend application is a feature-rich, powerful browser-based application which
interacts with micro services in the backend. Over time the frontend layer grows and may be
developed by separate teams. In this situation, the frontend application becomes more
difficult to maintain as it grows and adds new functionality.
The diagram below in figure 2.4 depicts a high level architecture of a monolithic frontend for
a Shop application. As visible, there are multiple microservices in the backend but only the 1
frontend application which may be maintained and developed on by multiple teams.
The monolithic frontend is an anti-pattern which can occur over time on a frontend project.
Pavlenko discusses how a monolithic architectural style frontend is difficult to scale and in
some cases, impossible to scale (Pavlenko et al. 2020). Teams may still want to develop
features concurrently, but this may not be possible in all cases. Pavlenko et al. argue that the
use of micro frontends is a solution to this problem.
2.8.1.2. Micro-frontend
A micro-frontend is a pattern where web application user interfaces are composed from
independent fragments which may be built by different teams using a broad array of
Figure 2.5. Monolithic Frontend
D. Gallagher
26
technologies. A micro-frontend architecture resembles a micro service backend architecture
where the backend is composed of independent microservices.
Various approaches exist in which micro frontends can be implemented in terms of splitting
up functionality (Mezzalira 2021). In the horizonal split approach, multiple micro-frontends
can exist within the same UI view. Multiple teams will be responsible for distinct parts of the
view and must coordinate their efforts. This approach offers flexibility in that teams can share
functionality but also teams need to be careful to not introduce unnecessary micro-frontends
within the same project. This approach is suitable for large sites with an extensive feature set
such as shopping sites. A team could develop the catalogue where another team could
develop product recommendations.
The second approach is a vertical split, where the individual teams are accountable for a
particular problem domain. In this approach, it is harder to share code between teams, but it
allows flexibility in terms of deployments. These systems are developed as individual systems
but branded with a company header and footer to give the appearance of the systems
belonging together. This approach is suitable for systems such as company intranets where
different teams develop different intranet sites, but they all utilize a common theme such as
colours and fonts.
The horizontal and vertical split approaches both have the same end goal, which is to split up
the frontend code into smaller more manageable chunks. Various teams can potentially work
on different codebases.
Benefits of using micro-frontends include:
• Micro sites are technology agnostic – teams can use different technologies
• The generated applications are independent and self-contained
• Multiple teams can work on distinctive features
• Development and deployment of the individual micro-frontends may be faster
D. Gallagher
27
The diagram referenced in figure 2.5 depicts how the code for micro frontends can exist in
the same source control repository or different source control repositories. The goal of the
CI/CD pipelines process is around building the unique micro frontends which are combined
for the overall deployed frontend application.
2.8.1.3. Single Page Application (SPA)
A user interface that operates directly inside the browser which does not require a page
reload when navigating between pages is referred to as a Single Page Application. This is
achieved by the browser loading JavaScript chunks on page load which contains all the
required logic that the browser will be dependent on. For any requests to the backend for
data, these are done in an asynchronous fashion using ajax requests. Jadhav et al. discuss
creating a single application using AngularJS, however they do not delve into topics such as
server side rendering or authentication (Jadhav et al. n.d.). The focus on Jadhav’s articles is
purely around getting started with developing a single page application. In an article with
comparable topics on creating a single page application, the authors delve further into topics
such as performance and reuse of components which are important for modern day
applications (Gavrilă et al. 2019).
2.8.2. Backend Application
2.8.2.1. Monolith
Like the frontend monolith, a backend monolith is essentially 1 project which contains all the
business logic. All teams work on the same project and need to coordinate changes between
each other. Any code changes to 1 module in the monolith could influence the other services
Figure 2.6. Micro Frontend Architecture
D. Gallagher
28
within the project. As the project grows, changes become harder to develop as well as cover
with automated testing. Routine maintenance of the project could become a more complex
task for developers.
When deploying monolithic backend applications to the cloud, the deployment options are
limited due to the size of the artifact to deploy and other constraints within the application.
Creating monolithic applications make the development process more straightforward than
creating microservices. Monolithic applications offer easier development and deployment
options but are lacking when it comes to complex maintenance, reliability, availability and
difficulties in scaling a monolith (Gos and Zabierowski 2020).
2.8.2.2. Microservice
A microservice architecture is where an application is structured as a collection of smaller
services that have the following attributes:
• Are highly testable and maintainable
• Have a more straightforward development process than with monolith applications
• Can be deployed independently of the other services
• Are loosely coupled
• Are focused on a business capability
• Are owned by a small team
In the world of agile, microservices is a great enabler for rapid, frequent, and reliable delivery
of applications to a production environment. In cases where microservices need to
communicate with each other, challenges can arise. In those scenarios, queues or message
bus technologies can be used for asynchronous communication. For synchronous
communication, a method of http calls with associated retries will need to be implemented.
There are a wide array of technologies which can be used for microservices, as well as steps
for migrating to microservices (Larrucea et al. 2018). Larrucea et al discuss the pitfalls of
microservices, but it is this authors opinion they could have discussed the complexities of
managing highly available applications when it relates to micro-services.
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation
Damien Gallagher Dissertation

More Related Content

Similar to Damien Gallagher Dissertation

A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database
Darroch Greally
 
A Comparative Analysis of Relational Databases and Graph Databases
A Comparative Analysis of Relational Databases and Graph DatabasesA Comparative Analysis of Relational Databases and Graph Databases
A Comparative Analysis of Relational Databases and Graph DatabasesDarroch Greally
 
Implementing Cloud-Based DevOps for Distributed Agile Projects
Implementing Cloud-Based DevOps for Distributed Agile ProjectsImplementing Cloud-Based DevOps for Distributed Agile Projects
Implementing Cloud-Based DevOps for Distributed Agile Projects
TechWell
 
Cloud capability for startups
Cloud capability for startupsCloud capability for startups
Cloud capability for startups
Cloud and analytics Lab
 
Shane Bruce Resume Oct 2016
Shane Bruce Resume Oct 2016Shane Bruce Resume Oct 2016
Shane Bruce Resume Oct 2016Shane Bruce
 
Secure Auditing and Deduplicating Data on Cloud
Secure Auditing and Deduplicating Data on CloudSecure Auditing and Deduplicating Data on Cloud
Secure Auditing and Deduplicating Data on Cloud
IJMTST Journal
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startups
Sekhar Mohanty
 
Leveraging Cloud for Non-Production Environments
Leveraging Cloud for Non-Production EnvironmentsLeveraging Cloud for Non-Production Environments
Leveraging Cloud for Non-Production Environments
Cognizant
 
networking report for sbit
networking report for sbit networking report for sbit
networking report for sbit
Ankit Dahiya
 
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
Akshit Arora
 
Performance Evaluation of Open source E-commerce application (Konakart) on pr...
Performance Evaluation of Open source E-commerce application (Konakart) on pr...Performance Evaluation of Open source E-commerce application (Konakart) on pr...
Performance Evaluation of Open source E-commerce application (Konakart) on pr...
Onkar Kadam
 
A Survey on A Secure Anti-Collusion Data Sharing Scheme for Dynamic Groups in...
A Survey on A Secure Anti-Collusion Data Sharing Scheme for Dynamic Groups in...A Survey on A Secure Anti-Collusion Data Sharing Scheme for Dynamic Groups in...
A Survey on A Secure Anti-Collusion Data Sharing Scheme for Dynamic Groups in...
IRJET Journal
 
Cloud Computing- Proposal (Autosaved)
Cloud Computing- Proposal (Autosaved)Cloud Computing- Proposal (Autosaved)
Cloud Computing- Proposal (Autosaved)Zuhair Haroon khan
 
Enhanced Integrity Preserving Homomorphic Scheme for Cloud Storage
Enhanced Integrity Preserving Homomorphic Scheme for Cloud StorageEnhanced Integrity Preserving Homomorphic Scheme for Cloud Storage
Enhanced Integrity Preserving Homomorphic Scheme for Cloud Storage
IRJET Journal
 
Coursework2 2013 distributed systems(1)
Coursework2 2013 distributed systems(1)Coursework2 2013 distributed systems(1)
Coursework2 2013 distributed systems(1)randomP786
 
A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
 A Survey Paper on Removal of Data Duplication in a Hybrid Cloud  A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
IRJET Journal
 
Final Project Implementing Data Masking the Coca-Co.docx
Final Project Implementing Data Masking the Coca-Co.docxFinal Project Implementing Data Masking the Coca-Co.docx
Final Project Implementing Data Masking the Coca-Co.docx
tjane3
 
Leveraging Cloud for Product Testing- Impetus White Paper
Leveraging Cloud for Product Testing- Impetus White PaperLeveraging Cloud for Product Testing- Impetus White Paper
Leveraging Cloud for Product Testing- Impetus White Paper
Impetus Technologies
 
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSAADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
IJCSES Journal
 
Cloud-Native Fundamentals: An Introduction to 12-Factor Applications
Cloud-Native Fundamentals: An Introduction to 12-Factor ApplicationsCloud-Native Fundamentals: An Introduction to 12-Factor Applications
Cloud-Native Fundamentals: An Introduction to 12-Factor Applications
VMware Tanzu
 

Similar to Damien Gallagher Dissertation (20)

A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database
 
A Comparative Analysis of Relational Databases and Graph Databases
A Comparative Analysis of Relational Databases and Graph DatabasesA Comparative Analysis of Relational Databases and Graph Databases
A Comparative Analysis of Relational Databases and Graph Databases
 
Implementing Cloud-Based DevOps for Distributed Agile Projects
Implementing Cloud-Based DevOps for Distributed Agile ProjectsImplementing Cloud-Based DevOps for Distributed Agile Projects
Implementing Cloud-Based DevOps for Distributed Agile Projects
 
Cloud capability for startups
Cloud capability for startupsCloud capability for startups
Cloud capability for startups
 
Shane Bruce Resume Oct 2016
Shane Bruce Resume Oct 2016Shane Bruce Resume Oct 2016
Shane Bruce Resume Oct 2016
 
Secure Auditing and Deduplicating Data on Cloud
Secure Auditing and Deduplicating Data on CloudSecure Auditing and Deduplicating Data on Cloud
Secure Auditing and Deduplicating Data on Cloud
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startups
 
Leveraging Cloud for Non-Production Environments
Leveraging Cloud for Non-Production EnvironmentsLeveraging Cloud for Non-Production Environments
Leveraging Cloud for Non-Production Environments
 
networking report for sbit
networking report for sbit networking report for sbit
networking report for sbit
 
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
 
Performance Evaluation of Open source E-commerce application (Konakart) on pr...
Performance Evaluation of Open source E-commerce application (Konakart) on pr...Performance Evaluation of Open source E-commerce application (Konakart) on pr...
Performance Evaluation of Open source E-commerce application (Konakart) on pr...
 
A Survey on A Secure Anti-Collusion Data Sharing Scheme for Dynamic Groups in...
A Survey on A Secure Anti-Collusion Data Sharing Scheme for Dynamic Groups in...A Survey on A Secure Anti-Collusion Data Sharing Scheme for Dynamic Groups in...
A Survey on A Secure Anti-Collusion Data Sharing Scheme for Dynamic Groups in...
 
Cloud Computing- Proposal (Autosaved)
Cloud Computing- Proposal (Autosaved)Cloud Computing- Proposal (Autosaved)
Cloud Computing- Proposal (Autosaved)
 
Enhanced Integrity Preserving Homomorphic Scheme for Cloud Storage
Enhanced Integrity Preserving Homomorphic Scheme for Cloud StorageEnhanced Integrity Preserving Homomorphic Scheme for Cloud Storage
Enhanced Integrity Preserving Homomorphic Scheme for Cloud Storage
 
Coursework2 2013 distributed systems(1)
Coursework2 2013 distributed systems(1)Coursework2 2013 distributed systems(1)
Coursework2 2013 distributed systems(1)
 
A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
 A Survey Paper on Removal of Data Duplication in a Hybrid Cloud  A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
 
Final Project Implementing Data Masking the Coca-Co.docx
Final Project Implementing Data Masking the Coca-Co.docxFinal Project Implementing Data Masking the Coca-Co.docx
Final Project Implementing Data Masking the Coca-Co.docx
 
Leveraging Cloud for Product Testing- Impetus White Paper
Leveraging Cloud for Product Testing- Impetus White PaperLeveraging Cloud for Product Testing- Impetus White Paper
Leveraging Cloud for Product Testing- Impetus White Paper
 
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSAADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
 
Cloud-Native Fundamentals: An Introduction to 12-Factor Applications
Cloud-Native Fundamentals: An Introduction to 12-Factor ApplicationsCloud-Native Fundamentals: An Introduction to 12-Factor Applications
Cloud-Native Fundamentals: An Introduction to 12-Factor Applications
 

Recently uploaded

Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 

Recently uploaded (20)

Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 

Damien Gallagher Dissertation

  • 1. Architecting Cloud Applications for High Availability Damien Gallagher M.Sc. in Computing in DevOps 2022
  • 2. Computing Department, Atlantic Technological University, Port Road, Letterkenny, Co. Donegal, Ireland. Architecting Cloud Applications for High Availability Author: Damien Gallagher Supervised by: Ruth G. Lennon A thesis submitted in partial fulfilment of the requirements for the Master of Science in Computing in DevOps Submitted to Quality and Qualifications Ireland (QQI) Dearbhú Cáilíochta agus Cáilíochtaí Éireann January 2022
  • 3. 1 Declaration I hereby certify that the material, which l now submit for assessment on the programmes of study leading to the award of Master of Science in Computing in DevOps, is entirely my own work and has not been taken from the work of others except to the extent that such work has been cited and acknowledged within the text of my own work. No portion of the work contained in this thesis has been submitted in support of an application for another degree or qualification to this or any other institution. I understand that it is my responsibility to ensure that I have adhered to ATU’s rules and regulations. I hereby certify that the material on which I have relied on for the purpose of my assessment is not deemed as personal data under the GDPR Regulations. Personal data is any data from living people that can be identified. Any personal data used for the purpose of my assessment has been pseudonymised and the data set and identifiers are not held by ATU. Alternatively, personal data has been anonymised in line with the Data Protection Commissioners Guidelines on Anonymisation. I consent that my work will be held for the purposes of education assistance to future students and will be shared on the ATU Computing website (www.lyitcomputing.com) and Research THEA website (https://research.thea.ie/). I understand that documents once uploaded onto the website can be viewed throughout the world and not just in Ireland. Consent can be withdrawn for the publishing of material online by emailing Thomas Dowling; Head of Department at Thomas.Dowling@atu.ie to remove items from the ATU Computing website and by emailing Denise McCaul; Systems Librarian at Denise.McCaul@atu.ie to remove items from the Research THEA website. Material will continue to appear in printed formats once published and as websites are public medium, ATU cannot guarantee that the material has not been saved or downloaded. Signature of Candidate Date
  • 4. 1 Acknowledgements I would like to thank my supervisor, Ruth Lennon for all the guidance and support she has provided throughout this process. I would also like to mention the colleagues who I worked with throughout the master’s program. We did not know each other prior to the program but we have become close over the past few years. Finally, I would like to dedicate this dissertation to my wonderful wife Tracy as well as my amazing sons, Jayden and Logan. They gave me the time I needed to complete this program as well as listen to my thoughts about the topic I was currently exploring. I look forward to watching them grow and pursue their own careers in the future.
  • 5. D. Gallagher 2 Abstract Outages with a cloud provider can lead to applications becoming unavailable and costing organizations in terms of loss of earnings as well as damaging their reputation. Avoidable application outages can also occur due to misaligned DevOps processes. This dissertation aims to prove that consistent DevOps processes as well as regional based application monitoring can help to maintain highly available applications for organizations. Cloud providers aim to a guarantee a certain level of uptime for their services, but they too encounter unforeseen outages. This dissertation discusses the various tools and techniques that can be employed to help with maintaining the high availability of an application. An accompanying research artifact also demonstrates the techniques that are discussed as well as outlining the various tests that were executed against the developed solution. The output of the practical was successful in that outages due to inconsistent DevOps processes and outages at a regional level were mitigated. It was identified that some cloud services were not suitable for a certain subset of applications that require high availability and overall, the costs can increase exponentially. This research has proven that a highly available application architecture is possible when deployed on a cloud platform. This research focuses on 1 small area of the overall problem domain, but it has the potential to form a basis to experiment in highly available applications for other problem domains as well managing highly available applications when deployed on other cloud providers.
  • 6. D. Gallagher 3 Table of Contents Declaration........................................................................................................................................ 1 Acknowledgements........................................................................................................................... 1 Abstract ............................................................................................................................................ 2 Table of Contents .............................................................................................................................. 3 Table of Figures................................................................................................................................. 9 Table of Tables ................................................................................................................................ 10 Table of Code Listings...................................................................................................................... 10 Nomenclature ................................................................................................................................. 11 1. Introduction .............................................................................................................................. 1 1.1. Purpose.......................................................................................................................... 2 1.2. Background.................................................................................................................... 3 1.3. Problem Statement ........................................................................................................ 3 1.4. Research Question ......................................................................................................... 4 1.5. Scope and Limitations .................................................................................................... 5 1.6. Methodological Approach .............................................................................................. 5 1.7. Report Outline................................................................................................................ 6 2. Literature Survey ....................................................................................................................... 7 2.1. Overview of Cloud Computing........................................................................................ 7 2.1.1. Why deploy code to the Cloud vs On Premises ............................................................... 7 2.1.2. Challenges faced when deploying to the Cloud............................................................... 8 2.1.3. Types of Outages............................................................................................................ 9 2.2. Infrastructure............................................................................................................... 10 2.2.1. Virtual Machines .......................................................................................................... 10 2.2.2. Cloud Service Offerings ................................................................................................ 10 2.2.3. Cloud Service Decision Matrix ...................................................................................... 11 2.3. High Availability and Disaster Recovery ........................................................................ 11 2.3.1. Disaster Recovery Metrics ............................................................................................ 12 2.3.2. Backup and Restore...................................................................................................... 12 2.3.3. Pilot Light..................................................................................................................... 13 2.3.4. Warm Standby ............................................................................................................. 13 2.3.5. Multi-site Active / Active.............................................................................................. 14 2.3.6. Comparison of Disaster Recovery Options .................................................................... 14
  • 7. D. Gallagher 4 2.4. Multi Region Architecture Considerations .................................................................... 15 2.4.1. Application................................................................................................................... 15 2.4.2. Database...................................................................................................................... 15 2.5. Region Failover Considerations..................................................................................... 16 2.6. Code Deployment......................................................................................................... 17 2.6.1. Manual......................................................................................................................... 17 2.6.2. Automated................................................................................................................... 18 2.6.3. Hybrid.......................................................................................................................... 18 2.7. Code Management....................................................................................................... 19 2.7.1. Single Developer Projects............................................................................................. 19 2.7.2. Source Control ............................................................................................................. 20 2.7.2.1. Branching Model: GitFlow ............................................................................................ 21 2.7.2.2. Branching Model: Trunk ............................................................................................... 22 2.7.3. DevOps Code Pipelines................................................................................................. 23 2.8. Cloud Application Architecture..................................................................................... 24 2.8.1. Frontend Application.................................................................................................... 25 2.8.1.1. Monolithic Frontend .................................................................................................... 25 2.8.1.2. Micro-frontend ............................................................................................................ 25 2.8.1.3. Single Page Application (SPA) ....................................................................................... 27 2.8.2. Backend Application..................................................................................................... 27 2.8.2.1. Monolith ...................................................................................................................... 27 2.8.2.2. Microservice................................................................................................................. 28 2.8.2.3. Serverless..................................................................................................................... 29 2.8.3. Database...................................................................................................................... 30 2.8.3.1. Relational Database...................................................................................................... 30 2.8.3.2. NoSQL.......................................................................................................................... 30 2.8.3.3. File / Object Storage..................................................................................................... 31 2.9. Networking .................................................................................................................. 32 2.9.1. Load Balancing ............................................................................................................. 33 2.9.2. Content Delivery Network (CDN).................................................................................. 33 2.10. Monitoring................................................................................................................... 34 2.10.1. Active Monitoring......................................................................................................... 34 2.10.2. Passive Monitoring....................................................................................................... 36 2.11. Observability................................................................................................................ 36
  • 8. D. Gallagher 5 2.12. Security and Regulatory Compliance ............................................................................ 37 2.12.1. Security........................................................................................................................ 37 2.12.2. Compliance .................................................................................................................. 38 2.13. Chapter Conclusions..................................................................................................... 39 3. Design ..................................................................................................................................... 41 3.1. System Context Diagram .............................................................................................. 41 3.2. Cloud Provider ............................................................................................................. 42 3.2.1. Amazon Web Services (AWS)........................................................................................ 42 3.2.2. Azure............................................................................................................................ 43 3.2.3. Google Cloud Platform (GCP)........................................................................................ 43 3.2.4. Selection Criteria.......................................................................................................... 44 3.2.5. Final Selection Justification........................................................................................... 44 3.3. Cloud Infrastructure ..................................................................................................... 45 3.3.1. Virtual Machines (VM).................................................................................................. 45 3.3.2. Containers.................................................................................................................... 46 3.3.3. Serverless..................................................................................................................... 47 3.3.4. Selection Criteria.......................................................................................................... 48 3.3.5. Final Selection Justification........................................................................................... 48 3.4. DevOps Tooling ............................................................................................................ 48 3.4.1. Source Control ............................................................................................................. 49 3.4.1.1. GitHub.......................................................................................................................... 49 3.4.1.2. AWS CodeCommit........................................................................................................ 49 3.4.1.3. Gitea ............................................................................................................................ 50 3.4.1.4. Selection Criteria.......................................................................................................... 50 3.4.1.5. Final Selection Justification........................................................................................... 50 3.4.2. Pipelines....................................................................................................................... 51 3.4.2.1. GitHub Actions ............................................................................................................. 51 3.4.2.2. AWS CodePipeline........................................................................................................ 52 3.4.2.3. Jenkins ......................................................................................................................... 52 3.4.2.4. Selection Criteria.......................................................................................................... 53 3.4.2.5. Final Selection Justification........................................................................................... 53 3.4.3. Infrastructure As Code (IAC)......................................................................................... 54 3.4.3.1. AWS CloudFormation................................................................................................... 54 3.4.3.2. Cloud Development Kit (CDK)....................................................................................... 54
  • 9. D. Gallagher 6 3.4.3.3. Terraform..................................................................................................................... 55 3.4.3.4. Selection Criteria.......................................................................................................... 56 3.4.3.5. Final Selection Justification........................................................................................... 56 3.5. Application Architecture............................................................................................... 56 3.5.1. Frontend Application.................................................................................................... 56 3.5.1.1. HTML and JavaScript .................................................................................................... 57 3.5.1.2. React............................................................................................................................ 57 3.5.1.3. Angular ........................................................................................................................ 58 3.5.1.4. Selection Criteria.......................................................................................................... 59 3.5.1.5. Final Selection Justification........................................................................................... 59 3.5.2. Backend Application..................................................................................................... 59 3.5.2.1. Java.............................................................................................................................. 59 3.5.2.2. NodeJS......................................................................................................................... 60 3.5.2.3. Python ......................................................................................................................... 60 3.5.2.4. Selection Criteria.......................................................................................................... 61 3.5.2.5. Final Selection Justification........................................................................................... 61 3.5.3. Database...................................................................................................................... 62 3.5.3.1. Relational Database...................................................................................................... 62 3.5.3.2. NoSQL.......................................................................................................................... 63 3.5.3.3. File / Object Storage..................................................................................................... 64 3.5.3.4. Selection Criteria.......................................................................................................... 64 3.5.3.5. Final Selection Justification........................................................................................... 64 3.6. Application High Availability ......................................................................................... 65 3.6.1. Static IP Address........................................................................................................... 65 3.6.2. Load Balancer............................................................................................................... 65 3.6.3. AWS Global Accelerator ............................................................................................... 66 3.6.4. Amazon API Gateway ................................................................................................... 67 3.6.5. Amazon Route53.......................................................................................................... 67 3.6.6. Amazon CloudFront...................................................................................................... 68 3.6.7. Selection Criteria.......................................................................................................... 68 3.6.8. Final Selection Justification........................................................................................... 68 3.7. Application Monitoring and Observability..................................................................... 69 3.7.1. Active Monitoring......................................................................................................... 69 3.7.2. Passive Monitoring....................................................................................................... 69
  • 10. D. Gallagher 7 3.7.3. Observability................................................................................................................ 70 3.7.4. Selection Criteria.......................................................................................................... 70 3.7.5. Final Selection Justification........................................................................................... 70 3.8. Pre-Implementation Details.......................................................................................... 71 3.9. Chapter Conclusions..................................................................................................... 71 4. Implementation....................................................................................................................... 72 4.1. System Context Diagram .............................................................................................. 72 4.1.1. Single Region Deployment............................................................................................ 72 4.1.2. Multi Region Deployment............................................................................................. 74 4.2. Cloud Provider ............................................................................................................. 76 4.3. Cloud Infrastructure ..................................................................................................... 77 4.4. DevOps Tooling ............................................................................................................ 77 4.4.1. Source Control ............................................................................................................. 77 4.4.2. Pipelines....................................................................................................................... 77 4.4.3. Infrastructure As Code (IAC)......................................................................................... 80 4.5. Application Architecture............................................................................................... 81 4.5.1. Frontend Application.................................................................................................... 81 4.5.2. Backend Application..................................................................................................... 82 4.5.3. Database...................................................................................................................... 83 4.6. Application High Availability ......................................................................................... 84 4.6.1. Frontend Application.................................................................................................... 85 4.6.2. Backend Application..................................................................................................... 86 4.7. Application Monitoring and Observability..................................................................... 87 4.7.1. Active Monitoring......................................................................................................... 87 4.7.2. Passive Monitoring....................................................................................................... 88 4.8. Chapter Conclusion ...................................................................................................... 89 5. Results..................................................................................................................................... 91 5.1. Test Strategy ................................................................................................................ 91 5.2. Application Unit Testing ............................................................................................... 91 5.2.1. Frontend Application.................................................................................................... 91 5.2.2. Backend Application..................................................................................................... 92 5.3. Using Integration Tests to Assist with High Availability ................................................. 92 5.4. Using Functional Tests to Validate a DevOps Pipeline ................................................... 94 5.5. Performance Testing of Cloud Applications .................................................................. 95
  • 11. D. Gallagher 8 5.5.1. Performance Test Results............................................................................................. 96 5.5.2. Performance Test Observations ......................................................................................... 97 5.6. Load Testing Highly Available Cloud Applications .................................................................. 97 5.6.1. Load Test Results ............................................................................................................... 98 5.6.2. Load Test Observations...................................................................................................... 99 5.7. Performing Security Testing on the Cloud ............................................................................. 99 5.8. UI Testing of Multi Region Cloud Applications..................................................................... 101 5.9. API Testing of Multi Region Cloud Applications ................................................................... 103 5.10. Chaos Testing of Multi Region Cloud Applications ............................................................. 105 5.11. Cost Analysis of Highly Available Cloud Applications.......................................................... 105 5.12. Chapter Conclusion........................................................................................................... 108 6. Conclusions ........................................................................................................................... 109 6.1. Conclusions on the State of the Art ............................................................................ 109 6.2. Conclusions on Practical Element ............................................................................... 109 6.2.1. Technologies to Enable High Availability..................................................................... 110 6.2.2. Tools and Techniques for Maintaining High Availability .............................................. 111 6.2.2. Effects of Developing for High Availability .................................................................. 112 6.3. Limitations Discussion ................................................................................................ 113 6.4. Further Work.............................................................................................................. 114 Appendices........................................................................................................................................ 1 Appendix A: References.................................................................................................................... 1 Appendix B: Code Listing .................................................................................................................. 6 Appendix C: Test Project Locally ....................................................................................................... 7 Appendix D: Configure AWS Real User Monitoring............................................................................ 8
  • 12. D. Gallagher 9 Table of Figures FIGURE 1.1. SOFTWARE DELIVERY SCHEDULE...............................................................................................................1 FIGURE 2.1. CLOUD DISASTER RECOVERY OPTIONS .....................................................................................................14 FIGURE 2.2. GITHUB ACTIONS - MANUAL APPROVAL ..................................................................................................19 FIGURE 2.3. GITFLOW BRANCHING STRATEGY. ..........................................................................................................21 FIGURE 2.4. TRUNK BASED BRANCHING MODEL.........................................................................................................22 FIGURE 2.5. MONOLITHIC FRONTEND......................................................................................................................25 FIGURE 2.6. MICRO FRONTEND ARCHITECTURE..........................................................................................................27 FIGURE 2.7. SYNTHETIC MONITORING .....................................................................................................................35 FIGURE 3.1. SYSTEM CONTEXT DIAGRAM – HIGH LEVEL ...............................................................................................42 FIGURE 3.2. CLOUD PROVIDERS MARKET SHARE (‘INFOGRAPHIC: CLOUD MARKET SHARE’ N.D.)............................................45 FIGURE 3.3. GITHUB ACTIONS - SAMPLE WORKFLOW .................................................................................................51 FIGURE 4.1. SYSTEM CONTEXT DIAGRAM - SINGLE REGION DEPLOYMENT.........................................................................73 FIGURE 4.2. SYSTEM CONTEXT DIAGRAM - MULTI REGION DEPLOYMENT .........................................................................75 FIGURE 4.3. FRONTEND APPLICATION PIPELINE. .........................................................................................................78 FIGURE 4.4. GITHUB ACTIONS BACKEND PIPELINE - DEPLOY APPLICATION STEPS................................................................79 FIGURE 4.5. TERRAFORM CLOUD CONSOLE. ..............................................................................................................81 FIGURE 4.6. FRONTEND APPLICATION......................................................................................................................82 FIGURE 4.7. DYNAMODB GLOBAL TABLES. ...............................................................................................................84 FIGURE 4.8. FRONTEND GLOBAL ACCELERATOR ENDPOINT GROUPS................................................................................85 FIGURE 4.9. CLOUDFRONT DISTRIBUTIONS FOR FRONTEND APPLICATION. ........................................................................86 FIGURE 4.10. BACKEND APPLICATION LOAD BALANCER - US-EAST-1 REGION.....................................................................86 FIGURE 4.11. FRONTEND APPLICATION CANARY TEST RESULTS – NO ISSUES. ....................................................................87 FIGURE 4.12. FRONTEND APPLICATION CANARY TEST RESULTS – ISSUE IN A REGION...........................................................88 FIGURE 4.13. FRONTEND APPLICATION CANARY TEST RESULTS – ISSUE RESOLVED..............................................................88 FIGURE 4.14. REAL USER MONITORING REPORT - US-EAST-1. .......................................................................................89 FIGURE 5.1. PIPELINE FAILED - FRONTEND TEST COVERAGE THRESHOLD NOT SATISFIED.......................................................91 FIGURE 5.2. PIPELINE SUCCESSFUL - FRONTEND TEST COVERAGE THRESHOLD SATISFIED.......................................................91 FIGURE 5.3. PIPELINE FAILED - BACKEND TEST COVERAGE THRESHOLD NOT SATISFIED.........................................................92 FIGURE 5.4. PIPELINE SUCCESSFUL - BACKEND TEST COVERAGE THRESHOLD SATISFIED ........................................................92 FIGURE 5.5. INTEGRATION TESTS - SYSTEM RUNNING NORMALLY ...................................................................................93 FIGURE 5.6. INTEGRATION TESTS – FRONTEND APPLICATION NOT FUNCTIONING CORRECTLY ................................................94 FIGURE 5.7. FUNCTIONAL TEST FAILED AFTER APPLICATION DEPLOYMENT........................................................................95 FIGURE 5.8. PERFORMANCE TEST RESULTS - PERCENTILE RESPONSE TIMES. ......................................................................96 FIGURE 5.9. PERFORMANCE TEST RESULTS - HIGH LEVEL STATISTICS. ..............................................................................96 FIGURE 5.10. LOAD TEST RESULTS - HIGH LEVEL STATISTICS..........................................................................................98 FIGURE 5.11. LOAD TEST RESULTS - PERCENTILE RESPONSE TIMES..................................................................................98 FIGURE 5.12. CURRENT SECURITY ISSUES REPORTED BY SNYK. .....................................................................................100 FIGURE 5.13. ISSUE INFORMATION PROVIDED BY SNYK. .............................................................................................100 FIGURE 5.14. SELENIUM TEST FAILED. ...................................................................................................................102 FIGURE 5.15. SELENIUM TEST PASSED. ..................................................................................................................102 FIGURE 5.16. POSTMAN USER INTERFACE ..............................................................................................................103 FIGURE 5.17. POSTMAN REPORTS FAILED TESTS IN PIPELINE. ......................................................................................104 FIGURE 5.18. POSTMAN REPORTS ALL TESTS SUCCESSFUL IN PIPELINE...........................................................................104
  • 13. D. Gallagher 10 Table of Tables TABLE 1. CLOUD SERVICE DECISION MATRIX..............................................................................................................11 TABLE 2. AWS CONTAINER OFFERINGS....................................................................................................................47 TABLE 3. AWS NOSQL OFFERINGS.........................................................................................................................63 TABLE 4. AWS LOAD BALANCER OPTIONS. ...............................................................................................................66 TABLE 5. AWS OBSERVABILITY SERVICES ..................................................................................................................70 TABLE 6. FRONTEND APPLICATION COST ESTIMATION. ...............................................................................................106 TABLE 7. BACKEND APPLICATION COST ESTIMATION..................................................................................................106 TABLE 8. COMBINED APPLICATION COST ESTIMATION – SINGLE REGION.........................................................................107 TABLE 9. COMBINED APPLICATION COST ESTIMATION – MULTIPLE REGIONS. ..................................................................107 TABLE 10. CODE REPOSITORIES. ...............................................................................................................................6 Table of Code Listings CODE LISTING 1. PSEUDO CODE FOR CODE DEPLOYMENTS............................................................................................80 CODE LISTING 2. SNIPPET OF BACKEND LAMBDA CODE.................................................................................................83 CODE LISTING 3. TEST FRONTEND CODE LOCALLY..........................................................................................................7 CODE LISTING 4. INSTALL RUM JAVASCRIPT LIBRARY.....................................................................................................8 CODE LISTING 5. RUM FRONTNED CODE SNIPPET. .......................................................................................................8
  • 14. D. Gallagher 11 Nomenclature Acronym Definition Page AWS Amazon Web Services 2 Azure Microsoft Azure 2 AWS Region An AWS region is a location positioned in a disperse around the world where AWS will cluster their data centres 4 AWS Availability Zone A logical grouping of data centres in an AWS Region 4 React React is a JavaScript library that is used for creating rich graphical user interfaces 5 DynamoDB DynamoDB is a NoSQL database provided by AWS that is both fast and flexible. 5 Container A container is a lightweight, standalone piece of software for packaging code along with the required dependencies. Containers enable an application to run quickly from 1 computing environment to another 6 Serverless Serverless is a development model for the cloud that allows developers to create, build, run and support applications whilst having to manage servers in the backend. 6 Active Directory (AD) Active Directory contains detailed information about user objects on a network and makes it straightforward for administrators and users to access this information 8 RTO Recovery Time Objective (RTO) is the time duration where a process must be restored to production normal after a disaster situation to mitigate against any adverse consequences associated with the disaster 12 RPO Recovery Point Objective (RPO) is the tolerable quantity of data which could be wiped or must be re-instated after application downtime. 12 DR Disaster recovery (DR) is the potential ability of an organization to respond to and recover from a disaster event which could impact upon the normal running of the businesses processes. The goal of DR is to resume normal operations with an organizations IT infrastructure as quickly as possible after a disaster occurs. 13 DNS Domain Name System (DNS) is how IP addresses associated with online services are mapped to domain names which are more human readable. DNS translates the domain name to IP addresses to the required content can be loaded in a browser. 15 MVP A minimum viable product (MVP) is a variant of a product that contains a subset of features to be evaluated by early customers to obtain feedback for future product development. 21 API Api is the abbreviation for Application Programming Interface. It is a collection of protocols and definitions for the development and integration of software applications. 24
  • 15. D. Gallagher 12 YAML YAML stands for ‘Yet Another Markup Language’. It is a data-serialization language that is human-readable and extensively used in data transmission as well as storage applications. YAML is also frequently used in application configuration files. 51 JSON JavaScript Object Notation (JSON) is based on the object syntax of JavaScript. It is a text-based format for encoding structured data. 54 PWA PWA stands for Progressive Web Applications. They are applications developed with manifests, service workers and other web-platform features to give end users an experience that may be similar in nature to native applications. 58 OSI Model The Open Systems Interconnection model or OSI model serves as a common communication standard that can be used for communication with different computer systems 66 RESTful API Also known as a REST Api. It is an application programming interface that satisfies the constraints of the REST architectural style and allows for communication between various systems. 67 WebSocket API The WebSocket Api is a technology that facilitates an open 2-way interactive communication session between the browser of a user and a backend server. Messages are sent to a backend server and the browser can receive event-driven responses without the need to poll the backend server for a reply 67 CRUD The term CRUD refers to applications that perform simple read, write, update and delete operations on a database table. 81 UI The UI or User Interface is the entry point for human-computer communication and interaction on a device. This can include keyboards, a mouse, display screens and the appearance of a desktop 101 GUI A GUI or Graphical User Interface is a screen through which a user interacts with electronic devices such as smartphones or computers using menus, icons and other graphics or visual indicators. 101
  • 16. D. Gallagher 1 1. Introduction Recent improvements in software development tooling have allowed small companies as well as large organizations to deliver software faster than ever before. From defining the initial requirements, to development of the feature, including shipping the feature can take anything from hours to months. Some companies have advanced DevOps processes with thorough automated test suites which are reliable whilst also allowing features to be delivered without human interaction (‘Continuous Integration, Delivery, and Deployment’ n.d.:21). Other companies have no pipelines or automated tests, with much of the testing completed manually. This becomes an issue with regression testing as quite often, defects can be introduced and not noticed if most of the testing is done manually. Figure 1.1 depicts an image that is commonplace in presentation decks for large organizations which depicts the excepted delivery date of a feature. Often, tasks such as pipeline development or code quality checks are neglected to hit pre-defined deadlines. This may result in the code artifact getting shipped on schedule, but over time when code changes need to be introduced, skipping the tasks to meet the original schedule come back to haunt the development teams. Figure 1.1. Software Delivery Schedule
  • 17. D. Gallagher 2 The main goal in common with organizations, it is vital their software solutions are highly available to their end users. If a system is not available when a customer wants to use it, they may never return to use that system again. Reasons a system may not be available include: • A code defect may have been introduced • A database may become unresponsive • A server the code is deployed to is experiencing issues. These issues are when the code is already deployed, issues could also arise during the deployment of new code or features. Any of these issues will result in a negative impact for the reputation of that organization. For systems in the social media space, this may not be such a significant problem , but reputation is extremely important when working in sectors such as financial services. Whilst reviewing various cloud providers such as Azure (‘Cloud Computing Services | Microsoft Azure’ n.d.) or AWS (‘Cloud Services - Amazon Web Services (AWS)’ n.d.), services are available which limit the impact of servers becoming unavailable. Both providers offer a significant range of services which are being expanded on constantly. AWS and Azure both have Well Architected frameworks (‘AWS Well-Architected Framework’ n.d.; ‘Microsoft Azure Well-Architected Framework’ n.d.) which can be used be by clients as a template to improve the quality of their workloads. The aim for systems is to ensure they are highly available. It is also not uncommon for issues to exist with the services provided by cloud providers. This dissertation introduces various approaches and patterns that can be followed to ensure high availability for many aspects of a cloud application. 1.1. Purpose High availability is vitally important for applications, regardless of the size of the customer base. Systems such as social media or banking platforms are developed to be used by customers on the customers’ schedule. As discussed in the introduction, there are reasons why a system may not be available. This research presents an approach for architecting applications along with the associated DevOps processes to minimize any disruptions to a system. Following typical architectures will provide a good level of availability but as we have seen with recent AWS outages (Moss n.d.), it is possible for an issue with the cloud provider
  • 18. D. Gallagher 3 to bring down your application. This dissertation will explore approaches that can be followed with pipelines to ensure a code deploy will not affect an application. It will also examine a fault tolerant architecture for a 3-tier application including how issues may be identified in an automated fashion before being reported by the end users. 1.2. Background With the ever-increasing need for software features to get delivered quickly, perform in a highly available fashion, it is imperative that solutions are setup in a way to facilitate this. This starts with the DevOps process around how the code is stored, the pipelines that are used to package, test, quality check along with deploying the code. Within the chosen cloud provider, how that code is deployed, including serving that code to end customers is also a key consideration. It is straightforward to deploy code onto a single server as well as expose an endpoint for customers. However, if the traffic increases to that service, or there is a disruption with the cloud provider, that application could easily become unavailable. Techniques such as application auto scaling can be setup to avoid this situation but there are other factors that need to be managed to avoid a complete outage. End users may become frustrated when an application they want to use is not available. Ideally, system issues should be resolved before impacting upon users. Techniques such as synthetic monitoring (‘Using synthetic monitoring - Amazon CloudWatch’ n.d.) provide a mechanism to identify issues in a scheduled fashion. Systems such as static websites often are neglected when it comes to automated monitoring whereas backend applications tend to have monitoring setup. As with automated testing, the more automated monitoring that is in place, the more beneficial it is for an application. Automated testing increases the likelihood that errors or outages will have been identified automatically before users try to access the system. 1.3. Problem Statement With the vast adoption of cloud services, organizations are deploying their code with a trusted cloud provider. This brings its own benefits such as not having to manage servers, reduced costs, plus the ability to spin up systems in a fast manner without having to purchase servers. However, as was seen with the AWS issues on the 6th December 2021 (Moss n.d.) and on the 22nd December 2021 (Moss n.d.), more needs to be done to ensure applications are highly
  • 19. D. Gallagher 4 available. Both outages in the us-east-1 region caused major disruptions for hundreds of reputable brands around the world. An AWS region is a physical area where AWS data centres are clustered throughout the world (‘Global Infrastructure Regions & AZs’ n.d.). The problem statement is: Recent outages in the AWS us-east-1 region lasting 5 and 8 hours respectively have impacted on applications such as Slack, IMDb and McDonalds, architecting cloud applications to be highly available will limit the impact of regional outages in the future. As identified with the outages, the reliance with the us-east-1 region is huge. It is the default region when a new AWS account is created. AWS has the concept of availability zones. An availability zone is an isolated location within an AWS region (‘Global Infrastructure Regions & AZs’ n.d.). An application can be deployed in numerous availability zones within a specific AWS region to help with high availability. But this will not guarantee high availability. If a natural disaster such as an earthquake or tsunami occurred close to an AWS datacentre for a region, there is every possibility that every availability zone in that region could become compromised. It is important to remember that achieving high availability can be costly. Implementing high availability must align with the main objectives that a business is aiming to achieve (Sarkar and Shah 2018). This dissertation will explore what it takes to deploy applications across multiple cloud regions. This will raise questions in terms of how the data is stored in databases across multiple regions, traffic management between these regions. How code is deployed to these regions in a safe, efficient manner will also be explored. This dissertation will also look at how a region can be taken out of service if issues are occurring in that region or scheduled maintenance is occurring. Overall, the goal is to ensure that if a user wants to interact with the system, it is available no matter what event is occurring across the network. 1.4. Research Question Ensuring an application is highly available, accounts for most outage scenarios, including those with a cloud provider is not a straightforward task. Costs may well increase by architecting an application to account for every outage scenario, but it may be a small price to pay to maintain a solid reputation.
  • 20. D. Gallagher 5 The research question considered in this research is: Can DevOps play a part in architecting an application to withstand various outages which may occur in a cloud environment? To answer this research question, 3 aims were identified. These are: 1. Identity key characteristics which indicate high availability of an application with reference to how it is different in a cloud-based environment. 2. Identify key processes that can be applied to enable high availability of applications deployed in the cloud. 3. Design and implement a solution to deploy a highly available application that will function in the event of sample test outages 1.5. Scope and Limitations GitHub actions will be used to create the necessary pipelines. This will integrate with the source code located in GitHub to provide a seamless integration between the pipelines and the code. AWS will be used as the cloud provider to deploy a 3-tier application in multiple geographically disperse regions. Azure and others provide similar functionality but due to limited scope, this dissertation will be restricted to AWS. AWS tooling such as CloudWatch will be used to monitor the application whilst also ensuring any outages are identified before consumers identify the issues. In the proposed solution, if an issue is encountered, scripts will be executed to bring an AWS region out of the pool which accepts traffic. Research will be done on open-source tooling that can be used to assist with monitoring. 1.6. Methodological Approach A sample 3-tier application will be created which will have a React frontend, Python backend, talking to a DynamoDB database. This application will be stored in GitHub separate repositories for the frontend and backend. A series of GitHub actions will be created for the various tiers which will be used for deploying the application onto AWS. Any cloud infrastructure created will use an Infrastructure as Code tool such as Terraform or AWS CDK to ensure the process is repeatable.
  • 21. D. Gallagher 6 For testing of outages or errors, the AWS CloudWatch toolset will be used which will also be setup with the chosen Infrastructure as Code tool. The final delivered artifact will be a reference implementation which can be used for any programming language to architect highly available applications. 1.7. Report Outline A review of current literature related to the subject matter was conducted and is discussed in chapter 2, covering Cloud Computing, Microservices infrastructure, containers, serverless as well as DevOps. The design and implementation of architecting a highly available application is covered in chapters 3 and 4, answering the research question posed in chapter 1. Chapter 6 concludes the dissertation by discussing the architected solution of architecting highly available applications which suggests further work and experiments which could occur in the future.
  • 22. D. Gallagher 7 2. Literature Survey The research in this chapter will initially focus on cloud computing and how companies have evolved from on premise deployments to using the cloud. There is a focus on highlighting the challenges that exist when deploying to the cloud before focusing on the various cloud service offerings. Disaster recovery is discussed in terms of how it can relate to cloud environments along with the various considerations which need to be thought of for multi region deployments as well as region failover scenarios. The research then looks at the DevOps process around code management, the pipelines involved, before discussing the makeup of a cloud application which can prove out the research question. This dissertation will have a focus on best practices and techniques for architecting applications that are highly available when deployed to a cloud environment. 2.1. Overview of Cloud Computing 2.1.1. Why deploy code to the Cloud vs On Premises When the decision arrives to deploy an application on the cloud or deploy to on premise servers, that decision needs to be evaluated on a per organization basis. In the past, larger organizations would have had the resources to spin up their own datacentres to deploy their applications. Applications as well as databases for the organization would live in that 1 datacentre and there may also exist a disaster recovery site. However, those servers would have to be accounted for, those servers came with costs associated with licencing, power, rack space as well as costs associated with the upkeep of the building the servers are located in. The duration of time it takes to provision a new server in a datacentre ranged from weeks to months which meant the need for substantial planning to be made in advance prior to an application being deployed to production. When deploying applications on the cloud, it is a different mindset. The costs that exist per server for power, rack space is no longer present when deploying to the cloud. The upfront cost of purchasing a server does not have to be made, but cloud providers do allow you to purchase credits on a server to keep the overall costs down. In relation to cloud, organizations have the option to adopt a pay as you go model.
  • 23. D. Gallagher 8 “Using cloud infrastructures and platforms is convenient because services on demand offers high flexibility and pay as you go pricing offers low costs.” (Toivonen 2013, p.17) As mentioned with on premise servers, the time to provision a server does not exist on the cloud which leads to shorter times to turn around new applications or proof of concepts. With the evolution of new cloud services, it is possible to design and build applications in a fashion which is not possible in a traditional datacentre. 2.1.2. Challenges faced when deploying to the Cloud Deploying to the cloud versus deploying to on premise servers does bring its own challenges. For security conscious organizations, source code and access to that code has restricted access. Deployment of code to any cloud provider needs to be performed in a safe fashion to ensure that proprietary code does not make its way into the public domain. When working on premise, the access to those servers is controlled with systems such as Active Directory. Different authentication mechanisms need to be setup to ensure appropriate access is provided to the correct individuals when accessing cloud resources (Garrison et al. 2012). Services on the cloud bring about their own costs with regards to data transfer, storage, backups that are not an issue on-premises. It is important when deploying to the cloud that considerations are made for setting up automated backups as well as testing strategies for performing rollbacks if required. With on-premises servers, the organization oversees the software installed. With cloud services, it is the cloud provider who oversees the core software installed on platforms. Organizations need to be vigilant when it comes to monitoring the cloud provider roadmaps to ensure they are complying with the supported versions of software they are using. Finally, a common danger with the vast array of service offerings in the cloud is ensuring that the correct service is used to satisfy the requirements of an applications. It is straightforward to write a hello world application that uses a particular service but supporting that service in a production environment can lead to its own array of challenges. The ability to manage peak loads, disaster recovery situations or being able to manage scheduled downtime during patching of certain services are among items that need to be considered.
  • 24. D. Gallagher 9 2.1.3. Types of Outages There are multiple distinct types of outages that can occur on any cloud provider. From an application perspective, if deployed code is not thoroughly tested, this could make an application unresponsive after a rogue deployment. The development team is responsible for ensuring they follow a rigid DevOps process to mitigate against untested code getting deployed. An outage could also occur on an application during a code deployment. The outage may not last long, but for end users, this is still and unexpected outage that will affect the brands reputation. This research will examine how outages can be avoided as part of the DevOps process as well when an application is running in a production setting. From a cloud provider perspective, there are occasions where a region may become unstable or unresponsive (Moss n.d.). Incidents such as data centre power loss, natural disasters such as earthquakes or tornados could bring an entire region offline. Scheduled maintenance of a managed cloud provider offering could make the service unusable. When deploying to the cloud, it is important to consider that all services can have outages. It is imperative to have this mindset to make an application as resistant to outages as possible. In an article related to cloud storage mechanisms, the authors evaluate various outages, where they arrived at the following conclusion. “The reason after the post-investigation of most of these outages revealed that the main root cause was the expected and predicted failures, while others happened due to the failure of correct components in the recovery process.” (Tahir et al. 2020) In this statement, the words expected, and predicted failures stand out to this author. Teams need to ensure that expected and predicted failure situations are accounted for before trying to manage the unexpected failures that could potentially occur. This research will examine the techniques that can be used for a 3-tier application to ensure there are no application outages. The survey of research into the area of cloud computing indicates the need for research into the role cloud computing can play in achieving high availability.
  • 25. D. Gallagher 10 2.2. Infrastructure 2.2.1. Virtual Machines There are a multitude of different options for deploying applications on the cloud with 1 of those options being virtual machines. This closely resembles deploying an application on an on-premises server. The main difference with virtual machines on the cloud is that they are meant to be ephemeral in nature, meaning they should not have an exceedingly long lifespan. As mentioned in a previous section, it is the cloud provider who oversees the underlying operating system on the virtual machines. There are no guarantees that a virtual machine which is launched on a given day will be compliant the following day. To get around this, it is imperative that the process for deploying applications to these virtual machines is automated with a rigorous CI/CD process. With virtual machines, there are a vast array different instance types that can be chosen from. Each instance type has their own characteristics, there are instance types that are compute optimised, whilst others are storage optimised. Newer versions of these instance types can offer increased performance or greater cost savings. It is important to periodically benchmark the instance types (Akioka and Muraoka 2010) that are in use with the newer offerings from a cloud provider to validate the applications are running as efficiently as possible. 2.2.2. Cloud Service Offerings Every cloud provider offers a waste array of services that can be used. These can range from virtual machines that were discussed in the previous section to containers(Merkel 2015), to serverless (Eismann et al. 2021)offerings. Again, choosing the correct service is a choice which is made based on the organizations needs and the application that is being deployed. Containers are useful for applications that maintain state or need to be long running in nature. A container is a way of packaging code in a format that can run on any operating system in a uniform way. The way of packaging the container is called an image. That image contains the necessary steps required to install the software. When deploying containers, they can be deployed in a fashion that is always on or deployed in a serverless fashion using newer offerings as provider by the larger cloud providers. On the other side, serverless is a solid option for short lived REST calls, standard CRUD operations and for deploying individual microservices. Serverless is an excellent option when you want the ability to scale from little or no traffic to huge spikes in traffic. The main selling
  • 26. D. Gallagher 11 point with serverless is you pay for the resources that you use. This makes it an attractive option for cost conscious organizations. 2.2.3. Cloud Service Decision Matrix The following table displays a decision matrix that can be used to determine the correct cloud solution to be adopted based on the authors experiences. This table evaluates the current state of an application and the deployment choices that can be made with or without code changes. The correct service should be chosen based on the characteristics of the deployed application. Table 1. Cloud Service Decision Matrix. Development Team Choice Use Virtual Machines Use Containers Use Serverless Are you deploying an existing application that requires no code changes? Yes No No Is there a docker image for the application or are there immediate plans to create a docker image? Yes Yes Yes Has the development / DevOps team the required bandwidth to learn a new cloud provider service such as serverless? No Yes Yes Is there potential for the existing application to grow into multiple smaller applications in the future? No Yes Yes Are keeping costs low with guaranteed high availability a requirement for this application? No Yes Yes Is there sufficient time available for the development team to create a correct solution for perceived future growth of the application? No Yes Yes Does the application contain tasks that may be long running (over 15 minutes)? Yes Yes No The choice of infrastructure will determine which DevOps techniques to use. The DevOps process for deploying applications to a virtual machine is different than when deploying to a container in the cloud. To focus the research, it is important to choose an overall infrastructure that requires processes to maintain high availability. 2.3. High Availability and Disaster Recovery With regards deploying applications in an organizations data centre, managing high availability and disaster recovery may involve switching traffic from 1 data centre to another or periodically switching the active data centre to be the previously idle data centre. In these
  • 27. D. Gallagher 12 situations, the organization is in control of the infrastructure and operations that occur within the datacentre. In terms of handling disaster recovery in the cloud, there are more considerations that need to be made. The cloud provider oversees the infrastructure and how the disaster recovery situations are managed. It is the responsibility of the organization to design their applications in such a way that they can manage disaster recovery situations and hence be highly available. Disaster scenarios can occur due to human error within the datacentres, natural disasters such as earthquakes or tornados destroying a datacentre or even just loss of power to a datacentre within a specific region. The following sections discuss disaster recovery terms as well as various disaster recovery options in the cloud, with a focus on how each approach impacts on high availability and cost. 2.3.1. Disaster Recovery Metrics In terms of disaster recovery metrics, there are 2 metrics which can be used to measure an application. Recovery Time Objective or RTO is a term used to define the time it takes to restore an organizations process to the agreed upon service levels after a disruption or disaster (Hamadah and Aqel 2019:1). For example, if a disaster were to occur at 1PM noon and the RTO is 8 hours, the disaster recovery process should recover the organizations service to the previously accepted service level by 9PM. Recovery Point Objective or RPO is the tolerable quantity of data loss for a system measured in units of time (Mendonça et al. 2019:2). For example, were a disaster situation to occur at 2PM and the agreed RPO is 1 hour. The system needs to be capable of recovering the entire dataset that was in the system before 1PM. In this situation, the data loss will be for 1 hour – 1PM to 2PM. 2.3.2. Backup and Restore Backup and Restore is a technique where backups of data is stored in a region, if a disaster were to occur in that region, the data is exported to a separate region (Robinson et al. 2014:9). Additionally, to exporting the data, the configuration must be redeployed, as well as the infrastructure, and application code in the new target region. An output of this process means the RTO and RPO would be low. There is a potential for data loss which may not be acceptable in domains such as financial services. This approach would not guarantee high availability as the time to copy the data to the new region as well as provision the required infrastructure
  • 28. D. Gallagher 13 would lead to application downtime. On the flipside, this approach would be cost effective as the applications are only deployed in 1 region at a time. Following a backup and restore strategy may be suitable for charity organizations, static websites or in problem domains where data loss and downtime is acceptable. 2.3.3. Pilot Light Pilot Light is a technique where data is replicated from 1 region to another region where a core minimal version of an organizations workload infrastructure is in operation (Trovato et al. 2019:5). Processes to replicate databases or file storage are always turned on. Application servers are pre-installed with application code, configuration and left in a turned off state unless testing is taking place or there is a disaster recovery situation. Systems in the DR region will only be switched on when a disaster recovery situation occurs. Unlike Backup and Restore, the core infrastructure is always ready to be turned on. The RTO and RPO for Pilot Light is lower than Backup and Restore but there is still the possibility of data loss when switching regions as well as the application being unavailable. This approach will cost more than Backup and Restore but it allows organizations to recover business critical applications in a timelier fashion. Pilot Light may be suitable for organizations that have a small set of critical applications with other applications being deemed non mission critical. The critical applications will be in a ready to launch state in a separate region at any stage. 2.3.4. Warm Standby Warm Standby is an approach where a fully functional scaled down copy of an organizations production environment is available in a separate region (Robinson et al. 2014:14). This approach extends the Pilot Light concept and decreases the time it takes to recover from a disaster situation as the workload is always running in another region. The environment in the DR region can be scaled up when required to guarantee it can manage the expected traffic volumes. This approach will be more expensive than Pilot Light as the infrastructure in the DR region is always running. It does however offer benefits of having a decreased RTO and RPO. Warm Standy may be suitable for organizations who have business critical applications and require high availability.
  • 29. D. Gallagher 14 2.3.5. Multi-site Active / Active Multi-Site Active/Active is an approach where an organization simultaneously runs their workload in multiple regions at the same time in an active/active or a hot standby active/ passive strategy (Robinson et al. 2014:16). The active/active strategy is used to serve traffic from every region in which the application has been deployed to. The warm standby strategy is used when serving traffic from a single region only, with different regions used in a disaster recovery situation. This approach is the most complex and expensive, but it is the only approach which will guarantee high availability if following active/active. In the hot standby approach, there is the possibility that users may not be able to access an application whilst the hot standby version of the application becomes the primary version of the application. This approach is the preferred option for organizations that require high availability and cannot tolerate any level of downtime for their applications. It also offers the benefit of being able to serve customers in various locations based on their geographic location to a specific cloud region. 2.3.6. Comparison of Disaster Recovery Options The diagram below in figure 2.1 displays a comparison of the various disaster recovery options with specific emphasis on the impact in relation to RTO and RPO. Understanding the various disaster recovery techniques are key when it relates to measuring the high availability characteristics of an application. For mission critical applications, A multi- site active / active approach may be needed. For small traffic applications that are not mission Figure 2.1. Cloud Disaster Recovery Options
  • 30. D. Gallagher 15 critical, the backup and restore technique may suffice. By understanding the criticality of an application, it can help decide on the required high availability characteristics. The research in this dissertation will outline the processes that can be developed to achieve high availability. 2.4. Multi Region Architecture Considerations 2.4.1. Application When developing applications to be deployed in a multi-region configuration, some considerations need to be made. The application needs to be developed in such a way that it can be deployed to separate regions with no code changes. Any items that require changes should belong to configuration that is specific to a region. It is imperative that the application can be deployed to net new regions with no code changes. There are a set of guidelines called the Twelve Factor Application (Wurster et al. 2017:4) which should be followed for every cloud application. These guidelines become more important as organizations discover the need to deploy an application across multiple cloud regions. Another consideration that needs to be made is when connecting to services like databases or message queues. In a scenario where these services are also deployed in a multi-region fashion, organizations need to guarantee that if those services failover to another region, the application can manage this situation. Using techniques such as top-level DNS entries can help to ensure that applications are not concerned with what region a database or service is deployed in. The responsibility of managing situations where a database or other service fails over, rests on the application developer. They must ensure the application can manage this situation gracefully and continuing responding successfully to user requests. 2.4.2. Database When deploying databases, it is important to think of how the database will behave in the event of a DR situation. For high availability, it is important the database service is deployed in a multi-region fashion. Deciding which region contains the primary database is important so every other database replica can keep coordinated with the main copy. The speed of replication of data between regions is important to guarantee data consistency. As discussed earlier, how applications connect to the database needs to be considered. In the world of microservices with smaller services getting deployed, managing the number of connections
  • 31. D. Gallagher 16 to the database is important to ensure the database does not get overloaded. It is vital to set the maximum allowed connections on the database to an acceptable level, then work with the application teams to guarantee this will satisfy the projected connection request demands. Application teams should consider deploying read replica versions of databases across regions to serve read only requests. This will free up the main database for write requests by taking away the load which would have been generated by read requests. Finally with databases, it is important a backup strategy is in place to ensure if an issue arises in any region, the database can be restored to a known good state. It is important to consider the multi region architecture decisions for each layer of an application. What will work for a frontend application will not work for a database. This section has been included to highlight that this research will look at high availability across an entire application tier. 2.5. Region Failover Considerations Performing a region failover needs to be a task that has been planned and evaluated prior to the event happening in production unexpectantly. By testing out the process, minor issues such as missing credentials or invalid paths for application source code can be found and rectified. When failing over an application between regions, topics such as the database connection string as well as the top-level DNS to use, need to be seriously considered by application teams. It is important that a region failover can occur as efficiently as possible in a production like environment. Any delay in performing a failover can result in an adverse impact on the overall availability of an application, which in turn may lead to disgruntled customers. For the ideal scenario, a region failover should be transparent to the end customer and every step should occur in an automated fashion. DR strategies such as Warm Standby and Multi-site Active/Active can help to make the process of a region failover smoother. When architecting applications, it is imperative to choose cloud service offerings that will work for the application whilst also supporting multi-region capabilities. By choosing the cloud services wisely, it can simplify the regional failover process.
  • 32. D. Gallagher 17 For UI based applications and REST based services, high availability when it comes to regional failover can be obtained using load balancers. In a paper which discusses high availability in the cloud, the authors discuss using the Hadoop software library for managing high availability. “Rather than rely on hardware to deliver high availability, the library itself is designed to detect and manage failures at the application layer, so delivering a highly available service(s) on top of a cluster of computers, each of which may be prone to failures” (Singh et al. 2012) The paper further discusses how hardware can fail which could in affect make nodes inactive when in fact they could service traffic. Choosing a software-based load balancing approach over a hardware-based approach guarantees the load balancing can be tweaked to suit the applications needs. The focus of this research is the ability to divert traffic to different regions in the event of a disaster situation. It is important to understand that any layer within the application can fail, the ability to handle this failure gracefully will prove crucial in validating the success of this research. The practical element of this dissertation will outline a solution to prove this approach is feasible. 2.6. Code Deployment 2.6.1. Manual When it comes to deploying code to a cloud service, the quickest approach is to package the code up on the developer’s machine and manually deploy to the cloud service. This approach is sufficient for quick proof of concept projects or demonstrations, but it soon becomes very inefficient. By factoring in the time duration of packaging the code, run the automated tests, log into the cloud provider console, upload the packaged artifact, and deploy, this time adds up daily. If the process takes 10 minutes and the developer attempts 6 deployments a day, this is an hour taken up in that developer’s day. This approach is error prone and can lead to issues further on in a project’s lifecycle. Required dependencies to build an artifact or run tests may exist on the developer’s machine. The steps to properly execute the steps may not be documented or properly defined. Overall, this makes tasks for future developers who may inherit this work more complex.
  • 33. D. Gallagher 18 The longer a manual process is followed, the more complex it is to obtain buy in from management to spend time on automating this task. 2.6.2. Automated In well-structured teams, there is evidence of rigid CI/CD processes. • Code is stored in a code management tool • Code is built using pipelines • Automatic testing of the code is performed in the pipeline • Every deployment to the cloud is automated In a situation where an automated pipeline exists, it is more straightforward to extend a pipeline to add code quality check tools, vulnerability checkers and other tools which may improve the overall codebase. By removing manual steps from the process of deploying code, it ensures there is an accurate, repeatable process in place for deploying code to a production environment. As will be discussed in a subsequent section, there are many benefits to using pipelines, not least the amount of developer time that will be saved with not having to manually deploy code. 2.6.3. Hybrid In a hybrid approach, there is an automated pipeline in place but certain steps in the process require manual approval. To be fully confident of shipping code directly from source control to production with no manual checking, requires a full suite of unit tests, integration tests and performance tests. If a project is not at that stage of their evolution, the best that can be done is to deploy code to a non-production environment, perform sanity checks / testing in that environment before approving the deployment to production. It would be ideal to be able to automatically deploy code to production but in cases where this is not possible, the manually approval is a safeguard to ensure rogue code does not inadvertently find its way into production. The hybrid approach may be used in organizations that have a rigid change control process involved for production installations. In this scenario, the manual step could be to enter a ticket number for a fully approved change ticket before the change is deployed. The following diagram highlights what an approval may look like in a sample GitHub actions pipeline.
  • 34. D. Gallagher 19 Figure 2.2. GitHub Actions - Manual Approval Having an approach for managing code deployment is vital to ensuring there are processes in place to automatically handle regional outages. Multiple developers can work on the overall process, and it can be refined over time as well as shared with other groups. These approaches play a small part in the overall process of achieving high availability. 2.7. Code Management 2.7.1. Single Developer Projects When it comes to projects that involve just 1 developer, often speed of development is treated as priority over following standards. It is effortless to develop code on a developer’s machine, ignore unit tests and deploy the same code from the developer’s machine. In cases where the developer may be developing a proof of concept for a larger design, this approach is justified. In larger projects intended for production use, the pitfalls of ignoring standards could decrease the quality of the generated project which over time may impact on the product. Potential pitfalls that may be encountered by not following a set of standards include: • No source control system in place: o Harder to onboard new developers to the project o Potential loss of code if developers’ machine is lost/stolen / damaged • No unit tests developed for project: o Issues that could have been found and resolved with unit tests make their way to production
  • 35. D. Gallagher 20 o Potential to introduce defects with every release o Low level of confidence that a slight code change will not have a negative effect on the rest of the codebase • Deploying code to production from a developer’s machine: o The process to deploy code to production is only known by 1 developer o Potential inconsistency in the artifact(s) deployed to production In an article that focuses on software development for individual developers, the authors talk about standards that apply to team projects (MIDS in this case) can be easily applied in single developer projects without changing the core essence of the standard (de León-Sigg et al. 2018). It is important that standards are followed where possible. An adoption of standards will not only improve the overall quality of the code delivered, but it will also help to simplify the onboarding of new developers to the project. When multiple developers are on the project, the process of managing code changes will be simplified. In an article related to coding practices, the authors discuss some techniques which can be used to improve code readability which in turn will help to devise the coding standards for a project (dos Santos and Gerosa 2018). Using techniques in the paper by dos Santos and Gerosa (2018) will help to improve a project whilst also helping to move away from the single developer mindset. 2.7.2. Source Control The importance of using source control for any project cannot be understated. As discussed in the previous section, using source control is a technique that can be used which can help move a project away from the single developer mindset. By using a source control system, it makes the process of collaboration amongst a team more straightforward. The collaboration benefits of using a source control tool like git are evident in an article where the author discusses using git to foster teamwork in the South African classroom (Blauw 2018). Git can store code for small projects as well as large enterprise grade projects. It can be used for projects developed in any language and has many features such as branching and pull requests which can be used for developers collaborating on projects. When starting with git, it is important the team members decide on the branching strategy to follow.
  • 36. D. Gallagher 21 2.7.2.1. Branching Model: GitFlow GitFlow is a branching strategy that employs the use of feature branches and multiple primary branches (Atlassian n.d.). GitFlow utilizes branches that are longer lived and contain larger commits. When using this strategy, developers can create feature branches and delay the merging of code into the main branch until the feature is fully implemented. A downside of this approach with long-lived feature branches is the increase in the collaboration required amongst developers to merge changes. It is also straightforward for conflicting updates to be introduced by developers. Refer to the diagram in figure 2.2 for an overview of the GitFlow Branching Strategy. GitFlow works best: • For managing an open-source project as all code must be checked in pull requests • When there are mostly junior developers on the team who can preview their changes on long lived feature branches before merging into the main branch • When the product that you are maintaining is well established as future changes are minimal and need to be monitored closely. Cases to avoid GitFlow are: • When you are starting a project as the pull request process can slow down the task of generating an MVP • When you need to iterate quickly as the pull request process can get in the way Figure 2.3. GitFlow Branching Strategy.
  • 37. D. Gallagher 22 • When there are mostly senior developers on the team as they are trusted and should be given the autonomy to do their job 2.7.2.2. Branching Model: Trunk Trunk based development is a source control branching model which allows developers to merge smaller, more frequent updates to the core main or trunk branch (paul-hammant n.d.). As the trunk-based approach streamlines the merging and integration phases, it helps bring about continuous integration and continuous deployment as well as increasing software delivery. The diagram represented in Figure 2.3 gives an overview of the Trunk Based Branching Model. High-performing engineering teams use the trunk-based development strategy as it sets and maintains a simplified Git branching strategy for teams. It also gives teams the flexibility and control over how and when software is delivered to customers. Trunk based development works best for (‘Trunk-based Development vs. Git Flow’ n.d.): • When a project is just starting up as it offers maximum development speed • When you need to iterate quickly as the trunk based approached allows you to change the product quickly when required • When there are mostly senior developers on the team Cases to avoid the Trunk based approach are: • When you run open-source projects as those projects are more suited to GitFlow • When the product is established, or you have large teams as strict control is required. GitFlow is recommended for this scenario. Figure 2.4. Trunk Based Branching Model.
  • 38. D. Gallagher 23 • When there are mostly junior developers on the team 2.7.3. DevOps Code Pipelines When it comes to deploying code, it is imperative to implement a pipeline strategy to achieve the goal. A pipeline can take the manual steps away from deploying code and replace those steps with a repeatable process. The use of a pipeline not only provides structure for deployments, but they can also be used to run code quality checks, run various forms of tests as well as drastically decrease the time duration required for a developer to deploy code to a production environment. The use of a pipeline is necessary when implementing Continuous Integration and Continuous Deployment for a project. By having a pipeline that is executed regularly, it can supply a benchmark for improving the overall quality of the project. A pipeline can be treated like a code artifact which can evolve over time. In a recently reviewed article, the authors highlight the importance of pipelines by stating they are mainly used for continuously executing steps to ensure an application can be deployable at any time (Beetz and Harrer 2021). A pipeline can be basic at the beginning with iterations taking place to add extra features for tasks such as code validation or running tests. Once a pipeline structure is in place, there is no limit to what can be achieved during the lifetime of the pipeline. There are benefits to implementing pipelines but there are also challenges including: • Choosing the pipeline technology to use. There is various open source as well as commercial pipeline options available. Choosing the pipeline technology to use for a project can be difficult. • The ramp up time for developers to learn a particular pipeline technology or syntax needs to be factored in when deciding on the technology to use. • Maintaining the pipeline infrastructure if a self-hosted pipeline technology is chosen • Managing the security for integrations (e.g., credentials for deploying to a cloud provider) • Finally, switching between pipeline options is not a trivial task and has the potential to introduce the need for re-work on the code pipelines. This research will show how a combination of DevOps processes and techniques will form grounding for building a highly available solution.
  • 39. D. Gallagher 24 2.8. Cloud Application Architecture This section will examine the various application types that can be architected and developed as part of this research. A key aim of this research is to implement a solution for deploying a highly available application that will function in the event of sample test outages. It is this authors opinion that the best way of achieving this is to develop a 3-Tier application. A 3-tier application consists of a presentation layer (frontend), application layer (backend code) and the data layer (data storage). A benefit of a 3-tier application is the ability to foster the reuse of software components between various different applications (Abdelrahman et al. 2020). When it comes to the various application tiers, each tier has their own responsibilities. The frontend is the gateway to the world, it is the frontend that contains any user interfaces which can be used by end customers. The frontend will contain visual screens which simplify the process of interacting with the backend. There are a vast array of programming languages and frameworks available to develop frontend applications with further technologies being developed on a regular basis. The backend performs the heavy lifting for the application. The backend runs any business logic in response to events from the frontend. With the frontend being the gateway for customers, the backend is the gateway to the required data. The backend can be accessed by advanced users or systems using api calls but for most users, the interaction to the backend is via the frontend. Like the frontend, there are a vast array of technologies and frameworks which can be used to develop backend applications. Finally, the data layer (database) is the most important part of any application. The data layer has the responsibility of storing the data which is accessed by the backend processes and is subsequently rendered to customers in the frontend. Every application has its own data requirements, and these will be touched on in this section. There is a vast array of different technology choices which are available when it comes to the data layer. This research will expand on the 3-tier application and highlight key considerations that need to be implemented to make an application highly available.
  • 40. D. Gallagher 25 2.8.1. Frontend Application 2.8.1.1. Monolithic Frontend A monolithic frontend application is a feature-rich, powerful browser-based application which interacts with micro services in the backend. Over time the frontend layer grows and may be developed by separate teams. In this situation, the frontend application becomes more difficult to maintain as it grows and adds new functionality. The diagram below in figure 2.4 depicts a high level architecture of a monolithic frontend for a Shop application. As visible, there are multiple microservices in the backend but only the 1 frontend application which may be maintained and developed on by multiple teams. The monolithic frontend is an anti-pattern which can occur over time on a frontend project. Pavlenko discusses how a monolithic architectural style frontend is difficult to scale and in some cases, impossible to scale (Pavlenko et al. 2020). Teams may still want to develop features concurrently, but this may not be possible in all cases. Pavlenko et al. argue that the use of micro frontends is a solution to this problem. 2.8.1.2. Micro-frontend A micro-frontend is a pattern where web application user interfaces are composed from independent fragments which may be built by different teams using a broad array of Figure 2.5. Monolithic Frontend
  • 41. D. Gallagher 26 technologies. A micro-frontend architecture resembles a micro service backend architecture where the backend is composed of independent microservices. Various approaches exist in which micro frontends can be implemented in terms of splitting up functionality (Mezzalira 2021). In the horizonal split approach, multiple micro-frontends can exist within the same UI view. Multiple teams will be responsible for distinct parts of the view and must coordinate their efforts. This approach offers flexibility in that teams can share functionality but also teams need to be careful to not introduce unnecessary micro-frontends within the same project. This approach is suitable for large sites with an extensive feature set such as shopping sites. A team could develop the catalogue where another team could develop product recommendations. The second approach is a vertical split, where the individual teams are accountable for a particular problem domain. In this approach, it is harder to share code between teams, but it allows flexibility in terms of deployments. These systems are developed as individual systems but branded with a company header and footer to give the appearance of the systems belonging together. This approach is suitable for systems such as company intranets where different teams develop different intranet sites, but they all utilize a common theme such as colours and fonts. The horizontal and vertical split approaches both have the same end goal, which is to split up the frontend code into smaller more manageable chunks. Various teams can potentially work on different codebases. Benefits of using micro-frontends include: • Micro sites are technology agnostic – teams can use different technologies • The generated applications are independent and self-contained • Multiple teams can work on distinctive features • Development and deployment of the individual micro-frontends may be faster
  • 42. D. Gallagher 27 The diagram referenced in figure 2.5 depicts how the code for micro frontends can exist in the same source control repository or different source control repositories. The goal of the CI/CD pipelines process is around building the unique micro frontends which are combined for the overall deployed frontend application. 2.8.1.3. Single Page Application (SPA) A user interface that operates directly inside the browser which does not require a page reload when navigating between pages is referred to as a Single Page Application. This is achieved by the browser loading JavaScript chunks on page load which contains all the required logic that the browser will be dependent on. For any requests to the backend for data, these are done in an asynchronous fashion using ajax requests. Jadhav et al. discuss creating a single application using AngularJS, however they do not delve into topics such as server side rendering or authentication (Jadhav et al. n.d.). The focus on Jadhav’s articles is purely around getting started with developing a single page application. In an article with comparable topics on creating a single page application, the authors delve further into topics such as performance and reuse of components which are important for modern day applications (Gavrilă et al. 2019). 2.8.2. Backend Application 2.8.2.1. Monolith Like the frontend monolith, a backend monolith is essentially 1 project which contains all the business logic. All teams work on the same project and need to coordinate changes between each other. Any code changes to 1 module in the monolith could influence the other services Figure 2.6. Micro Frontend Architecture
  • 43. D. Gallagher 28 within the project. As the project grows, changes become harder to develop as well as cover with automated testing. Routine maintenance of the project could become a more complex task for developers. When deploying monolithic backend applications to the cloud, the deployment options are limited due to the size of the artifact to deploy and other constraints within the application. Creating monolithic applications make the development process more straightforward than creating microservices. Monolithic applications offer easier development and deployment options but are lacking when it comes to complex maintenance, reliability, availability and difficulties in scaling a monolith (Gos and Zabierowski 2020). 2.8.2.2. Microservice A microservice architecture is where an application is structured as a collection of smaller services that have the following attributes: • Are highly testable and maintainable • Have a more straightforward development process than with monolith applications • Can be deployed independently of the other services • Are loosely coupled • Are focused on a business capability • Are owned by a small team In the world of agile, microservices is a great enabler for rapid, frequent, and reliable delivery of applications to a production environment. In cases where microservices need to communicate with each other, challenges can arise. In those scenarios, queues or message bus technologies can be used for asynchronous communication. For synchronous communication, a method of http calls with associated retries will need to be implemented. There are a wide array of technologies which can be used for microservices, as well as steps for migrating to microservices (Larrucea et al. 2018). Larrucea et al discuss the pitfalls of microservices, but it is this authors opinion they could have discussed the complexities of managing highly available applications when it relates to micro-services.