The age of cloud computing presents the software architects with a unique set of opportunities as well as a unique set of challenges. Designing, building, and operating applications at cloud scale has changed the very nature of software architecture discipline to accommodate a much larger set of objectives and skills. Prior to the cloud era, software architecture was primarily about fulfilling functional requirements while maintaining code modularity and meeting a narrow set of non-functional requirements, such as performance. The cloud-era architect needs to accommodate not only functional requirements and customer-defined throughput and performance requirements, but also a large set of non-functional requirements related to cyber security, compliance, and most notably also the financial/cost characteristics, which at cloud scale can make or break a software-as-a-service company. The whole discipline of software architecture just became not only wider to accommodate all the above aspects, but also deeper as cloud-scale architecture spans all layers of software all the way down to operating system kernel tuning and in the case of private cloud also requires hardware know-how and hardware assembly design closely aligned with high-level application workload requirements to achieve reasonable performance and economics at cloud scale.
Software Architecture in the age of Cloud Computing
1. SOFTWARE
ARCHITECTURE
IN THE AGE OF
CLOUD
COMPUTING
JAROSLAV GERGIC
Industrial Keynote
16th European Conference on Software
Architecture (ECSA), Prague,
19 – 23 September 2022
2. AG E N DA
INTRODUCTION
CLOUD SCALE COMPUTING
ARCHITECTING CLOUS SCALE SAAS
CLOSING THOUGHTS
SUMMARY
3. JA RO S L AV
G E RG I C
Always busy building the next big thing,
now living in the confluence of
cybersecurity, machine learning,
and cloud computing.
2022 1995:
Cisco, GoodData, Ariba, IBM Research, Reuters, Mobil
Server, LCS International
Mentoring: StartupYard, JIC, MSIC
I
N
T
R
O
D
U
C
T
I
O
N
3
4. C LO U D
C O M P U T I N G
LET’S DEFINE THE TERM
C
L
O
U
D
C
O
M
P
U
T
I
N
G
4
6. E N T E R P R I S E
S C A L E
is no longer the
summit of software
architecture
E
N
T
E
R
P
R
I
S
E
S
C
A
L
E
6
7. LET’S TALK CLOUD SCALE
Cloud Computing =/= Public Cloud
“
”
C
L
O
U
D
S
C
A
L
E
7
8. LET’S TALK CLOUD SCALE
Software as a Service (SaaS)
“
”
C
L
O
U
D
S
C
A
L
E
–
S
O
F
T
W
A
R
E
A
S
A
S
E
R
V
I
C
E
8
9. B2C
Serve millions or billions of users
• Facebook
• YouTube
• TikTok
• Seznam.cz
B2B
(tens of) thousands businesses
• Salesforce
• Dropbox*)
• WorkDay
• GoodData*)
H OW B I G I S C LO U D S C A L E ?
C
L
O
U
S
S
C
A
L
E
S
A
A
S
–
B
2
B
9
10. B 2 B S A A S : Cloud scale is at lest
three orders of
magnitude bigger than
enterprise scale.
Because you need to serve
thousands of enterprises.
C
L
O
U
D
S
C
A
L
E
B
2
B
S
A
A
S
10
11. Reverse Migration
• Both Dropbox and GoodData started
originally on AWS
• As they grew, they sought to reduce costs
• GoodData migrated to Rackspace
managed hosting in 2014
• Dropbox migrated to their own datacenters
in 2016
B2B
(tens of) thousands businesses
• Salesforce
• Dropbox*)
• WorkDay
• GoodData*)
P U B L I C C LO U D V S P R I VAT E D C
P
U
B
L
I
C
C
L
O
U
D
V
S
.
P
R
I
V
A
T
E
D
C
11
12. Public Cloud
• developer productivity
• time to market
• smaller scale
• high-margin product
Private Datacenter
• operational costs
• steady state product
• extreme scale
• margins under pressure
P U B L I C C LO U D V S P R I VAT E D C
P
U
B
L
I
C
C
L
O
U
D
V
S
.
P
R
I
V
A
T
E
D
C
12
13. C LO U D S C A L E
S A A S
A RC H I T E C T U R E
WHAT DOES IT TAKE TO ARCHITECT CLOUD
SCALE SOFTWARE AS A SERVICE?
C
L
O
U
D
S
C
A
L
E
S
A
A
S
A
R
C
H
I
T
E
C
T
U
R
E
13
15. S C A L A B I L I T Y
• Horizontal scaling
• Distributed computing
• Redundancy and Fault Tolerance
• Elastic workloads
S
C
A
L
A
B
I
L
I
T
Y
15
16. SINGLE CAUSE OF FAILURE
(vs. Single Point of Failure)
S
I
N
G
L
E
C
A
U
S
E
O
F
F
A
I
L
U
R
E
16
17. S I N G L E C AU S E O F FA I LU R E
• DNS issue
• credentials rotation
• kernel update
• networking issue
• Infrastructure-level configuration change
S
I
N
G
L
E
C
A
U
S
E
O
F
F
A
I
L
U
R
E
17
Beware of ubiquitous things, which seemingly always work fine!
18. AVO I D I N G T H E P I T FA L L S
• avoid singletons at any cost*)
• always think of blast radius when any
component, service or piece of underlying
infrastructure fails
• pro tip: checkout out service mesh such as
ISTIO (https://istio.io/)
• allows us to operate multiple
interconnected K8S clusters
A
V
O
I
D
I
N
G
T
H
E
P
I
T
F
A
L
L
S
18
*) there can be only one!
19. C A PAC I T Y
P L A N N I N G
Why would I need to
do capacity planning in
a public cloud?
Is not it elastic by
design?
C
A
P
A
C
I
T
Y
P
L
A
N
N
I
N
G
19
20. C O S T S
• Gross Margin in SaaS
• Gross Margin = (Revenue – COGS)/Revenue
• COGS – Cost of Goods Sold
• What is COGS in SaaS?
• All costs needed to operate your SaaS offering.
• HW, SW, operations, support
• What is the benchmark Gross Margin in SaaS?
C
O
S
T
S
20
80%
21. C O S T S
• Gross Margin in SaaS
• Gross Margin = (Revenue – COGS)/Revenue
• COGS – Cost of Goods Sold
• What is COGS in SaaS?
• All costs needed to operate your SaaS offering.
• HW, SW, operations, support
• What is the benchmark Gross Margin in SaaS?
C
O
S
T
S
21
80%
COGS
$10
$2
22. C O S T S AV I N G S
STORIES FROM THE TRENCHES
C
O
S
T
S
A
V
I
N
G
S
22
23. L I N U X K E R N E L T U N I N G
Low-level Linux kernel settings
like huge pages and NUMA
options settings led to
35% - 40% performance boost
for the prevailing workloads
L
I
N
U
X
K
E
R
N
E
L
T
U
N
I
N
G
23
24. R E G U L A R E X P R E S S I O N S 1 0 1
Parsing input data at cloud
scale…
On multiple occasions we hit
performance issues with 3rd
party regex libraries in different
programming languages.
The improvement was > 10x.
R
E
G
U
L
A
R
E
X
P
R
E
S
S
I
O
N
S
1
0
1
24
25. E L A S T I C S C A L I N G W I T H S P OT
I N S TA N C E S
Use case:
• A stateful compute and memory
intensive workload driven by
incoming telemetry flow.
Solution:
• Fleet of inexpensive spot instances
coupled with ML-based capacity
predictor.
E
L
A
S
T
I
C
S
C
A
L
I
N
G
W
I
T
H
S
P
O
T
I
N
S
T
A
N
C
E
S
25
26. S L A S
• SLAs – Service Level Agreements
• Uptime, Latency, Throughput
• Recovery Time/Point Objectives (RTO/RPO)
• Requires supporting infrastructure
• Monitoring – metrics, dashboards
• Logging – instrumentation,
troubleshooting, auditing
• Alerting – 24/7 reliable notification with
duty rotation and escalation paths
S
L
A
S
26
27. S E C U R I T Y & C O M P L I A N C E
Security Compliance
Threat Modeling SOC 2, ISO 27001, HIPAA, GDPR, Accessibility
Vulnerability Management SOC – Security and Organization Controls
Access Controls SOC 2 - Security, Availability, Processing
Integrity, Confidentiality, or Privacy
Supply chain attack prevention Objectives -> Controls -> Assessments
Security Monitoring PII protection
S
E
C
U
R
I
T
Y
&
C
O
M
P
L
I
A
N
C
E
27
~30% of R&D effort
28. ( D E V E LO P E R ) P RO D U C T I V I T Y
• Continuous Integration / Continuous
Delivery pipelines (CI/CD)
• Development, Testing and Release
Processes
• Quality Assurance, Cycle Time
• Making sure the above scale to many
R&D teams – avoiding bottlenecks.
(
D
E
V
E
L
O
P
E
R
)
P
R
O
D
U
C
T
I
V
I
T
Y
28
30. W H AT N E X T ?
NOW, WHEN I AM DONE ARCHITECTING AND
BUILDING MY CLOUD SCALE SAAS OFFERING?
P
R
E
S
E
N
T
A
T
I
O
N
T
I
T
L
E
30
31. E VO LV I N G : P U B L I C C LO U D
• Periodically review and benchmark new
instance types.
• Review, evaluate and benchmark new
services provided by the vendor.
• Issue recommendations and develop
blueprints for R&D teams.
• Plan migration.
• Rinse & Repeat.
E
V
O
L
V
I
N
G
:
P
U
B
L
I
C
C
L
O
U
D
31
32. E VO LV I N G : P R I VAT E D C
• Periodically perform capacity planning and
maintain HW order book based on up-to-date
predictions.
• Periodically review and benchmark new HW
generations. Negotiate prices with the
vendor(s).
• Issue recommendations and develop
blueprints for R&D teams.
• Plan migration.
• Rinse & Repeat.
E
V
O
L
V
I
N
G
:
P
U
B
L
I
C
C
L
O
U
D
32
33. S TA R T M E U P !
PRODUCTION WORKLOADS IN PUBLIC CLOUD
NEWCOMER GUIDE 2022 EDITION
S
T
A
R
T
M
E
U
P
!
33
34. W H E R E TO S TA R T ?
W
H
E
R
E
T
O
S
T
A
R
T
?
34
https://googlecloudcheatsheet.withgoogle.com/
35. P U B L I C C LO U D S I N 2 0 2 2
• All three leading public cloud providers
(AWS, Azure, GCP) exhibit increasing
complexity.
• It is relatively easy to spin up proof of
concepts or play with technologies.
• But launching production workloads is a
whole different story. It is not just about
getting started, but also about doing
things right.
P
U
B
L
I
C
C
L
O
U
D
S
I
N
2
0
2
2
35
36. S O F T W A R E A R C H I T E C T U R E
I N T H E A G E O F C L O U D
C O M P U T I N G
The cloud-era software architect needs to accommodate not only
functional requirements and customer-defined throughput and
performance requirements at cloud scale, but also a large set of non-
functional requirements related to cyber security, compliance, developer
productivity, and most notably also the financial/cost characteristics,
which at cloud scale can make or break a software-as-a-service
company. The role of a software architect thus became interdisciplinary
by nature and its mental model needs to accommodate all the above
non-functional aspects all the while maintain full picture across all
layers of the software stack from user-facing features all the way down
to operating system and underlying hardware platform levels.
Interdisciplinary “decathlon”
S
U
M
M
A
R
Y
36
37. O U T LO O K
S IM P LIFICATION
There are vast opportunities for simplification
and codification of best practices as the cloud
computing industry matures.
HYBRID CLOUD
Hybrid cloud deployments will become more
prevalent to protect margins.
O
U
T
L
O
O
K
37
39. T H A N K YO U
Jaroslav Gergic
@jgergic jaroslavgergic
https://cognitive.cisco.com/
https://conf.researchr.org/home/ecsa-2022
T
H
A
N
K
Y
O
U
&
C
O
N
T
A
C
T
S
39