SlideShare a Scribd company logo
Architecting Distributed
Cloud Applications
Jeffrey Richter Software Architect
(Azure)
Microsoft
Jeffrey Richter: Microsoft Azure Software Architect,
Wintellect Co-Founder, & Author
JeffreyR@Microsoft.com
www.linkedin.com/in/JeffRichter
@JeffRichter
 We must do things differently when building
cost-effective, failure-resilient solutions
Why cloud apps?
Past Present
Clients Enterprise/Intranet Public/Internet
Demand Stable (small) Dynamic (small  massive)
Datacenter Single tenant Multi-tenant
Operations People (expensive) Automation (cheap)
Scale Up via few reliable (expensive) PCs Out via lots of (cheap) commodity PCs
Failure Unlikely but possible Very likely
Machine loss Catastrophic Normal (no big deal)
Examples Past Present
Exceptions Catch, swallow & keep running Crash & restart
Communication In order
Exactly once
Out of order
Clients must retry & servers must be idempotent
 Some reasons why a service instance may fail (stop)
 Developer: Unhandled exception
 DevOps: Scaling the number of service instances down
 DevOps: Updating service code to a new version
 Orchestrator: Moving service code from one machine to another
 Force majeure: Hardware failure (power supply, fans [overheating], hard disk,
network controller, router, bad network cable, etc.)
 Force majeure: Data center outages (natural disasters, attacks)
 Since failure is inevitable & unavoidable, embrace it
 Architect assuming failures will happen
 Operate services using infrastructure that avoids single points of failure
 Run multiple instances of services, replicate data, etc.
Cloud computing is all about embracing failure
Region
Load
Balancer
 Infrastructure/Platform/Containers/Functions as a Service
 Manage lifecycle, health, scaling, & upgrades for PC/VM, networking
& service code
Orchestrators
PC/VM
PC/VM
PC/VM
PC/VM
PC/VM
PC/VM
E-Commerce Application
Load
Balancer
Applications consist of many (micro)services
Inventory #1
Inventory #2
Orders #1
Orders #2
Orders #3
Orders #4
Web Site #1
Web Site #2
Web Site #3
Each service solves a domain-
specific problem & has exclusive
access to its own data store
Thumbnail
Service
Thumbnail
ServicePhoto Share
Service
Photo Share
Service
Photo Share
Service
4 reasons to split a monolith into microservices
Photo Share
Service
Thumbnail
Service
Photo Share Service
Thumbnail
SharedLib-v7
Photo Share
Service
SharedLib-v1
Photo Share
Service
node.js
Thumbnail
Service
.NET
Photo Share
Service (V1) Thumbnail
Service
V1
Thumbnail
Service
SharedLib-v7
Thumbnail
Service
V2
SharedLib-v1
Video Share
Service (V1)
Backward compatibility
must be maintained
 Myth: Microservices offer small,
easy-to-understand/manage code bases
 A monolith can use OOP & libraries (requires developer discipline)
 Library changes cause build failures (not runtime failures)
 Myth: A failing service doesn’t impact other services
 Many services require dependencies be fully functioning
 Hard to write/test code that gracefully recovers when dependency fails
 We run multiple service instances so there is no such thing as “failure”
 A monolith is up/down completely; no recovery code
 Infrastructure restarts failed instances keeping them up
Microservice architecture benefits myths
Composing SLAs for dependent services
Service-A Service-B Service-C Service-D
99.990% (264s/month)
99.998% ( 52s/month)
99.985% (396s/month)
99.997% ( 78s/month)
99.980% (528s/month)
99.996% (104s/month)
99.995% (132s/month)
99.999% ( 26s/month)
What about the network’s SLA?
http://12factor.net
12-Factor Services (Apps)
1. Single root repo; don’t share code with another service
2. Deploy dependent libs with service
3. No config in code; read from environment vars
4. Handle unresponsive service dependencies robustly
5. Strictly separate build, release, & run steps
 Build: Builds a version of the code repo & gathers dependencies
 Release: Combines build with config  ReleaseId (immutable)
 Run: Runs service in execution environment
12-factor services (1-5)
6. Service is 1+ stateless processes & shares nothing
7. Service listens on ports; avoid using (web) hosts
8. Use processes for isolation; multiple for concurrency
9. Processes can crash/be killed quickly & start fast
10. Keep dev, staging, & prod environments similar
11. Log to stdout (dev=console; prod=file & archive it)
12. Deploy & run admin tasks (scripts) as processes
12-factor services (6-12)
Networking Communication
8 fallacies of distributed computing
http://www.rgoarchitects.com/Files/fallacies.pdf
Fallacy Effect
The network is reliable App needs error handling/retry
Latency is zero App must restrict its traffic
Bandwidth is infinite App must restrict its traffic
The network is secure App must secure its data/authenticate servers
Topology doesn't change Changes affect latency & bandwidth
There is one administrator Changes affect ability to reach destination
Transport cost is zero Costs must be budgeted
The network is homogeneous Affects reliability, latency, & bandwidth
 We run multiple instances of a service
 For service failure/recovery & scale up/down
 So, instances’ endpoints dynamically change over the service’s lifetime
 Ideally, we’d like to abstract this from client code
 Each client wants a single stable endpoint as the face of the
dynamically-changing service instance endpoints
 Typically, this is accomplished via a reverse proxy
 NOTE: Every request goes through the RP; causes an extra network hop
 We’re losing some performance to gain a lot of benefits
 Client uses DNS (at well-known static endpoint) to get RP’s stable endpoint
 DNS endpoints are usually cached & re-resolved infrequently
Service high-availability & scalability
Forward & reverse proxies
Client-1
Client-2
(Forward)
Proxy
Server-1
Reverse
Proxy
Server-2
Cluster DNS & service reverse proxy
Load
Balancer
Web Site #1
Web Site #2
Web Site #3
Inventory #1
Inventory #3
Inventory #2
Orders #1
Orders #2
⚠ WS #1 could fail
before I #3 replies
⚠
 Comparing an in-process call to a network request
 Performance: Worse, increases network congestion, unpredictable
 Unreliable: Requires retry loops with exponential backup/circuit breakers
 Server code must be idempotent
 Security: Requires authentication, authorization, & encryption
 Diagnostics: network issues, perf counters/events/logs, causality/call stacks
Turning a monolith into a microservice
IntelliSense, refactoring & compile-time type-safety)
Thumbnail
Service
Thumbnail
ServicePhoto Share
Service
Photo Share
Service
Photo Share
Service
4 reasons to split a monolith into microservices
Photo Share
Service
Thumbnail
Service
Photo Share Service
Thumbnail
SharedLib-v7
Photo Share
Service
SharedLib-v1
Photo Share
Service
node.js
Thumbnail
Service
.NET
Photo Share
Service (V1) Thumbnail
Service
V1
Thumbnail
Service
SharedLib-v7
Thumbnail
Service
V2
SharedLib-v1
Video Share
Service (V1)
Backward compatibility
must be maintained
 Define explicit, formal cross-language API/data contracts
 “Contracts” defined via code do not work; do not do this
 Ex: DateTime can be null in Java but not in .NET
 Use cross-language data transfer formats
 Ex: JSON/XML, Avro, Protocol Buffers, FlatBuffers, Thrift, Bond, etc.
 Consider embedding a version number in the data structure
 Optional: (De)serialize data into language-specific types
 Beware of RAM/CPU costs with this; keep types “disposable” (not contracts)
Defining network API contracts
 Technologies try to map method call  network request
 Examples: RPC, RMI, CORBA, DCOM, WCF, etc.
 These frequently don’t work well due to
 Network fallacies (lack of retry/circuit breaker)
 Language-specific data type conversions (ex: dates, times, durations)
 Versioning: Which version to call on the server?
 Authentication: expiring tokens
 Logging: Log request parameters/headers/payload, reply headers/payload?
Beware leaky RPC-like abstractions
http://ReactiveManifesto.org/
Messaging Communication
 The request/reply pattern is frequently not the best
 Client sends to server but selected server may be busy; other server may be idle
 Client may crash/scale down/reconfigure while waiting for server’s reply
 So, consider messaging communication instead
 Resource efficient
 Client doesn’t wait for server reply (no blocked threads/long-lived locks)
 Idle consumers pull work vs busy consumer pushed more work
 Consumers don’t need listening endpoints; producers talk to queue service
 Resilient: Producer/consumer instances can come, go, and move at will
 If consumer fails, another consumer processes the message (1+ delivery, not ordered)
 Consumers/producers can be offline without message loss
 Elastic: Use queue length to determine need to scale up/down
Messaging communication
Messaging with queues
Load
Balancer
WebSite #1
WebSite #2
WebSite #3
Service-A
#1
Service-A
#3
Service-A
#2
Service-B
#1
Service-B
#2
🛈 Request/reply isn’t required; Service-B #1
could post to Q-WS1; not to Q-A
🛈 All Service-A instances could
go down; but not WebSite #1
Storing State
 Building reliable & scalable services that manage state is
substantially harder than building stateless services
 Due to data size/speed, partitioning, replication, consistency, disaster recovery,
backup/restore, costs, administration, security, etc.
 Because of this, most devs do not build their own stateful
services; they use a robust/hardened service instead
 When selecting a stateful service, you must fully understand your service’s
requirements and understand the trade-offs when comparing available services
 It is common to use multiple stateful services within a single solution
Stateful service considerations
 The most frequently-used stateful service
 Used for documents, images, audio, video, etc.
 Fast & inexpensive: GB/month storage, I/O requests, and egress bytes
 All cloud providers offer a file storage service
 No lock-in: It’s relatively easy to move files across providers if you avoid
provider-specific features
 File storage services offer public (read-only) access
 Send clients file URLs for them to access; reduces load on your other services!
 Use a Content Delivery Network (CDN) to improve performance even more
Files (blobs & objects) storage services
 Store many small related entities
 Common: query, joins, indexing, sorting, stored proc, viewers/editors, etc.
 As data increases, relational DBs (SQL) require expensive
hardware to address size & performance
 ACID goal: give impression that 1 thing at a time is happening no matter how
complex the work (looks like a single PC)
 NonRel-DBs (noSQL) spread data across many cheap PCs
 For customer preferences, shopping carts, product catalogs, session state, etc.
 Con: Can’t easily access all data (no sort/join); many are eventually consistency
 Pro: Cheaper & have flexible data models (entity ≈ in-memory object)
 Rel-DBs & NonRel-DBs will co-exist for years to come
DB storage services
Non-Relational
Database
Relational DB vs non-relational DB:
speed, size, simplicity, & price
Service #1
Relational
Database
(1 partition)
Service #2
Service #3
Service #4
Service #5
Service #1
Service #2
Service #3
Service #4
Service #5
Partition #1
Partition #2
Partition #3
Simple CRUD
Joins, sorts,
etc.
Complex CRUD,
joins, sorts,
stored procs,
X-table txns
 Data is partitioned for size, speed, or both
 Architecting a service’s partitions is often the hardest part of designing a service
 X-partition ops require network hops & different/distributed transactions
 How many partitions depends on how much data you’ll have in the future
 And how you intend to access that data
 Each partition’s data is replicated for reliability
 Replicating state increases chance of data surviving 1+ simultaneous failures
 But, more replicas increase cost & network latency to sync replicas
 For some scenarios, data loss is OK
 Replicas go across fault/update domains; avoids single point of failure
Data partitioning & replicas
 CAP theorem states
 When facing a network Partition (replicas can’t talk to each other):
 You can maintain Consistency by not allowing writes (loss of availability)
 You can maintain Availability by not replicating data (loss of consistency)
 Strong: all replicas see same data at same time
 Done via distributed transactions/locks across replicas communication
 Weak: replicas see different data at a moment in time but
eventually see the same data
 There are many factors pushing us towards weak consistency
 Txs rarely work across DBs & each microservice selects its own DB
 Caches improve perf by copying data which is out of sync with the truth
 CQRS pattern: writes data asynchronously but reads data synchronously
Data consistency
Load
Balancer
A cache can improve performance
but introduces stale (inconsistent) data
Stateful
Data
Other
Internal
Tiers
?
Stateless
Compute
Cache
Stateless
Web
 Concurrency control
 Pessimistic: accessor locks 1+ entries (blocking other accessors), modifies entries,
& then unlock them (unblocking another accessor)
 Bad scalability (1 accessor at a time) & what if locker fails to release the lock?
 Optimistic: accessor gets 1+ entries/version IDs, modifies entries if IDs haven’t
changed (contains the read value)
 Data schema versioning (without downtime)
 Backup & Restore Needed due to app bug/hacker
 Recovery Point Objective(RPO): Max data (minutes) business can afford to lose
 Recovery Time Objective(RTO): Max downtime business can afford to restore data
 NOTE: smaller RPO/RTO increases costs
Other DB concerns
@JeffRichter
Jeffrey Richter Software Architect
(Azure)
Microsoft
Вопросы?
www.linkedin.com/in/JeffRichter
JeffreyR@Microsoft.com

More Related Content

What's hot

DevOps Toolbox: Infrastructure as code
DevOps Toolbox: Infrastructure as codeDevOps Toolbox: Infrastructure as code
DevOps Toolbox: Infrastructure as code
sriram_rajan
 
CI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the TimeCI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the Time
Amazon Web Services
 
Performance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & WebdriverPerformance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & Webdriver
BlazeMeter
 
Infrastructure as Code with Ansible
Infrastructure as Code with AnsibleInfrastructure as Code with Ansible
Infrastructure as Code with Ansible
Daniel Bezerra
 
DevOps On AWS - Deep Dive on Continuous Delivery
DevOps On AWS - Deep Dive on Continuous DeliveryDevOps On AWS - Deep Dive on Continuous Delivery
DevOps On AWS - Deep Dive on Continuous Delivery
Mikhail Prudnikov
 
Containerization: The DevOps Revolution
Containerization: The DevOps Revolution Containerization: The DevOps Revolution
Containerization: The DevOps Revolution
SoftServe
 
Database deployments - dotnetsheff
Database deployments - dotnetsheffDatabase deployments - dotnetsheff
Database deployments - dotnetsheff
Giulio Vian
 
How to build a SaaS solution in 60 days
How to build a SaaS solution in 60 daysHow to build a SaaS solution in 60 days
How to build a SaaS solution in 60 days
Brett McLain
 
Securing Containers From Day One | null Ahmedabad Meetup
Securing Containers From Day One | null Ahmedabad MeetupSecuring Containers From Day One | null Ahmedabad Meetup
Securing Containers From Day One | null Ahmedabad Meetup
Kumar Ashwin
 
PaaS with Java
PaaS with JavaPaaS with Java
PaaS with Java
Eberhard Wolff
 
Game of Streams: How to Tame and Get the Most from Your Messaging Platforms
Game of Streams: How to Tame and Get the Most from Your Messaging PlatformsGame of Streams: How to Tame and Get the Most from Your Messaging Platforms
Game of Streams: How to Tame and Get the Most from Your Messaging Platforms
VMware Tanzu
 
Developing Resilient Cloud Native Apps with Spring Cloud
Developing Resilient Cloud Native Apps with Spring CloudDeveloping Resilient Cloud Native Apps with Spring Cloud
Developing Resilient Cloud Native Apps with Spring Cloud
Dustin Ruehle
 
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECSWeaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks
 
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
Vadym Kazulkin
 
Adopting Java for the Serverless world at Serverless Meetup Singapore
Adopting Java for the Serverless world at Serverless Meetup SingaporeAdopting Java for the Serverless world at Serverless Meetup Singapore
Adopting Java for the Serverless world at Serverless Meetup Singapore
Vadym Kazulkin
 
Contract-based Testing Approach as a Tool for Shift Lef
Contract-based Testing Approach as a Tool for Shift LefContract-based Testing Approach as a Tool for Shift Lef
Contract-based Testing Approach as a Tool for Shift Lef
Katherine Golovinova
 
Building a PaaS with Docker and AWS
Building a PaaS with Docker and AWSBuilding a PaaS with Docker and AWS
Building a PaaS with Docker and AWS
Amazon Web Services
 
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf
 
Can I Contain This?
Can I Contain This?Can I Contain This?
Can I Contain This?
Eficode
 
Getting Started With Docker on AWS
Getting Started With Docker on AWSGetting Started With Docker on AWS
Getting Started With Docker on AWS
Mikhail Prudnikov
 

What's hot (20)

DevOps Toolbox: Infrastructure as code
DevOps Toolbox: Infrastructure as codeDevOps Toolbox: Infrastructure as code
DevOps Toolbox: Infrastructure as code
 
CI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the TimeCI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the Time
 
Performance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & WebdriverPerformance Testing using Real Browsers with JMeter & Webdriver
Performance Testing using Real Browsers with JMeter & Webdriver
 
Infrastructure as Code with Ansible
Infrastructure as Code with AnsibleInfrastructure as Code with Ansible
Infrastructure as Code with Ansible
 
DevOps On AWS - Deep Dive on Continuous Delivery
DevOps On AWS - Deep Dive on Continuous DeliveryDevOps On AWS - Deep Dive on Continuous Delivery
DevOps On AWS - Deep Dive on Continuous Delivery
 
Containerization: The DevOps Revolution
Containerization: The DevOps Revolution Containerization: The DevOps Revolution
Containerization: The DevOps Revolution
 
Database deployments - dotnetsheff
Database deployments - dotnetsheffDatabase deployments - dotnetsheff
Database deployments - dotnetsheff
 
How to build a SaaS solution in 60 days
How to build a SaaS solution in 60 daysHow to build a SaaS solution in 60 days
How to build a SaaS solution in 60 days
 
Securing Containers From Day One | null Ahmedabad Meetup
Securing Containers From Day One | null Ahmedabad MeetupSecuring Containers From Day One | null Ahmedabad Meetup
Securing Containers From Day One | null Ahmedabad Meetup
 
PaaS with Java
PaaS with JavaPaaS with Java
PaaS with Java
 
Game of Streams: How to Tame and Get the Most from Your Messaging Platforms
Game of Streams: How to Tame and Get the Most from Your Messaging PlatformsGame of Streams: How to Tame and Get the Most from Your Messaging Platforms
Game of Streams: How to Tame and Get the Most from Your Messaging Platforms
 
Developing Resilient Cloud Native Apps with Spring Cloud
Developing Resilient Cloud Native Apps with Spring CloudDeveloping Resilient Cloud Native Apps with Spring Cloud
Developing Resilient Cloud Native Apps with Spring Cloud
 
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECSWeaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
Weaveworks at AWS re:Invent 2016: Operations Management with Amazon ECS
 
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
 
Adopting Java for the Serverless world at Serverless Meetup Singapore
Adopting Java for the Serverless world at Serverless Meetup SingaporeAdopting Java for the Serverless world at Serverless Meetup Singapore
Adopting Java for the Serverless world at Serverless Meetup Singapore
 
Contract-based Testing Approach as a Tool for Shift Lef
Contract-based Testing Approach as a Tool for Shift LefContract-based Testing Approach as a Tool for Shift Lef
Contract-based Testing Approach as a Tool for Shift Lef
 
Building a PaaS with Docker and AWS
Building a PaaS with Docker and AWSBuilding a PaaS with Docker and AWS
Building a PaaS with Docker and AWS
 
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release Pipelines
 
Can I Contain This?
Can I Contain This?Can I Contain This?
Can I Contain This?
 
Getting Started With Docker on AWS
Getting Started With Docker on AWSGetting Started With Docker on AWS
Getting Started With Docker on AWS
 

Similar to Jeffrey Richter

Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
Fwdays
 
Docebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessDocebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverless
AWS User Group Italy
 
Server Farms and XML Web Services
Server Farms and XML Web ServicesServer Farms and XML Web Services
Server Farms and XML Web Services
Jorgen Thelin
 
Novell Service Desk overview
Novell Service Desk overviewNovell Service Desk overview
Novell Service Desk overview
Jon Giffard
 
Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud
NetApp
 
Web services
Web servicesWeb services
Web services
Peter R. Egli
 
Microservice 微服務
Microservice 微服務Microservice 微服務
Microservice 微服務
YOU SHENG CHEN
 
Mciro Services & Zookeeper
Mciro Services & ZookeeperMciro Services & Zookeeper
Mciro Services & Zookeeper
Deepak Singhal
 
Service Virtualization 101
Service Virtualization 101Service Virtualization 101
Service Virtualization 101
Stefana Muller
 
Taw opening session
Taw opening sessionTaw opening session
Taw opening session
Michel Burger
 
High volume real time contiguous etl and audit
High volume real time contiguous etl and auditHigh volume real time contiguous etl and audit
High volume real time contiguous etl and audit
Remus Rusanu
 
Data stream processing and micro service architecture
Data stream processing and micro service architectureData stream processing and micro service architecture
Data stream processing and micro service architecture
Vyacheslav Benedichuk
 
Test expo cloud-enabled testing services (wide)_v1.0
Test expo cloud-enabled testing services (wide)_v1.0Test expo cloud-enabled testing services (wide)_v1.0
Test expo cloud-enabled testing services (wide)_v1.0
Ewald Roodenrijs
 
08 hopex v next service fabric
08 hopex v next   service fabric08 hopex v next   service fabric
08 hopex v next service fabric
Michel Bruchet
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
Faren faren
 
Arsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry SusantoArsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry Susanto
DicodingEvent
 
SOA patterns
SOA patterns SOA patterns
SOA patterns
Arnon Rotem-Gal-Oz
 
Adopting the Cloud
Adopting the CloudAdopting the Cloud
Adopting the Cloud
Tapio Rautonen
 

Similar to Jeffrey Richter (20)

Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
 
Docebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessDocebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverless
 
Server Farms and XML Web Services
Server Farms and XML Web ServicesServer Farms and XML Web Services
Server Farms and XML Web Services
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
Novell Service Desk overview
Novell Service Desk overviewNovell Service Desk overview
Novell Service Desk overview
 
Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud Databarracks & SolidFire - How to run tier 1 applications in the cloud
Databarracks & SolidFire - How to run tier 1 applications in the cloud
 
Azure migration
Azure migrationAzure migration
Azure migration
 
Web services
Web servicesWeb services
Web services
 
Microservice 微服務
Microservice 微服務Microservice 微服務
Microservice 微服務
 
Mciro Services & Zookeeper
Mciro Services & ZookeeperMciro Services & Zookeeper
Mciro Services & Zookeeper
 
Service Virtualization 101
Service Virtualization 101Service Virtualization 101
Service Virtualization 101
 
Taw opening session
Taw opening sessionTaw opening session
Taw opening session
 
High volume real time contiguous etl and audit
High volume real time contiguous etl and auditHigh volume real time contiguous etl and audit
High volume real time contiguous etl and audit
 
Data stream processing and micro service architecture
Data stream processing and micro service architectureData stream processing and micro service architecture
Data stream processing and micro service architecture
 
Test expo cloud-enabled testing services (wide)_v1.0
Test expo cloud-enabled testing services (wide)_v1.0Test expo cloud-enabled testing services (wide)_v1.0
Test expo cloud-enabled testing services (wide)_v1.0
 
08 hopex v next service fabric
08 hopex v next   service fabric08 hopex v next   service fabric
08 hopex v next service fabric
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
 
Arsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry SusantoArsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry Susanto
 
SOA patterns
SOA patterns SOA patterns
SOA patterns
 
Adopting the Cloud
Adopting the CloudAdopting the Cloud
Adopting the Cloud
 

More from CodeFest

Alexander Graebe
Alexander GraebeAlexander Graebe
Alexander Graebe
CodeFest
 
Никита Прокопов
Никита ПрокоповНикита Прокопов
Никита Прокопов
CodeFest
 
Денис Баталов
Денис БаталовДенис Баталов
Денис Баталов
CodeFest
 
Елена Гальцина
Елена ГальцинаЕлена Гальцина
Елена Гальцина
CodeFest
 
Александр Калашников
Александр КалашниковАлександр Калашников
Александр Калашников
CodeFest
 
Ирина Иванова
Ирина ИвановаИрина Иванова
Ирина Иванова
CodeFest
 
Marko Berković
Marko BerkovićMarko Berković
Marko Berković
CodeFest
 
Денис Кортунов
Денис КортуновДенис Кортунов
Денис Кортунов
CodeFest
 
Александр Зимин
Александр ЗиминАлександр Зимин
Александр Зимин
CodeFest
 
Сергей Крапивенский
Сергей КрапивенскийСергей Крапивенский
Сергей Крапивенский
CodeFest
 
Сергей Игнатов
Сергей ИгнатовСергей Игнатов
Сергей Игнатов
CodeFest
 
Николай Крапивный
Николай КрапивныйНиколай Крапивный
Николай Крапивный
CodeFest
 
Alexander Graebe
Alexander GraebeAlexander Graebe
Alexander Graebe
CodeFest
 
Вадим Смирнов
Вадим СмирновВадим Смирнов
Вадим Смирнов
CodeFest
 
Константин Осипов
Константин ОсиповКонстантин Осипов
Константин Осипов
CodeFest
 
Максим Пугачев
Максим ПугачевМаксим Пугачев
Максим Пугачев
CodeFest
 
Rene Groeschke
Rene GroeschkeRene Groeschke
Rene Groeschke
CodeFest
 
Иван Бондаренко
Иван БондаренкоИван Бондаренко
Иван Бондаренко
CodeFest
 
Алексей Акулович
Алексей АкуловичАлексей Акулович
Алексей Акулович
CodeFest
 
Артем Титаренко
Артем ТитаренкоАртем Титаренко
Артем Титаренко
CodeFest
 

More from CodeFest (20)

Alexander Graebe
Alexander GraebeAlexander Graebe
Alexander Graebe
 
Никита Прокопов
Никита ПрокоповНикита Прокопов
Никита Прокопов
 
Денис Баталов
Денис БаталовДенис Баталов
Денис Баталов
 
Елена Гальцина
Елена ГальцинаЕлена Гальцина
Елена Гальцина
 
Александр Калашников
Александр КалашниковАлександр Калашников
Александр Калашников
 
Ирина Иванова
Ирина ИвановаИрина Иванова
Ирина Иванова
 
Marko Berković
Marko BerkovićMarko Berković
Marko Berković
 
Денис Кортунов
Денис КортуновДенис Кортунов
Денис Кортунов
 
Александр Зимин
Александр ЗиминАлександр Зимин
Александр Зимин
 
Сергей Крапивенский
Сергей КрапивенскийСергей Крапивенский
Сергей Крапивенский
 
Сергей Игнатов
Сергей ИгнатовСергей Игнатов
Сергей Игнатов
 
Николай Крапивный
Николай КрапивныйНиколай Крапивный
Николай Крапивный
 
Alexander Graebe
Alexander GraebeAlexander Graebe
Alexander Graebe
 
Вадим Смирнов
Вадим СмирновВадим Смирнов
Вадим Смирнов
 
Константин Осипов
Константин ОсиповКонстантин Осипов
Константин Осипов
 
Максим Пугачев
Максим ПугачевМаксим Пугачев
Максим Пугачев
 
Rene Groeschke
Rene GroeschkeRene Groeschke
Rene Groeschke
 
Иван Бондаренко
Иван БондаренкоИван Бондаренко
Иван Бондаренко
 
Алексей Акулович
Алексей АкуловичАлексей Акулович
Алексей Акулович
 
Артем Титаренко
Артем ТитаренкоАртем Титаренко
Артем Титаренко
 

Recently uploaded

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 

Recently uploaded (20)

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 

Jeffrey Richter

  • 1. Architecting Distributed Cloud Applications Jeffrey Richter Software Architect (Azure) Microsoft
  • 2. Jeffrey Richter: Microsoft Azure Software Architect, Wintellect Co-Founder, & Author JeffreyR@Microsoft.com www.linkedin.com/in/JeffRichter @JeffRichter
  • 3.  We must do things differently when building cost-effective, failure-resilient solutions Why cloud apps? Past Present Clients Enterprise/Intranet Public/Internet Demand Stable (small) Dynamic (small  massive) Datacenter Single tenant Multi-tenant Operations People (expensive) Automation (cheap) Scale Up via few reliable (expensive) PCs Out via lots of (cheap) commodity PCs Failure Unlikely but possible Very likely Machine loss Catastrophic Normal (no big deal) Examples Past Present Exceptions Catch, swallow & keep running Crash & restart Communication In order Exactly once Out of order Clients must retry & servers must be idempotent
  • 4.  Some reasons why a service instance may fail (stop)  Developer: Unhandled exception  DevOps: Scaling the number of service instances down  DevOps: Updating service code to a new version  Orchestrator: Moving service code from one machine to another  Force majeure: Hardware failure (power supply, fans [overheating], hard disk, network controller, router, bad network cable, etc.)  Force majeure: Data center outages (natural disasters, attacks)  Since failure is inevitable & unavoidable, embrace it  Architect assuming failures will happen  Operate services using infrastructure that avoids single points of failure  Run multiple instances of services, replicate data, etc. Cloud computing is all about embracing failure
  • 5. Region Load Balancer  Infrastructure/Platform/Containers/Functions as a Service  Manage lifecycle, health, scaling, & upgrades for PC/VM, networking & service code Orchestrators PC/VM PC/VM PC/VM PC/VM PC/VM PC/VM
  • 6. E-Commerce Application Load Balancer Applications consist of many (micro)services Inventory #1 Inventory #2 Orders #1 Orders #2 Orders #3 Orders #4 Web Site #1 Web Site #2 Web Site #3 Each service solves a domain- specific problem & has exclusive access to its own data store
  • 7. Thumbnail Service Thumbnail ServicePhoto Share Service Photo Share Service Photo Share Service 4 reasons to split a monolith into microservices Photo Share Service Thumbnail Service Photo Share Service Thumbnail SharedLib-v7 Photo Share Service SharedLib-v1 Photo Share Service node.js Thumbnail Service .NET Photo Share Service (V1) Thumbnail Service V1 Thumbnail Service SharedLib-v7 Thumbnail Service V2 SharedLib-v1 Video Share Service (V1) Backward compatibility must be maintained
  • 8.  Myth: Microservices offer small, easy-to-understand/manage code bases  A monolith can use OOP & libraries (requires developer discipline)  Library changes cause build failures (not runtime failures)  Myth: A failing service doesn’t impact other services  Many services require dependencies be fully functioning  Hard to write/test code that gracefully recovers when dependency fails  We run multiple service instances so there is no such thing as “failure”  A monolith is up/down completely; no recovery code  Infrastructure restarts failed instances keeping them up Microservice architecture benefits myths
  • 9. Composing SLAs for dependent services Service-A Service-B Service-C Service-D 99.990% (264s/month) 99.998% ( 52s/month) 99.985% (396s/month) 99.997% ( 78s/month) 99.980% (528s/month) 99.996% (104s/month) 99.995% (132s/month) 99.999% ( 26s/month) What about the network’s SLA?
  • 11. 1. Single root repo; don’t share code with another service 2. Deploy dependent libs with service 3. No config in code; read from environment vars 4. Handle unresponsive service dependencies robustly 5. Strictly separate build, release, & run steps  Build: Builds a version of the code repo & gathers dependencies  Release: Combines build with config  ReleaseId (immutable)  Run: Runs service in execution environment 12-factor services (1-5)
  • 12. 6. Service is 1+ stateless processes & shares nothing 7. Service listens on ports; avoid using (web) hosts 8. Use processes for isolation; multiple for concurrency 9. Processes can crash/be killed quickly & start fast 10. Keep dev, staging, & prod environments similar 11. Log to stdout (dev=console; prod=file & archive it) 12. Deploy & run admin tasks (scripts) as processes 12-factor services (6-12)
  • 14. 8 fallacies of distributed computing http://www.rgoarchitects.com/Files/fallacies.pdf Fallacy Effect The network is reliable App needs error handling/retry Latency is zero App must restrict its traffic Bandwidth is infinite App must restrict its traffic The network is secure App must secure its data/authenticate servers Topology doesn't change Changes affect latency & bandwidth There is one administrator Changes affect ability to reach destination Transport cost is zero Costs must be budgeted The network is homogeneous Affects reliability, latency, & bandwidth
  • 15.  We run multiple instances of a service  For service failure/recovery & scale up/down  So, instances’ endpoints dynamically change over the service’s lifetime  Ideally, we’d like to abstract this from client code  Each client wants a single stable endpoint as the face of the dynamically-changing service instance endpoints  Typically, this is accomplished via a reverse proxy  NOTE: Every request goes through the RP; causes an extra network hop  We’re losing some performance to gain a lot of benefits  Client uses DNS (at well-known static endpoint) to get RP’s stable endpoint  DNS endpoints are usually cached & re-resolved infrequently Service high-availability & scalability
  • 16. Forward & reverse proxies Client-1 Client-2 (Forward) Proxy Server-1 Reverse Proxy Server-2
  • 17. Cluster DNS & service reverse proxy Load Balancer Web Site #1 Web Site #2 Web Site #3 Inventory #1 Inventory #3 Inventory #2 Orders #1 Orders #2 ⚠ WS #1 could fail before I #3 replies ⚠
  • 18.  Comparing an in-process call to a network request  Performance: Worse, increases network congestion, unpredictable  Unreliable: Requires retry loops with exponential backup/circuit breakers  Server code must be idempotent  Security: Requires authentication, authorization, & encryption  Diagnostics: network issues, perf counters/events/logs, causality/call stacks Turning a monolith into a microservice IntelliSense, refactoring & compile-time type-safety)
  • 19. Thumbnail Service Thumbnail ServicePhoto Share Service Photo Share Service Photo Share Service 4 reasons to split a monolith into microservices Photo Share Service Thumbnail Service Photo Share Service Thumbnail SharedLib-v7 Photo Share Service SharedLib-v1 Photo Share Service node.js Thumbnail Service .NET Photo Share Service (V1) Thumbnail Service V1 Thumbnail Service SharedLib-v7 Thumbnail Service V2 SharedLib-v1 Video Share Service (V1) Backward compatibility must be maintained
  • 20.  Define explicit, formal cross-language API/data contracts  “Contracts” defined via code do not work; do not do this  Ex: DateTime can be null in Java but not in .NET  Use cross-language data transfer formats  Ex: JSON/XML, Avro, Protocol Buffers, FlatBuffers, Thrift, Bond, etc.  Consider embedding a version number in the data structure  Optional: (De)serialize data into language-specific types  Beware of RAM/CPU costs with this; keep types “disposable” (not contracts) Defining network API contracts
  • 21.  Technologies try to map method call  network request  Examples: RPC, RMI, CORBA, DCOM, WCF, etc.  These frequently don’t work well due to  Network fallacies (lack of retry/circuit breaker)  Language-specific data type conversions (ex: dates, times, durations)  Versioning: Which version to call on the server?  Authentication: expiring tokens  Logging: Log request parameters/headers/payload, reply headers/payload? Beware leaky RPC-like abstractions
  • 23.  The request/reply pattern is frequently not the best  Client sends to server but selected server may be busy; other server may be idle  Client may crash/scale down/reconfigure while waiting for server’s reply  So, consider messaging communication instead  Resource efficient  Client doesn’t wait for server reply (no blocked threads/long-lived locks)  Idle consumers pull work vs busy consumer pushed more work  Consumers don’t need listening endpoints; producers talk to queue service  Resilient: Producer/consumer instances can come, go, and move at will  If consumer fails, another consumer processes the message (1+ delivery, not ordered)  Consumers/producers can be offline without message loss  Elastic: Use queue length to determine need to scale up/down Messaging communication
  • 24. Messaging with queues Load Balancer WebSite #1 WebSite #2 WebSite #3 Service-A #1 Service-A #3 Service-A #2 Service-B #1 Service-B #2 🛈 Request/reply isn’t required; Service-B #1 could post to Q-WS1; not to Q-A 🛈 All Service-A instances could go down; but not WebSite #1
  • 26.  Building reliable & scalable services that manage state is substantially harder than building stateless services  Due to data size/speed, partitioning, replication, consistency, disaster recovery, backup/restore, costs, administration, security, etc.  Because of this, most devs do not build their own stateful services; they use a robust/hardened service instead  When selecting a stateful service, you must fully understand your service’s requirements and understand the trade-offs when comparing available services  It is common to use multiple stateful services within a single solution Stateful service considerations
  • 27.  The most frequently-used stateful service  Used for documents, images, audio, video, etc.  Fast & inexpensive: GB/month storage, I/O requests, and egress bytes  All cloud providers offer a file storage service  No lock-in: It’s relatively easy to move files across providers if you avoid provider-specific features  File storage services offer public (read-only) access  Send clients file URLs for them to access; reduces load on your other services!  Use a Content Delivery Network (CDN) to improve performance even more Files (blobs & objects) storage services
  • 28.  Store many small related entities  Common: query, joins, indexing, sorting, stored proc, viewers/editors, etc.  As data increases, relational DBs (SQL) require expensive hardware to address size & performance  ACID goal: give impression that 1 thing at a time is happening no matter how complex the work (looks like a single PC)  NonRel-DBs (noSQL) spread data across many cheap PCs  For customer preferences, shopping carts, product catalogs, session state, etc.  Con: Can’t easily access all data (no sort/join); many are eventually consistency  Pro: Cheaper & have flexible data models (entity ≈ in-memory object)  Rel-DBs & NonRel-DBs will co-exist for years to come DB storage services
  • 29. Non-Relational Database Relational DB vs non-relational DB: speed, size, simplicity, & price Service #1 Relational Database (1 partition) Service #2 Service #3 Service #4 Service #5 Service #1 Service #2 Service #3 Service #4 Service #5 Partition #1 Partition #2 Partition #3 Simple CRUD Joins, sorts, etc. Complex CRUD, joins, sorts, stored procs, X-table txns
  • 30.  Data is partitioned for size, speed, or both  Architecting a service’s partitions is often the hardest part of designing a service  X-partition ops require network hops & different/distributed transactions  How many partitions depends on how much data you’ll have in the future  And how you intend to access that data  Each partition’s data is replicated for reliability  Replicating state increases chance of data surviving 1+ simultaneous failures  But, more replicas increase cost & network latency to sync replicas  For some scenarios, data loss is OK  Replicas go across fault/update domains; avoids single point of failure Data partitioning & replicas
  • 31.  CAP theorem states  When facing a network Partition (replicas can’t talk to each other):  You can maintain Consistency by not allowing writes (loss of availability)  You can maintain Availability by not replicating data (loss of consistency)  Strong: all replicas see same data at same time  Done via distributed transactions/locks across replicas communication  Weak: replicas see different data at a moment in time but eventually see the same data  There are many factors pushing us towards weak consistency  Txs rarely work across DBs & each microservice selects its own DB  Caches improve perf by copying data which is out of sync with the truth  CQRS pattern: writes data asynchronously but reads data synchronously Data consistency
  • 32. Load Balancer A cache can improve performance but introduces stale (inconsistent) data Stateful Data Other Internal Tiers ? Stateless Compute Cache Stateless Web
  • 33.  Concurrency control  Pessimistic: accessor locks 1+ entries (blocking other accessors), modifies entries, & then unlock them (unblocking another accessor)  Bad scalability (1 accessor at a time) & what if locker fails to release the lock?  Optimistic: accessor gets 1+ entries/version IDs, modifies entries if IDs haven’t changed (contains the read value)  Data schema versioning (without downtime)  Backup & Restore Needed due to app bug/hacker  Recovery Point Objective(RPO): Max data (minutes) business can afford to lose  Recovery Time Objective(RTO): Max downtime business can afford to restore data  NOTE: smaller RPO/RTO increases costs Other DB concerns
  • 34. @JeffRichter Jeffrey Richter Software Architect (Azure) Microsoft Вопросы? www.linkedin.com/in/JeffRichter JeffreyR@Microsoft.com

Editor's Notes

  1. Motivation Embracing failure When to split a monolith into Microservices and when not to Containers Networking Messaging Versioning & upgrades Managing state
  2. Jeffrey Richter: Software Architect, Microsoft Azure Jeffrey Richter is a Software Engineer on Microsoft’s Azure team. He is also a co-founder of Wintellect, a software consulting and training company. He has authored many videos available on WintellectNOW, has spoken at many industry conferences, and is the author of several best-selling Windows and .NET Framework programming books including Windows Runtime via C#, CLR via C#, 4th Edition, and Windows via C/C++, 5th Edition. Jeffrey has also been a contributing editor to MSDN Magazine where he authored many feature articles and columns.
  3. http://www.morganclaypool.com/doi/abs/10.2200/S00516ED2V01Y201306CAC024 http://channel9.msdn.com/Shows/ARCast.TV/ARCastTV-Pat-Helland-on-Memories-Guesses-and-Apologies Server/Service v/s Cloud Enterprise intranet services, stable demand, security is within intranet Cloud :: scale – plan for phenomenal growth, scale-up & scale-out – availability is business. Scale-up :: expensive, utilization can be low if the usage pattern varies, geo-aspect (replication). # of instances is lower. Scale-out :: economical, cost-effective, larger replication presence.
  4. http://xiard.wordpress.com/2008/11/06/pdc-2008-designing-for-scale-out/ http://channel9.msdn.com/pdc2008/BB54/ http://mschnlnine.vo.llnwd.net/d1/pdc08/PPTX/BB54.pptx Consistency Levels Strong: Changes visible now via synchronization Eventual: Changes occur in the future (address change for mail) Optimistic: Changes occur MAYBE in the future (stock ticker) Message Assurance Exactly once: no loss, no dups At least once: no loss, duplicates At most once: loss, no duplicates Best effort: loss & duplicates
  5. https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing
  6. Level 4: Hash-based traffic distribution (5-tuple: Src IP/Port to Dst IP/Port, protocol)   TCP/UDP support Port forwarding Idle timeout adjustment Client IP affinity (3-tuple: Src IP to Dst IP, protocol); all requests from a client go to same server TCP & HTTP health monitoring NAT & SNAT Level 7: Cookie persistence SSL/TLS Offload HTTPS monitoring URL or HTTP path LB WAF rules PaaS Scale out
  7. Beware: a new service instance could be assigned a previous instance’s endpoint This requires certificates or some ID/uniqueness so client knows which service it’s communicating with
  8. MS OneAPI document? https://github.com/Microsoft/api-guidelines/blob/master/Guidelines.md JMR MOVE: Method that could return an unbounded collection, must implement paging & may offer filtering/sorting
  9. Use simple data types and shallow object graphs JSON: null, true/false, number, string, array (ordered set of values), object (unordered set of name/value pairs) For richer values (guid, date, time, duration), use string & clearly document format
  10. Encourages scalable, resilient, versioning patterns (ex: CQRS & Event Sourcing) JMR: Election of a single role instance to perform a task
  11. http://msdn.microsoft.com/en-us/library/windowsazure/dd179338.aspx
  12. http://msdn.microsoft.com/en-us/library/windowsazure/dd179338.aspx
  13. https://msdn.microsoft.com/en-us/library/dn589800.aspx