2. Patterns of Data Management in
Microservices
Databas
e per
Service
Event-
driven
Architectur
e
CQRS
Command
Query
Responsibili
ty
Segregation
From Chris Richardson
2
Database
as a queue
(Application
events)
Event-
sourcing
Mining
Transact
ion Log
3. Agenda
• Monolith vs. Microservices
• Distributed data management in microservice
• Database per Service pattern
• Event-driven architecture
• Patterns to achieve Atomicity
• Database as a queue
• Transaction log mining
• Event sourcing
3
11. Problem
What’s the database architecture in a microservice
application?
11
How to deal with distributed data management problems?
12. Forces
• Services must be loosely coupled so that they
can be developed, deployed, and scaled
independently.
12
13. Patterns of Data Management in
Microservices
Event-
driven
Architectur
e
CQRS
Command
Query
Responsibili
ty
Segregation
From Chris Richardson
13
Database
as a queue
(Application
events)
Event-
sourcing
Mining
Transact
ion Log
Databas
e per
Service
16. Shared Database
Order Service Customer Service …Service
Order Table
order_total
Customer Table
credit_limit
…
…
Tight Coupling😭 Simple and ACID ✌️
16
17. Database per Service
Order Service Customer Service
Order Table
order_total
Customer Table
credit_limit
Loosely coupling 🤘🤘 but more complex 😧
17
18. DB Architecture in a
microservice application
• Microservice-based applications often use mixture of
SQL/NoSQL databases, which called polyglot
persistence
• Database per Service (Keep a service’s persistent data
private)
• Private-tables-per-service
• Schema-per-service
• Database-server-per-service
19
24. 2PC is not an option
• It’s a BLOCKING protocol
• According to CAP theorem, you need to choose
between availability and ACID-style consistency
• Availability is usually a better choice
• Moreover, most NoSQL databases do not support
2PC
• BUT maintaining data consistency across services and
databases is essential
26
25. Patterns of Data Management in
Microservices
Databas
e per
Service
CQRS
Command
Query
Responsibili
ty
Segregation
From Chris Richardson
27
Database
as a queue
(Application
events)
Event-
sourcing
Mining
Transact
ion Log
Event-
driven
Architectu
re
29. The Architecture provides
that
• Each service atomically updates its database and
publish events
• The message broker guarantees that events are
delivered at least once then you can implement
business transactions that span multiple services
• trading some consistency for availability can lead to
dramatic improvements in scalability (1)
• Eventual consistency instead of Strong
consistency
31
30. use events to maintain materialized views
that pre-join data owned by multiple
microservices.
could be NoSQL like
document-based
MongoDB
31. We’re not done yet…
What if Order service crashes
after updating the database
but before publishing the event
We might be loosing Atomicity
(Atomically update state and publish events)
32. lution to transactions that span multiple services and provide eve
Enables an application to maintain materialized views
ogramming model is more complex than when using ACID transac
plement compensating transaction to recover application-level fail
Materialized view is not up to date
Duplicate event should be detect
33. Patterns of Data Management in
Microservices
Databas
e per
Service
Event-
driven
Architectur
e
CQRS
Command
Query
Responsibili
ty
Segregation
From Chris Richardson
35
Event-
sourcing
Mining
Transact
ion Log
Database
as a queue
(Applicatio
n events)
34. Publishing Events Using Local Transactions
1. Create EVENT table for storing state of business entities
2. Local transaction for business logic and EVENT table
3. A separate application thread or process queries the EVENT table,
publish events to Message Broker, then use local transaction to mark
the events as published
35. Patterns of Data Management in
Microservices
Databas
e per
Service
Event-
driven
Architectur
e
CQRS
Command
Query
Responsibili
ty
Segregation
From Chris Richardson
37
Database
as a queue
(Application
events)
Event-
sourcing
Mining
Transac
tion Log
37. Patterns of Data Management in
Microservices
Databas
e per
Service
Event-
driven
Architectur
e
CQRS
Command
Query
Responsibili
ty
Segregation
From Chris Richardson
39
Database
as a queue
(Application
events)
Mining
Transact
ion Log
Event-
sourcin
g
38. • Different and unfamiliar style of programming
• The event store only directly supports the lookup of
business entities by primary key
• Should use CQRS to implement query
39. Summary
• In microservices, each service has its own private
datastore which might be different SQL/NoSQL
databases
• While the database architecture has significant benefits,
it also derive distributed data management challenge
• How to implement business transaction that maintain
consistency across multiple DBs
• If adopting Event-driven, how to atomically update
state and publish event
Editor's Notes
因為這個主題的東西實在涉足相當廣泛,今天我只就 data management 在 microservice中如何實作的部分詳細解說,不會對於microservice本身下太多結論
如果我們採用了 microservice,在 microservice 的世界裡,什麼是db architecture 的 best practice 呢?
有一些限制我們必須先講清楚:在micro service的世界裡,每個 service 都要能夠被獨立地開發、部署與擴充
首先是 Database Per Service
左邊是常見的 monolith 系統,所有business logic都寫在同一個service裡面,並且共用一個 database,且常見為RDBMS,因此可以輕易滿足ACID的特性,也就是transaction。 但是在microservice的世界裡,由於強調必須service間的獨立性,有一個pattern是keep a service’s persistent data private!! 因此每個service都對應不同的db,相對於monolith,這是很不一樣的架構,那在這前提下要如何做到 ACID 呢?
ACID
Atomicity – Changes are made atomically 對data的變更要嘛全做要嘛全不做
Consistency – 必須保證資料The state of the database is always consistent
Isolation – Even though transactions are executed concurrently it appears they are executed serially
Durability – Once a transaction has committed it is not undone
假設online store service分別有order, customer services, 在 shared db 架構下要達到 Transaction safe 相當容易,只要使用 db 支援的 transaction 機制便可達到ACID。但是不同service之前的邏輯是緊緊綁在一起的
以每個service操作不同的db來看,彼此的logic不再綁在一起,符合 microservice的精神
但是你可以想像得到無論在application layer還是 data layer 要處理的事情肯定變複雜了
在microservice的應用中,不同應用時常使用不同類型的 data store,graph 相關的可以用 neo4j, time series 相關的可以用Riak-TS, influxDB 因此在此架構下,應用並不會限定於某種特定DB,實作上也有更多的彈性
• Private-tables-per-service – each service owns a set of tables that must only be accessed by that service
• Schema-per-service – each service has a database schema that’s private to that service
• Database-server-per-service – each service has it’s own database server.
Private-tables-per-service and schema-per-service have the lowest overhead. Using a schema per service is appealing since it makes ownership clearer. Some high throughput services might need their own database server.
這是在 microservice裡面最難的問題之一:實作跨多個DB的 business transaction 的複雜度很高
Order service 無法直接 access Customer table,只能透過 Customer API 存取相關資訊,因此假設有一張新的訂單要產生,但是需要確認此訂單的客戶 credit limit 是否超過,在無法使用 local ACID transaction 的前提下,要如何確保 data consistency呢?換句話說,若是有多個訂單在同時 update credit limit 的話,就會出現 inconsistent 的情況
PHASE 1
coordinator送出 query to commit 給所有 cohort 並等待所有人回應
cohort 開始準備執行 transaction相關動作:lock resource, reserve resource, 寫log
每個 cohort 回應 coordinator,準備成功 vote YES, or vote NO
PHASE 2:
all YES -> coordinator send “commit” 指令; cohort commit 並且回應完成
not all YES -> send “rollback”; cohort release resources 並且回應完成
coordinator commit global transaction
postgresql: PREPARE TRANSACTIONhttps://www.postgresql.org/docs/current/static/sql-prepare-transaction.html
如果不使用2PC的演算法去解決data consistency,要怎麼解決呢?
在 event-driven 架構中,會多一個 message broker,而他是實現 data consistency的關鍵角色
order 收到指令後 insert Order table並且將此order state標註為pending,接著 publish “OrderCreated” event,
Order service 收到後會 update order state 為 open 完成此流程
Weak (弱一致性):當你寫入一個新值後,讀操作在數據副本上可能讀出來,也可能讀不出來。比如:某些 cache 系統。
Eventually (最終一致性):當你寫入一個新值後,有可能讀不出來,但在某個時間窗口之後保證最終能讀出來。比如: DNS ,電子郵件、 Amazon S3 , Google 搜索引擎這樣的系統。
Strong (強一致性):新的數據一旦寫入,在任意副本任意時刻都能讀到新值。比如:文件系統, RDBMS , Azure Table 都是強一致性的。
還可以利用 event 實作一個for customer&order 查詢的 service
這個 service 會 query 一個利用order/customer event 所 join 出來且不斷更新 materialized view
update service entity state 跟 publish event 沒有維持 atomic
當 credit check fail 時可能需要 cancel order 等application需要自行處理inconsistent data
或是materialized view的資料是不同步的
甚至必須detect or ignore duplication event -> idempotent receiver
create 一個專門儲存entity state的EVENT table 在local db
application在實作時必須同時對business logic操作及 EVENT table 操作包local transaction, 此時的event state會是 NEW 的狀態
獨立一隻process或thread去 polling EVENT table 內的 state list 並且以 transaction 的方式把對應 EVENT mark 成 published
如此一來即便是order service掛掉了,create order 跟 publish event 都會保證是一起進行
Order service update ORDER table
然後另外起一個 Transaction log miner process 不斷 tail log 來產生 event
問題:要如何知道process到哪?不同DB的log format不一樣 DB版本不同可能也不同
原本order service是存著 order current state, 完全不同的是,改為存 order 一系列的 state-changing event
然後 application 再透過 replay 這些 event state 去重建 order 的 current state
這完全是不同的設計方式,此時 application 只需要煩惱將 event 存進 event store
而 event store 在這裡扮演的角色有點像是前面提到的 msg broker負責event的pub/sub,並且提供add / get event 的API
可以想像一但採用了event-sourcing 的架構後,實作的方式會大不相同,原本下 query 的方式也會改變,複雜度大大提升