Chapter 5
Event-Driven
Data Management
Designing and Deploying Microservices
2
by Chris Richardson
Rick Hwang
2018/10/23
Microservices and the Problem of Distributed Data Management
● A monolithic application typically has a single relational database.
● A key benefit of using a relational database is:
○ your application can use ACID transactions.
○ relational database provides SQL, which is a rich, declarative, and standardized query
language .
3
4
Problem of Distributed Data Management
● data owned by each microservice is private to that microservice and can only
be accessed via its API
● Encapsulating the data ensures that the microservices are loosely coupled
and can evolve independently of one another.
● If multiple services access the same data, schema updates require
time-consuming, coordinated updates to all of the services.
5
Different microservices use different
kinds of databases.
6
https://www.allthingsdistributed.com/2018/06/purpose-built-databases-in-aws.html
A one size fits all database doesn't fit anyone
7
https://www.jamesserra.com/archive/2015/08/relational-databases-vs-non-relational-databases/ 8
Next Generation: NewSQL
9
● OLTP, ACID, Scalable
● New Architecture: Spanner, CockroachDB
● Transparent Sharding Middleware: MariaDB MaxScale, ScaleArc
● DBaaS: Amazon Aurora, ClearDB
https://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf
10
The First Challenge is
how to implement business transactions that
maintain consistency across multiple services.
11
● Customer Service
○ maintains information about customer, including their credit lines (信用額度).
● Order Service
○ manages orders and must verify that a new order doesn’t violate the customer’s credit limit.
● In the monolithic version:
○ the Order Service can simply use an ACID transaction to check the available credit and
create the order
Example:
12
Monolithic Version
● the Order Service can simply use an ACID transaction to check the available
credit and create the order
13
1. the ORDER and CUSTOMER
tables are private to their
respective services.
a. The Order Service cannot
access the CUSTOMER
table directly
b. It can only use the API
provided by the Customer
Service
2. The Order Service could
potentially use distributed
transactions, also known as
two-phase commit (2PC).
Microservice Architecture
14
15
The CAP theorem requires you to choose between availability and ACID-style
consistency, and availability is usually the better choice.
Moreover, many modern technologies, such as most NoSQL databases, do not
support 2PC Maintaining data consistency across services and databases is
essential, so we need another solution.
16
CAP 常見的排列組合:
● CA (consistency + availability)
○ RDBMS
○ 2PC (2 Phase Commit), XA Transactions
● CP (consistency + partition tolerance)
○ 一致性、分區容錯
○ 共識演算法:Paxos、Raft / PBFT
● AP (availability + partition tolerance)
○ 關注的是 可用性 與 分區容錯
○ Dynamo
Source: https://www.w3resource.com/mongodb/nosql.php
17
The Second Challenge is
how to implement queries that retrieve
data from multiple services.
18
For example
19
the application needs to display a customer and his recent orders.
If the Order Service provides an API for retrieving a customer’s orders then you
can retrieve this data using an application-side join.
The application retrieves the customer from the Customer Service and the
customer’s orders from the Order Service.
Suppose, however, that the Order Service only supports the lookup of orders by
their primary key. In this situation, there is no obvious way to retrieve the needed
data.
20
Event-Driven
Architecture
21
Pub / Sub
22
In this architecture, a microservice publishes an event when something notable
happens, such as when it updates a business entity.
Other microservices subscribe to those events. When a microservice receives an
event it can update its own business entities, which might lead to more events
being published
You can use events to implement business transactions that span multiple
services (跨服務).
A transaction consists of a series of steps. Each step consists of a microservice
updating a business entity and publishing an event that triggers the next step.
Message Broker (仲介)
23
MESSAGE
BROKER
24
OrderCreated
ORDER
SERVICE
Place Order
ID CUST_ID STATUS TOTAL
999 101 NEW 1234
ORDER table
● The Order Service
○ creates an Order with status NEW
○ publishes an OrderCreated event
MESSAGE
BROKER
25
ORDER
SERVICE
ID CUST_ID STATUS TOTAL
999 101 NEW 1234
ORDER table
The Customer Service
● consumes the OrderCreated event,
reserves credit for the order
● publishes a CreditReserved event
CUSTOMER
SERVICE
ID CREDIT_LIMIT ...
202 5000
CUSTOMER table
ID ORDER_ID AMOUNT
202 999 1234
RESERVED_CREDIT table
OrderCreated
CreditReserved
MESSAGE
BROKER
26
ORDER
SERVICE
ID CUST_ID STATUS TOTAL
999 101 OPEN 1234
ORDER table
The Order Service
● consumes the CreditReserved event
● changes the status of the order to
OPEN
CUSTOMER
SERVICE
ID CREDIT_LIMIT ...
202 5000
CUSTOMER table
ID ORDER_ID AMOUNT
RESERVED_CREDIT table
CreditReserved
27
BASE Model
Provided that
● (a) each service atomically updates the database and publishes an event
● (b) the Message Broker guarantees that events are delivered at least once,
then you can implement business transactions that span multiple services
● It is important to note that these are NOT ACID transactions.
● They offer much weaker guarantees such as eventual consistency.
● This transaction model has been referred to as the BASE model.
28
BASE Model
最終一致性,ACID 的目的。
Eventually Consistent - Revisited
By Werner Vogels on 22 December 2008
Eventual Consistency (最終一致性模型)
29
● Client-side Consistency
○ Strong consistency (強一致性): 執行完一操作後,後續操作 保證取得更新後的最新資料。
○ Weak consistency (弱一致性):執行完一操作後,後續操作 不保證取得更新後的最新資料。
● Eventual consistency (最終一致性)
○ 弱一致性的特例,經過一段時間之後,必須取的最新資料。
○ DNS 就是最終一致性模型的常例
30
CUSTOMER
ORDER VIEW
QUERY
CUSTOMER
ORDER VIEW
UPDATER
CUSTOMERORDER
MESSAGE BROKER
OrderCreated
Order Cancelled
Order Shipped
CustomerCreated
CustomerCancelled
Customer Shipped
Update Query
Fund Customer
and OrdersCustomer Order View
accessed by two services
1
2
Customer
Order View
receives a Customer or Order event
document database,
such as MongoDB
3
handles requests for a customer and
recent orders by querying
The benefits of event-driven architecture
31
● It enables the implementation of transactions that span multiple services
and provide eventual consistency.
● Another benefit is that it also enables an application to maintain materialized
views.
The drawback of event-driven architecture
32
● the programming model is more complex than when using ACID
transactions.
● Often you must implement compensating (補償) transactions to recover
from application-level failures;
○ you must cancel an order if the credit check fails, applications must deal with inconsistent
data. That is because changes made by in-flight transactions are visible.
○ The application can also see inconsistencies if it reads from a materialized view that is not
yet updated.
● subscribers must detect and ignore duplicate events
33
Achieving Atomicity
(實踐原子性)
34
35
In an event-driven architecture there is also the problem of atomically updating the
database and publishing an event. For example, Order Service must
1. insert a row into the ORDER table and
2. publish an Order Created event
It is essential that these two operations are done atomically.
If the service crashes after updating the database but before publishing the event,
the system becomes inconsistent.
The standard way to ensure atomicity is to use a distributed transaction involving
the database and the Message Broker.
Publishing Events
Using Local Transactions
36
INSERT INSERT
MESSAGE
BROKER
37
ORDER
SERVICE
ID CUST_ID STATUS TOTAL
999 101 NEW 1234
ORDER table
Multi-step process involving only Local Transactions
ID TYPE DATA STATE
9527 101 { … } NEW
EVENT table
EVENT
SERVICE
Local Transaction
QUERY
Publish
1
a (local) database transaction,
updates the state of the business
entities, inserts an event.
functions as a message queue
A separate application thread or
process queries the EVENT table,
publishes the events
2
3
4
Published
38
Benefits
1. it guarantees an event is published for each update without relying on 2PC.
2. the application publishes business-level events, which eliminates (消除) the
need to infer (臆測) them.
Backward
● it is potentially error-prone (容易出錯) since the developer must remember to
publish events.
● A limitation of this approach is that it is challenging to implement when using
some NoSQL databases because of their limited transaction and query
capabilities.
39
40
Mining a Database
Transaction Log
41
MESSAGE
BROKER
42
ORDER
SERVICE
Fig 5-7 A Message broker can arbitrate data transactions
Datastore
ORDER table
Transaction log
TRANSACTION
LOG MINIER
Update
Changes Publish
Linkined Databus
43
● Databus mines the Oracle transaction log and publishes events
corresponding to the changes.
● LinkedIn uses Databus to keep various derived data stores consistent with
the system of record.
AWS DynamoDB
● A DynamoDB stream contains the time-ordered sequence of changes
(create, update, and delete operations) made to the items in a DynamoDB
table in the last 24 hours.
● An application can read those changes from the stream and, for example,
publish them as events.
44
Benefits of Transaction log mining
45
● it guarantees that an event is published for each update without using 2PC.
● Transaction log mining can also simplify the application by separating event
publishing from the application’s business logic.
Backwards of Transaction log mining
46
● the format of the transaction log is proprietary to each database and can
even change between database versions.
● it can be difficult to reverse engineer the high-level business events from the
low-level updates recorded in the transaction log
47
Using Event
Sourcing
48
Event Souring
49
● Event sourcing achieves atomicity without 2PC by using a radically different,
event-centric approach to persisting business entities.
● Rather than store the current state of an entity, the application stores a
sequence of state-changing events.
● The application reconstructs an entity’s current state by replaying the events.
● Whenever the state of a business entity changes, a new event is appended to
the list of events.
● Since saving an event is a single operation, it is inherently atomic.
Event Source
50
ORDER
SERVICE
Fig 5-7 A Message broker can arbitrate data transactions
Order: 9527
ORDER
SERVICE
Add Events
ID STATUS TOTAL ...
999 ABCDE NEW ...
ORDER table
CUSTOMER
SERVICE
ORDER Cancelled
ORDER Approved
...
ORDER Shipped
Find Events
Subscribe to Events
51
Event Store
● Events persist in an Event Store, which is a database of events.
● The store has an API for adding and retrieving an entity’s events.
● The Event Store also behaves like the Message Broker in the architectures
we described previously.
● It provides an API that enables services to subscribe to events.
● The Event Store delivers all events to all interested subscribers.
● The Event Store is the backbone of an event-driven microservices
architecture.
The Benefits of Event Sourcing
● It solves one of the key problems in implementing an event-driven architecture and makes it
possible to reliably publish events whenever state changes.
● As a result, it solves data consistency issues in a microservices architecture.
● Also, because it persists events rather than domain objects, it mostly avoids the object-relational
impedance mismatch problem.
● Event sourcing also provides a 100% reliable audit log of the changes made to a business entity
and makes it possible to implement temporal queries that determine the state of an entity at any
point in time.
● Another major benefit of event sourcing is that your business logic consists of loosely coupled
business entities that exchange events.
● This makes it a lot easier to migrate from a monolithic application to a microservices
architecture
52
Cloud Architectures - AWS
53
54
Cloud Architectures - AWS
Architecting for the Cloud (AWS Best Practices)
Drawback of Event Sourcing
● It is a different and unfamiliar style of programming and so there is a learning
curve.
● The event store only directly supports the lookup of business entities by
primary key.
● You must use command query responsibility separation (CQRS) to
implement queries.
● As a result, applications must handle eventually consistent data.
55
56
Summary
57
Summary
58
● In a microservices architecture, each microservice has its own private datastore.
● Different microservices might use different SQL and NoSQL databases.
○ While this database architecture has significant benefits, it creates some distributed data
management challenges.
○ The first challenge is how to implement business transactions that maintain consistency
across multiple services.
○ The second challenge is how to implement queries that retrieve data from multiple services.
● For many applications, the solution is to use an event-driven architecture
○ One challenge with implementing an event-driven architecture is how to atomically update
state and how to publish events.
○ There are a few ways to accomplish this, including using the database as a message queue,
transaction log mining, and event sourcing.
●
Reference
● A one size fits all database doesn't fit anyone
● Eventually Consistent - Revisited
● Cloud Architectures - AWS
● Architecting for the Cloud (AWS Best Practices)
59

Study Notes - Event-Driven Data Management for Microservices

  • 1.
    Chapter 5 Event-Driven Data Management Designingand Deploying Microservices 2 by Chris Richardson Rick Hwang 2018/10/23
  • 2.
    Microservices and theProblem of Distributed Data Management ● A monolithic application typically has a single relational database. ● A key benefit of using a relational database is: ○ your application can use ACID transactions. ○ relational database provides SQL, which is a rich, declarative, and standardized query language . 3
  • 3.
  • 4.
    Problem of DistributedData Management ● data owned by each microservice is private to that microservice and can only be accessed via its API ● Encapsulating the data ensures that the microservices are loosely coupled and can evolve independently of one another. ● If multiple services access the same data, schema updates require time-consuming, coordinated updates to all of the services. 5
  • 5.
    Different microservices usedifferent kinds of databases. 6
  • 6.
  • 7.
  • 8.
    Next Generation: NewSQL 9 ●OLTP, ACID, Scalable ● New Architecture: Spanner, CockroachDB ● Transparent Sharding Middleware: MariaDB MaxScale, ScaleArc ● DBaaS: Amazon Aurora, ClearDB https://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf
  • 9.
  • 10.
    The First Challengeis how to implement business transactions that maintain consistency across multiple services. 11
  • 11.
    ● Customer Service ○maintains information about customer, including their credit lines (信用額度). ● Order Service ○ manages orders and must verify that a new order doesn’t violate the customer’s credit limit. ● In the monolithic version: ○ the Order Service can simply use an ACID transaction to check the available credit and create the order Example: 12
  • 12.
    Monolithic Version ● theOrder Service can simply use an ACID transaction to check the available credit and create the order 13
  • 13.
    1. the ORDERand CUSTOMER tables are private to their respective services. a. The Order Service cannot access the CUSTOMER table directly b. It can only use the API provided by the Customer Service 2. The Order Service could potentially use distributed transactions, also known as two-phase commit (2PC). Microservice Architecture 14
  • 14.
    15 The CAP theoremrequires you to choose between availability and ACID-style consistency, and availability is usually the better choice. Moreover, many modern technologies, such as most NoSQL databases, do not support 2PC Maintaining data consistency across services and databases is essential, so we need another solution.
  • 15.
    16 CAP 常見的排列組合: ● CA(consistency + availability) ○ RDBMS ○ 2PC (2 Phase Commit), XA Transactions ● CP (consistency + partition tolerance) ○ 一致性、分區容錯 ○ 共識演算法:Paxos、Raft / PBFT ● AP (availability + partition tolerance) ○ 關注的是 可用性 與 分區容錯 ○ Dynamo Source: https://www.w3resource.com/mongodb/nosql.php
  • 16.
  • 17.
    The Second Challengeis how to implement queries that retrieve data from multiple services. 18
  • 18.
    For example 19 the applicationneeds to display a customer and his recent orders. If the Order Service provides an API for retrieving a customer’s orders then you can retrieve this data using an application-side join. The application retrieves the customer from the Customer Service and the customer’s orders from the Order Service. Suppose, however, that the Order Service only supports the lookup of orders by their primary key. In this situation, there is no obvious way to retrieve the needed data.
  • 19.
  • 20.
  • 21.
    Pub / Sub 22 Inthis architecture, a microservice publishes an event when something notable happens, such as when it updates a business entity. Other microservices subscribe to those events. When a microservice receives an event it can update its own business entities, which might lead to more events being published
  • 22.
    You can useevents to implement business transactions that span multiple services (跨服務). A transaction consists of a series of steps. Each step consists of a microservice updating a business entity and publishing an event that triggers the next step. Message Broker (仲介) 23
  • 23.
    MESSAGE BROKER 24 OrderCreated ORDER SERVICE Place Order ID CUST_IDSTATUS TOTAL 999 101 NEW 1234 ORDER table ● The Order Service ○ creates an Order with status NEW ○ publishes an OrderCreated event
  • 24.
    MESSAGE BROKER 25 ORDER SERVICE ID CUST_ID STATUSTOTAL 999 101 NEW 1234 ORDER table The Customer Service ● consumes the OrderCreated event, reserves credit for the order ● publishes a CreditReserved event CUSTOMER SERVICE ID CREDIT_LIMIT ... 202 5000 CUSTOMER table ID ORDER_ID AMOUNT 202 999 1234 RESERVED_CREDIT table OrderCreated CreditReserved
  • 25.
    MESSAGE BROKER 26 ORDER SERVICE ID CUST_ID STATUSTOTAL 999 101 OPEN 1234 ORDER table The Order Service ● consumes the CreditReserved event ● changes the status of the order to OPEN CUSTOMER SERVICE ID CREDIT_LIMIT ... 202 5000 CUSTOMER table ID ORDER_ID AMOUNT RESERVED_CREDIT table CreditReserved
  • 26.
    27 BASE Model Provided that ●(a) each service atomically updates the database and publishes an event ● (b) the Message Broker guarantees that events are delivered at least once, then you can implement business transactions that span multiple services ● It is important to note that these are NOT ACID transactions. ● They offer much weaker guarantees such as eventual consistency. ● This transaction model has been referred to as the BASE model.
  • 27.
  • 28.
    Eventually Consistent -Revisited By Werner Vogels on 22 December 2008 Eventual Consistency (最終一致性模型) 29 ● Client-side Consistency ○ Strong consistency (強一致性): 執行完一操作後,後續操作 保證取得更新後的最新資料。 ○ Weak consistency (弱一致性):執行完一操作後,後續操作 不保證取得更新後的最新資料。 ● Eventual consistency (最終一致性) ○ 弱一致性的特例,經過一段時間之後,必須取的最新資料。 ○ DNS 就是最終一致性模型的常例
  • 29.
    30 CUSTOMER ORDER VIEW QUERY CUSTOMER ORDER VIEW UPDATER CUSTOMERORDER MESSAGEBROKER OrderCreated Order Cancelled Order Shipped CustomerCreated CustomerCancelled Customer Shipped Update Query Fund Customer and OrdersCustomer Order View accessed by two services 1 2 Customer Order View receives a Customer or Order event document database, such as MongoDB 3 handles requests for a customer and recent orders by querying
  • 30.
    The benefits ofevent-driven architecture 31 ● It enables the implementation of transactions that span multiple services and provide eventual consistency. ● Another benefit is that it also enables an application to maintain materialized views.
  • 31.
    The drawback ofevent-driven architecture 32 ● the programming model is more complex than when using ACID transactions. ● Often you must implement compensating (補償) transactions to recover from application-level failures; ○ you must cancel an order if the credit check fails, applications must deal with inconsistent data. That is because changes made by in-flight transactions are visible. ○ The application can also see inconsistencies if it reads from a materialized view that is not yet updated. ● subscribers must detect and ignore duplicate events
  • 32.
  • 33.
  • 34.
    35 In an event-drivenarchitecture there is also the problem of atomically updating the database and publishing an event. For example, Order Service must 1. insert a row into the ORDER table and 2. publish an Order Created event It is essential that these two operations are done atomically. If the service crashes after updating the database but before publishing the event, the system becomes inconsistent. The standard way to ensure atomicity is to use a distributed transaction involving the database and the Message Broker.
  • 35.
  • 36.
    INSERT INSERT MESSAGE BROKER 37 ORDER SERVICE ID CUST_IDSTATUS TOTAL 999 101 NEW 1234 ORDER table Multi-step process involving only Local Transactions ID TYPE DATA STATE 9527 101 { … } NEW EVENT table EVENT SERVICE Local Transaction QUERY Publish 1 a (local) database transaction, updates the state of the business entities, inserts an event. functions as a message queue A separate application thread or process queries the EVENT table, publishes the events 2 3 4 Published
  • 37.
    38 Benefits 1. it guaranteesan event is published for each update without relying on 2PC. 2. the application publishes business-level events, which eliminates (消除) the need to infer (臆測) them.
  • 38.
    Backward ● it ispotentially error-prone (容易出錯) since the developer must remember to publish events. ● A limitation of this approach is that it is challenging to implement when using some NoSQL databases because of their limited transaction and query capabilities. 39
  • 39.
  • 40.
  • 41.
    MESSAGE BROKER 42 ORDER SERVICE Fig 5-7 AMessage broker can arbitrate data transactions Datastore ORDER table Transaction log TRANSACTION LOG MINIER Update Changes Publish
  • 42.
    Linkined Databus 43 ● Databusmines the Oracle transaction log and publishes events corresponding to the changes. ● LinkedIn uses Databus to keep various derived data stores consistent with the system of record.
  • 43.
    AWS DynamoDB ● ADynamoDB stream contains the time-ordered sequence of changes (create, update, and delete operations) made to the items in a DynamoDB table in the last 24 hours. ● An application can read those changes from the stream and, for example, publish them as events. 44
  • 44.
    Benefits of Transactionlog mining 45 ● it guarantees that an event is published for each update without using 2PC. ● Transaction log mining can also simplify the application by separating event publishing from the application’s business logic.
  • 45.
    Backwards of Transactionlog mining 46 ● the format of the transaction log is proprietary to each database and can even change between database versions. ● it can be difficult to reverse engineer the high-level business events from the low-level updates recorded in the transaction log
  • 46.
  • 47.
  • 48.
    Event Souring 49 ● Eventsourcing achieves atomicity without 2PC by using a radically different, event-centric approach to persisting business entities. ● Rather than store the current state of an entity, the application stores a sequence of state-changing events. ● The application reconstructs an entity’s current state by replaying the events. ● Whenever the state of a business entity changes, a new event is appended to the list of events. ● Since saving an event is a single operation, it is inherently atomic.
  • 49.
    Event Source 50 ORDER SERVICE Fig 5-7A Message broker can arbitrate data transactions Order: 9527 ORDER SERVICE Add Events ID STATUS TOTAL ... 999 ABCDE NEW ... ORDER table CUSTOMER SERVICE ORDER Cancelled ORDER Approved ... ORDER Shipped Find Events Subscribe to Events
  • 50.
    51 Event Store ● Eventspersist in an Event Store, which is a database of events. ● The store has an API for adding and retrieving an entity’s events. ● The Event Store also behaves like the Message Broker in the architectures we described previously. ● It provides an API that enables services to subscribe to events. ● The Event Store delivers all events to all interested subscribers. ● The Event Store is the backbone of an event-driven microservices architecture.
  • 51.
    The Benefits ofEvent Sourcing ● It solves one of the key problems in implementing an event-driven architecture and makes it possible to reliably publish events whenever state changes. ● As a result, it solves data consistency issues in a microservices architecture. ● Also, because it persists events rather than domain objects, it mostly avoids the object-relational impedance mismatch problem. ● Event sourcing also provides a 100% reliable audit log of the changes made to a business entity and makes it possible to implement temporal queries that determine the state of an entity at any point in time. ● Another major benefit of event sourcing is that your business logic consists of loosely coupled business entities that exchange events. ● This makes it a lot easier to migrate from a monolithic application to a microservices architecture 52
  • 52.
  • 53.
    54 Cloud Architectures -AWS Architecting for the Cloud (AWS Best Practices)
  • 54.
    Drawback of EventSourcing ● It is a different and unfamiliar style of programming and so there is a learning curve. ● The event store only directly supports the lookup of business entities by primary key. ● You must use command query responsibility separation (CQRS) to implement queries. ● As a result, applications must handle eventually consistent data. 55
  • 55.
  • 56.
  • 57.
    Summary 58 ● In amicroservices architecture, each microservice has its own private datastore. ● Different microservices might use different SQL and NoSQL databases. ○ While this database architecture has significant benefits, it creates some distributed data management challenges. ○ The first challenge is how to implement business transactions that maintain consistency across multiple services. ○ The second challenge is how to implement queries that retrieve data from multiple services. ● For many applications, the solution is to use an event-driven architecture ○ One challenge with implementing an event-driven architecture is how to atomically update state and how to publish events. ○ There are a few ways to accomplish this, including using the database as a message queue, transaction log mining, and event sourcing. ●
  • 58.
    Reference ● A onesize fits all database doesn't fit anyone ● Eventually Consistent - Revisited ● Cloud Architectures - AWS ● Architecting for the Cloud (AWS Best Practices) 59