Asynchronous Processing
in Big System
Lê Minh Nghĩa
Solution Architect at Tiki.vn
Big System’s Targets
• High Performance
• High Scalability
• High Reliable
• Low Cost Maintenance
Problems
• IO Bottleneck
• Scale Processing
• Handle a Huge Concurrent Request
• Availability and Partition Fault Tolerance
• Deal with consistency and concurrency
Split to Scale
• If you can’t split, you can’t scale it
• From Monolithic System to Distributed System
• From One to Many Processing
• From One to Many Persistent
• From Single to Parallel Processing
• From Synchronous to Asynchronous
Data Replication
• Every nodes need a way to communicate with
other
• Data replication is the most important in
distributed system
• The reliability of a system depends on the way
data replication
Replicate Model
• Primary Data Backup
• State Machine Model
(Active Active Model)
What’s Reliable
Replication?
• No Lost Data
• Guarantee Ordering
• High Scalability
• Easy Integration
Message Queue
• Message Queue is the key to split and scale
system
• It’s the solution for reliable replication
• But It’s not simple as we think…
Message Queue
1. Guarantee No Lost Data
We usually do both:
- Write Data To DB
- Send Message To
Queue
Database
Message
Queue
Processing
Problems, In fact:
- Can Write But Can’t
Send
- Can Send But Cant’
Write
1. Guarantee No Lost Data
• Solutions:
• Use One Way data flow:
Process —> Database —> Message Queue
• Use Transaction Log to Dispatch Data Changes
1. Guarantee No Lost Data
2. Guarantee
Sending Ordering
• Problems:
• Each request sending out
one message at the
same time, in different
threads
• One of the messages can
be fail in sending
• That cause the messages
are not in ordering
2. Guarantee
Sending Ordering
• Use Transaction Log To Append Un-dispatched
Message in Order
• Asynchronous Sending Un-Dispatched Message
to Message Queue
2. Guarantee
Sending Ordering
Transaction
Undispatched
Message
Write
Polling
Worker
1. Poll Messages
Message Bus
2. Publish Messages
3. Remove Messages
3. Guarantee
Delivery Ordering
E2 E3 E4E1
Worker 1 Worker 2 Worker 3 Worker 4
- Events are dequeued in
concurrency by many
workers
- Message Queue can
guarantee first in first out
- The later event can be
processed faster than the
earlier event —> cause lost
ordering
3. Guarantee
Delivery Ordering
• Solutions:
• if use Rabbit MQ/Active MQ: use only one
consumer for a queue
• If use Kafka, Kafka guarantee ordering delivery
message per each partition. Only one thread of a
consumer group can receive message from a
partition
4. Idempotemcy Filtering
• This is about duplicate message
• A message can be delivery more than one time
• Example: can deposit twice because receive
deposit message twice
4. Idempotemcy Filtering
• Solutions:
• Use UUID/GUID v4 for message id
• Use timestamp or version of message to detect
duplicate
5. Versoning Message
• Replicated data is
always eventually
consistency
• Sometime we
need to know
about how stale
data is
V4 V3 V2V5
Write V5 Read V1
5. Versoning Message
• Use timestamp
• Use incremental version (integer)
• Guarantee increase version consistency when
write data
6. Non Blocking IO
• How to handle million
messages in a queue?
• Solutions:
• Processing message in
pipeline.
• Split processing in three
separated phases: receiving,
handling and completing
message
• Each phase is processing in
parallel
receiving
handling
completing
7. Capture Data Changes
• Is the way capture data changes of DB to
replicate data to Message Queue
• Use specific mechanism of DB to know the
changes of Data
MySQL Bin Log
• Decode My
SQL Bin Log
to know new
data changes
MySQL My SQL Binlog
Event Handler
Decode Bin Log
Message Queue
Postgresql Notification
• Use Postgres
Notification to
notify the
changes of
data
Postgresql
Notification
Receiver
Message Queue
Notify
Thank You
• Contact: Lê Minh Nghĩa
• Email: nghia.fit@gmail.com
• Facebook: /nghialeminh

Asynchronous processing in big system

  • 1.
    Asynchronous Processing in BigSystem Lê Minh Nghĩa Solution Architect at Tiki.vn
  • 2.
    Big System’s Targets •High Performance • High Scalability • High Reliable • Low Cost Maintenance
  • 3.
    Problems • IO Bottleneck •Scale Processing • Handle a Huge Concurrent Request • Availability and Partition Fault Tolerance • Deal with consistency and concurrency
  • 4.
    Split to Scale •If you can’t split, you can’t scale it • From Monolithic System to Distributed System • From One to Many Processing • From One to Many Persistent • From Single to Parallel Processing • From Synchronous to Asynchronous
  • 5.
    Data Replication • Everynodes need a way to communicate with other • Data replication is the most important in distributed system • The reliability of a system depends on the way data replication
  • 6.
    Replicate Model • PrimaryData Backup • State Machine Model (Active Active Model)
  • 7.
    What’s Reliable Replication? • NoLost Data • Guarantee Ordering • High Scalability • Easy Integration
  • 8.
    Message Queue • MessageQueue is the key to split and scale system • It’s the solution for reliable replication • But It’s not simple as we think…
  • 9.
  • 10.
    1. Guarantee NoLost Data We usually do both: - Write Data To DB - Send Message To Queue Database Message Queue Processing Problems, In fact: - Can Write But Can’t Send - Can Send But Cant’ Write
  • 11.
    1. Guarantee NoLost Data • Solutions: • Use One Way data flow: Process —> Database —> Message Queue • Use Transaction Log to Dispatch Data Changes
  • 12.
    1. Guarantee NoLost Data
  • 13.
    2. Guarantee Sending Ordering •Problems: • Each request sending out one message at the same time, in different threads • One of the messages can be fail in sending • That cause the messages are not in ordering
  • 14.
    2. Guarantee Sending Ordering •Use Transaction Log To Append Un-dispatched Message in Order • Asynchronous Sending Un-Dispatched Message to Message Queue
  • 15.
    2. Guarantee Sending Ordering Transaction Undispatched Message Write Polling Worker 1.Poll Messages Message Bus 2. Publish Messages 3. Remove Messages
  • 16.
    3. Guarantee Delivery Ordering E2E3 E4E1 Worker 1 Worker 2 Worker 3 Worker 4 - Events are dequeued in concurrency by many workers - Message Queue can guarantee first in first out - The later event can be processed faster than the earlier event —> cause lost ordering
  • 17.
    3. Guarantee Delivery Ordering •Solutions: • if use Rabbit MQ/Active MQ: use only one consumer for a queue • If use Kafka, Kafka guarantee ordering delivery message per each partition. Only one thread of a consumer group can receive message from a partition
  • 18.
    4. Idempotemcy Filtering •This is about duplicate message • A message can be delivery more than one time • Example: can deposit twice because receive deposit message twice
  • 19.
    4. Idempotemcy Filtering •Solutions: • Use UUID/GUID v4 for message id • Use timestamp or version of message to detect duplicate
  • 20.
    5. Versoning Message •Replicated data is always eventually consistency • Sometime we need to know about how stale data is V4 V3 V2V5 Write V5 Read V1
  • 21.
    5. Versoning Message •Use timestamp • Use incremental version (integer) • Guarantee increase version consistency when write data
  • 22.
    6. Non BlockingIO • How to handle million messages in a queue? • Solutions: • Processing message in pipeline. • Split processing in three separated phases: receiving, handling and completing message • Each phase is processing in parallel receiving handling completing
  • 23.
    7. Capture DataChanges • Is the way capture data changes of DB to replicate data to Message Queue • Use specific mechanism of DB to know the changes of Data
  • 24.
    MySQL Bin Log •Decode My SQL Bin Log to know new data changes MySQL My SQL Binlog Event Handler Decode Bin Log Message Queue
  • 25.
    Postgresql Notification • UsePostgres Notification to notify the changes of data Postgresql Notification Receiver Message Queue Notify
  • 26.
    Thank You • Contact:Lê Minh Nghĩa • Email: nghia.fit@gmail.com • Facebook: /nghialeminh