Methods of solving problems of storage and parallel processing of a large amount of postponed mails
Presentation was given by Roman Valihura at Wise TechTalks
2. C
O
N
T
E
N
T
1. The mails delivery evolution
2. What our team is doing at Wise
3. Design scheduling architecture
problems
How to fetch emails in time
How to send emails in time
How to split emails between workers
4. Solutions conclusion
4. What our
team is
doing at
Wise
We are creating an
ESP (Email
Service Provider) that
provides convenient
Rest API for working
with large newsletters
and collect statistics
about how users
interact with your
emails
5. U S E C A S E D I A G R A M
R E G U L A R F L O W
I N 3 0 D A Y S
3-party
APP
I N 9 0 D A Y S
According to Wikipedia, a
presentation is the process
of presenting a topic to an
audience. It is typically a
demonstration,
introduction, lecture, or
speech meant to inform,
persuade, or build good will.
ESP API
Send E
mail
HTTP
Request
GrayBox
processing Sendemail to
user
Event
Storage
6. WE NEED A
scheduling
BETTER
CONTROL
Users can send emails
in advance and
monitor sending
process from their side
LESS LOAD
FOR API
You need to have less
servers and less
throughput if you have
more time
NO PERSONAL
CRON
You don't need
personal CRON for
delivery postponed
emails
7. U S E C A S E D I A G R A M
S C H E D U L I N G F L O W ( S T A G E 0 )
3-party
APP
I N 9 0 D A Y S
According to Wikipedia, a
presentation is the process
of presenting a topic to an
audience. It is typically a
demonstration,
introduction, lecture, or
speech meant to inform,
persuade, or build good will.
ESP API
Send
E
mail
HTTP
Request
Sendemail to
user
Event
Storage
Scheduling
Storage
8. S O U R C E : W I S E C H A R A C T E R S D A T A B A S E
M E E T T H E W O R K E R ( V 1 )
Bio
Name
Worker (v1)
Age
4 month
Sex
Male
Responsibility
Make query to data storage with certain interval
Send ready to send emails to target users
10. S O U R C E : W I S E C H A R A C T E R S D A T A B A S E
M E E T T H E C L I C K H O U S E
Bio
Name
ClickHouse
Age
2 years 4 month
Sex
Undefined
Features
True column-oriented storage
Local and distributed joins
We use ClickHouse for events storage
Metrics (EC2 c5.large)
Read speed is 1.2 GB/s
Write speed is 50 to 200 MB/s
Finding 335K scheduled emails on a
dataset with size 5M take 0.061 second
With network latency for transfer, we
able to fetch 3.5M emails per second
Summary
11. U S E C A S E D I A G R A M
S C H E D U L I N G F L O W ( S T A G E 1 )
3-party
APP
I N 9 0 D A Y S
According to Wikipedia, a
presentation is the process
of presenting a topic to an
audience. It is typically a
demonstration,
introduction, lecture, or
speech meant to inform,
persuade, or build good will.
ESP API
Send
E
mail
HTTP
Request
Sendemail to
user
Event
Storage
Clickhouse
12. S O U R C E : W I S E C H A R A C T E R S D A T A B A S E
N O B O D Y I S P E R F E C T
Bio
Name
Worker (v1)
Age
4 month
Sex
Male
Responsibility
Make query to data storage with certain interval
Send ready to send emails to target users
Disadvantages
Works slow
13. How to send emails in
time when we have
limited throughput?
14. U S E C A S E D I A G R A M
S C H E D U L I N G F L O W ( S T A G E 2 )
3-party
APP
I N 9 0 D A Y S
According to Wikipedia, a
presentation is the process
of presenting a topic to an
audience. It is typically a
demonstration,
introduction, lecture, or
speech meant to inform,
persuade, or build good will.
ESP API
Send
E
mail
HTTP
Request
Sendemail to
user
Event
Storage
Scheduling
Storage
15. S O U R C E : W I S E C H A R A C T E R S D A T A B A S E
N O B O D Y I S P E R F E C T
Bio
Name
Worker (v1)
Age
4 month
Sex
Regular
Responsibility
Make query to data storage with certain interval
Send ready to send emails to target users
Disadvantages
Works slow
Not team member
16. How to split emails
between workers without
duplicates?
17. S O U R C E : W I S E C H A R A C T E R S D A T A B A S E
M E E T T H E R A N G E R
Bio
Name
Ranger
Age
4 month
Sex
Male
Responsibility
The Ranger is responsible for making a range for
workers. Each range has unique properties that allow
worker decide which records need to fetch from a
database. Ready to send ranges creates internally.
Throughput
Working in pair with a ClickHouse as data storage, this
system is able to create about 700 ranges per second
with size 5000 emails.
18. U S E C A S E D I A G R A M
T H E R A N G E R F L O W
Scheduling
Storage
(ClickHouse)
Ranger
[2018-10-08T07:47:56.144Z, 3], [2018-10-08T07:47:56.144Z, 4] ...
dateFrom: Date('iso-format')
dateTo: Date('iso-format')
Workers
Messaging
Queue
Send email
to user
dateFrom: Date('iso-format')
dateTo: Date('iso-format')
19. S O U R C E : W I S E C H A R A C T E R S D A T A B A S E
M E E T T H E W O R K E R ( V 2 )
Bio
Name
Worker (v2)
Age
3 month
Sex
Male
Responsibility
Pull Message Queue with ranges
Fetch emails from ClickHouse by range payload
Send emails to target users
Advantages
Works
Team member
20. U S E C A S E D I A G R A M
T H E R A N G E R F L O W
Scheduling
Storage
(ClickHouse)
Ranger
[2018-10-08T07:47:56.144Z, 3], [2018-10-08T07:47:56.144Z, 4] ...
dateFrom: Date('iso-format')
dateTo: Date('iso-format')
Workers
Messaging
Queue
Send email
to user
dateFrom: Date('iso-format')
dateTo: Date('iso-format')
21. U S E C A S E D I A G R A M
S C H E D U L I N G F L O W ( S T A G E 3 )
I N 3 0 D A Y S
3-party
APP
ESP API
Send
E
mail
HTTP
Request
Send
email to
user
Scheduling
Storage
(ClickHouse)
Ranger
Workers
22. 01
02
03
HOW TO FETCH EMAILS IN TIME?
We selected proper storage for that case.
With Сlickhouse, we can fetch 335K
records per second and filtering by ready
date take 0.1 seconds for 5M database set
(including network latency)
HOW SEND EMAILS IN TIME?
Obviously, we need a scaling for that case.
Using multiple workers we can reach such
throughput that we need.
HOW TO SPLIT EMAILS FOR WORKERS?
We created one more additional
component that doing ranges for our
workers and put them into a queue. Then
each worker fetches different range from
a queue.