Solving some of the scalability
problems at Booking.com
Ivan Kruglov
YAPC::EU 2017
based on Oleg Sidorov’s slides
graphite processor
event graphite processor
What is an event?
• message with technical
and business data
• Sereal encoded
• srl([ $e, $e, $e, … ])
WEB
CRON
e-mail FAX
MySQL
VoIP
Transport
monitoring hadoop/hive A/B testing elastic search
• Distributed, dc-fault tolerant
• Generates graphite metrics from events
• Runs user-defined code (~260 monitors)
• Processes events second-by-second (epochs)
• upto ~500k events per second
Event Graphite Processor.
Event Graphite Processor.
epoch_of_events = get_events_for_epoch(now)
foreach monitor : monitors
result = monitor.run(epoch_of_events)
graphite.send(result)
Event Graphite Processor.
• The dataset is huge and it’s growing
• Every second of events takes 10–15GB of RAM
• Monitors are split into groups to run faster
• Every group runs in a fork
• Forking provokes COW
• RAM is being saturated
• No RAM = the box is being kicked out
Processing.
Think different.
Processing. Thinking different.
epoch_of_events = sereal_decoder.parse(srl_events)
foreach (event : epoch_of_events) ...
vs
iterator = sereal_decoder.iterator(srl_events)
while event = iterator.next() ...
Processing. Thinking different.
epoch_of_events = get_events_for_epoch(now)
iterator = epoch_of_events.iterator
while event = iterator.next()
foreach monitor : monitors
monitor.process_event(event)
foreach monitor : monitors
result = monitor.post_process()
graphite.send(result)
Problem: need to rewrite user code
epoch_of_events = get_events_for_epoch(now)
foreach monitor : monitors
result = monitor.run(epoch_of_events)
graphite.send(result)
epoch_of_events = get_events_for_epoch(now)
iterator = epoch_of_events.iterator
while event = iterator.next()
foreach monitor : monitors
monitor.process_event(event)
foreach monitor : monitors
result = monitor.post_process()
graphite.send(result)
vs
How?
• Sereal::Path::Iterator
a. iterate over objects
(scalar/arrayref/hashref/blessed/etc)
b. decode from current
position
• no streaming
• check limitations
Still need to rewrite
user code!
A proof of concept.
FlogCron
1. Identical stack
2. Same problems
3. Easy to migrateA proof of concept.
FlogCron
First results: promising.
• CPU: no changes
• Processing time: 60sec vs 30sec
• # of boxes: 20 vs 8
• RAM: 10GB vs 100MB
1. RAM is an issue
2. New user monitors
3. New systems
4. More events every day
EGP: Time to act!
1. Implement a proof of concept
2. Freeze EGP development
3. Migrate all monitors
4. Full-scale test
5. Roll out the new system
6. Profit!
EGP migration TODO
1. 8 people
2. All done in 1 day
3. 260 monitors
4. 317 files changed,
10336 insertions,
11288 deletions
5. Ready to run a full-scale test
Migration.
Hackathon.
Results
80s
vs
120+s
Chasing the problem.
foreach (@events) {
…
}
while (my $event = $iterator->next()) {
…
}
POSIX::exit
POSIX::exit
undef $event # ~500k times, ~15GB in total
1. Xeon E5-2690
2. 56 logical cores
The new hardware.
1. No more RAM constraints
2. More aggressive forking
3. 3x more groups/forks
Parallelization.
Fork me!
2 times faster.
1. Processing time: 80sec vs 40sec
2. RAM: 16GB vs 500MB
3. # of boxes: 80 vs 30
The results.
The new system
1. Engineering is the king,
collaboration is the queen
2. The ideas that
failed individually might
work together
3. Challenge everything
Lessons learned.
Ivan Kruglov
ivan.kruglov@booking.com
Thank you!

Solving some of the scalability problems at booking.com