MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud

André Spiegel, Distinguished Engineer, MongoDB
Event Horizon:
Meet Albert Einstein as you move to the Cloud
@drmirror

Surprise in the Cloud
2,500 ops/s
50 ops/s

AWS Data Centers in the US
us-west-2
us-west-1
us-east-1
us-east-2

As the Raven Flies
us-west-2
us-west-1
us-east-1
us-east-2
300
2000
2350
480
distance (miles)

Einstein Was Here
us-west-2
us-west-1
us-east-1
us-east-2
3
22
25
5
min. light roundtrip time (milliseconds)

In theory and in practice...
us-west-2
us-west-1
us-east-1
us-east-2
5
32
37
8
min. fiber roundtrip time (milliseconds)

... and what we actually get
us-west-2
us-west-1
us-east-1
us-east-2
11
52
62
21
ICMP ping time (milliseconds)

I Don’t Have That Problem
I’ll co-locate my app and my data. One region is good enough
• Some of us don’t have that luxury
• Our users are not where our app is
I wish I got even close to 100ms round trip
• Let’s take a closer look at that app
• 100ms always matters

Latency Matters
• Google — increase page load by 500ms, 25% fewer searches
• Amazon — for each 100ms, lose 1% of sales
• Facebook — pages 500ms slower, 1% drop-off in traffic
• a one-second delay in page response decreases customer
satisfaction by 16%
— Campbell / Majors, Database Reliability Engineering

What can we do?
• Latency is significant – and it won't go away
• Avoid, Ignore, Embrace
• Get the most out of every round-trip (batching)
• Do something else during round-trip (async)

An Experiment
How to Travel Faster than the Speed of Light

Insert documents into MongoDB, with long-
range replication across trans-continental
links. Some inserts will fail due to duplicate
keys. Catch those and report the offending
documents.

Setup
• us-east-1 to us-west-1 (2,300mi, 62ms)
• 2-member replica set on m4.16xlarge
• mongodb 4.0.10
• write concern w:2, j:false
• clients in python 3.7.3 / pymongo 3.8.0 / motor 2.0.0

client = MongoClient(
"mongodb://localhost:27017/?replicaSet=rs0"
)
coll = client["test"]["coll"].with_options(
write_concern=WriteConcern(w=2)
)
for i in range(num_docs):
coll.insert_one({ "_id" : i, "a" : random() })
sync / single

errors = []
for i in range(num_docs):
doc = { "_id" : i, "a" : random() }
try:
coll.insert_one(doc)
except DuplicateKeyError as e:
errors.append(doc)
sync / single

for i in range(0, num_docs, batch_size):
batch = [
InsertOne({ "_id" : j, "a" : random()})
for j in range(i, i+batch_size)
]
coll.bulk_write(batch)
sync / bulk

for i in range(0, num_docs, batch_size):
batch = [
InsertOne({ "_id" : j, "a" : random()})
for j in range(i, i+batch_size)
]
try:
coll.bulk_write(batch, ordered=False)
except BulkWriteError as e:
for x in e.details[u'writeErrors']:
error_id = x[u'op']['_id']
errors.append(
get_document(batch, error_id)
)

def get_document(batch, ident):
for i in batch:
if i._doc["_id"] == ident:
return i._doc

sync / bulk
52,700 op/s
@ 100,000 batch

async def insert_one(coll, i):
await coll.insert_one(
{ "_id" : i, "a" : random() } )
async def main(coll, num_docs):
await asyncio.gather(*[
insert_one(coll, i) for i in range(num_docs)
])
asyncio.get_event_loop().run_until_complete(
main(coll, num_docs))
async / single

async def insert_one(coll, i):
doc = { "_id" : i, "a" : random() }
try:
await coll.insert_one(doc)
return None
except DuplicateKeyError as e:
return doc
async def main(coll, num_docs):
results = await asyncio.gather(*[
insert_one(coll, i)
for i in range(num_docs)])
return list(filter(
lambda x : x != None, results))

async def bulk_write(coll, i, batch_size):
batch = [
InsertOne( { "_id" : q, "a" : random() } )
for q in range(i, i+batch_size)
]
await coll.bulk_write(batch, ordered=False)
async def main(coll, num_docs, batch_size):
tasks = [bulk_write(coll, x, batch_size)
for x in range(0, num_docs, batch_size)]
await asyncio.gather(*tasks)
async / bulk

async / bulk
140,000 op/s
@ 8,000 batch

sync /
single
sync / bulk
async /
single
async / bulk
east-1 / west-1
62 ms
8
52,700 /
100,000
490
140,000 /
8,000
east-1 / east-2
11 ms
39
70,000 /
30,000
1,400
140,000 /
2,500

Summary
• Computing becomes an intercontinental Game of Chess
• ... and Einstein is on the table
• Understand what your latencies are – they won't go away
• Avoid, Ignore, Embrace
• Batching and asynchronous programming

Thank you.
github.com/drmirror/einstein

MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud

MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud

Similar to MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud