This document discusses bypassing JSON encoding when communicating between a MongoDB database and a web service client by instead using raw BSON encoding. It describes monkey patching the PyMongo driver to call a custom BSON decoding function that leaves documents encoded instead of decoding to dictionaries. This allows the client to receive documents still in BSON format and decode them, skipping the encoding and decoding steps. While benchmarks showed improved performance, the author notes the benchmark is flawed as the custom Python code does not compare well to the C-coded default. The key idea is to use raw BSON between client and server to reduce serialization overhead.
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
PyConIT6 - Messing up with pymongo for fun and profit
1. MESSING UP WITH PYMONGO
FOR FUN AND PROFIT
Alessandro Molina
@__amol__
alessandro.molina@axant.it
2. Who am I
● CTO @ AXANT.it, mostly Python company,
(with some iOS and Android)
● TurboGears2 dev team member
● Took over Beaker maintenace in 2015
● Mostly contributing to web world python
libraries: Formencode, MING MongoDB ODM, ToscaWidgets
3. The Reason
● EuroPython 2013
○ JSON WebServices with Python best practices
talk
● Question raised
○ “We have a service where our bottleneck is
actually the JSON encoding itself, what can we
do?”
4. First obvious answer
● Avoid encoding whole data in memory
○ iterencode yields one object at time instead of
encoding everything at once.
● Use a faster encoder!
○ There are projects with custom encoders like
GPSD that are very fast and very memory
conservative.
5. Mad answer
● If the JSON encoder is too slow for you
○ Remove JSON encoding
● Looking for the fastest encoding?
○ Don’t encode data at all!
6. MongoDB flow
● BSON is the serialization format used by
mongodb to talk with its clients
● Involves decoding BSON and then re-
encoding JSON
MongoDB WebService
Client
BSON
JSON
Driver
NATIVE
7. Using BSON!
● Can we totally skip “BSON decoding” and
“JSON encoding” dance and directly use
BSON?
“BSON [bee · sahn], short for Bin-ary JSON, is a binary-encoded seri-alization
of JSON-like documents. Like JSON, BSON supports the embedding of
documents and arrays within other documents and arrays. BSON also contains
extensions that allow representation of data types that are not part of the JSON
spec. For example, BSON has a Date type and a BinData type.”
8. Target Flow
● BSON decoding on the client can happen
using the js-bson library (or equivalent)
● Skipping BSON decoding on server is hard
○ It’s built-in into the mongodb driver
MongoDB WebService
Client
BSON
BSON
Driver
BSON
10. Custom decoding
● bson.decode_all is the method in charge
of decoding BSON objects.
● We need a decoder that partially decodes
the query but lets the actual documents
encoded.
● Full BSON spec available on bsonspec.org
11. Custom bson.decode_all
$ python test.py
{u'text': u'My first blog post!',
u'_id': ObjectId('5267f71a0e9ce56fe55bdc4b'),
u'author': u'Mike'}
$ python test.py
'Ex00x00x00x07_idx00Rgxf7x1ax0ex9cxe5oxe5
[xdcKx02textx00x14x00x00x00My first blog post!
x00x02authorx00x05x00x00x00Mikex00x00'
13. Custom bson.decode_all
obj_size = struct.unpack("<i", data[position:position + 4])[0]
elements = data[position + 4:position + obj_size - 1]
position += obj_size
docs.append(_elements_to_dict(elements, as_class, ...))
obj_size = struct.unpack("<i", data[position:position + 4])[0]
elements = data[position:position + obj_size]
position += obj_size
docs.append(elements)
14. Enforcing in PyMongo
● Now that we have a custom decoding
function, that leaves the documents
encoded in BSON, we need to enforce it to
PyMongo
● _unpack_response is the method that is in
charge of calling the decode_all function,
we must convince it to call our version
16. Hijacking decoding
● _unpack_response
○ Called by pymongo to unpack responses retrieved
by the server.
○ Some information are given: like the current
cursor id in case of getMore and other parameters
○ We can use provided parameters to suppose if we
are decoding a query response or something else.
17. Custom unpack_response
_real_unpack_response = pymongo.helpers._unpack_response
def custom_unpack_response(response, cursor_id=None, as_class=None,
*args, **kw):
if as_class is None: # Not a query, here lies the real trick
return _real_unpack_response(response, cursor_id, dict, *args, **kw)
response_flag = struct.unpack("<i", response[:4])[0]
if response_flag & 2: # In case it's an error report
return _real_unpack_response(response, cursor_id, as_class, *args, **kw)
result = {}
result["cursor_id"] = struct.unpack("<q", response[4:12])[0]
result["starting_from"] = struct.unpack("<i", response[12:16])[0]
result["number_returned"] = struct.unpack("<i", response[16:20])[0]
result["data"] = custom_decode_all(response[20:])
return result
pymongo.helpers._unpack_response = custom_unpack_response
18. Custom decode_all
def custom_decode_all(data):
docs = []
position = 0
end = len(data) - 1
try:
while position < end:
obj_size = struct.unpack("<i", data[position:position + 4])[0]
elements = data[position:position + obj_size]
position += obj_size
docs.append(elements)
return docs
except Exception:
exc_type, exc_value, exc_tb = sys.exc_info()
raise InvalidBSON, str(exc_value), exc_tb
19. Fetching BSON
● Our PyMongo queries will now return
BSON encoded data we can then push to
the client
● Let’s fetch the data from the client to close
the loop
20. Fetching BSON
function fetch_bson() {
var BSON = bson().BSON;
var oReq = new XMLHttpRequest();
oReq.open("GET", 'http://localhost:8080/results_bson', true);
oReq.responseType = "arraybuffer";
oReq.onload = function(e) {
var data = new Uint8Array(oReq.response);
var offset = 0;
var results = [];
while (offset < data.length)
offset = BSON.deserializeStream(data, offset, 1, results, results.length, {});
show_output(results);
}
oReq.send();
}
22. Performance Gain
● All started to get a performance boost,
how much did it improve?
JSON BSON
1239.72 req/sec 2079.75 req/sec
23. False Benchmark
● Benchmark is actually pointless
○ as usual ;)
● Replacing bson.decode_all which is
written in C with custom_decode_all which
is written in Python
○ The two don’t compare much
● Wanna try with PyPy?