The talk will cover how to use BSON directly as an exchange protocol to gain speed and advanced types.
BSON is the underlying serialization protocol used by MongoDB to store and represent data.
Whenever we retrieve data from MongoDB we get it as BSON, then our drivers decode it just so that our web service can encode it back in JSON.
We will see how to take advantage of BSON for fun and speed skipping this double step by directly fetching BSON and decoding it at client side.
08448380779 Call Girls In Civil Lines Women Seeking Men
MongoTorino 2013 - BSON Mad Science for fun and profit
1. BSON MAD SCIENCE
FOR FUN AND PROFIT
Alessandro Molina
@__amol__
alessandro.molina@axant.it
2. Who am I
● CTO @ Axant.it, mostly Python company,
with some iOS and Android development.
● Mostly relying on MySQL, MongoDB, Redis
(and sqlite!) for day by day data storage
● TurboGears web framework team member
● Contributions to Ming MongoDB ODM
3. The Reason
● EuroPython 2013
○ JSON WebServices with Python best practices
talk
● Question raised
○ “We have a service where our bottleneck is
actually the JSON encoding itself, what can we
do?”
4. First obvious answer
● Avoid encoding whole data in memory
○ iterencode yields one object at time instead of
encoding everything at once.
● Use a faster encoder!
○ There are projects with custom encoders like
GPSD that are very fast and very memory
conservative.
5. Mad answer
● If the JSON encoder is too slow for you
○ Remove JSON encoding
● Looking for the fastest encoding?
○ Don’t encode data at all!
7. Using BSON!
● Can we totally skip “BSON decoding” and
“JSON encoding” dance and directly use
BSON?
“BSON [bee · sahn], short for Bin-ary JSON, is a binary-encoded seri-alization
of JSON-like documents. Like JSON, BSON supports the embedding of
documents and arrays within other documents and arrays. BSON also contains
extensions that allow representation of data types that are not part of the JSON
spec. For example, BSON has a Date type and a BinData type.”
10. Custom decoding
● bson.decode_all is the method in charge
of decoding BSON objects.
● We need a decoder that partially decodes
the query but lets the actual documents
encoded.
● Full BSON spec available on bsonspec.org
11. Custom bson.decode_all
$ python test.py
{u'text': u'My first blog post!',
u'_id': ObjectId('5267f71a0e9ce56fe55bdc4b'),
u'author': u'Mike'}
$ python test.py
'Ex00x00x00x07_idx00Rgxf7x1ax0ex9cxe5oxe5
[xdcKx02textx00x14x00x00x00My first blog post!
x00x02authorx00x05x00x00x00Mikex00x00'
13. Custom bson.decode_all
obj_size = struct.unpack("<i", data[position:position + 4])[0]
elements = data[position + 4:position + obj_size - 1]
position += obj_size
docs.append(_elements_to_dict(elements, as_class, ...))
obj_size = struct.unpack("<i", data[position:position + 4])[0]
elements = data[position:position + obj_size]
position += obj_size
docs.append(elements)
14. Enforcing in PyMongo
● Now that we have a custom decoding
function, that leaves the documents
encoded in BSON, we need to enforce it to
PyMongo
● _unpack_response is the method that is in
charge of calling the decode_all function,
we must convince it to call our version
16. Hijacking decoding
● _unpack_response
○ Called by pymongo to unpack responses retrieved
by the server.
○ Some informations are given: like the current
cursor id in case of getMore and other parameters
○ We can use provided parameters to suppose if we
are decoding a query response or something else.
17. Custom unpack_response
_real_unpack_response = pymongo.helpers._unpack_response
def custom_unpack_response(response, cursor_id=None, as_class=None,
*args, **kw):
if as_class is None: # Not a query, here lies the real trick
return _real_unpack_response(response, cursor_id, dict, *args, **kw)
response_flag = struct.unpack("<i", response[:4])[0]
if response_flag & 2: # In case it's an error report
return _real_unpack_response(response, cursor_id, as_class, *args, **kw)
result = {}
result["cursor_id"] = struct.unpack("<q", response[4:12])[0]
result["starting_from"] = struct.unpack("<i", response[12:16])[0]
result["number_returned"] = struct.unpack("<i", response[16:20])[0]
result["data"] = custom_decode_all(response[20:])
return result
pymongo.helpers._unpack_response = custom_unpack_response
18. Fetching BSON
● Our PyMongo queries will now return
BSON encoded data we can then push to
the client
● Let’s fetch the data from the client to close
the loop
19. Fetching BSON
function fetch_bson() {
var BSON = bson().BSON;
var oReq = new XMLHttpRequest();
oReq.open("GET", 'http://localhost:8080/results_bson', true);
oReq.responseType = "arraybuffer";
oReq.onload = function(e) {
var data = new Uint8Array(oReq.response);
var offset = 0;
var results = [];
while (offset < data.length)
offset = BSON.deserializeStream(data, offset, 1, results, results.length, {});
show_output(results);
}
oReq.send();
}
21. Performance Gain
● All started to get a performance boost,
how much did it improve?
JSON
BSON
1239.72 req/sec
2079.75 req/sec
22. False Benchmark
● Benchmark is actually pointless
○ as usual ;)
● Replacing bson.decode_all which is
written in C with custom_decode_all which
is written in Python
○ The two don’t compare much
● Wanna try with PyPy?