BSON MAD SCIENCE
FOR FUN AND PROFIT
Alessandro Molina
@__amol__
alessandro.molina@axant.it
Who am I
● CTO @ Axant.it, mostly Python company,
with some iOS and Android development.
● Mostly relying on MySQL, MongoD...
The Reason
● EuroPython 2013
○ JSON WebServices with Python best practices
talk

● Question raised
○ “We have a service wh...
First obvious answer
● Avoid encoding whole data in memory
○ iterencode yields one object at time instead of
encoding ever...
Mad answer
● If the JSON encoder is too slow for you
○ Remove JSON encoding

● Looking for the fastest encoding?
○ Don’t e...
MongoDB flow
JSON
BSON
MongoDB

NATIVE
Driver

Client

WebService

● BSON is the serialization format used by
mongodb to t...
Using BSON!
● Can we totally skip “BSON decoding” and
“JSON encoding” dance and directly use
BSON?
“BSON [bee · sahn], sho...
Target Flow
BSON
BSON
MongoDB

BSON
Driver

Client

WebService

● BSON decoding on the client can happen
using the js-bson...
The Python Driver

MongoDB

_element_to_dict

Cursor

_elements_to_dict

_unpack_response

bson.decode_all
Custom decoding
● bson.decode_all is the method in charge
of decoding BSON objects.
● We need a decoder that partially dec...
Custom bson.decode_all
$ python test.py
{u'text': u'My first blog post!',
u'_id': ObjectId('5267f71a0e9ce56fe55bdc4b'),
u'...
BSON format

SIZE

ONE OR MORE KEY-VALUE ENTRIES

TYPE

KEY NAME

0

VALUE

0
Custom bson.decode_all
obj_size = struct.unpack("<i", data[position:position + 4])[0]
elements = data[position + 4:positio...
Enforcing in PyMongo
● Now that we have a custom decoding
function, that leaves the documents
encoded in BSON, we need to ...
MonkeyPatching

and this is the reason why it’s mad science and you should avoid doing it!
Hijacking decoding
● _unpack_response
○ Called by pymongo to unpack responses retrieved
by the server.
○ Some informations...
Custom unpack_response
_real_unpack_response = pymongo.helpers._unpack_response
def custom_unpack_response(response, curso...
Fetching BSON
● Our PyMongo queries will now return
BSON encoded data we can then push to
the client
● Let’s fetch the dat...
Fetching BSON
function fetch_bson() {
var BSON = bson().BSON;
var oReq = new XMLHttpRequest();
oReq.open("GET", 'http://lo...
See it in action
Performance Gain
● All started to get a performance boost,
how much did it improve?

JSON

BSON

1239.72 req/sec

2079.75 ...
False Benchmark
● Benchmark is actually pointless
○ as usual ;)

● Replacing bson.decode_all which is
written in C with cu...
Questions?
Upcoming SlideShare
Loading in …5
×

MongoTorino 2013 - BSON Mad Science for fun and profit

6,178
-1

Published on

The talk will cover how to use BSON directly as an exchange protocol to gain speed and advanced types.

BSON is the underlying serialization protocol used by MongoDB to store and represent data.
Whenever we retrieve data from MongoDB we get it as BSON, then our drivers decode it just so that our web service can encode it back in JSON.

We will see how to take advantage of BSON for fun and speed skipping this double step by directly fetching BSON and decoding it at client side.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,178
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

MongoTorino 2013 - BSON Mad Science for fun and profit

  1. 1. BSON MAD SCIENCE FOR FUN AND PROFIT Alessandro Molina @__amol__ alessandro.molina@axant.it
  2. 2. Who am I ● CTO @ Axant.it, mostly Python company, with some iOS and Android development. ● Mostly relying on MySQL, MongoDB, Redis (and sqlite!) for day by day data storage ● TurboGears web framework team member ● Contributions to Ming MongoDB ODM
  3. 3. The Reason ● EuroPython 2013 ○ JSON WebServices with Python best practices talk ● Question raised ○ “We have a service where our bottleneck is actually the JSON encoding itself, what can we do?”
  4. 4. First obvious answer ● Avoid encoding whole data in memory ○ iterencode yields one object at time instead of encoding everything at once. ● Use a faster encoder! ○ There are projects with custom encoders like GPSD that are very fast and very memory conservative.
  5. 5. Mad answer ● If the JSON encoder is too slow for you ○ Remove JSON encoding ● Looking for the fastest encoding? ○ Don’t encode data at all!
  6. 6. MongoDB flow JSON BSON MongoDB NATIVE Driver Client WebService ● BSON is the serialization format used by mongodb to talk with its clients ● Involves decoding BSON and then reencoding JSON
  7. 7. Using BSON! ● Can we totally skip “BSON decoding” and “JSON encoding” dance and directly use BSON? “BSON [bee · sahn], short for Bin-ary JSON, is a binary-encoded seri-alization of JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type.”
  8. 8. Target Flow BSON BSON MongoDB BSON Driver Client WebService ● BSON decoding on the client can happen using the js-bson library (or equivalent) ● Skipping BSON decoding on server is hard ○ It’s built-in into the mongodb driver
  9. 9. The Python Driver MongoDB _element_to_dict Cursor _elements_to_dict _unpack_response bson.decode_all
  10. 10. Custom decoding ● bson.decode_all is the method in charge of decoding BSON objects. ● We need a decoder that partially decodes the query but lets the actual documents encoded. ● Full BSON spec available on bsonspec.org
  11. 11. Custom bson.decode_all $ python test.py {u'text': u'My first blog post!', u'_id': ObjectId('5267f71a0e9ce56fe55bdc4b'), u'author': u'Mike'} $ python test.py 'Ex00x00x00x07_idx00Rgxf7x1ax0ex9cxe5oxe5 [xdcKx02textx00x14x00x00x00My first blog post! x00x02authorx00x05x00x00x00Mikex00x00'
  12. 12. BSON format SIZE ONE OR MORE KEY-VALUE ENTRIES TYPE KEY NAME 0 VALUE 0
  13. 13. Custom bson.decode_all obj_size = struct.unpack("<i", data[position:position + 4])[0] elements = data[position + 4:position + obj_size - 1] position += obj_size docs.append(_elements_to_dict(elements, as_class, ...)) obj_size = struct.unpack("<i", data[position:position + 4])[0] elements = data[position:position + obj_size] position += obj_size docs.append(elements)
  14. 14. Enforcing in PyMongo ● Now that we have a custom decoding function, that leaves the documents encoded in BSON, we need to enforce it to PyMongo ● _unpack_response is the method that is in charge of calling the decode_all function, we must convince it to call our version
  15. 15. MonkeyPatching and this is the reason why it’s mad science and you should avoid doing it!
  16. 16. Hijacking decoding ● _unpack_response ○ Called by pymongo to unpack responses retrieved by the server. ○ Some informations are given: like the current cursor id in case of getMore and other parameters ○ We can use provided parameters to suppose if we are decoding a query response or something else.
  17. 17. Custom unpack_response _real_unpack_response = pymongo.helpers._unpack_response def custom_unpack_response(response, cursor_id=None, as_class=None, *args, **kw): if as_class is None: # Not a query, here lies the real trick return _real_unpack_response(response, cursor_id, dict, *args, **kw) response_flag = struct.unpack("<i", response[:4])[0] if response_flag & 2: # In case it's an error report return _real_unpack_response(response, cursor_id, as_class, *args, **kw) result = {} result["cursor_id"] = struct.unpack("<q", response[4:12])[0] result["starting_from"] = struct.unpack("<i", response[12:16])[0] result["number_returned"] = struct.unpack("<i", response[16:20])[0] result["data"] = custom_decode_all(response[20:]) return result pymongo.helpers._unpack_response = custom_unpack_response
  18. 18. Fetching BSON ● Our PyMongo queries will now return BSON encoded data we can then push to the client ● Let’s fetch the data from the client to close the loop
  19. 19. Fetching BSON function fetch_bson() { var BSON = bson().BSON; var oReq = new XMLHttpRequest(); oReq.open("GET", 'http://localhost:8080/results_bson', true); oReq.responseType = "arraybuffer"; oReq.onload = function(e) { var data = new Uint8Array(oReq.response); var offset = 0; var results = []; while (offset < data.length) offset = BSON.deserializeStream(data, offset, 1, results, results.length, {}); show_output(results); } oReq.send(); }
  20. 20. See it in action
  21. 21. Performance Gain ● All started to get a performance boost, how much did it improve? JSON BSON 1239.72 req/sec 2079.75 req/sec
  22. 22. False Benchmark ● Benchmark is actually pointless ○ as usual ;) ● Replacing bson.decode_all which is written in C with custom_decode_all which is written in Python ○ The two don’t compare much ● Wanna try with PyPy?
  23. 23. Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×