SlideShare a Scribd company logo
1 of 24
Download to read offline
MESSING UP WITH PYMONGO
FOR FUN AND PROFIT
Alessandro Molina
@__amol__
alessandro.molina@axant.it
Who am I
● CTO @ AXANT.it, mostly Python company,
(with some iOS and Android)
● TurboGears2 dev team member
● Took over Beaker maintenace in 2015
● Mostly contributing to web world python
libraries: Formencode, MING MongoDB ODM, ToscaWidgets
The Reason
● EuroPython 2013
○ JSON WebServices with Python best practices
talk
● Question raised
○ “We have a service where our bottleneck is
actually the JSON encoding itself, what can we
do?”
First obvious answer
● Avoid encoding whole data in memory
○ iterencode yields one object at time instead of
encoding everything at once.
● Use a faster encoder!
○ There are projects with custom encoders like
GPSD that are very fast and very memory
conservative.
Mad answer
● If the JSON encoder is too slow for you
○ Remove JSON encoding
● Looking for the fastest encoding?
○ Don’t encode data at all!
MongoDB flow
● BSON is the serialization format used by
mongodb to talk with its clients
● Involves decoding BSON and then re-
encoding JSON
MongoDB WebService
Client
BSON
JSON
Driver
NATIVE
Using BSON!
● Can we totally skip “BSON decoding” and
“JSON encoding” dance and directly use
BSON?
“BSON [bee · sahn], short for Bin-ary JSON, is a binary-encoded seri-alization
of JSON-like documents. Like JSON, BSON supports the embedding of
documents and arrays within other documents and arrays. BSON also contains
extensions that allow representation of data types that are not part of the JSON
spec. For example, BSON has a Date type and a BinData type.”
Target Flow
● BSON decoding on the client can happen
using the js-bson library (or equivalent)
● Skipping BSON decoding on server is hard
○ It’s built-in into the mongodb driver
MongoDB WebService
Client
BSON
BSON
Driver
BSON
The Python Driver
MongoDB Cursor _unpack_response
bson.decode_all_elements_to_dict_element_to_dict
Custom decoding
● bson.decode_all is the method in charge
of decoding BSON objects.
● We need a decoder that partially decodes
the query but lets the actual documents
encoded.
● Full BSON spec available on bsonspec.org
Custom bson.decode_all
$ python test.py
{u'text': u'My first blog post!',
u'_id': ObjectId('5267f71a0e9ce56fe55bdc4b'),
u'author': u'Mike'}
$ python test.py
'Ex00x00x00x07_idx00Rgxf7x1ax0ex9cxe5oxe5
[xdcKx02textx00x14x00x00x00My first blog post!
x00x02authorx00x05x00x00x00Mikex00x00'
BSON format
SIZE ONE OR MORE KEY-VALUE ENTRIES 0
TYPE KEY NAME 0 VALUE
Custom bson.decode_all
obj_size = struct.unpack("<i", data[position:position + 4])[0]
elements = data[position + 4:position + obj_size - 1]
position += obj_size
docs.append(_elements_to_dict(elements, as_class, ...))
obj_size = struct.unpack("<i", data[position:position + 4])[0]
elements = data[position:position + obj_size]
position += obj_size
docs.append(elements)
Enforcing in PyMongo
● Now that we have a custom decoding
function, that leaves the documents
encoded in BSON, we need to enforce it to
PyMongo
● _unpack_response is the method that is in
charge of calling the decode_all function,
we must convince it to call our version
MonkeyPatching
and this is the reason why it’s mad science and you should avoid doing it!
Hijacking decoding
● _unpack_response
○ Called by pymongo to unpack responses retrieved
by the server.
○ Some information are given: like the current
cursor id in case of getMore and other parameters
○ We can use provided parameters to suppose if we
are decoding a query response or something else.
Custom unpack_response
_real_unpack_response = pymongo.helpers._unpack_response
def custom_unpack_response(response, cursor_id=None, as_class=None,
*args, **kw):
if as_class is None: # Not a query, here lies the real trick
return _real_unpack_response(response, cursor_id, dict, *args, **kw)
response_flag = struct.unpack("<i", response[:4])[0]
if response_flag & 2: # In case it's an error report
return _real_unpack_response(response, cursor_id, as_class, *args, **kw)
result = {}
result["cursor_id"] = struct.unpack("<q", response[4:12])[0]
result["starting_from"] = struct.unpack("<i", response[12:16])[0]
result["number_returned"] = struct.unpack("<i", response[16:20])[0]
result["data"] = custom_decode_all(response[20:])
return result
pymongo.helpers._unpack_response = custom_unpack_response
Custom decode_all
def custom_decode_all(data):
docs = []
position = 0
end = len(data) - 1
try:
while position < end:
obj_size = struct.unpack("<i", data[position:position + 4])[0]
elements = data[position:position + obj_size]
position += obj_size
docs.append(elements)
return docs
except Exception:
exc_type, exc_value, exc_tb = sys.exc_info()
raise InvalidBSON, str(exc_value), exc_tb
Fetching BSON
● Our PyMongo queries will now return
BSON encoded data we can then push to
the client
● Let’s fetch the data from the client to close
the loop
Fetching BSON
function fetch_bson() {
var BSON = bson().BSON;
var oReq = new XMLHttpRequest();
oReq.open("GET", 'http://localhost:8080/results_bson', true);
oReq.responseType = "arraybuffer";
oReq.onload = function(e) {
var data = new Uint8Array(oReq.response);
var offset = 0;
var results = [];
while (offset < data.length)
offset = BSON.deserializeStream(data, offset, 1, results, results.length, {});
show_output(results);
}
oReq.send();
}
See it in action
Performance Gain
● All started to get a performance boost,
how much did it improve?
JSON BSON
1239.72 req/sec 2079.75 req/sec
False Benchmark
● Benchmark is actually pointless
○ as usual ;)
● Replacing bson.decode_all which is
written in C with custom_decode_all which
is written in Python
○ The two don’t compare much
● Wanna try with PyPy?
Questions?

More Related Content

What's hot

Meetup#1: 10 reasons to fall in love with MongoDB
Meetup#1: 10 reasons to fall in love with MongoDBMeetup#1: 10 reasons to fall in love with MongoDB
Meetup#1: 10 reasons to fall in love with MongoDB
Minsk MongoDB User Group
 
Node.js and How JavaScript is Changing Server Programming
Node.js and How JavaScript is Changing Server Programming  Node.js and How JavaScript is Changing Server Programming
Node.js and How JavaScript is Changing Server Programming
Tom Croucher
 
BedCon 2013 - Java Persistenz-Frameworks für MongoDB
BedCon 2013 - Java Persistenz-Frameworks für MongoDBBedCon 2013 - Java Persistenz-Frameworks für MongoDB
BedCon 2013 - Java Persistenz-Frameworks für MongoDB
Tobias Trelle
 

What's hot (20)

Introduction to using MongoDB with Ruby
Introduction to using MongoDB with RubyIntroduction to using MongoDB with Ruby
Introduction to using MongoDB with Ruby
 
Meetup#1: 10 reasons to fall in love with MongoDB
Meetup#1: 10 reasons to fall in love with MongoDBMeetup#1: 10 reasons to fall in love with MongoDB
Meetup#1: 10 reasons to fall in love with MongoDB
 
This upload requires better support for ODP format
This upload requires better support for ODP formatThis upload requires better support for ODP format
This upload requires better support for ODP format
 
Shared memory and multithreading in Node.js - Timur Shemsedinov - JSFest'19
Shared memory and multithreading in Node.js - Timur Shemsedinov - JSFest'19Shared memory and multithreading in Node.js - Timur Shemsedinov - JSFest'19
Shared memory and multithreading in Node.js - Timur Shemsedinov - JSFest'19
 
Stefano Maestri - Blockchain and smart contracts, what they are and why you s...
Stefano Maestri - Blockchain and smart contracts, what they are and why you s...Stefano Maestri - Blockchain and smart contracts, what they are and why you s...
Stefano Maestri - Blockchain and smart contracts, what they are and why you s...
 
Cryptography In Silverlight
Cryptography In SilverlightCryptography In Silverlight
Cryptography In Silverlight
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Back to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQLBack to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQL
 
Back to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsBack to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in Documents
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with Morphia
 
Node.js and How JavaScript is Changing Server Programming
Node.js and How JavaScript is Changing Server Programming  Node.js and How JavaScript is Changing Server Programming
Node.js and How JavaScript is Changing Server Programming
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
 
BedCon 2013 - Java Persistenz-Frameworks für MongoDB
BedCon 2013 - Java Persistenz-Frameworks für MongoDBBedCon 2013 - Java Persistenz-Frameworks für MongoDB
BedCon 2013 - Java Persistenz-Frameworks für MongoDB
 
BreizhCamp 2013 - Pimp my backend
BreizhCamp 2013 - Pimp my backendBreizhCamp 2013 - Pimp my backend
BreizhCamp 2013 - Pimp my backend
 
Ethereum Contracts - Coinfest 2015
Ethereum Contracts - Coinfest 2015Ethereum Contracts - Coinfest 2015
Ethereum Contracts - Coinfest 2015
 
iOS Development with Blocks
iOS Development with BlocksiOS Development with Blocks
iOS Development with Blocks
 
Building Your First App with MongoDB
Building Your First App with MongoDBBuilding Your First App with MongoDB
Building Your First App with MongoDB
 
Java development with MongoDB
Java development with MongoDBJava development with MongoDB
Java development with MongoDB
 
Objective-C Blocks and Grand Central Dispatch
Objective-C Blocks and Grand Central DispatchObjective-C Blocks and Grand Central Dispatch
Objective-C Blocks and Grand Central Dispatch
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Similar to PyConIT6 - Messing up with pymongo for fun and profit

Social Analytics with MongoDB
Social Analytics with MongoDBSocial Analytics with MongoDB
Social Analytics with MongoDB
Patrick Stokes
 
Marc s01 e02-crud-database
Marc s01 e02-crud-databaseMarc s01 e02-crud-database
Marc s01 e02-crud-database
MongoDB
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
MongoDB
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
MongoDB
 

Similar to PyConIT6 - Messing up with pymongo for fun and profit (20)

Back to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBBack to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDB
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDB
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
 
PyConUK2013 - Validated documents on MongoDB with Ming
PyConUK2013 - Validated documents on MongoDB with MingPyConUK2013 - Validated documents on MongoDB with Ming
PyConUK2013 - Validated documents on MongoDB with Ming
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
Social Analytics with MongoDB
Social Analytics with MongoDBSocial Analytics with MongoDB
Social Analytics with MongoDB
 
Using Mongoid with Ruby on Rails
Using Mongoid with Ruby on RailsUsing Mongoid with Ruby on Rails
Using Mongoid with Ruby on Rails
 
Marc s01 e02-crud-database
Marc s01 e02-crud-databaseMarc s01 e02-crud-database
Marc s01 e02-crud-database
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
Spring Data, Jongo & Co.
Spring Data, Jongo & Co.Spring Data, Jongo & Co.
Spring Data, Jongo & Co.
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Webinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverWebinar: What's new in the .NET Driver
Webinar: What's new in the .NET Driver
 
Beyond php it's not (just) about the code
Beyond php   it's not (just) about the codeBeyond php   it's not (just) about the code
Beyond php it's not (just) about the code
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Consequences of using the Copy-Paste method in C++ programming and how to dea...
Consequences of using the Copy-Paste method in C++ programming and how to dea...Consequences of using the Copy-Paste method in C++ programming and how to dea...
Consequences of using the Copy-Paste method in C++ programming and how to dea...
 

More from Alessandro Molina

Reactive & Realtime Web Applications with TurboGears2
Reactive & Realtime Web Applications with TurboGears2Reactive & Realtime Web Applications with TurboGears2
Reactive & Realtime Web Applications with TurboGears2
Alessandro Molina
 
Rapid Prototyping with TurboGears2
Rapid Prototyping with TurboGears2Rapid Prototyping with TurboGears2
Rapid Prototyping with TurboGears2
Alessandro Molina
 
TurboGears2 Pluggable Applications
TurboGears2 Pluggable ApplicationsTurboGears2 Pluggable Applications
TurboGears2 Pluggable Applications
Alessandro Molina
 
From SQLAlchemy to Ming with TurboGears2
From SQLAlchemy to Ming with TurboGears2From SQLAlchemy to Ming with TurboGears2
From SQLAlchemy to Ming with TurboGears2
Alessandro Molina
 

More from Alessandro Molina (13)

PyCon Ireland 2022 - PyArrow full stack.pdf
PyCon Ireland 2022 - PyArrow full stack.pdfPyCon Ireland 2022 - PyArrow full stack.pdf
PyCon Ireland 2022 - PyArrow full stack.pdf
 
PyconIE 2016 - Kajiki, the fast and validated template engine your were looki...
PyconIE 2016 - Kajiki, the fast and validated template engine your were looki...PyconIE 2016 - Kajiki, the fast and validated template engine your were looki...
PyconIE 2016 - Kajiki, the fast and validated template engine your were looki...
 
EP2016 - Moving Away From Nodejs To A Pure Python Solution For Assets
EP2016 - Moving Away From Nodejs To A Pure Python Solution For AssetsEP2016 - Moving Away From Nodejs To A Pure Python Solution For Assets
EP2016 - Moving Away From Nodejs To A Pure Python Solution For Assets
 
EuroPython 2015 - Storing files for the web is not as straightforward as you ...
EuroPython 2015 - Storing files for the web is not as straightforward as you ...EuroPython 2015 - Storing files for the web is not as straightforward as you ...
EuroPython 2015 - Storing files for the web is not as straightforward as you ...
 
PyConFR 2014 - DEPOT, Story of a file.write() gone wrong
PyConFR 2014 - DEPOT, Story of a file.write() gone wrongPyConFR 2014 - DEPOT, Story of a file.write() gone wrong
PyConFR 2014 - DEPOT, Story of a file.write() gone wrong
 
PyConUK 2014 - PostMortem Debugging and Web Development Updated
PyConUK 2014 - PostMortem Debugging and Web Development UpdatedPyConUK 2014 - PostMortem Debugging and Web Development Updated
PyConUK 2014 - PostMortem Debugging and Web Development Updated
 
Reactive & Realtime Web Applications with TurboGears2
Reactive & Realtime Web Applications with TurboGears2Reactive & Realtime Web Applications with TurboGears2
Reactive & Realtime Web Applications with TurboGears2
 
Post-Mortem Debugging and Web Development
Post-Mortem Debugging and Web DevelopmentPost-Mortem Debugging and Web Development
Post-Mortem Debugging and Web Development
 
EuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears TrainingEuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears Training
 
PyGrunn2013 High Performance Web Applications with TurboGears
PyGrunn2013  High Performance Web Applications with TurboGearsPyGrunn2013  High Performance Web Applications with TurboGears
PyGrunn2013 High Performance Web Applications with TurboGears
 
Rapid Prototyping with TurboGears2
Rapid Prototyping with TurboGears2Rapid Prototyping with TurboGears2
Rapid Prototyping with TurboGears2
 
TurboGears2 Pluggable Applications
TurboGears2 Pluggable ApplicationsTurboGears2 Pluggable Applications
TurboGears2 Pluggable Applications
 
From SQLAlchemy to Ming with TurboGears2
From SQLAlchemy to Ming with TurboGears2From SQLAlchemy to Ming with TurboGears2
From SQLAlchemy to Ming with TurboGears2
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

PyConIT6 - Messing up with pymongo for fun and profit

  • 1. MESSING UP WITH PYMONGO FOR FUN AND PROFIT Alessandro Molina @__amol__ alessandro.molina@axant.it
  • 2. Who am I ● CTO @ AXANT.it, mostly Python company, (with some iOS and Android) ● TurboGears2 dev team member ● Took over Beaker maintenace in 2015 ● Mostly contributing to web world python libraries: Formencode, MING MongoDB ODM, ToscaWidgets
  • 3. The Reason ● EuroPython 2013 ○ JSON WebServices with Python best practices talk ● Question raised ○ “We have a service where our bottleneck is actually the JSON encoding itself, what can we do?”
  • 4. First obvious answer ● Avoid encoding whole data in memory ○ iterencode yields one object at time instead of encoding everything at once. ● Use a faster encoder! ○ There are projects with custom encoders like GPSD that are very fast and very memory conservative.
  • 5. Mad answer ● If the JSON encoder is too slow for you ○ Remove JSON encoding ● Looking for the fastest encoding? ○ Don’t encode data at all!
  • 6. MongoDB flow ● BSON is the serialization format used by mongodb to talk with its clients ● Involves decoding BSON and then re- encoding JSON MongoDB WebService Client BSON JSON Driver NATIVE
  • 7. Using BSON! ● Can we totally skip “BSON decoding” and “JSON encoding” dance and directly use BSON? “BSON [bee · sahn], short for Bin-ary JSON, is a binary-encoded seri-alization of JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type.”
  • 8. Target Flow ● BSON decoding on the client can happen using the js-bson library (or equivalent) ● Skipping BSON decoding on server is hard ○ It’s built-in into the mongodb driver MongoDB WebService Client BSON BSON Driver BSON
  • 9. The Python Driver MongoDB Cursor _unpack_response bson.decode_all_elements_to_dict_element_to_dict
  • 10. Custom decoding ● bson.decode_all is the method in charge of decoding BSON objects. ● We need a decoder that partially decodes the query but lets the actual documents encoded. ● Full BSON spec available on bsonspec.org
  • 11. Custom bson.decode_all $ python test.py {u'text': u'My first blog post!', u'_id': ObjectId('5267f71a0e9ce56fe55bdc4b'), u'author': u'Mike'} $ python test.py 'Ex00x00x00x07_idx00Rgxf7x1ax0ex9cxe5oxe5 [xdcKx02textx00x14x00x00x00My first blog post! x00x02authorx00x05x00x00x00Mikex00x00'
  • 12. BSON format SIZE ONE OR MORE KEY-VALUE ENTRIES 0 TYPE KEY NAME 0 VALUE
  • 13. Custom bson.decode_all obj_size = struct.unpack("<i", data[position:position + 4])[0] elements = data[position + 4:position + obj_size - 1] position += obj_size docs.append(_elements_to_dict(elements, as_class, ...)) obj_size = struct.unpack("<i", data[position:position + 4])[0] elements = data[position:position + obj_size] position += obj_size docs.append(elements)
  • 14. Enforcing in PyMongo ● Now that we have a custom decoding function, that leaves the documents encoded in BSON, we need to enforce it to PyMongo ● _unpack_response is the method that is in charge of calling the decode_all function, we must convince it to call our version
  • 15. MonkeyPatching and this is the reason why it’s mad science and you should avoid doing it!
  • 16. Hijacking decoding ● _unpack_response ○ Called by pymongo to unpack responses retrieved by the server. ○ Some information are given: like the current cursor id in case of getMore and other parameters ○ We can use provided parameters to suppose if we are decoding a query response or something else.
  • 17. Custom unpack_response _real_unpack_response = pymongo.helpers._unpack_response def custom_unpack_response(response, cursor_id=None, as_class=None, *args, **kw): if as_class is None: # Not a query, here lies the real trick return _real_unpack_response(response, cursor_id, dict, *args, **kw) response_flag = struct.unpack("<i", response[:4])[0] if response_flag & 2: # In case it's an error report return _real_unpack_response(response, cursor_id, as_class, *args, **kw) result = {} result["cursor_id"] = struct.unpack("<q", response[4:12])[0] result["starting_from"] = struct.unpack("<i", response[12:16])[0] result["number_returned"] = struct.unpack("<i", response[16:20])[0] result["data"] = custom_decode_all(response[20:]) return result pymongo.helpers._unpack_response = custom_unpack_response
  • 18. Custom decode_all def custom_decode_all(data): docs = [] position = 0 end = len(data) - 1 try: while position < end: obj_size = struct.unpack("<i", data[position:position + 4])[0] elements = data[position:position + obj_size] position += obj_size docs.append(elements) return docs except Exception: exc_type, exc_value, exc_tb = sys.exc_info() raise InvalidBSON, str(exc_value), exc_tb
  • 19. Fetching BSON ● Our PyMongo queries will now return BSON encoded data we can then push to the client ● Let’s fetch the data from the client to close the loop
  • 20. Fetching BSON function fetch_bson() { var BSON = bson().BSON; var oReq = new XMLHttpRequest(); oReq.open("GET", 'http://localhost:8080/results_bson', true); oReq.responseType = "arraybuffer"; oReq.onload = function(e) { var data = new Uint8Array(oReq.response); var offset = 0; var results = []; while (offset < data.length) offset = BSON.deserializeStream(data, offset, 1, results, results.length, {}); show_output(results); } oReq.send(); }
  • 21. See it in action
  • 22. Performance Gain ● All started to get a performance boost, how much did it improve? JSON BSON 1239.72 req/sec 2079.75 req/sec
  • 23. False Benchmark ● Benchmark is actually pointless ○ as usual ;) ● Replacing bson.decode_all which is written in C with custom_decode_all which is written in Python ○ The two don’t compare much ● Wanna try with PyPy?