0
Python Profiling:
A. Jesse Jiryu Davis


@jessejiryudavis


MongoDB
The Glory
&
The Guts
“PyMongo is slower!
compared to the JavaScript version”
MongoDB Node.js driver:!88,000 per second
PyMongo: ! ! ! ! ! ! ! !...
“Why Is
PyMongo Slower?”
From:!steve@mongodb.com!
To:!! jesse@mongodb.com!
CC:!! eliot@mongodb.com

Hi Jesse,!
!
Why is th...
The Python Code
# Obtain a MongoDB collection.!
import pymongo!
!
client = pymongo.MongoClient('localhost')!
db = client.r...
n_documents = 80000!
batch_size = 5000!
batch = []!
!
import time!
start = time.time()
The Python Code
import random!
from datetime import datetime!
!
min_date = datetime(2012, 1, 1)!
max_date = datetime(2013, 1, 1)!
delta = ...
What?!
The Python Code
for i in range(n_documents):!
date = datetime.fromtimestamp(!
time.mktime(min_date.timetuple())!
+ ...
duration = time.time() - start!
!
print 'inserted %d documents per second' % (!
n_documents / duration)!
The Python Code
i...
The Node.js Code
(not shown)
The Question
Why is the Python script
3 times slower than the
equivalent Node script?
Why Profile?
• Optimization is like debugging
• Hypothesis:

“The following change will yield a
worthwhile improvement.”
• ...
Why Profile?
Profiling is a way to

generate hypotheses.
Which Profiler?
• cProfile
• GreenletProfiler
• Yappi
Yappi
By Sümer Cip
Yappi
Compared to cProfile, it is:
!
• As fast
• Also measures functions
• Can measure CPU time, not just wall

• Can measu...
Yappi
import yappi!
!
yappi.set_clock_type('cpu')!
yappi.start(builtins=True)!
!
start = time.time()!
!
for i in range(n_d...
KCacheGrind
for index in range(n_documents):!
date = datetime.fromtimestamp(!
time.mktime(min_date.timetuple())!
+ int(round(random.ra...
for index in range(n_documents):!
date = datetime.now()!
!
!
!
value = random.random()!
document = {!
'created_on': date,!...
The Python Code
• Before: 30,000 inserts per second
• After: 50,000 inserts per second
Why Profile?
• Generate hypotheses

• Estimate possible improvement
How Does

Profiling Work?
int callback(PyFrameObject *frame,!
int what,!
PyObject *arg);!
int start(void)!
{!
PyEval_SetPro...
PyObject *!
PyEval_EvalFrameEx(PyFrameObject *frame)!
{!
if (tstate->c_profilefunc != NULL) {!
tstate->c_profilefunc(frame...
int callback(PyFrameObject *frame,!
int what,!
PyObject *arg)!
{!
switch (what) {!
case PyTrace_CALL:!
{!
PyCodeObject *co...
A. Jesse Jiryu Davis


@jessejiryudavis


MongoDB
Python Performance Profiling: The Guts And The Glory
Upcoming SlideShare
Loading in...5
×

Python Performance Profiling: The Guts And The Glory

561

Published on

Your Python program is too slow, and you need to optimize it. Where do you start? With the right tools, you can optimize your code where it counts. We’ll explore the guts of the Python profiler “Yappi” to understand its features and limitations. We’ll learn how to find the maximum performance wins with minimum effort.

Published in: Software, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
561
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Python Performance Profiling: The Guts And The Glory"

  1. 1. Python Profiling: A. Jesse Jiryu Davis 
 @jessejiryudavis 
 MongoDB The Glory & The Guts
  2. 2. “PyMongo is slower! compared to the JavaScript version” MongoDB Node.js driver:!88,000 per second PyMongo: ! ! ! ! ! ! ! ! ! 29,000 per second
  3. 3. “Why Is PyMongo Slower?” From:!steve@mongodb.com! To:!! jesse@mongodb.com! CC:!! eliot@mongodb.com
 Hi Jesse,! ! Why is the Node MongoDB driver 3 times! faster than PyMongo?! 
 http://dzone.com/articles/mongodb-facts-over-80000
  4. 4. The Python Code # Obtain a MongoDB collection.! import pymongo! ! client = pymongo.MongoClient('localhost')! db = client.random! collection = db.randomData! collection.remove()!
  5. 5. n_documents = 80000! batch_size = 5000! batch = []! ! import time! start = time.time() The Python Code
  6. 6. import random! from datetime import datetime! ! min_date = datetime(2012, 1, 1)! max_date = datetime(2013, 1, 1)! delta = (max_date - min_date).total_seconds()! The Python Code
  7. 7. What?! The Python Code for i in range(n_documents):! date = datetime.fromtimestamp(! time.mktime(min_date.timetuple())! + int(round(random.random() * delta)))! ! value = random.random()! document = {! 'created_on': date,! 'value': value}! ! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!
  8. 8. duration = time.time() - start! ! print 'inserted %d documents per second' % (! n_documents / duration)! The Python Code inserted 30,000 documents per second
  9. 9. The Node.js Code (not shown)
  10. 10. The Question Why is the Python script 3 times slower than the equivalent Node script?
  11. 11. Why Profile? • Optimization is like debugging • Hypothesis:
 “The following change will yield a worthwhile improvement.” • Experiment • Repeat until fast enough
  12. 12. Why Profile? Profiling is a way to
 generate hypotheses.
  13. 13. Which Profiler? • cProfile • GreenletProfiler • Yappi
  14. 14. Yappi By Sümer Cip
  15. 15. Yappi Compared to cProfile, it is: ! • As fast • Also measures functions • Can measure CPU time, not just wall
 • Can measure all threads • Can export to callgrind
  16. 16. Yappi import yappi! ! yappi.set_clock_type('cpu')! yappi.start(builtins=True)! ! start = time.time()! ! for i in range(n_documents):! # ... same code ... ! ! duration = time.time() - start! stats = yappi.get_func_stats()! stats.save('callgrind.out', type='callgrind')! Same code
 as before
  17. 17. KCacheGrind
  18. 18. for index in range(n_documents):! date = datetime.fromtimestamp(! time.mktime(min_date.timetuple())! + int(round(random.random() * delta)))! ! value = random.random()! document = {! 'created_on': date,! 'value': value}! ! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []! The Python Code one third
 of the time
  19. 19. for index in range(n_documents):! date = datetime.now()! ! ! ! value = random.random()! document = {! 'created_on': date,! 'value': value}! ! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []! The Python Code
  20. 20. The Python Code • Before: 30,000 inserts per second • After: 50,000 inserts per second
  21. 21. Why Profile? • Generate hypotheses
 • Estimate possible improvement
  22. 22. How Does
 Profiling Work? int callback(PyFrameObject *frame,! int what,! PyObject *arg);! int start(void)! {! PyEval_SetProfile(callback);! }!
  23. 23. PyObject *! PyEval_EvalFrameEx(PyFrameObject *frame)! {! if (tstate->c_profilefunc != NULL) {! tstate->c_profilefunc(frame,! PyTrace_CALL,! Py_None);! }! ! /* ... execute bytecode in the frame! * until return or exception... */! ! if (tstate->c_profilefunc != NULL) {! tstate->c_profilefunc(frame,! PyTrace_RETURN,! retval);! }! }!
  24. 24. int callback(PyFrameObject *frame,! int what,! PyObject *arg)! {! switch (what) {! case PyTrace_CALL:! {! PyCodeObject *cobj = frame->f_code;! PyObject *filename = cobj->co_filename;! PyObject *funcname = cobj->co_name;! ! /* ... record the function call ... */! }! break;! ! /* ... other cases ... */! ! }! }!
  25. 25. A. Jesse Jiryu Davis 
 @jessejiryudavis 
 MongoDB
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×