Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Python Performance Profiling: The Guts And The Glory

3,196 views

Published on

Your Python program is too slow, and you need to optimize it. Where do you start? With the right tools, you can optimize your code where it counts. We’ll explore the guts of the Python profiler “Yappi” to understand its features and limitations. We’ll learn how to find the maximum performance wins with minimum effort.

Published in: Software, Technology
  • My struggles with my dissertation were long gone since the day I contacted Emily for my dissertation help. Great assistance by guys from ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ♥♥♥ http://bit.ly/2F4cEJi ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❤❤❤ http://bit.ly/2F4cEJi ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Python Performance Profiling: The Guts And The Glory

  1. 1. Python Profiling: A. Jesse Jiryu Davis 
 @jessejiryudavis 
 MongoDB The Glory & The Guts
  2. 2. “PyMongo is slower! compared to the JavaScript version” MongoDB Node.js driver:!88,000 per second PyMongo: ! ! ! ! ! ! ! ! ! 29,000 per second
  3. 3. “Why Is PyMongo Slower?” From:!steve@mongodb.com! To:!! jesse@mongodb.com! CC:!! eliot@mongodb.com
 Hi Jesse,! ! Why is the Node MongoDB driver 3 times! faster than PyMongo?! 
 http://dzone.com/articles/mongodb-facts-over-80000
  4. 4. The Python Code # Obtain a MongoDB collection.! import pymongo! ! client = pymongo.MongoClient('localhost')! db = client.random! collection = db.randomData! collection.remove()!
  5. 5. n_documents = 80000! batch_size = 5000! batch = []! ! import time! start = time.time() The Python Code
  6. 6. import random! from datetime import datetime! ! min_date = datetime(2012, 1, 1)! max_date = datetime(2013, 1, 1)! delta = (max_date - min_date).total_seconds()! The Python Code
  7. 7. What?! The Python Code for i in range(n_documents):! date = datetime.fromtimestamp(! time.mktime(min_date.timetuple())! + int(round(random.random() * delta)))! ! value = random.random()! document = {! 'created_on': date,! 'value': value}! ! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!
  8. 8. duration = time.time() - start! ! print 'inserted %d documents per second' % (! n_documents / duration)! The Python Code inserted 30,000 documents per second
  9. 9. The Node.js Code (not shown)
  10. 10. The Question Why is the Python script 3 times slower than the equivalent Node script?
  11. 11. Why Profile? • Optimization is like debugging • Hypothesis:
 “The following change will yield a worthwhile improvement.” • Experiment • Repeat until fast enough
  12. 12. Why Profile? Profiling is a way to
 generate hypotheses.
  13. 13. Which Profiler? • cProfile • GreenletProfiler • Yappi
  14. 14. Yappi By Sümer Cip
  15. 15. Yappi Compared to cProfile, it is: ! • As fast • Also measures functions • Can measure CPU time, not just wall
 • Can measure all threads • Can export to callgrind
  16. 16. Yappi import yappi! ! yappi.set_clock_type('cpu')! yappi.start(builtins=True)! ! start = time.time()! ! for i in range(n_documents):! # ... same code ... ! ! duration = time.time() - start! stats = yappi.get_func_stats()! stats.save('callgrind.out', type='callgrind')! Same code
 as before
  17. 17. KCacheGrind
  18. 18. for index in range(n_documents):! date = datetime.fromtimestamp(! time.mktime(min_date.timetuple())! + int(round(random.random() * delta)))! ! value = random.random()! document = {! 'created_on': date,! 'value': value}! ! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []! The Python Code one third
 of the time
  19. 19. for index in range(n_documents):! date = datetime.now()! ! ! ! value = random.random()! document = {! 'created_on': date,! 'value': value}! ! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []! The Python Code
  20. 20. The Python Code • Before: 30,000 inserts per second • After: 50,000 inserts per second
  21. 21. Why Profile? • Generate hypotheses
 • Estimate possible improvement
  22. 22. How Does
 Profiling Work? int callback(PyFrameObject *frame,! int what,! PyObject *arg);! int start(void)! {! PyEval_SetProfile(callback);! }!
  23. 23. PyObject *! PyEval_EvalFrameEx(PyFrameObject *frame)! {! if (tstate->c_profilefunc != NULL) {! tstate->c_profilefunc(frame,! PyTrace_CALL,! Py_None);! }! ! /* ... execute bytecode in the frame! * until return or exception... */! ! if (tstate->c_profilefunc != NULL) {! tstate->c_profilefunc(frame,! PyTrace_RETURN,! retval);! }! }!
  24. 24. int callback(PyFrameObject *frame,! int what,! PyObject *arg)! {! switch (what) {! case PyTrace_CALL:! {! PyCodeObject *cobj = frame->f_code;! PyObject *filename = cobj->co_filename;! PyObject *funcname = cobj->co_name;! ! /* ... record the function call ... */! }! break;! ! /* ... other cases ... */! ! }! }!
  25. 25. A. Jesse Jiryu Davis 
 @jessejiryudavis 
 MongoDB

×