Python 中 += 與 join比較
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Python 中 += 與 join比較

  • 706 views
Uploaded on

Python 中 += 與 join比較

Python 中 += 與 join比較

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
706
On Slideshare
547
From Embeds
159
Number of Embeds
3

Actions

Shares
Downloads
9
Comments
0
Likes
1

Embeds 159

http://www.blackwhite.tw 125
http://127.0.0.1 31
http://localhost 3

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Python 中 += 與 join比較 高國棟
  • 2. 演講經歷 ● 2013/04 在 taipei.py 演講關於 pdb 的實作。相關投影片: http://www.slideshare.net/ya790026/recoverpdb ● 2013/05 在 pyconf.tw 演將 CPython 原始碼解析。相關投 影片:http://www.slideshare.net/ya790026/c-python23247730。 ● 2013/08 在taipei.py 演講 python 如何執行程式碼。相關 投影片:http://www.slideshare.net/ya790026/python27854881
  • 3. 實驗與觀察 ● https://gist.github.com/ya790206/7496787 在 windows, linux, mac 下,呈現的結果有時 join 快,有時 += 快, why?
  • 4. pep8 For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.
  • 5. What is fragile ?
  • 6. += ● += 的 opcode 是 INPLACE_ADD ● + 的 opcode 是 BINARY_ADD ● 執行字串加法是靠 x = string_concatenate(v, w, f, next_instr); ● Python/ceval.c
  • 7. string_concatenate if (v->ob_refcnt == 1 && !PyString_CHECK_INTERNED(v)) { if (_PyString_Resize(&v, new_len) != 0) { return NULL; } memcpy(PyString_AS_STRING(v) + v_len, PyString_AS_STRING(w), w_len); return v; } else { PyString_Concat(&v, w); return v; } https://github.com/ya790206/CPython/blob/master/Python/ceval.c#L4836
  • 8. PyString_Concat void PyString_Concat(register PyObject **pv, register PyObject *w) { register PyObject *v; if (*pv == NULL) return; if (w == NULL || !PyString_Check(*pv)) { Py_CLEAR(*pv); return; } v = string_concat((PyStringObject *) *pv, w); Py_DECREF(*pv); *pv = v; } https://github.com/ya790206/CPython/blob/master/Objects/stringobject.c#L3856
  • 9. string_concat op = (PyStringObject *)PyObject_MALLOC(PyStringObject_SIZE + size); if (op == NULL) return PyErr_NoMemory(); PyObject_INIT_VAR(op, &PyString_Type, size); op->ob_shash = -1; op->ob_sstate = SSTATE_NOT_INTERNED; Py_MEMCPY(op->ob_sval, a->ob_sval, Py_SIZE(a)); Py_MEMCPY(op->ob_sval + Py_SIZE(a), b->ob_sval, Py_SIZE(b)); op->ob_sval[size] = '0'; return (PyObject *) op; https://github.com/ya790206/CPython/blob/master/Objects/stringobject.c#L1014
  • 10. _PyString_Resize ● defined in Objects/stringobject.c ● it called PyObject_REALLOC ● _PyString_Resize -> PyObject_REALLOC (Include/objimp.h) -> PyObject_Realloc (Objects/obmalloc.c)
  • 11. PyObject_Realloc return realloc(p, nbytes); https://github.com/ya790206/CPython/blob/master/Objects/obmalloc.c#L1176
  • 12. uClibc 的 realloc if (new_size > size) /* Grow the block. */ { size_t extra = new_size - size; __heap_lock (&__malloc_heap_lock); extra = __heap_alloc_at (&__malloc_heap, base_mem + size, extra); __heap_unlock (&__malloc_heap_lock); if (extra) /* Record the changed size. */ MALLOC_SET_SIZE (base_mem, size + extra); else /* Our attempts to extend MEM in place failed, just allocate-and-copy. */ { void *new_mem = malloc (new_size - MALLOC_HEADER_SIZE); if (new_mem) { memcpy (new_mem, mem, size - MALLOC_HEADER_SIZE); free (mem); } mem = new_mem; } } https://github.com/ya790206/ext_c_lib/blob/master/uClibc-0.9.33/libc/stdlib/malloc/realloc.c#L24
  • 13. glibc 的 realloc if (chunk_is_mmapped(oldp)) { void* newmem; #if HAVE_MREMAP newp = mremap_chunk(oldp, nb); if(newp) return chunk2mem(newp); #endif /* Note the extra SIZE_SZ overhead. */ if(oldsize - SIZE_SZ >= nb) return oldmem; /* do nothing */ /* Must alloc, copy, free. */ newmem = __libc_malloc(bytes); if (newmem == 0) return 0; /* propagate failure */ MALLOC_COPY(newmem, oldmem, oldsize - 2*SIZE_SZ); munmap_chunk(oldp); return newmem; } https://github.com/ya790206/ext_c_lib/blob/master/glibc-2.18/malloc/malloc.c#L2908
  • 14. glibc 的 realloc /* Note the extra SIZE_SZ overhead. */ if(oldsize - SIZE_SZ >= nb) newmem = oldmem; /* do nothing */ else { /* Must alloc, copy, free. */ if (top_check() >= 0) newmem = _int_malloc(&main_arena, bytes+1); if (newmem) { MALLOC_COPY(newmem, oldmem, oldsize - 2*SIZE_SZ); munmap_chunk(oldp); } } https://github.com/ya790206/ext_c_lib/blob/master/glibc-2.18/malloc/hooks.c#L290
  • 15. 複雜度分析 問: n 個長度為 m 的字串相加,在最糟糕的情形 下,其複雜度? 答: ● 使用 join ,複雜度為 O(nm) ● 使用 +=,其複雜度為 f(n) = f(n-1) + nm = O(n 2m)
  • 16. 複雜度分析 問: n 個長度為 m 的字串相加,在最佳的情形下, 其複雜度? 答: ● 使用 join ,複雜度為 O(nm) ● 使用 +=,其複雜度為 O(nm)
  • 17. 一開始,join是在慢什麼? 1. 執行字串的 INPLACE_ADD 比 CALL_FUNCTION(append) 快多了。 2. python list 的實作與 c++ vector相似。當 list 空間不夠時,會將 list 搬到可以滿足新大小的 地方
  • 18. list resize ● list_resize(Object/listobject.c) -> PyMem_RESIZE(Include/pymem.h)-> PyMem_REALLOC(Include/pymem.h)-> PyMem_REALLOC(Include/pymem.h)-> realloc
  • 19. 為什麼 join 可以後來居上? ● list 要空間是越要越大。4, 8, 16, 25, 35, 46, 58, 72, 88, … ● list 只存指標,因此每次搬家只需複製 (現在 list 大小 * 指標大小) 個bytes
  • 20. 結論 ● 如果 realloc 不是回傳新的記憶體位址,則 += 會很有效率,因為減少一次 memcpy,而且呼 叫字串加法比呼叫 list 的 append 快。 ● 在程序的記憶體破碎 (memory fragment) 情 形還沒很嚴重時,realloc比較不容易回傳新的 位址。
  • 21. 結論 ● 使用 join 會比 += 好,因為 join 的效能是可以 預期的(使用 +=,你的程序可能執行越久,效 能越差)。 ● 如果你想要使用 += 的最佳化,則可以考慮在 新建的程序執行。新建的程序不會有記憶體破 碎問題,但是新建程序會有額外成本。 ● += 和 + 的效能一樣。
  • 22. 工商時間 ● PyConf 場務徵人
  • 23. Question
  • 24. Thank you