C python 原始碼解析 投影片

1,543 views
1,415 views

Published on

這是談論有關 CPython 運作時,背後的資料結構如何實做出來的。

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,543
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
19
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

C python 原始碼解析 投影片

  1. 1. CPython 原始碼解析果凍http://goo.gl/3mq3Y
  2. 2. 簡介● 中興大學資工系● 任職於曼克斯● 接觸 python 時間有七年● 喜歡學習新的程式語言● C、C++、java、golang。● About me: http://about.me/ya790206
  3. 3. 大綱1. 介紹 C 語言如何模擬繼承2. 介紹 python 的根本物件 PyObject3. 介紹 PyType4. 介紹 PyIntObject5. 介紹 PyStringObject6. 介紹 PyListhttps://github.com/ya790206/CPython
  4. 4. ptr->data物件起始位址 data屬性離物件起始位址置的差data屬性所在位址
  5. 5. #include <stdio.h>#include <stdint.h>struct classA{int32_t data;};struct classB{int8_t data[4];};struct classC{int32_t data;int32_t data1;};
  6. 6. int main(){struct classA obj;struct classA *pa = &obj;struct classB *pb = (struct classB*)&obj;struct classC *pc = (struct classC*)&obj;obj.data = 0;printf("%d, %d, %d, %d, %d, %dn", pa->data, pb->data[0], pb->data[1], pb->data[2], pb->data[3], pc->data);obj.data = 1;printf("%d, %d, %d, %d, %d, %dn", pa->data, pb->data[0], pb->data[1], pb->data[2], pb->data[3], pc->data);obj.data = 1 << 8;printf("%d, %d, %d, %d, %d, %dn", pa->data, pb->data[0], pb->data[1], pb->data[2], pb->data[3], pc->data);printf("%p %pn",&(pb->data[1]) - &(((struct classB*)0)->data[1]), pb);return 0;}
  7. 7. 0, 0, 0, 0, 0, 01, 1, 0, 0, 0, 1256, 0, 1, 0, 0, 2560x7fff613428a0 0x7fff613428a0
  8. 8. Little-EndianclassAclassBclassCdatadatadata[0]data[1]data[2]data[3]data data1obj32
  9. 9. #include <stdio.h>typedef void (*myfunc)();#define Father_HEADER myfunc init;struct father{Father_HEADER};struct child1{Father_HEADERmyfunc custom1;};struct child2{Father_HEADERmyfunc custom2;};C 語言模仿繼承方法
  10. 10. void father_init(){printf("call father initn");}void child1_init(){printf("call child1 initn");}void child2_init(){printf("call child2 initn");}void call_init(struct father *obj){obj->init();}
  11. 11. int main(){struct father f_obj = {father_init};struct child1 c1_obj = {child1_init, 0};struct child2 c2_obj = {child2_init, 0};call_init(&f_obj);call_init((struct father*) &c1_obj);call_init((struct father*) &c2_obj);return 0;}
  12. 12. 結果call father initcall child1 initcall child2 init
  13. 13. object.hPython 物件根本:1. PyObjecta. 如 intb. PyObject_HEAD2. PyVarObjecta. 多了 ob_size 欄位b. 如 string, listc. PyObject_VAR_HEADref: http://docs.python.org/2/c-api/structures.html
  14. 14. #define PyObject_HEAD _PyObject_HEAD_EXTRA Py_ssize_t ob_refcnt; struct _typeobject *ob_type;#define PyObject_VAR_HEAD PyObject_HEAD Py_ssize_t ob_size; /* Number of items in variable part */
  15. 15. typedef struct _object {PyObject_HEAD} PyObject;typedef struct {PyObject_VAR_HEAD} PyVarObject;
  16. 16. ob_refcnt:1. Reference Counting#define Py_INCREF(op) ( _Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA((PyObject*)(op))->ob_refcnt++)ob_type:a. 該物件的 type,該物件能作的動作b. PyType_Type 物件的此欄位指向自己c. 其他屬於 PyTypeObject 的物件此欄位指向PyType_Type 物件d. 其他物件則指向他所屬的 PyTypeObject 物件
  17. 17. PyIntObjectob_typePyType_Typeob_typePyInt_Typeob_typeob_type
  18. 18. PyObject1. PyClassObject2. PyInstanceObject3. PyMethodObject4. PyCodeObject5. Py_complex6. PyDictObject7. PyFileObject8. PyFunctionObject9. PyIntObject10. PySetObject等等
  19. 19. PyVarObject1. PyByteArrayObject2. PyFrameObject3. PyListObject4. PyStringObject5. PyTupleObject
  20. 20. PyTypeObject1. 存放該物件可以被執行的方法2. 如 PyInt_Type 存放 Int 型別所支援的方法,他支援 tp_str,所以我們可以使用 str(5)。當我們呼叫 str(5),他會呼叫相對應的 c function,int_to_decimal_string。3. 因為 tp_call 的值是 0,因此 int 型別不能被呼叫。tp_call 對應到 python 的 __call__ 方法。
  21. 21. PyTypeObject#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)PyObject *v;Py_TYPE(v)->tp_free((PyObject *)v);
  22. 22. PyIntObjecttypedef struct {PyObject_HEADlong ob_ival;} PyIntObject;
  23. 23. PyIntObject1. PyInt_FromLong(long ival) 建立整數的函數2. 預設 CPython 實作,-5 ~ 256 的整數物件都是 singletons。3. 使用 free_list 來減少沒必的 memoryallocate/deallocate。4. 每次向python memory system系統要求可容納 N_INTOBJECTS 個整數的空間。5. 做連續 n 次加法時,會產生 n - 1 個暫時物件。因 int_add 的回傳值是 PyObject。
  24. 24. ob_refcnt ob_typeob_refcnt ob_typeob_refcnt ob_typeob_refcnt ob_type_intblock *nextPyIntObjectobjects[N_INTOBJECTS];PyIntBlockfill_free_list 的原理free_list
  25. 25. 1 2 3 4 5 6 7 8free list: 6 -> 7 -> 8如果在第三個位置的物件被刪除後1 2 3 4 5 6 7 8free list: 3 -> 6 -> 7 -> 8
  26. 26. PyStringtypedef struct {PyObject_VAR_HEADlong ob_shash;int ob_sstate;char ob_sval[1];} PyStringObject;
  27. 27. HEADERob_shashob_sstateob_sval[1]空字串0abc a b c 01. PyStringObject_SIZE + size2. ob_sval[1], ob_sval[2], ob_sval[3],反正 C不會檢查索引有沒有超過陣列大小
  28. 28. PyString1. 對於特定字串,使用 intern 機制,增加物件重複使用率。但是並無增加太多效率。2. 在CPython中,一個 byte的字串和空字串是singletons 物件。
  29. 29. PyObject *PyString_InternFromString(const char *cp){PyObject *s = PyString_FromString(cp);if (s == NULL)return NULL;PyString_InternInPlace(&s);return s;}
  30. 30. PyString_InternInPlace(PyObject **p)1. 檢查是否是字串,(不是字串)或(是NULL),則離開。2. 如果是字串子類別,則離開。3. 如果已經是 intern 字串,則離開4. 如果 intern dict 中有相同字串,則將原本的字串參考計數減1,傳回 intern 字串(傳位址)5. 如果字串不在 intern dict 裡,則把自己插入到intern dict 裡。6. 把自己的參考計數減27. 把字串狀態設定成SSTATE_INTERNED_MORTAL
  31. 31. why Py_REFCNT(s) -= 2?1. 因為當 intern dict 的 key ,被加一2. 又當 intern dict 的 value ,再被加一3. 這兩個參考計數只被 intern dict使用,如果不減2,則永遠不會被消滅。(至少有 intern dict的 key/value 指向他)
  32. 32. 哪些會呼叫PyString_InternInPlace1. 字串長度小於等於12. 呼叫 PyString_InternFromString3. 使用者呼叫 intern(對應 C 的 builtin_intern)
  33. 33. string_concatop = (PyStringObject *)PyObject_MALLOC(PyStringObject_SIZE + size);PyObject_INIT_VAR(op, &PyString_Type, size);op->ob_shash = -1;op->ob_sstate = SSTATE_NOT_INTERNED;Py_MEMCPY(op->ob_sval, a->ob_sval, Py_SIZE(a));Py_MEMCPY(op->ob_sval + Py_SIZE(a), b->ob_sval, Py_SIZE(b));op->ob_sval[size] = 0;return (PyObject *) op;1. 每次完成字串加法的動作後,傳回新的物件2. 每次取得記憶體空間,是使用PyObject_MALLOC
  34. 34. Python memory managementPyIntBlockPyIntObject PyString
  35. 35. string_join/* Allocate result space. */res = PyString_FromStringAndSize((char*)NULL, sz);if (res == NULL) {Py_DECREF(seq);return NULL;}1. 只分配一次記憶體
  36. 36. /* Catenate everything. */p = PyString_AS_STRING(res);for (i = 0; i < seqlen; ++i) {size_t n;item = PySequence_Fast_GET_ITEM(seq, i);n = PyString_GET_SIZE(item);Py_MEMCPY(p, PyString_AS_STRING(item), n);p += n;if (i < seqlen - 1) {Py_MEMCPY(p, sep, seplen);p += seplen;}}Py_DECREF(seq);return res;
  37. 37. typedef struct {PyObject_VAR_HEADPyObject **ob_item;Py_ssize_t allocated;} PyListObject;PyListObject
  38. 38. why ob_item 是二維指標?PyObject *PyObject *PyObject *PyObject *PyObject *PyObjectPyObjectPyObject
  39. 39. PyList_New1. static PyListObject *free_list[PyList_MAXFREELIST];2. nbytes = size * sizeof(PyObject *);3. op = PyObject_GC_New(PyListObject, &PyList_Type);4. op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);1. op 用來維護 list 的相關資訊,如 ob_size, ob_refcnt2. op->ob_item 存放 list 裡元素所在位址。
  40. 40. list_dealloc1. Py_XDECREF(op->ob_item[i]);2. PyMem_FREE(op->ob_item);3. 二選一a. free_list[numfree++] = op;b. Py_TYPE(op)->tp_free((PyObject *)op);
  41. 41. app1(Append)1. list_resize(self, n+1)2. Py_INCREF(v);3. PyList_SET_ITEM(self, n, v);
  42. 42. list_resize1. new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);2. new_allocated += newsize;3. PyMem_RESIZE(items, PyObject *, new_allocated);4. self->ob_item = items;5. Py_SIZE(self) = newsize;6. self->allocated = new_allocated;實際上,PyMem_RESIZE 最後呼叫 realloc。Python 的 list行為與 C++ 的 vector 相似
  43. 43. #define PyList_GET_ITEM(op, i)(((PyListObject *)(op))->ob_item[i])#define PyList_SET_ITEM(op, i, v)(((PyListObject *)(op))->ob_item[i] = (v))#define PyList_GET_SIZE(op) Py_SIZE(op)
  44. 44. ins1if (where > n)where = n;items = self->ob_item;for (i = n; --i >= where; )items[i+1] = items[i];Py_INCREF(v);items[where] = v;return 0;
  45. 45. 參考資料1. python c api2. Extending and Embedding thePython Interpreter3. Python源码剖析4. python source code
  46. 46. Q & A
  47. 47. 謝謝大家The manx:http://www.themanxgroup.tw/The manx production:http://lucky-lane.com/

×