• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Quick tour of PHP from inside
 

Quick tour of PHP from inside

on

  • 7,983 views

Quick tour of PHP from inside

Quick tour of PHP from inside

Statistics

Views

Total Views
7,983
Views on SlideShare
7,891
Embed Views
92

Actions

Likes
12
Downloads
0
Comments
2

4 Embeds 92

http://www.udemy.com 44
https://twitter.com 36
http://www.scoop.it 7
http://samba.alua.in.ua 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Quick tour of PHP from inside Quick tour of PHP from inside Presentation Transcript

    • { Diving into PHPs heart
    • { Hey !● Julien PAULI French guy author at Eyrolles Architect at Comuto http://www.blablacar.com PHP contributor/dev Not so much :( PHP Internals studying Some pictures on those slides havent been translated (french)
    • {● Before we start Well talk about PHP and only PHP ● No DB, no network● PHP is not the main bottleneck● PHPs performance are good ● As soon as you understand what you write
    • {● PHP What were gonna talk about Interpreted language (whats that?) Zend Engine Lexer, parser, compiler, executor, memory manager...● Performances Find the bottleneck parsing, I/Os, syscalls, error reporting ... Zend Memory Manager / Garbage Collector● « Dos and donts »
    • { Cool challenge
    • { Lets go
    • { PHP
    • {● PHP kernel : Zend Engine ~65000 LOC 10% total LOC (counting extensions) Zend Engine License● ZendE VM● ZendE Core● ZendE Tools● Thread-Safe TSRM Layer
    • { Heart: main et ext/standard● 62384 LOC str_ array_ files and streams ...
    • {● Extensions : ext/xxx 529778 LOC for ext/● "Extensions" and "Zend extensions" Statically or dynamically loaded Add features Consume resources (memory)● php -m ; php --re● Mandatory extensions (5.3) : core / date / ereg / pcre / reflection / SPL / standard● Other extensions : http://pecl.php.net
    • { ● Computer program PHP ● C written, total is about 800,000 lines ● Goal : define a language higher level, interpretedInterpreted language is a programming language in which programs are indirectly executed ("interpreted") byan interpreter program. This can be contrasted with a compiled language which is converted into machine codeand then directly executed by the host CPU. Theoretically, any language may be compiled or interpreted, sothis designation is applied purely because of common implementation practice and not some essential propertyof a language. Indeed, for some programming languages, there is little performance difference between aninterpretive- or compiled-based approach to their implementation. [Wikipedia] ● Interpreted language : less efficient than compiled language but much more easier to handle
    • { ● PHP from inside Virtual machine Compiler/Executor intermediate OPCode Mono Thread, Mono process ● Automatic memory handling Memory Manager Garbage collector
    • {● Startup (mem alloc) Steps Startup● Compilation Lexing and parsing zend_compile_file() Compiling (OP Code generation)● Execution zend_execute() OPCode interpretation Several VM exec modes Shutdown● Shutdown (mem freeing) "Share nothing architecture"
    • {● Recognize characters Lexing● Turn characters to Tokens Lexer generator : Re2c Was Flex before 5.3 http://re2c.org/● http://www.php.net/tokens highlight_file() highlight_string() compile_file() compile_string()
    • {● zend_language_scanner.l int lex_scan(zval *zendlval TSRMLS_DC)/*!re2cHNUM "0x"[0-9a-fA-F]+LABEL [a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*TABS_AND_SPACES [ t]*NEWLINE ("r"|"n"|"rn")<ST_IN_SCRIPTING>"("{TABS_AND_SPACES}("int"|"integer"){TABS_AND_SPACES}")" { return T_INT_CAST;} ● re2c is used elsewhere for PHP : PDO (PS emulation) dates : strtotime(), serialize()/unserialize() ● Generate zend_language_scanner.c :$(RE2C) $(RE2C_FLAGS) --case-inverted -cbdFt $(srcdir)/zend_language_scanner_defs.h -o $(srcdir)/zend_language_scanner.l
    • {● Hands on the lexer You can access lexer from PHP land : https://github.com/sebastianbergmann/phptok https://github.com/nikic/PHP-Parser ext/tokenizer Line Token Text --------------------------------------------------------- 1 OPEN_TAG <?php function display_data(array $data) { 2 WHITESPACE $buf = ; 3 FUNCTION function foreach ($data as $k=>$v) { 3 WHITESPACE $buf .= sprintf("%s: %s n", $k, $v); 3 STRING display_data } 3 OPEN_BRACKET ( return $buf; 3 ARRAY array } 3 WHITESPACE 3 VARIABLE $data 3 CLOSE_BRACKET ) 3 WHITESPACE 4 OPEN_CURLY { 4 WHITESPACE … … ...
    • { Parsing● "Understands" the tokens (rules) Defines the language syntax● Generate the parser : GNU/Bison (LALR) (Lemon to replace it ?)● For each token → Launch a compiler function → Go to next token (State machine)● Tied to lexical analyzer
    • {● zend_language_parser.y Used by PHP and ext/tokenizer ; zendparse()unticked_statement: T_FOREACH ( variable T_AS { zend_do_foreach_begin(&$1, &$2, &$3, &$4, 1 TSRMLS_CC); } foreach_variable foreach_optional_arg ) { zend_do_foreach_cont(&$1, &$2, &$4,&$6, &$7 TSRMLS_CC); } foreach_statement { zend_do_foreach_end(&$1, &$4 TSRMLS_CC); };foreach_variable: variable { zend_check_writable_variable(&$1); $$ = $1; } | & variable { zend_check_writable_variable(&$2); $$ = $2; $$.u.EA.type |=ZEND_PARSED_REFERENCE_VARIABLE; };foreach_optional_arg: /* empty */ { $$.op_type = IS_UNUSED; } | T_DOUBLE_ARROW foreach_variable { $$ = $2; };● Generate zend_language_parser.c : $(YACC) -p zend -v -d $(srcdir)/zend_language_parser.y -o zend_language_parser.c
    • { Wuups
    • {● Invoked by parser Compiler● Generate an OPCode array● OPCode = low level VM instruction Looks like asm Example : ADD (a,b) → c ; CONCAT(c,d) → e ; etc...● The compiling stage is very heavy Lots of checks Adresses resolutions
    • { Lets see an example <?php print foo;
    • { Example <?php print foo; lexing <ST_IN_SCRIPTING>"print" { return T_PRINT; } parsing T_PRINT expr { zend_do_print(&$$, &$2 TSRMLS_CC); }
    • { Example T_PRINT expr { zend_do_print(&$$, &$2 TSRMLS_CC); } compiling void zend_do_print(znode *result, const znode *arg TSRMLS_DC) /* {{{ */ { zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->result.op_type = IS_TMP_VAR; opline->result.u.var = get_temporary_variable(CG(active_op_array)); opline->opcode = ZEND_PRINT; opline->op1 = *arg; SET_UNUSED(opline->op2); *result = opline->result; }
    • { What is OPCode ? Function: line # opcode display_data Compiled variables: !0 = $data, !1 = $buf, !2 = $k, !3 = $v result operands● ext/bytekit-cli ----------------------------------------------------------------------------- 3 0 EXT_NOP● ext/vld 1 RECV 5 2 EXT_STMT !0 1, f(0) 3 ASSIGN !1, 6 4 EXT_STMTfunction display_data(array $data) { 5 FE_RESET $1 !0, ->18 $buf = ; 6 FE_FETCH $2, ~4 $1, ->18 foreach ($data as $k=>$v) { 7 ASSIGN !3, $2 $buf .= sprintf("%s: %s n", $k, $v); 8 ASSIGN !2, ~4 } 7 9 EXT_STMT return $buf; 10 EXT_FCALL_BEGIN 11 SEND_VAL %s: %s n, 1} 12 SEND_VAR !2, 2 13 SEND_VAR !3, 3 14 DO_FCALL $6 sprintf 15 EXT_FCALL_END 16 ASSIGN_CONCAT !1, $6 8 17 JMP ->6 18 SWITCH_FREE $1 9 19 EXT_STMT 20 RETURN !1 10 21 EXT_STMT 22 RETURN null
    • {● Executes OPCode Execution Startup Hardest part in ZendEngine "The" Virtual Machine zend_compile_file()● zend_vm_execute.h● zend_vm_skel.h zend_execute()● For each OPCode Shutdown Call a handler Zend vm handlers Several dispatch modes available
    • { Example (continued) <?php ZEND_PRINT print foo;static int ZEND_FASTCALL ZEND_PRINT_SPEC_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS){ zend_op *opline = EX(opline); Z_LVAL(EX_T(opline->result.u.var).tmp_var) = 1; Z_TYPE(EX_T(opline->result.u.var).tmp_var) = IS_LONG; return ZEND_ECHO_SPEC_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS_PASSTHRU);}static int ZEND_FASTCALL ZEND_ECHO_SPEC_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS){ zend_op *opline = EX(opline); zval z_copy; zval *z = &opline->op1.u.constant; zend_print_variable(z); // Some kind of printf() ZEND_VM_NEXT_OPCODE();}
    • { Still here ?
    • { Performances
    • {● PHP Performances Work on subjects weve seen : Compile (Lexing + parsing + compiling) Execute● Work on global subjects : Memory manager, syscalls & IO functions complexity (Big O) All that gets repeated (loops)● Examples doing that : HipHop for PHP ext/bcompiler zend optimizer ; APC …
    • {● OPCode Cache OPCode = PHP syntax that just got compiled, ready to execute through the VM● Compiling time versus Exec time ?● OPCode Caching stores OPCode somewhere (shmem) to prevent parsing next time
    • { Concrete example <?php function foo() { $data = file(/etc/fstab); sort($data); return $data; } for($i=0; $i<=$argv[1]; $i++) { $a = foo(); $a[] = range(0, $i); $result[] = $a; } var_dump($result);
    • { Compile / Exec function foo() { $data = file(/etc/fstab); sort($data); return $data; } for($i=0; $i<=$argv[1]; $i++) { $a = foo(); $a[] = range(0, $i); $result[] = $a; } var_dump($result);
    • {● Preventing Compilation Use an OPCode cache, and tune it ! APC / Xcache / Eaccelerator / ZendOptimizer● Compiling can take really long when there is many source code lines to parse Frameworks anyone ? autoload ?
    • {● exec performances Find the slow parts : profiling Xdebug XHPROF microtime()● Then optimize your functions● But what about PHP functions ? PHP structures like loops, array/object accesses ? file() seems slow ? Same for PDO::__construct() ?
    • {● Low level analysis PHP functions are C code Valgrind&callgrind, gprof, zoom, ...● C functions use syscalls (Kernel services) Strace / ltrace / time / iostat-iotop / perf● C functions use memory access Memory access are slow (pagefaults) Valgrind [memcheck | massif | exp-dhat ]
    • { Behind PHPs functions● IOs , syscalls, mem access / mem copy buffer cache / realpath cache DNS lookups HTTP calls (DOM DTD , etc...) MySQL API calls FooBarBaz API calls
    • {malloc(1024) ltrace : Oops ! = 0x02c3c050memset(0x02c3c050, 000, 1024) = 0x02c3c050malloc(72) = 0x02c3c460malloc(240) = 0x02c3c4b0memcpy(0x02c3c4b0, "001l^034377177", 240) = 0x02c3c4b0tolower(f) = ftolower(u) = utolower(n) = ntolower(c) = ctolower(_) = _tolower(n) = ntolower(u) = utolower(m) = mtolower(_) = _tolower(a) = atolower(r) = rtolower(g) = gtolower(s) = smemcpy(0x7f2805b800b0, "func_num_args", 14) = 0x7f2805b800b0
    • { Example : @ s cost● @, for error suppression● OPCodes : ZEND_BEGIN_SILENCE ZEND_END_SILENCE
    • {● strace -c Strace : syscall traces % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- -nan 0.000000 0 31 read -nan 0.000000 0 201 write● time -nan 0.000000 -nan 0.000000 0 0 64 31 36 open close -nan 0.000000 0 20 15 stat● /proc/{pid}/status -nan 0.000000 -nan 0.000000 0 0 38 15 fstat lstat -nan 0.000000 0 8 3 lseek -nan 0.000000 0 57 mmap -nan 0.000000 0 23 mprotect● syscall = context switch -nan 0.000000 0 16 15 access -nan 0.000000 0 2 socket● syscalls about I/O : -nan 0.000000 -nan 0.000000 0 0 2 1 2 connect execve -nan 0.000000 0 4 time iotop -p -nan 0.000000 0 3 1 futex -nan 0.000000 0 1 set_tid_address vmstat -nan 0.000000 … … ... 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- Kernel uses a buffer 100.00 0.000000 562 74 total cache
    • { "perf" tool● Powerful and great tool ● Little bit complex to handle > perf stat php fooscript.php 10 Performance counter stats for php ../fooscript.php 10: 18,409965 task-clock # 0,769 CPUs utilized 1 150 context-switches # 0,062 M/sec 0 CPU-migrations # 0,000 M/sec 2 683 page-faults # 0,146 M/sec 44 278 835 cycles # 2,405 GHz [80,35%] 26 211 096 stalled-cycles-frontend # 59,20% frontend cycles idle [80,27%] 22 097 571 stalled-cycles-backend # 49,91% backend cycles idle [57,04%] 47 509 944 instructions # 1,07 insns per cycle # 0,55 stalled cycles per insn [86,76%] 8 437 590 branches # 458,316 M/sec 203 537 branch-misses # 2,41% of all branches [92,68%] 0,023935557 seconds time elapsed
    • {● Virtual Machine is expensive C is way faster than PHP Dont code in PHP what PHP already does Code critical parts in C Perhaps an existing ext can do the job ? PECL ?● Compile PHP by yourself GCC, ICC, LLVM … know your compiler ACOVEA (Analysis of Compiler Options via Evolutionary Algorithm) http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > CFLAGS="-march=native -O4" ./configure && make Use an up-to-date PHP version
    • { Keep PHP up to date ● zend/micro-bench
    • { Memory consumption
    • {● Zend Memory Manager ZendMM : dynamic allocator Handles the process heap (malloc() / mmap()) Prevents MMU faulting (cost) with large allocations Used by PHP source, but not everywhere Tunable zend_alloc.c/h Heap fragments Large blocks Small blocks Bloc caching zvals
    • {● Tuner ZendMM ZendMM allocates the heap with segments Seg size is configurable Default 256Kb, good results for main cases● Each segment is divided in blocs for real data
    • {● Evaluate the consumption memory_get_usage() size of all blocs● memory_get_usage(true) size of all segments● Not fully accurate Not all code uses ZendMM cat /proc/13399/status /proc/{pid}/status Name: php State: S (sleeping) pmap tool VmPeak: 154440 kB VmSize: 133700 kB VmLck: 0 kB Take care of shared libs VmPin: 0 kB VmHWM: 30432 kB php> echo memory_get_usage(); VmRSS: 10304 kB 625272 VmData: 4316 kB php> echo memory_get_usage(1); VmStk: 136 kB 786432 VmExe: 9876 kB VmLib: 13408 kB VmPTE: 276 kB VmSwap: 0 kB
    • {● Compile a mini-PHP Make PHP lighter > ./configure --disable-all● Disable extensions you dont use Each ext has 2 startup hooks One triggered at PHP process startup (MINIT) One triggered for each request to PHP (RINIT) MINIT hook allocates global memory Never freed (until the end of the process) If you dont use the ext : thats a full waste
    • {● exts. allocate memory Example for ext/mbstring RINIT PHP_RINIT_FUNCTION(mbstring) { int n; enum mbfl_no_encoding *list=NULL, *entry; zend_function *func, *orig; const struct mb_overload_def *p; n = 0; if (MBSTRG(detect_order_list)) { list = MBSTRG(detect_order_list); n = MBSTRG(detect_order_list_size); } entry = (enum mbfl_no_encoding *)safe_emalloc(n, sizeof(int), 0); /* override original function. */ if (MBSTRG(func_overload)){ p = &(mb_ovld[0]); while (p->type > 0) { if ((MBSTRG(func_overload) & p->type) == p->type && zend_hash_find(EG(function_table), p->save_func, strlen(p->save_func)+1, (void **)&orig) != SUCCESS) { ...
    • { Memory in PHP● Free your ressources yourself● Free your "big" variables More on that later Reference counting / copy on write Dont overuse references, at least until you really know what they are and how they work● Compute a variable memory consumption comuto_get_var_memory_usage() A try, not very accurate actually :-p
    • {● PHP vars and memory Copy On Write $a = "foo"; $b = $a; $c = $b; $b = "bar"; unset($a);
    • {● References PHP vars and memory Disable COW ! $a = "string"; $b = &$a; $c = $b;
    • {● PHP vars and memory Function calls function foo($var) { $var = "bar"; return $var; } $a = "foobaz"; $b = foo($a);
    • {● Garbage collector (ZendGC) GC = PHP vars (zval) Nothing to do with Zend Memory Manager● Frees vars not used any more but still in memory circular references, OO● PHP.ini zend.enable_gc = 1● or gc_enable() / gc_disable()● Activated by default
    • { Example of a mem leak class Foo { } class Bar { } $f = new Foo; $b = new Bar; $f->b = $b; $b->f = $f; unset($f); unset($b); echo gc_collect_cycles(); // 2
    • {● How does GC work ? When zval buffer gets full call gc_collect_cycles() frees some zval, eventually (garbage) GC_ROOT_BUFFER_MAX_ENTRIES = 10000 by default● GC consummes resources memory (320K) + CPU cycles At each zval manipulation (so : every time)● http://www.php.net/gc
    • { Dos and donts
    • { ● References Dont use references everywhere, thinking you optimize : you will fail ● Cool :$a[b][c] = array(); $ref =& $a[b][c];for($i = 0; $i < 5; $i++) { $a[b][c][$i] = $i; } for($i = 0; $i < 5; $i++) { $ref[$i] = $i; } ● Not cool : function foo(&$data) { $len = strlen($data); /* Zval copy */ /* … */ }
    • {● == convert type, not === == versus === is_identical_function() VS compare_function()● Conversions are not always light : case TYPE_PAIR(IS_STRING, IS_STRING): zendi_smart_strcmp(result, op1, op2); return SUCCESS; ZEND_API void zendi_smart_strcmp(zval *result, zval *s1, zval *s2) /* {{{ */ { int ret1, ret2; long lval1, lval2; double dval1, dval2; if ((ret1=is_numeric_string(Z_STRVAL_P(s1), Z_STRLEN_P(s1), &lval1, &dval1, 0)) && (ret2=is_numeric_string(Z_STRVAL_P(s2), Z_STRLEN_P(s2), &lval2, &dval2, 0))) {
    • { Function call● WTF ?? Can you tell why we get this result ? const MAX_IT = 1000; const MAX_IT = 1000; $time = microtime(1); $time = microtime(1); $str = "string"; $str = "string"; for($i=0; $i<=MAX_IT; $i++) { for($i=0; $i<=MAX_IT; $i++) { strlen($str) == 2; isset($str[3]); } } echo microtime(1)-$time . "n"; echo microtime(1)-$time . "n"; 0.00090789794921875 0.00034594535827637
    • {● A function call in the engine OPCode ZEND_DO_FCALL Checks we can make the call : else if (UNEXPECTED(zend_hash_quick_find(EG(function_table), Z_STRVAL_P(fname), Z_STRLEN_P(fname)+1, Z_HASH_P(fname), (void **) &EX(function_state).function)==FAILURE)) { SAVE_OPLINE(); zend_error_noreturn(E_ERROR, "Call to undefined function %s()", fname->value.str.val); }if (UNEXPECTED((fbc->common.fn_flags & (ZEND_ACC_ABSTRACT|ZEND_ACC_DEPRECATED)) != 0)) { if (UNEXPECTED((fbc->common.fn_flags & ZEND_ACC_ABSTRACT) != 0)) { zend_error_noreturn(E_ERROR, "Cannot call abstract method %s::%s()", fbc->common.scope->name, fbc->common.function_name); if (UNEXPECTED((fbc->common.fn_flags & ZEND_ACC_DEPRECATED) != 0)) { zend_error(E_DEPRECATED, "Function %s%s%s() is deprecated", if (fbc->common.scope && !(fbc->common.fn_flags & ZEND_ACC_STATIC) { zend_error(E_STRICT, "Non-static method %s::%s() should not be called statically", fbc->common.scope->name, fbc->common.function_name);
    • { A function call in the engine● … to prepare the argument stack and the function context :zend_arg_types_stack_3_pop(&EG(arg_types_stack), &EX(called_scope), &EX(current_object), &EX(fbc)); EX(function_state).arguments = zend_vm_stack_push_args(opline->extended_value TSRMLS_CC); EX(original_return_value) = EG(return_value_ptr_ptr); EG(active_symbol_table) = NULL; EG(active_op_array) = &fbc->op_array; EG(return_value_ptr_ptr) = NULL; if (RETURN_VALUE_USED(opline)) { temp_variable *ret = &EX_T(opline->result.var); ret->var.ptr = NULL; EG(return_value_ptr_ptr) = &ret->var.ptr; ret->var.ptr_ptr = &ret->var.ptr; ret->var.fcall_returned_reference = (fbc->common.fn_flags & ZEND_ACC_RETURN_REFERENCE) != 0; }
    • { ● isset() is not a function call isset has its own parser rule :internal_functions_in_yacc: T_ISSET ( isset_variables ) { $$ = $3; } | T_EMPTY ( variable ) { zend_do_isset_or_isempty(ZEND_ISEMPTY, &$$, &$3 TSRMLS_CC); } ● Leading to the VM handler zend_isset_isempty_dim_prop_obj_handler And it does a simple compare : thats light 0 < offset < str_lenght} else if ((*container)->type == IS_STRING && !prop_dim) { /* string offsets */ if (Z_TYPE_P(offset) == IS_LONG) { if (opline->extended_value & ZEND_ISSET) { if (offset->value.lval >= 0 && offset->value.lval < Z_STRLEN_PP(container)) { result = 1; }
    • { Conclusion
    • {● PHP is built onto an OS Remember What if PHP waits for a DB ? What if PHP waits for the network ? What if the OS has not been tuned ?● Mainly, if PHP waits, dont blame PHP, its not its fault Example of blocking syscalls : open(), accept(), close(), poll() … ioctl(), fcntl(), select(), read(), write(), send()...
    • {● Dont micro optimise the syntax " vs vs Heredoc ?● Trace, analyze, find the bottleneck, know where to code● Do not Over-engineer , Do not Over-Design● Do not micro-optimize● Optimize loops, youll get great results
    • {● Is PHP the right tool ? batch processing● Multiple FS or DB access● PHP can show weaknesses in some cases● C Parallel programming (fork(), phtread_create()) Very good memory control lowlevel access - syscalls access - ASM embeding● Java multi OS Parallel programming
    • { Thanks ! jpauli@php.net @julienpauli http://julien-pauli.developpez.com French written technical PHP internals article "Good" french, gets translated very cleanlly