• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Phpcompilerinternals 090824022750-phpapp02
 

Phpcompilerinternals 090824022750-phpapp02

on

  • 1,471 views

Creating a slidecast of this event.

Creating a slidecast of this event.

Statistics

Views

Total Views
1,471
Views on SlideShare
1,444
Embed Views
27

Actions

Likes
1
Downloads
21
Comments
0

3 Embeds 27

http://www.scoop.it 20
https://twitter.com 4
http://www.slideshare.net 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Phpcompilerinternals 090824022750-phpapp02 Phpcompilerinternals 090824022750-phpapp02 Presentation Transcript

    • (Do not be afraid of) PHP Compiler Internals Sebastian Bergmann August 23rd 2009
    • Sebastian Bergmann  Co-Founder and Principal Consultant with thePHP.cc  Creator of PHPUnit  Involved in the PHP project since 2000
    • Under PHP's Hood Extensions (date, dom, gd, json, mysql, pcre, pdo, reflection, session, standard, …) PHP Core Zend Engine Request Management Compilation and Execution File and Network Operations Memory and Resource Allocation Server API (SAPI) (mod_php, FastCGI, CLI, ...) This slide contains material by Sara Golemon
    • How PHP executes code  Lexical Analysis Scan the source for sequences of characters and convert them to a sequence of tokens
    • How PHP executes code  Lexical Analysis  Syntax Analysis Parse a sequence of tokens to determine their grammatical structure
    • How PHP executes code  Lexical Analysis  Syntax Analysis  Bytecode Generation Generate bytecode based on the information gathered by analyzing the source
    • How PHP executes code  Lexical Analysis  Syntax Analysis  Bytecode Generation  Bytecode Execution
    • Lexical Analysis Scan a sequence of characters 1 <?php 2 if (TRUE) { 3 print '*'; 4 } 5 ?>
    • Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { 3 print '*'; 4 } 5 ?>
    • Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; 4 } 5 ?>
    • Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; 4 } 5 ?>
    • Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE 4 } } 5 ?>
    • Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE 4 } } T_WHITESPACE 5 ?> T_CLOSE_TAG
    • Lexical Analysis Scan a sequence of characters T_OPEN_TAG <?php T_IF if T_WHITESPACE ( T_STRING TRUE ) T_WHITESPACE { T_WHITESPACE T_PRINT print T_WHITESPACE T_CONSTANT_ENCAPSE '*' D_STRING ; T_WHITESPACE } T_WHITESPACE ?> T_CLOSE_TAG
    • Lexical Analysis Scan a sequence of characters
    • Lexical Analysis Scanner Generators  You do not want to write a scanner by hand At least when the code for the scanner should be efficient and maintainable  Tools such as flex or re2c generate the code for a scanner from a set of rules <ST_IN_SCRIPTING>"if" { "if" { return T_IF; }
    • Lexical Analysis PHP Tokens  T_ABSTRACT  T_CONCAT_EQUAL  T_ELSE  T_FUNCTION  T_AND_EQUAL  T_CONST  T_ELSEIF  T_FUNC_C  T_ARRAY  T_CONSTANT_ENCAPSED_STRING  T_EMPTY  T_GLOBAL  T_ARRAY_CAST  T_CONTINUE  T_ENCAPSED_AND_WHITESPACE  T_GOTO  T_AS  T_CURLY_OPEN  T_ENDDECLARE  T_HALT_COMPILER  T_BAD_CHARACTER  T_DEC  T_ENDFOR  T_IF  T_BOOLEAN_AND  T_DECLARE  T_ENDFOREACH  T_IMPLEMENTS  T_BOOLEAN_OR  T_DEFAULT  T_ENDIF  T_INC  T_BOOL_CAST  T_DIR  T_ENDSWITCH  T_INCLUDE  T_BREAK  T_DIV_EQUAL  T_ENDWHILE  T_INCLUDE_ONCE  T_CASE  T_DNUMBER  T_END_HEREDOC  T_INLINE_HTML  T_CATCH  T_DOC_COMMENT  T_EVAL  T_INSTANCEOF  T_CHARACTER  T_DO  T_EXIT  T_INT_CAST  T_CLASS  T_DOLLAR_OPEN_CURLY_BRACES  T_EXTENDS  T_INTERFACE  T_CLASS_C  T_DOUBLE_ARROW  T_FILE  T_ISSET  T_CLONE  T_DOUBLE_CAST  T_FINAL  T_IS_EQUAL  T_CLOSE_TAG  T_DOUBLE_COLON  T_FOR  T_IS_GREATER_OR_EQUAL  T_COMMENT  T_ECHO  T_FOREACH  T_IS_IDENTICAL
    • Lexical Analysis PHP Tokens  T_IS_NOT_EQUAL  T_OBJECT_CAST  T_SR_EQUAL  T_IS_NOT_IDENTICAL  T_OBJECT_OPERATOR  T_START_HEREDOC  T_IS_SMALLER_OR_EQUAL  T_OLD_FUNCTION  T_STATIC  T_LINE  T_OPEN_TAG  T_STRING  T_LIST  T_OPEN_TAG_WITH_ECHO  T_STRING_CAST  T_LNUMBER  T_OR_EQUAL  T_STRING_VARNAME  T_LOGICAL_AND  T_PAAMAYIM_NEKUDOTAYIM  T_SWITCH  T_LOGICAL_OR  T_PLUS_EQUAL  T_THROW  T_LOGICAL_XOR  T_PRINT  T_TRY  T_METHOD_C  T_PRIVATE  T_UNSET  T_MINUS_EQUAL  T_PUBLIC  T_UNSET_CAST  T_ML_COMMENT  T_PROTECTED  T_USE  T_MOD_EQUAL  T_REQUIRE  T_VAR  T_MUL_EQUAL  T_REQUIRE_ONCE  T_VARIABLE  T_NAMESPACE  T_RETURN  T_WHILE  T_NS_C  T_SL  T_WHITESPACE  T_NEW  T_SL_EQUAL  T_XOR_EQUAL  T_NUM_STRING  T_SR
    • Syntax Analysis Parse a sequence of tokens
    • Syntax Analysis Parse a sequence of tokens  You do not want to write a parser by hand At least when the code for the scanner should be efficient and maintainable  Tools such as bison or lemon generate the code for a parser from a set of rules T_IF '(' expr ')' { ... } statement { ... } elseif_list else_single { ... }
    • PHP Bytecode Using bytekit-cli to disassemble bytecode 1 <?php 2 if (TRUE) { 3 print '*'; 4 } 5 ?> sb@thinkpad ~ % bytekit if.php bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: /home/sb/if.php Function: main Number of oplines: 8 line # opcode result operands ----------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '*' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1
    • PHP Bytecode Using bytekit-cli to visualize bytecode 1 <?php 2 if (TRUE) { 3 print '*'; 4 } 5 ?> sb@thinkpad ~ % bytekit --graph /tmp --format svg if.php
    • How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { typedef struct _znode { int op_type; union { zval constant; zend_uint var; zend_uint opline_num; zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; zend_uint type; } EA; } u; } } znode; zend_do_if_cond() is called when an if statement is compiled
    • How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); struct _zend_op { opcode_handler_t handler; znode result; znode op1; znode op2; ulong extended_value; uint lineno; zend_uchar opcode; } }; Allocate a new opline in the current oparray
    • How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; } Set the opcode of the new opline to JMPZ (jump if zero)
    • How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; opline->op1 = *cond; } Set the first operand of the new opline to the if condition
    • How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = if_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array)); } Perform book keeping tasks such as marking the second operand of the new opline as unused or incrementing the backpatching counter for the current oparray
    • PHP Bytecode PHP Opcodes  NOP  IS_NOT_EQUAL  POST_INC  ADD_VAR  UNSET_DIM  ADD  IS_SMALLER  POST_DEC  BEGIN_SILENCE  UNSET_OBJ  SUB  IS_SMALLER_OR_EQUAL  ASSIGN  END_SILENCE  FE_RESET  MUL  CAST  ASSIGN_REF  INIT_FCALL_BY_NAME  FE_FETCH  DIV  QM_ASSIGN  ECHO  DO_FCALL  EXIT  MOD  ASSIGN_ADD  PRINT  DO_FCALL_BY_NAME  FETCH_R  SL  ASSIGN_SUB  JMPZ  RETURN  FETCH_DIM_R  SR  ASSIGN_MUL  JMPNZ  RECV  FETCH_OBJ_R  CONCAT  ASSIGN_DIV  JMPZNZ  RECV_INIT  FETCH_W  BW_OR  ASSIGN_MOD  JMPZ_EX  SEND_VAL  FETCH_DIM_W  BW_AND  ASSIGN_SL  JMPNZ_EX  SEND_VAR  FETCH_OBJ_W  BW_XOR  ASSIGN_SR  CASE  SEND_REF  FETCH_RW  BW_NOT  ASSIGN_CONCAT  SWITCH_FREE  NEW  FETCH_DIM_RW  BOOL_NOT  ASSIGN_BW_OR  BRK  FREE  FETCH_OBJ_RW  BOOL_XOR  ASSIGN_BW_AND  BOOL  INIT_ARRAY  FETCH_IS  IS_IDENTICAL  ASSIGN_BW_XOR  INIT_STRING  ADD_ARRAY_ELEMENT  FETCH_DIM_IS  IS_NOT_IDENTICAL  PRE_INC  ADD_CHAR  INCLUDE_OR_EVAL  FETCH_OBJ_IS  IS_EQUAL  PRE_DEC  ADD_STRING  UNSET_VAR  FETCH_FUNC_ARG
    • PHP Bytecode PHP Opcodes  FETCH_DIM_FUNC_ARG  INIT_STATIC_METHOD_CALL  FETCH_OBJ_FUNC_ARG  ISSET_ISEMPTY_VAR  FETCH_UNSET  ISSET_ISEMPTY_DIM_OBJ  FETCH_DIM_UNSET  PRE_INC_OBJ  FETCH_OBJ_UNSET  PRE_DEC_OBJ  FETCH_DIM_TMP_VAR  POST_INC_OBJ  FETCH_CONSTANT  POST_DEC_OBJ  EXT_STMT  ASSIGN_OBJ  EXT_FCALL_BEGIN  INSTANCEOF  EXT_FCALL_END  DECLARE_CLASS  EXT_NOP  DECLARE_INHERITED_CLASS  TICKS  DECLARE_FUNCTION  SEND_VAR_NO_REF  RAISE_ABSTRACT_ERROR  CATCH  ADD_INTERFACE  THROW  VERIFY_ABSTRACT_CLASS  FETCH_CLASS  ASSIGN_DIM  CLONE  ISSET_ISEMPTY_PROP_OBJ  INIT_METHOD_CALL  HANDLE_EXCEPTION
    • Extending the PHP Compiler Test First! --TEST-- unless statement --FILE-- <?php unless (FALSE) { print 'unless FALSE is TRUE, this is printed'; } unless (TRUE) { print 'unless TRUE is TRUE, this is printed'; } ?> --EXPECT-- unless FALSE is TRUE, this is printed
    • Extending the PHP Compiler  Add token for unless to the scanner  Add rule for unless to the parser  Implement bytecode generation for unless in the compiler  Add token for unless to ext/tokenizer
    • Add unless scanner token Zend/zend_language_parser.y %token T_NAMESPACE %token T_NS_C %token T_DIR %token T_NS_SEPARATOR %token T_UNLESS
    • Add unless scanner token Zend/zend_language_scanner.l <ST_IN_SCRIPTING>"if" { return T_IF; } <ST_IN_SCRIPTING>"unless" { return T_UNLESS; } <ST_IN_SCRIPTING>"elseif" { return T_ELSEIF; } <ST_IN_SCRIPTING>"endif" { return T_ENDIF; } <ST_IN_SCRIPTING>"else" { return T_ELSE; }
    • Add unless parser rule Zend/zend_language_parser.y unticked_statement: '{' inner_statement_list '}' | T_IF '(' expr ')' { . . | T_UNLESS '(' expr ')' { zend_do_unless_cond(&$3, &$4 TSRMLS_CC); } statement { zend_do_if_after_statement(&$4, 1 TSRMLS_CC); } { zend_do_if_end(TSRMLS_C); }
    • Add unless to the compiler Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int unless_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPNZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = unless_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array)); } All we have to do to generate code for the unless statement, as compared to generate code for the if statement, is to emit JMPNZ (jump if not zero) instead of JMPZ (jump if zero)
    • Add unless to the compiler The generated bytecode 1 <?php 2 unless (FALSE) { 3 print '*'; 4 } 5 ?> sb@thinkpad ~ % bytekit unless.php bytekit-cli 1.0.0 by Sebastian Bergmann. Filename: /home/sb/unless.php Function: main Number of oplines: 8 line # opcode result operands ----------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPNZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '*' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1
    • Running the test sb@thinkpad php-5.3-unless % make test TESTS=Zend/tests/unless.phpt Build complete. Don't forget to run 'make test'. ===================================================================== PHP : /usr/local/src/php/php-5.3-unless/sapi/cli/php PHP_SAPI : cli PHP_VERSION : 5.3.1-dev ZEND_VERSION: 2.3.0 PHP_OS : Linux 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 01:19:55 UTC 2009 i686 GNU/Linux INI actual : /usr/local/src/php/php-5.3-unless/tmp-php.ini More .INIs : CWD : /usr/local/src/php/php-5.3-unless Extra dirs : VALGRIND : Not used ===================================================================== Running selected tests. PASS unless statement [Zend/tests/unless.phpt] ===================================================================== Number of tests : 1 1 Tests skipped : 0 ( 0.0%) -------- Tests warned : 0 ( 0.0%) ( 0.0%) Tests failed : 0 ( 0.0%) ( 0.0%) Expected fail : 0 ( 0.0%) ( 0.0%) Tests passed : 1 (100.0%) (100.0%) --------------------------------------------------------------------- Time taken : 0 seconds =====================================================================
    • Add unless to ext/tokenizer sb@thinkpad tokenizer % ./tokenizer_data_gen.sh Wrote tokenizer_data.c
    • The End Thank you for your interest! These slides will be posted on http://slideshare.net/sebastian_bergmann
    • Acknowledgements  Thomas Lee, whose Python Language Internals presentation at OSDC 2008 inspired this presentation  Stefan Esser for creating the Bytekit extension that provides PHP bytecode access and analysis features  Derick Rethans, David Soria Parra, and Scott MacVicar for reviewing these slides
    • References  http://www.php.net/manual/en/tokens.php  http://www.zapt.info/opcodes.html  ”Extending and Embedding PHP”, Sara Golemon  http://bytekit.org/  http://github.com/sebastianbergmann/bytekit-cli/
    • License   This presentation material is published under the Attribution-Share Alike 3.0 Unported license.   You are free: ✔ to Share – to copy, distribute and transmit the work. ✔ to Remix – to adapt the work.   Under the following conditions: ● Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). ● Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.   For any reuse or distribution, you must make clear to others the license terms of this work.   Any of the above conditions can be waived if you get permission from the copyright holder.   Nothing in this license impairs or restricts the author's moral rights.