PHP Compiler Internals

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    17 Favorites, 1 Group & 1 Event

    PHP Compiler Internals - Presentation Transcript

    1. (Do not be afraid of) PHP Compiler Internals Sebastian Bergmann May 27th 2009
    2. Who I Am  Sebastian Bergmann  Involved in the PHP project since 2000  Creator of PHPUnit  Co-Founder and Principal Consultant with thePHP.cc
    3. Under PHP's Hood Extensions (date, dom, gd, json, mysql, pcre, pdo, reflection, session, standard, …) PHP Core Zend Engine Request Management Compilation and Execution File and Network Operations Memory and Resource Allocation Server API (SAPI) (mod_php, FastCGI, CLI, ...) This slide contains material by Sara Golemon
    4. How PHP executes code  Lexical Analysis Converts the source from a sequence of characters into a sequence of tokens
    5. How PHP executes code  Lexical Analysis  Syntax Analysis Analyzes a sequence of tokens to determine their grammatical structure
    6. How PHP executes code  Lexical Analysis  Syntax Analysis  Bytecode Generation Generate bytecode based on the information gathered by analyzing the sourcecode
    7. How PHP executes code  Lexical Analysis  Syntax Analysis  Bytecode Generation  Bytecode Execution
    8. Lexical Analysis Scan a sequence of characters 1 <?php 2 if (TRUE) { 3 print '*'; 4 } 5 ?>
    9. Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { 3 print '*'; 4 } 5 ?>
    10. Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; 4 } 5 ?>
    11. Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; 4 } 5 ?>
    12. Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE 4 } } 5 ?>
    13. Lexical Analysis Scan a sequence of characters 1 <?php T_OPEN_TAG 2 if (TRUE) { T_IF T_WHITESPACE ( T_STRING ) T_WHITESPACE { T_WHITESPACE 3 print '*'; T_PRINT T_WHITESPACE T_CONSTANT_ENCAPSED_STRING ; T_WHITESPACE 4 } } T_WHITESPACE 5 ?> T_CLOSE_TAG
    14. Lexical Analysis Scan a sequence of characters T_OPEN_TAG <?php T_IF if T_WHITESPACE ( T_STRING TRUE ) T_WHITESPACE { T_WHITESPACE T_PRINT print T_WHITESPACE T_CONSTANT_ENCAPSED_STRING '*' ; T_WHITESPACE } T_WHITESPACE T_CLOSE_TAG ?>
    15. Lexical Analysis Scan a sequence of characters
    16. Lexical Analysis Scanner Generators  You do not want to write a scanner by hand  At least when the code for the scanner should be efficient and maintainable  Tools such as flex or re2c generate the code for a scanner from a set of rules <ST_IN_SCRIPTING>\"if\" { \"if\" { return T_IF; }
    17. Lexical Analysis PHP Tokens  T_ABSTRACT  T_CONCAT_EQUAL  T_ELSE  T_FUNCTION  T_AND_EQUAL  T_CONST  T_ELSEIF  T_FUNC_C  T_ARRAY  T_CONSTANT_ENCAPSED_STRING  T_EMPTY  T_GLOBAL  T_ARRAY_CAST  T_CONTINUE  T_ENCAPSED_AND_WHITESPACE  T_GOTO  T_AS  T_CURLY_OPEN  T_ENDDECLARE  T_HALT_COMPILER  T_BAD_CHARACTER  T_DEC  T_ENDFOR  T_IF  T_BOOLEAN_AND  T_DECLARE  T_ENDFOREACH  T_IMPLEMENTS  T_BOOLEAN_OR  T_DEFAULT  T_ENDIF  T_INC  T_BOOL_CAST  T_DIR  T_ENDSWITCH  T_INCLUDE  T_BREAK  T_DIV_EQUAL  T_ENDWHILE  T_INCLUDE_ONCE  T_CASE  T_DNUMBER  T_END_HEREDOC  T_INLINE_HTML  T_CATCH  T_DOC_COMMENT  T_EVAL  T_INSTANCEOF  T_CHARACTER  T_DO  T_EXIT  T_INT_CAST  T_CLASS  T_DOLLAR_OPEN_CURLY_BRACES  T_EXTENDS  T_INTERFACE  T_CLASS_C  T_DOUBLE_ARROW  T_FILE  T_ISSET  T_CLONE  T_DOUBLE_CAST  T_FINAL  T_IS_EQUAL  T_CLOSE_TAG  T_DOUBLE_COLON  T_FOR  T_IS_GREATER_OR_EQUAL  T_COMMENT  T_ECHO  T_FOREACH  T_IS_IDENTICAL
    18. Lexical Analysis PHP Tokens  T_IS_NOT_EQUAL  T_OBJECT_CAST  T_SR_EQUAL  T_IS_NOT_IDENTICAL  T_OBJECT_OPERATOR  T_START_HEREDOC  T_IS_SMALLER_OR_EQUAL  T_OLD_FUNCTION  T_STATIC  T_LINE  T_OPEN_TAG  T_STRING  T_LIST  T_OPEN_TAG_WITH_ECHO  T_STRING_CAST  T_LNUMBER  T_OR_EQUAL  T_STRING_VARNAME  T_LOGICAL_AND  T_PAAMAYIM_NEKUDOTAYIM  T_SWITCH  T_LOGICAL_OR  T_PLUS_EQUAL  T_THROW  T_LOGICAL_XOR  T_PRINT  T_TRY  T_METHOD_C  T_PRIVATE  T_UNSET  T_MINUS_EQUAL  T_PUBLIC  T_UNSET_CAST  T_ML_COMMENT  T_PROTECTED  T_USE  T_MOD_EQUAL  T_REQUIRE  T_VAR  T_MUL_EQUAL  T_REQUIRE_ONCE  T_VARIABLE  T_NAMESPACE  T_RETURN  T_WHILE  T_NS_C  T_SL  T_WHITESPACE  T_NEW  T_SL_EQUAL  T_XOR_EQUAL  T_NUM_STRING  T_SR
    19. Syntax Analysis Analyze a sequence of tokens
    20. Syntax Analysis Parser Generators  You do not want to write a parser by hand  At least when the code for the scanner should be efficient and maintainable  Tools such as bison or lemon generate the code for a parser from a set of rules T_IF '(' expr ')' { ... } statement { ... } elseif_list else_single { ... }
    21. Bytecode Generation 1 <?php 2 if (TRUE) { 3 print '*'; 4 } 5 ?> filename: /home/sb/if.php function name: (null) number of ops: 8 compiled vars: none line # op fetch ext return operands ------------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '%2A' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1
    22. Bytecode Generation PHP Opcodes  NOP  IS_NOT_EQUAL  POST_INC  ADD_VAR  UNSET_DIM  ADD  IS_SMALLER  POST_DEC  BEGIN_SILENCE  UNSET_OBJ  SUB  IS_SMALLER_OR_EQUAL  ASSIGN  END_SILENCE  FE_RESET  MUL  CAST  ASSIGN_REF  INIT_FCALL_BY_NAME  FE_FETCH  DIV  QM_ASSIGN  ECHO  DO_FCALL  EXIT  MOD  ASSIGN_ADD  PRINT  DO_FCALL_BY_NAME  FETCH_R  SL  ASSIGN_SUB  JMPZ  RETURN  FETCH_DIM_R  SR  ASSIGN_MUL  JMPNZ  RECV  FETCH_OBJ_R  CONCAT  ASSIGN_DIV  JMPZNZ  RECV_INIT  FETCH_W  BW_OR  ASSIGN_MOD  JMPZ_EX  SEND_VAL  FETCH_DIM_W  BW_AND  ASSIGN_SL  JMPNZ_EX  SEND_VAR  FETCH_OBJ_W  BW_XOR  ASSIGN_SR  CASE  SEND_REF  FETCH_RW  BW_NOT  ASSIGN_CONCAT  SWITCH_FREE  NEW  FETCH_DIM_RW  BOOL_NOT  ASSIGN_BW_OR  BRK  FREE  FETCH_OBJ_RW  BOOL_XOR  ASSIGN_BW_AND  BOOL  INIT_ARRAY  FETCH_IS  IS_IDENTICAL  ASSIGN_BW_XOR  INIT_STRING  ADD_ARRAY_ELEMENT  FETCH_DIM_IS  IS_NOT_IDENTICAL  PRE_INC  ADD_CHAR  INCLUDE_OR_EVAL  FETCH_OBJ_IS  IS_EQUAL  PRE_DEC  ADD_STRING  UNSET_VAR  FETCH_FUNC_ARG
    23. Bytecode Generation PHP Opcodes  FETCH_DIM_FUNC_ARG  INIT_STATIC_METHOD_CALL  FETCH_OBJ_FUNC_ARG  ISSET_ISEMPTY_VAR  FETCH_UNSET  ISSET_ISEMPTY_DIM_OBJ  FETCH_DIM_UNSET  PRE_INC_OBJ  FETCH_OBJ_UNSET  PRE_DEC_OBJ  FETCH_DIM_TMP_VAR  POST_INC_OBJ  FETCH_CONSTANT  POST_DEC_OBJ  EXT_STMT  ASSIGN_OBJ  EXT_FCALL_BEGIN  INSTANCEOF  EXT_FCALL_END  DECLARE_CLASS  EXT_NOP  DECLARE_INHERITED_CLASS  TICKS  DECLARE_FUNCTION  SEND_VAR_NO_REF  RAISE_ABSTRACT_ERROR  CATCH  ADD_INTERFACE  THROW  VERIFY_ABSTRACT_CLASS  FETCH_CLASS  ASSIGN_DIM  CLONE  ISSET_ISEMPTY_PROP_OBJ  INIT_METHOD_CALL  HANDLE_EXCEPTION
    24. Extending the Compiler
    25. Test First! Zend/tests/unless.phpt --TEST-- unless statement --FILE-- <?php unless (FALSE) { print 'unless FALSE is TRUE, this is printed'; } unless (TRUE) { print 'unless TRUE is TRUE, this is printed'; } ?> --EXPECT-- unless FALSE is TRUE, this is printed
    26. Extending the Compiler  Add token for unless to the scanner  Add rule for unless to the parser  Generate bytecode for unless in the compiler  Add token for unless to ext/tokenizer
    27. Add unless scanner token Zend/zend_language_scanner.l <ST_IN_SCRIPTING>\"if\" { return T_IF; } <ST_IN_SCRIPTING>\"unless\" { return T_UNLESS; } <ST_IN_SCRIPTING>\"elseif\" { return T_ELSEIF; } <ST_IN_SCRIPTING>\"endif\" { return T_ENDIF; } <ST_IN_SCRIPTING>\"else\" { return T_ELSE; }
    28. Add unless parser rule Zend/zend_language_parser.y %token T_NAMESPACE %token T_NS_C %token T_DIR %token T_NS_SEPARATOR %token T_UNLESS . . unticked_statement: '{' inner_statement_list '}' | T_IF '(' expr ')' { . . | T_UNLESS '(' expr ')' { zend_do_unless_cond(&$3, &$4 TSRMLS_CC); } statement { zend_do_if_after_statement(&$4, 1 TSRMLS_CC); } { zend_do_if_end(TSRMLS_C); } . .
    29. How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { } zend_do_if_cond() is called when an if statement is compiled
    30. How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); } Allocate a new opline in the current oparray
    31. How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; } Set the opcode of the new opline to JMPZ (jump if zero)
    32. How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; opline->op1 = *cond; } Set the first operand of the new opline to the if condition
    33. How if is compiled Zend/zend_compile.c void zend_do_if_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = if_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array)); } Perform book keeping tasks such as marking the second operand of the new opline as unused or incrementing the backpatching counter for the current oparray
    34. Add unless to compiler Zend/zend_compile.c void zend_do_unless_cond (const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int unless_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPNZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = unless_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array)); } All we have to do to generate code for the unless statement, as compared to generate code for the if statement, is to use the JMPNZ (jump if not zero) opcode instead of the JMPZ (jump if zero) opcode
    35. Add unless to compiler The generated bytecode 1 <?php 2 unless (FALSE) { 3 print '*'; 4 } 5 ?> filename: /home/sb/unless.php function name: (null) number of ops: 8 compiled vars: none line # op fetch ext return operands ------------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPNZ false, ->6 3 2 EXT_STMT 3 PRINT ~0 '%2A' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1
    36. Run the test sb@ubuntu php-5.3-unless % make test TESTS=Zend/tests/unless.phpt Build complete. Don't forget to run 'make test'. ===================================================================== PHP : /usr/local/src/php/php-5.3-unless/sapi/cli/php PHP_SAPI : cli PHP_VERSION : 5.3.0alpha4-dev ZEND_VERSION: 2.3.0 PHP_OS : Linux - Linux ubuntu 2.6.27-9-generic #1 SMP Thu Nov 20 22:15:32 UTC 2008 x86_64 INI actual : /usr/local/src/php/php-5.3-unless/tmp-php.ini More .INIs : CWD : /usr/local/src/php/php-5.3-unless Extra dirs : VALGRIND : Not used ===================================================================== Running selected tests. PASS unless statement [Zend/tests/unless.phpt] ===================================================================== Number of tests : 1 1 Tests skipped : 0 ( 0.0%) -------- Tests warned : 0 ( 0.0%) ( 0.0%) Tests failed : 0 ( 0.0%) ( 0.0%) Expected fail : 0 ( 0.0%) ( 0.0%) Tests passed : 1 (100.0%) (100.0%) --------------------------------------------------------------------- Time taken : 0 seconds =====================================================================
    37. Add unless to ext/tokenizer ext/tokenizer/tokenizer_data.c sb@ubuntu tokenizer % ./tokenizer_data_gen.sh Wrote tokenizer_data.c
    38. The End Thank you for your interest! These slides will be linked soon from http://sebastian-bergmann.de/
    39. Acknowledgements  Thomas Lee, whose Python Language Internals presentation at OSDC 2008 inspired this presentation  Derick Rethans, without whose VLD we could not see PHP bytecode  Derick Rethans, David Soria Parra, and Scott MacVicar for reviewing these slides
    40. References  http://www.php.net/manual/en/tokens.php  http://www.zapt.info/opcodes.html  ”Extending and Embedding PHP” by Sara Golemon
    41. License   This presentation material is published under the Attribution-Share Alike 3.0 Unported license.   You are free: ✔ to Share – to copy, distribute and transmit the work. ✔ to Remix – to adapt the work.   Under the following conditions: ● Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). ● Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.   For any reuse or distribution, you must make clear to others the license terms of this work.   Any of the above conditions can be waived if you get permission from the copyright holder.   Nothing in this license impairs or restricts the author's moral rights.

    + Sebastian BergmannSebastian Bergmann, 6 months ago

    custom

    4504 views, 17 favs, 20 embeds more stats

    In this presentation we introduce a new language co more

    More info about this document

    CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

    Go to text version

    • Total Views 4504
      • 3551 on SlideShare
      • 953 from embeds
    • Comments 0
    • Favorites 17
    • Downloads 145
    Most viewed embeds
    • 718 views on http://sebastian-bergmann.de
    • 128 views on http://www.planet-php.net
    • 33 views on http://www.3wstudio.com.ar
    • 26 views on http://blogs.vinuthomas.com
    • 15 views on http://www.planet-php.org

    more

    All embeds
    • 718 views on http://sebastian-bergmann.de
    • 128 views on http://www.planet-php.net
    • 33 views on http://www.3wstudio.com.ar
    • 26 views on http://blogs.vinuthomas.com
    • 15 views on http://www.planet-php.org
    • 12 views on http://planet-php.org
    • 6 views on http://www.rumahzakat.org
    • 3 views on http://rumahzakat.org
    • 1 views on http://www.phpeye.com
    • 1 views on http://favit.dev
    • 1 views on http://74.125.19.132
    • 1 views on http://surf.googlemashups.com
    • 1 views on http://www.birds.cornell.edu
    • 1 views on http://indypendance.com
    • 1 views on http://www.slideshare.net
    • 1 views on http://var-dump.com
    • 1 views on http://xss.yandex.net
    • 1 views on http://feed.bmaron.net
    • 1 views on http://planet-php.net
    • 1 views on http://kailash-blog.blogspot.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories