Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
@asgrim
Climbing the
Abstract Syntax Tree
James Titcumb
phpDay 2017
Who is this guy?
James Titcumb
www.jamestitcumb.com
www.roave.com
www.phphants.co.uk
www.phpsouthcoast.co.uk
@asgrim
@asgrim
How PHP works
PHP code
OpCache
Execute (VM)
Lexer + Parser
Compiler
@asgrim
The PHP Lexer
zend_language_scanner.l
@asgrim
zend_language_scanner.l
<ST_IN_SCRIPTING>"exit" {
RETURN_TOKEN(T_EXIT);
}
<ST_IN_SCRIPTING>"die" {
RETURN_TOKEN(T_...
@asgrim
zend_language_scanner.l
<ST_IN_SCRIPTING>"exit" {
RETURN_TOKEN(T_EXIT);
}
<ST_IN_SCRIPTING>"die" {
RETURN_TOKEN(T_...
@asgrim
zend_language_scanner.l
<ST_IN_SCRIPTING>"exit" {
RETURN_TOKEN(T_EXIT);
}
<ST_IN_SCRIPTING>"die" {
RETURN_TOKEN(T_...
@asgrim
zend_language_scanner.l
<ST_IN_SCRIPTING>"exit" {
RETURN_TOKEN(T_EXIT);
}
<ST_IN_SCRIPTING>"die" {
RETURN_TOKEN(T_...
@asgrim
zend_language_scanner.l
<ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" {
yy_push_state(ST_LOOKING_FOR_VARNAME);
RE...
@asgrim
zend_language_scanner.l
<ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" {
yy_push_state(ST_LOOKING_FOR_VARNAME);
RE...
@asgrim
zend_language_scanner.l
<ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" {
yy_push_state(ST_LOOKING_FOR_VARNAME);
RE...
@asgrim
zend_language_scanner.l
<ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" {
yy_push_state(ST_LOOKING_FOR_VARNAME);
RE...
@asgrim
zend_language_scanner.l
<ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" {
yy_push_state(ST_LOOKING_FOR_VARNAME);
RE...
@asgrim
zend_language_scanner.l
<ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" {
yy_push_state(ST_LOOKING_FOR_VARNAME);
RE...
@asgrim
zend_language_scanner.l
<ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" {
yy_push_state(ST_LOOKING_FOR_VARNAME);
RE...
@asgrim
zend_language_scanner.l
<ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" {
yy_push_state(ST_LOOKING_FOR_VARNAME);
RE...
@asgrim
zend_language_scanner.l
<ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" {
yy_push_state(ST_LOOKING_FOR_VARNAME);
RE...
@asgrim
The PHP Lexer
zend_language_scanner.l
@asgrim
The PHP Lexer
zend_language_scanner.l
re2c
@asgrim
The PHP Lexer
zend_language_scanner.l
re2c
zend_language_scanner.c
@asgrim
The PHP Parser
zend_language_parser.y
@asgrim
zend_language_parser.y
if_stmt:
if_stmt_without_else %prec T_NOELSE { $$ = $1; }
| if_stmt_without_else T_ELSE sta...
@asgrim
if_stmt:
if_stmt_without_else %prec T_NOELSE { $$ = $1; }
| if_stmt_without_else T_ELSE statement
{ $$ = zend_ast_...
@asgrim
if_stmt:
if_stmt_without_else %prec T_NOELSE { $$ = $1; }
| if_stmt_without_else T_ELSE statement
{ $$ = zend_ast_...
@asgrim
if_stmt:
if_stmt_without_else %prec T_NOELSE { $$ = $1; }
| if_stmt_without_else T_ELSE statement
{ $$ = zend_ast_...
@asgrim
if_stmt:
if_stmt_without_else %prec T_NOELSE { $$ = $1; }
| if_stmt_without_else T_ELSE statement
{ $$ = zend_ast_...
@asgrim
if_stmt:
if_stmt_without_else %prec T_NOELSE { $$ = $1; }
| if_stmt_without_else T_ELSE statement
{ $$ = zend_ast_...
@asgrim
if_stmt:
if_stmt_without_else %prec T_NOELSE { $$ = $1; }
| if_stmt_without_else T_ELSE statement
{ $$ = zend_ast_...
@asgrim
if_stmt:
if_stmt_without_else %prec T_NOELSE { $$ = $1; }
| if_stmt_without_else T_ELSE statement
{ $$ = zend_ast_...
@asgrim
if ($a == 1)
{
a();
}
else if ($b == 1)
{
b();
}
else
{
c();
}
Using the rules to parse
@asgrim
if ($a == 1)
{
a();
}
else if ($b == 1)
{
b();
}
else
{
c();
}
Using the rules to parse
if_stmt_without_else (A)
@asgrim
if ($a == 1)
{
a();
}
else if ($b == 1)
{
b();
}
else
{
c();
}
Using the rules to parse
if_stmt_without_else (A)
i...
@asgrim
if ($a == 1)
{
a();
}
else if ($b == 1)
{
b();
}
else
{
c();
}
Using the rules to parse
if_stmt_without_else (A)
i...
@asgrim
Zend_language_parser.y (PHP 7.0.10)
if_stmt:
if_stmt_without_else %prec T_NOELSE { $$ = $1; }
| if_stmt_without_el...
@asgrim
zend_language_parser.y (PHP 5.6.26)
T_IF parenthesis_expr { zend_do_if_cond(&$2, &$1 TSRMLS_CC); }
statement { zen...
@asgrim
AST is new in PHP 7+
@asgrim
How PHP works
PHP code
OpCache
Execute (VM)
Lexer + Parser
Compiler
@asgrim
Let’s simplify!
@asgrim
First… WTF is AST?
@asgrim
AST is just a data structure
@asgrim
PHP code
<?php
echo "Hello world";
@asgrim
An AST representation
Echo statement
`-- String, value "Hello world"
@asgrim
PHP code
<?php
echo "Hello " . "world";
@asgrim
An AST representation
Echo statement
`-- Concat
|-- Left
| `-- String, value "Hello "
`-- Right
`-- String, value ...
@asgrim
PHP code
<?php
$a = 5;
$b = 3;
echo $a + ($b * 2);
@asgrim
An AST representation
Assign statement
|-- Variable $a
`-- Integer, value 5
Assign statement
|-- Variable $b
`-- I...
@asgrim
Why?
@asgrim
AST compilation
Statements
EchoAssign
Scalar
value: (int)5
Variable
name: $a
Assign
Scalar
value: (int)3
Variable
...
@asgrim
AST compilation: pre-order traversal
Statements
EchoAssign
Scalar
value: (int)5
Variable
name: $a
Assign
Scalar
va...
@asgrim
Pre-order traversal: Polish notation
Assign(Variable $a, Scalar 5)
Assign(Variable $b, Scalar 3)
Echo (
Add(
Varia...
@asgrim
Order of precedence
1 + 2 * 3
= 1 + (2 * 3) = 7?
= (1 + 2) * 3 = 9?
@asgrim
Order of precedence
1 + 2 * 3
= 1 + (2 * 3) = 7?
= (1 + 2) * 3 = 9?
+ 1 * 2 3
@asgrim
Order of precedence
1 + 2 * 3
= 1 + (2 * 3) = 7?
= (1 + 2) * 3 = 9?
+ 1 * 2 3
Operator Left operand
Right operand
@asgrim
Order of precedence
1 + 2 * 3
= 1 + (2 * 3) = 7?
= (1 + 2) * 3 = 9?
+ 1 * 2 3
Operator Left operand Right operand
...
@asgrim
Reverse Polish Notation
1 2 3 * +
@asgrim
Reverse Polish Notation
1 2 3 * + The stack
@asgrim
Reverse Polish Notation
1 2 3 * + The stack
1
@asgrim
Reverse Polish Notation
1 2 3 * + The stack
1
2
@asgrim
Reverse Polish Notation
1 2 3 * + The stack
1
2
3
@asgrim
Reverse Polish Notation
1 2 3 * + The stack
1
2
3
@asgrim
Reverse Polish Notation
1 2 3 * + The stack
1
2
3
@asgrim
Reverse Polish Notation
1 2 3 * + The stack
1
6
@asgrim
Reverse Polish Notation
1 2 3 * + The stack
1
6
@asgrim
Reverse Polish Notation
1 2 3 * + The stack
7
@asgrim
Let’s write a compiler (!!!)
In three easy steps…
@asgrim
Warning: do not use in production
@asgrim
View > Source
https://github.com/asgrim/basic-maths-compiler
@asgrim
Define the language
Tokens
● T_ADD (+)
● T_SUBTRACT (-)
● T_MULTIPLY (/)
● T_DIVIDE (*)
● T_INTEGER (d)
● T_WHITES...
@asgrim
Step 1: Writing a simple lexer
@asgrim
Using regular expressions
private static $matches = [
'/^(+)/' => Token::T_ADD,
'/^(-)/' => Token::T_SUBTRACT,
'/^...
@asgrim
Step through the input string
public function __invoke(string $input) : array
{
$tokens = [];
$offset = 0;
while (...
@asgrim
The matching method
private function match(string $input) : Token
{
foreach (self::$matches as $pattern => $token)...
@asgrim
Step 2: Parsing the tokens
@asgrim
Order tokens by operator precedence
/**
* Higher number is higher precedence.
* @var int[]
*/
private static $oper...
@asgrim
Order tokens by operator precedence
/** @var Token[] $stack */
$stack = [];
/** @var Token[] $operators */
$operat...
@asgrim
Order tokens by operator precedence
/** @var Token[] $stack */
$stack = [];
/** @var Token[] $operators */
$operat...
@asgrim
Order tokens by operator precedence
/** @var Token[] $stack */
$stack = [];
/** @var Token[] $operators */
$operat...
@asgrim
Order tokens by operator precedence
/** @var Token[] $stack */
$stack = [];
/** @var Token[] $operators */
$operat...
@asgrim
Order tokens by operator precedence
if ($token->isOperator()) {
$tokenPrecedence = self::$operatorPrecedence[$toke...
@asgrim
Order tokens by operator precedence
if ($token->isOperator()) {
$tokenPrecedence = self::$operatorPrecedence[$toke...
@asgrim
Order tokens by operator precedence
if ($token->isOperator()) {
$tokenPrecedence = self::$operatorPrecedence[$toke...
@asgrim
Order tokens by operator precedence
if ($token->isOperator()) {
$tokenPrecedence = self::$operatorPrecedence[$toke...
@asgrim
Order tokens by operator precedence
// Clean up by moving any remaining operators onto the token stack
while (coun...
@asgrim
Order tokens by operator precedence
1 + 2 * 3
Output stack
Operator stack
@asgrim
Order tokens by operator precedence
1 + 2 * 3
1Output stack
Operator stack
@asgrim
Order tokens by operator precedence
1 + 2 * 3
1
+
Output stack
Operator stack
@asgrim
Order tokens by operator precedence
1 + 2 * 3
1 2
+
Output stack
Operator stack
@asgrim
Order tokens by operator precedence
1 + 2 * 3
1 2
+ *
Output stack
Operator stack
@asgrim
Order tokens by operator precedence
1 + 2 * 3
1 2 3
+ *
Output stack
Operator stack
@asgrim
Order tokens by operator precedence
1 + 2 * 3
1 2 3 *
+ *
Output stack
Operator stack
@asgrim
Order tokens by operator precedence
1 + 2 * 3
1 2 3 * +
+
Output stack
Operator stack
@asgrim
Create AST
while ($ip < count($tokenStack)) {
$token = $tokenStack[$ip++];
if ($token->isOperator()) {
// (figure ...
@asgrim
Create AST
while ($ip < count($tokenStack)) {
$token = $tokenStack[$ip++];
if ($token->isOperator()) {
// (figure ...
@asgrim
Create AST
while ($ip < count($tokenStack)) {
$token = $tokenStack[$ip++];
if ($token->isOperator()) {
// (figure ...
@asgrim
Create AST
while ($ip < count($tokenStack)) {
$token = $tokenStack[$ip++];
if ($token->isOperator()) {
// (figure ...
@asgrim
Create AST
NodeBinaryOpAdd (
NodeScalarIntegerValue(1),
NodeBinaryOpMultiply (
NodeScalarIntegerValue(2),
NodeScal...
@asgrim
Step 3: Executing the AST
@asgrim
Compile & execute AST
private function compileNode(NodeInterface $node)
{
if ($node instanceof NodeBinaryOpAbstrac...
@asgrim
Compile & execute AST
private function compileBinaryOp(NodeBinaryOpAbstractBinaryOp $node)
{
$left = $this->compil...
@asgrim
What does this mean for me?
@asgrim
AST in userland
@asgrim
php-ast extension
https://github.com/nikic/php-ast
@asgrim
php-ast example usage
<?php
require 'path/to/util.php';
$code = <<<'EOC'
<?php
$var = 42;
EOC;
echo ast_dump(astpa...
@asgrim
astkit
https://github.com/sgolemon/astkit
@asgrim
astkit example usage
$if = AstKit::parseString(<<<EOD
if (true) {
echo "This is a triumph.n";
} else {
echo "The c...
@asgrim
PhpParser
https://github.com/nikic/PHP-Parser
@asgrim
PHP Parser
<?php
use PhpParserParserFactory;
$parser = (new ParserFactory)
->create(ParserFactory::PREFER_PHP7);
p...
@asgrim
Better Reflection
https://github.com/Roave/BetterReflection
@asgrim
Better Reflection workflow
Reflector
Source
Locator
PhpParser
Reflection
@asgrim
PHP Reflection
$reflection = new ReflectionClass(
MyExampleClass::class
);
$this->assertSame(
'ExampleClass',
$ref...
@asgrim
Better Reflection
$reflection = ReflectionClass::createFromName(
MyExampleClass::class
);
$this->assertSame(
'Exam...
@asgrim
ReflectionClass::createFromName()
// In ReflectionClass :
public static function createFromName($className)
{
retu...
@asgrim
ClassReflector::buildDefaultReflector()
// In ClassReflector :
public static function buildDefaultReflector()
{
re...
@asgrim
Given a class structure...
<?php
class Foo
{
private $bar;
public function thing()
{
}
}
@asgrim
… we get the AST!
Class, name Foo
|-- Statements
| |-- Property, name bar
| | |-- Type [private]
| | `-- Attribute...
Any questions?
https://joind.in/talk/a51b7
James Titcumb @asgrim
Upcoming SlideShare
Loading in …5
×

Climbing the Abstract Syntax Tree (phpDay 2017)

252 views

Published on

The new Abstract Syntax Tree (AST) in PHP 7 means the way our PHP code is being executed has changed. Understanding this new fundamental compilation step is key to understanding how our code is being run.

To demonstrate, James will show how a basic compiler works and how introducing an AST simplifies this process. We’ll look into how these magical time-warp techniques* can also be used in your code to introspect, analyse and modify code in a way that was never possible before.

After seeing this talk, you'll have a great insight as to the wonders of an AST, and how it can be applied to both compilers and userland code.

(*actual magic or time-warp not guaranteed)

Published in: Technology
  • Download The Complete Lean Belly Breakthrough Program with Special Discount. ➤➤ https://tinyurl.com/y6qaaou7
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Do This Simple 2-Minute Ritual To Loss 1 Pound Of Belly Fat Every 72 Hours ▲▲▲ https://tinyurl.com/bkfitness4u
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Download The Complete Lean Belly Breakthrough Program with Special Discount.  http://ishbv.com/bkfitness3/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Climbing the Abstract Syntax Tree (phpDay 2017)

  1. 1. @asgrim Climbing the Abstract Syntax Tree James Titcumb phpDay 2017
  2. 2. Who is this guy? James Titcumb www.jamestitcumb.com www.roave.com www.phphants.co.uk www.phpsouthcoast.co.uk @asgrim
  3. 3. @asgrim How PHP works PHP code OpCache Execute (VM) Lexer + Parser Compiler
  4. 4. @asgrim The PHP Lexer zend_language_scanner.l
  5. 5. @asgrim zend_language_scanner.l <ST_IN_SCRIPTING>"exit" { RETURN_TOKEN(T_EXIT); } <ST_IN_SCRIPTING>"die" { RETURN_TOKEN(T_EXIT); } <ST_IN_SCRIPTING>"function" { RETURN_TOKEN(T_FUNCTION); }
  6. 6. @asgrim zend_language_scanner.l <ST_IN_SCRIPTING>"exit" { RETURN_TOKEN(T_EXIT); } <ST_IN_SCRIPTING>"die" { RETURN_TOKEN(T_EXIT); } <ST_IN_SCRIPTING>"function" { RETURN_TOKEN(T_FUNCTION); }
  7. 7. @asgrim zend_language_scanner.l <ST_IN_SCRIPTING>"exit" { RETURN_TOKEN(T_EXIT); } <ST_IN_SCRIPTING>"die" { RETURN_TOKEN(T_EXIT); } <ST_IN_SCRIPTING>"function" { RETURN_TOKEN(T_FUNCTION); }
  8. 8. @asgrim zend_language_scanner.l <ST_IN_SCRIPTING>"exit" { RETURN_TOKEN(T_EXIT); } <ST_IN_SCRIPTING>"die" { RETURN_TOKEN(T_EXIT); } <ST_IN_SCRIPTING>"function" { RETURN_TOKEN(T_FUNCTION); }
  9. 9. @asgrim zend_language_scanner.l <ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" { yy_push_state(ST_LOOKING_FOR_VARNAME); RETURN_TOKEN(T_DOLLAR_OPEN_CURLY_BRACES); } <ST_LOOKING_FOR_VARNAME>{LABEL}[[}] { yyless(yyleng - 1); zend_copy_value(zendlval, yytext, yyleng); yy_pop_state(); yy_push_state(ST_IN_SCRIPTING); RETURN_TOKEN(T_STRING_VARNAME); }
  10. 10. @asgrim zend_language_scanner.l <ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" { yy_push_state(ST_LOOKING_FOR_VARNAME); RETURN_TOKEN(T_DOLLAR_OPEN_CURLY_BRACES); } <ST_LOOKING_FOR_VARNAME>{LABEL}[[}] { yyless(yyleng - 1); zend_copy_value(zendlval, yytext, yyleng); yy_pop_state(); yy_push_state(ST_IN_SCRIPTING); RETURN_TOKEN(T_STRING_VARNAME); }
  11. 11. @asgrim zend_language_scanner.l <ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" { yy_push_state(ST_LOOKING_FOR_VARNAME); RETURN_TOKEN(T_DOLLAR_OPEN_CURLY_BRACES); } <ST_LOOKING_FOR_VARNAME>{LABEL}[[}] { yyless(yyleng - 1); zend_copy_value(zendlval, yytext, yyleng); yy_pop_state(); yy_push_state(ST_IN_SCRIPTING); RETURN_TOKEN(T_STRING_VARNAME); }
  12. 12. @asgrim zend_language_scanner.l <ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" { yy_push_state(ST_LOOKING_FOR_VARNAME); RETURN_TOKEN(T_DOLLAR_OPEN_CURLY_BRACES); } <ST_LOOKING_FOR_VARNAME>{LABEL}[[}] { yyless(yyleng - 1); zend_copy_value(zendlval, yytext, yyleng); yy_pop_state(); yy_push_state(ST_IN_SCRIPTING); RETURN_TOKEN(T_STRING_VARNAME); }
  13. 13. @asgrim zend_language_scanner.l <ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" { yy_push_state(ST_LOOKING_FOR_VARNAME); RETURN_TOKEN(T_DOLLAR_OPEN_CURLY_BRACES); } <ST_LOOKING_FOR_VARNAME>{LABEL}[[}] { yyless(yyleng - 1); zend_copy_value(zendlval, yytext, yyleng); yy_pop_state(); yy_push_state(ST_IN_SCRIPTING); RETURN_TOKEN(T_STRING_VARNAME); }
  14. 14. @asgrim zend_language_scanner.l <ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" { yy_push_state(ST_LOOKING_FOR_VARNAME); RETURN_TOKEN(T_DOLLAR_OPEN_CURLY_BRACES); } <ST_LOOKING_FOR_VARNAME>{LABEL}[[}] { yyless(yyleng - 1); zend_copy_value(zendlval, yytext, yyleng); yy_pop_state(); yy_push_state(ST_IN_SCRIPTING); RETURN_TOKEN(T_STRING_VARNAME); }
  15. 15. @asgrim zend_language_scanner.l <ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" { yy_push_state(ST_LOOKING_FOR_VARNAME); RETURN_TOKEN(T_DOLLAR_OPEN_CURLY_BRACES); } <ST_LOOKING_FOR_VARNAME>{LABEL}[[}] { yyless(yyleng - 1); zend_copy_value(zendlval, yytext, yyleng); yy_pop_state(); yy_push_state(ST_IN_SCRIPTING); RETURN_TOKEN(T_STRING_VARNAME); }
  16. 16. @asgrim zend_language_scanner.l <ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" { yy_push_state(ST_LOOKING_FOR_VARNAME); RETURN_TOKEN(T_DOLLAR_OPEN_CURLY_BRACES); } <ST_LOOKING_FOR_VARNAME>{LABEL}[[}] { yyless(yyleng - 1); zend_copy_value(zendlval, yytext, yyleng); yy_pop_state(); yy_push_state(ST_IN_SCRIPTING); RETURN_TOKEN(T_STRING_VARNAME); }
  17. 17. @asgrim zend_language_scanner.l <ST_DOUBLE_QUOTES,ST_BACKQUOTE,ST_HEREDOC>"${" { yy_push_state(ST_LOOKING_FOR_VARNAME); RETURN_TOKEN(T_DOLLAR_OPEN_CURLY_BRACES); } <ST_LOOKING_FOR_VARNAME>{LABEL}[[}] { yyless(yyleng - 1); zend_copy_value(zendlval, yytext, yyleng); yy_pop_state(); yy_push_state(ST_IN_SCRIPTING); RETURN_TOKEN(T_STRING_VARNAME); }
  18. 18. @asgrim The PHP Lexer zend_language_scanner.l
  19. 19. @asgrim The PHP Lexer zend_language_scanner.l re2c
  20. 20. @asgrim The PHP Lexer zend_language_scanner.l re2c zend_language_scanner.c
  21. 21. @asgrim The PHP Parser zend_language_parser.y
  22. 22. @asgrim zend_language_parser.y if_stmt: if_stmt_without_else %prec T_NOELSE { $$ = $1; } | if_stmt_without_else T_ELSE statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, NULL, $3)); } ; if_stmt_without_else: T_IF '(' expr ')' statement { $$ = zend_ast_create_list(1, ZEND_AST_IF, zend_ast_create(ZEND_AST_IF_ELEM, $3, $5)); } | if_stmt_without_else T_ELSEIF '(' expr ')' statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, $4, $6)); } ;
  23. 23. @asgrim if_stmt: if_stmt_without_else %prec T_NOELSE { $$ = $1; } | if_stmt_without_else T_ELSE statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, NULL, $3)); } ; if_stmt_without_else: T_IF '(' expr ')' statement { $$ = zend_ast_create_list(1, ZEND_AST_IF, zend_ast_create(ZEND_AST_IF_ELEM, $3, $5)); } | if_stmt_without_else T_ELSEIF '(' expr ')' statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, $4, $6)); } ; zend_language_parser.y
  24. 24. @asgrim if_stmt: if_stmt_without_else %prec T_NOELSE { $$ = $1; } | if_stmt_without_else T_ELSE statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, NULL, $3)); } ; if_stmt_without_else: T_IF '(' expr ')' statement { $$ = zend_ast_create_list(1, ZEND_AST_IF, zend_ast_create(ZEND_AST_IF_ELEM, $3, $5)); } | if_stmt_without_else T_ELSEIF '(' expr ')' statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, $4, $6)); } ; zend_language_parser.y
  25. 25. @asgrim if_stmt: if_stmt_without_else %prec T_NOELSE { $$ = $1; } | if_stmt_without_else T_ELSE statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, NULL, $3)); } ; if_stmt_without_else: T_IF '(' expr ')' statement { $$ = zend_ast_create_list(1, ZEND_AST_IF, zend_ast_create(ZEND_AST_IF_ELEM, $3, $5)); } | if_stmt_without_else T_ELSEIF '(' expr ')' statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, $4, $6)); } ; zend_language_parser.y
  26. 26. @asgrim if_stmt: if_stmt_without_else %prec T_NOELSE { $$ = $1; } | if_stmt_without_else T_ELSE statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, NULL, $3)); } ; if_stmt_without_else: T_IF '(' expr ')' statement { $$ = zend_ast_create_list(1, ZEND_AST_IF, zend_ast_create(ZEND_AST_IF_ELEM, $3, $5)); } | if_stmt_without_else T_ELSEIF '(' expr ')' statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, $4, $6)); } ; zend_language_parser.y
  27. 27. @asgrim if_stmt: if_stmt_without_else %prec T_NOELSE { $$ = $1; } | if_stmt_without_else T_ELSE statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, NULL, $3)); } ; if_stmt_without_else: T_IF '(' expr ')' statement { $$ = zend_ast_create_list(1, ZEND_AST_IF, zend_ast_create(ZEND_AST_IF_ELEM, $3, $5)); } | if_stmt_without_else T_ELSEIF '(' expr ')' statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, $4, $6)); } ; zend_language_parser.y
  28. 28. @asgrim if_stmt: if_stmt_without_else %prec T_NOELSE { $$ = $1; } | if_stmt_without_else T_ELSE statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, NULL, $3)); } ; if_stmt_without_else: T_IF '(' expr ')' statement { $$ = zend_ast_create_list(1, ZEND_AST_IF, zend_ast_create(ZEND_AST_IF_ELEM, $3, $5)); } | if_stmt_without_else T_ELSEIF '(' expr ')' statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, $4, $6)); } ; zend_language_parser.y
  29. 29. @asgrim if_stmt: if_stmt_without_else %prec T_NOELSE { $$ = $1; } | if_stmt_without_else T_ELSE statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, NULL, $3)); } ; if_stmt_without_else: T_IF '(' expr ')' statement { $$ = zend_ast_create_list(1, ZEND_AST_IF, zend_ast_create(ZEND_AST_IF_ELEM, $3, $5)); } | if_stmt_without_else T_ELSEIF '(' expr ')' statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, $4, $6)); } ; zend_language_parser.y
  30. 30. @asgrim if ($a == 1) { a(); } else if ($b == 1) { b(); } else { c(); } Using the rules to parse
  31. 31. @asgrim if ($a == 1) { a(); } else if ($b == 1) { b(); } else { c(); } Using the rules to parse if_stmt_without_else (A)
  32. 32. @asgrim if ($a == 1) { a(); } else if ($b == 1) { b(); } else { c(); } Using the rules to parse if_stmt_without_else (A) if_stmt_without_else (B)
  33. 33. @asgrim if ($a == 1) { a(); } else if ($b == 1) { b(); } else { c(); } Using the rules to parse if_stmt_without_else (A) if_stmt_without_else (B) if_stmt
  34. 34. @asgrim Zend_language_parser.y (PHP 7.0.10) if_stmt: if_stmt_without_else %prec T_NOELSE { $$ = $1; } | if_stmt_without_else T_ELSE statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, NULL, $3)); } ; if_stmt_without_else: T_IF '(' expr ')' statement { $$ = zend_ast_create_list(1, ZEND_AST_IF, zend_ast_create(ZEND_AST_IF_ELEM, $3, $5)); } | if_stmt_without_else T_ELSEIF '(' expr ')' statement { $$ = zend_ast_list_add($1, zend_ast_create(ZEND_AST_IF_ELEM, $4, $6)); } ;
  35. 35. @asgrim zend_language_parser.y (PHP 5.6.26) T_IF parenthesis_expr { zend_do_if_cond(&$2, &$1 TSRMLS_CC); } statement { zend_do_if_after_statement(&$1, 1 TSRMLS_CC); } void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC) { int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC); opline->opcode = ZEND_JMPZ; SET_NODE(opline->op1, cond); closing_bracket_token->u.op.opline_num = if_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array)); }
  36. 36. @asgrim AST is new in PHP 7+
  37. 37. @asgrim How PHP works PHP code OpCache Execute (VM) Lexer + Parser Compiler
  38. 38. @asgrim Let’s simplify!
  39. 39. @asgrim First… WTF is AST?
  40. 40. @asgrim AST is just a data structure
  41. 41. @asgrim PHP code <?php echo "Hello world";
  42. 42. @asgrim An AST representation Echo statement `-- String, value "Hello world"
  43. 43. @asgrim PHP code <?php echo "Hello " . "world";
  44. 44. @asgrim An AST representation Echo statement `-- Concat |-- Left | `-- String, value "Hello " `-- Right `-- String, value "world"
  45. 45. @asgrim PHP code <?php $a = 5; $b = 3; echo $a + ($b * 2);
  46. 46. @asgrim An AST representation Assign statement |-- Variable $a `-- Integer, value 5 Assign statement |-- Variable $b `-- Integer, value 3 Echo statement `-- Add operation |-- Left | `-- Variable $a `-- Right `-- Multiply operation |-- Left | `-- Variable $b `-- Right `-- Integer, value 2
  47. 47. @asgrim Why?
  48. 48. @asgrim AST compilation Statements EchoAssign Scalar value: (int)5 Variable name: $a Assign Scalar value: (int)3 Variable name: $b Add op Right operandLeft operand Variable name: $a Multiply op Right operandLeft operand Variable name: $b Scalar value: (int)2
  49. 49. @asgrim AST compilation: pre-order traversal Statements EchoAssign Scalar value: (int)5 Variable name: $a Assign Scalar value: (int)3 Variable name: $b Add op Right operandLeft operand Variable name: $a Multiply op Right operandLeft operand Variable name: $b Scalar value: (int)2
  50. 50. @asgrim Pre-order traversal: Polish notation Assign(Variable $a, Scalar 5) Assign(Variable $b, Scalar 3) Echo ( Add( Variable $a, Multiply( $b, 2 ) ) )
  51. 51. @asgrim Order of precedence 1 + 2 * 3 = 1 + (2 * 3) = 7? = (1 + 2) * 3 = 9?
  52. 52. @asgrim Order of precedence 1 + 2 * 3 = 1 + (2 * 3) = 7? = (1 + 2) * 3 = 9? + 1 * 2 3
  53. 53. @asgrim Order of precedence 1 + 2 * 3 = 1 + (2 * 3) = 7? = (1 + 2) * 3 = 9? + 1 * 2 3 Operator Left operand Right operand
  54. 54. @asgrim Order of precedence 1 + 2 * 3 = 1 + (2 * 3) = 7? = (1 + 2) * 3 = 9? + 1 * 2 3 Operator Left operand Right operand Operator Left operand Right operand
  55. 55. @asgrim Reverse Polish Notation 1 2 3 * +
  56. 56. @asgrim Reverse Polish Notation 1 2 3 * + The stack
  57. 57. @asgrim Reverse Polish Notation 1 2 3 * + The stack 1
  58. 58. @asgrim Reverse Polish Notation 1 2 3 * + The stack 1 2
  59. 59. @asgrim Reverse Polish Notation 1 2 3 * + The stack 1 2 3
  60. 60. @asgrim Reverse Polish Notation 1 2 3 * + The stack 1 2 3
  61. 61. @asgrim Reverse Polish Notation 1 2 3 * + The stack 1 2 3
  62. 62. @asgrim Reverse Polish Notation 1 2 3 * + The stack 1 6
  63. 63. @asgrim Reverse Polish Notation 1 2 3 * + The stack 1 6
  64. 64. @asgrim Reverse Polish Notation 1 2 3 * + The stack 7
  65. 65. @asgrim Let’s write a compiler (!!!) In three easy steps…
  66. 66. @asgrim Warning: do not use in production
  67. 67. @asgrim View > Source https://github.com/asgrim/basic-maths-compiler
  68. 68. @asgrim Define the language Tokens ● T_ADD (+) ● T_SUBTRACT (-) ● T_MULTIPLY (/) ● T_DIVIDE (*) ● T_INTEGER (d) ● T_WHITESPACE (s+)
  69. 69. @asgrim Step 1: Writing a simple lexer
  70. 70. @asgrim Using regular expressions private static $matches = [ '/^(+)/' => Token::T_ADD, '/^(-)/' => Token::T_SUBTRACT, '/^(*)/' => Token::T_MULTIPLY, '/^(/)/' => Token::T_DIVIDE, '/^(d+)/' => Token::T_INTEGER, '/^(s+)/' => Token::T_WHITESPACE, ];
  71. 71. @asgrim Step through the input string public function __invoke(string $input) : array { $tokens = []; $offset = 0; while ($offset < strlen($input)) { $focus = substr($input, $offset); $result = $this->match($focus); $tokens[] = $result; $offset += strlen($result->getLexeme()); } return $tokens; }
  72. 72. @asgrim The matching method private function match(string $input) : Token { foreach (self::$matches as $pattern => $token) { if (preg_match($pattern, $input, $matches)) { return new Token($token, $matches[1]); } } throw new RuntimeException(sprintf( 'Unmatched token, next 15 chars were: %s', substr($input, 0, 15) )); }
  73. 73. @asgrim Step 2: Parsing the tokens
  74. 74. @asgrim Order tokens by operator precedence /** * Higher number is higher precedence. * @var int[] */ private static $operatorPrecedence = [ Token::T_SUBTRACT => 0, Token::T_ADD => 1, Token::T_DIVIDE => 2, Token::T_MULTIPLY => 3, ];
  75. 75. @asgrim Order tokens by operator precedence /** @var Token[] $stack */ $stack = []; /** @var Token[] $operators */ $operators = []; while (false !== ($token = current($tokens))) { if ($token->isOperator()) { // ... } $stack[] = $token; next($tokens); }
  76. 76. @asgrim Order tokens by operator precedence /** @var Token[] $stack */ $stack = []; /** @var Token[] $operators */ $operators = []; while (false !== ($token = current($tokens))) { if ($token->isOperator()) { // ... } $stack[] = $token; next($tokens); }
  77. 77. @asgrim Order tokens by operator precedence /** @var Token[] $stack */ $stack = []; /** @var Token[] $operators */ $operators = []; while (false !== ($token = current($tokens))) { if ($token->isOperator()) { // ... } $stack[] = $token; next($tokens); }
  78. 78. @asgrim Order tokens by operator precedence /** @var Token[] $stack */ $stack = []; /** @var Token[] $operators */ $operators = []; while (false !== ($token = current($tokens))) { if ($token->isOperator()) { // ... } $stack[] = $token; next($tokens); }
  79. 79. @asgrim Order tokens by operator precedence if ($token->isOperator()) { $tokenPrecedence = self::$operatorPrecedence[$token->getToken()]; while ( count($operators) && self::$operatorPrecedence[$operators[count($operators) - 1]->getToken()] > $tokenPrecedence ) { $higherOp = array_pop($operators); $stack[] = $higherOp; } $operators[] = $token; next($tokens); continue; }
  80. 80. @asgrim Order tokens by operator precedence if ($token->isOperator()) { $tokenPrecedence = self::$operatorPrecedence[$token->getToken()]; while ( count($operators) && self::$operatorPrecedence[$operators[count($operators) - 1]->getToken()] > $tokenPrecedence ) { $higherOp = array_pop($operators); $stack[] = $higherOp; } $operators[] = $token; next($tokens); continue; }
  81. 81. @asgrim Order tokens by operator precedence if ($token->isOperator()) { $tokenPrecedence = self::$operatorPrecedence[$token->getToken()]; while ( count($operators) && self::$operatorPrecedence[$operators[count($operators) - 1]->getToken()] > $tokenPrecedence ) { $higherOp = array_pop($operators); $stack[] = $higherOp; } $operators[] = $token; next($tokens); continue; }
  82. 82. @asgrim Order tokens by operator precedence if ($token->isOperator()) { $tokenPrecedence = self::$operatorPrecedence[$token->getToken()]; while ( count($operators) && self::$operatorPrecedence[$operators[count($operators) - 1]->getToken()] > $tokenPrecedence ) { $higherOp = array_pop($operators); $stack[] = $higherOp; } $operators[] = $token; next($tokens); continue; }
  83. 83. @asgrim Order tokens by operator precedence // Clean up by moving any remaining operators onto the token stack while (count($operators)) { $stack[] = array_pop($operators); } return $stack;
  84. 84. @asgrim Order tokens by operator precedence 1 + 2 * 3 Output stack Operator stack
  85. 85. @asgrim Order tokens by operator precedence 1 + 2 * 3 1Output stack Operator stack
  86. 86. @asgrim Order tokens by operator precedence 1 + 2 * 3 1 + Output stack Operator stack
  87. 87. @asgrim Order tokens by operator precedence 1 + 2 * 3 1 2 + Output stack Operator stack
  88. 88. @asgrim Order tokens by operator precedence 1 + 2 * 3 1 2 + * Output stack Operator stack
  89. 89. @asgrim Order tokens by operator precedence 1 + 2 * 3 1 2 3 + * Output stack Operator stack
  90. 90. @asgrim Order tokens by operator precedence 1 + 2 * 3 1 2 3 * + * Output stack Operator stack
  91. 91. @asgrim Order tokens by operator precedence 1 + 2 * 3 1 2 3 * + + Output stack Operator stack
  92. 92. @asgrim Create AST while ($ip < count($tokenStack)) { $token = $tokenStack[$ip++]; if ($token->isOperator()) { // (figure out $nodeType) $right = array_pop($astStack); $left = array_pop($astStack); $astStack[] = new $nodeType($left, $right); continue; } $astStack[] = new NodeScalarIntegerValue((int)$token->getLexeme()); }
  93. 93. @asgrim Create AST while ($ip < count($tokenStack)) { $token = $tokenStack[$ip++]; if ($token->isOperator()) { // (figure out $nodeType) $right = array_pop($astStack); $left = array_pop($astStack); $astStack[] = new $nodeType($left, $right); continue; } $astStack[] = new NodeScalarIntegerValue((int)$token->getLexeme()); }
  94. 94. @asgrim Create AST while ($ip < count($tokenStack)) { $token = $tokenStack[$ip++]; if ($token->isOperator()) { // (figure out $nodeType) $right = array_pop($astStack); $left = array_pop($astStack); $astStack[] = new $nodeType($left, $right); continue; } $astStack[] = new NodeScalarIntegerValue((int)$token->getLexeme()); }
  95. 95. @asgrim Create AST while ($ip < count($tokenStack)) { $token = $tokenStack[$ip++]; if ($token->isOperator()) { // (figure out $nodeType) $right = array_pop($astStack); $left = array_pop($astStack); $astStack[] = new $nodeType($left, $right); continue; } $astStack[] = new NodeScalarIntegerValue((int)$token->getLexeme()); }
  96. 96. @asgrim Create AST NodeBinaryOpAdd ( NodeScalarIntegerValue(1), NodeBinaryOpMultiply ( NodeScalarIntegerValue(2), NodeScalarIntegerValue(3) ) )
  97. 97. @asgrim Step 3: Executing the AST
  98. 98. @asgrim Compile & execute AST private function compileNode(NodeInterface $node) { if ($node instanceof NodeBinaryOpAbstractBinaryOp) { return $this->compileBinaryOp($node); } if ($node instanceof NodeScalarIntegerValue) { return $node->getValue(); } }
  99. 99. @asgrim Compile & execute AST private function compileBinaryOp(NodeBinaryOpAbstractBinaryOp $node) { $left = $this->compileNode($node->getLeft()); $right = $this->compileNode($node->getRight()); switch (get_class($node)) { case NodeBinaryOpAdd::class: return $left + $right; case NodeBinaryOpSubtract::class: return $left - $right; case NodeBinaryOpMultiply::class: return $left * $right; case NodeBinaryOpDivide::class: return $left / $right; } }
  100. 100. @asgrim What does this mean for me?
  101. 101. @asgrim AST in userland
  102. 102. @asgrim php-ast extension https://github.com/nikic/php-ast
  103. 103. @asgrim php-ast example usage <?php require 'path/to/util.php'; $code = <<<'EOC' <?php $var = 42; EOC; echo ast_dump(astparse_code($code, $version=35)), "n"; // Output: AST_STMT_LIST 0: AST_ASSIGN var: AST_VAR name: "var" expr: 42
  104. 104. @asgrim astkit https://github.com/sgolemon/astkit
  105. 105. @asgrim astkit example usage $if = AstKit::parseString(<<<EOD if (true) { echo "This is a triumph.n"; } else { echo "The cake is a lie.n"; } EOD ); $if->execute(); // First run, program is as-seen above $const = $if->getChild(0)->getChild(0); // Replace the "true" constant in the condition with false $const->graft(0, false); // Can also graft other AstKit nodes, instead of constants $if->execute(); // Second run now takes the else path
  106. 106. @asgrim PhpParser https://github.com/nikic/PHP-Parser
  107. 107. @asgrim PHP Parser <?php use PhpParserParserFactory; $parser = (new ParserFactory) ->create(ParserFactory::PREFER_PHP7); print_r($parser->parse( file_get_contents('ast-demo-src.php') ));
  108. 108. @asgrim Better Reflection https://github.com/Roave/BetterReflection
  109. 109. @asgrim Better Reflection workflow Reflector Source Locator PhpParser Reflection
  110. 110. @asgrim PHP Reflection $reflection = new ReflectionClass( MyExampleClass::class ); $this->assertSame( 'ExampleClass', $reflection->getShortName() );
  111. 111. @asgrim Better Reflection $reflection = ReflectionClass::createFromName( MyExampleClass::class ); $this->assertSame( 'ExampleClass', $reflection->getShortName() );
  112. 112. @asgrim ReflectionClass::createFromName() // In ReflectionClass : public static function createFromName($className) { return ClassReflector::buildDefaultReflector()->reflect($className); }
  113. 113. @asgrim ClassReflector::buildDefaultReflector() // In ClassReflector : public static function buildDefaultReflector() { return new self(new AggregateSourceLocator([ new PhpInternalSourceLocator(), new EvaledCodeSourceLocator(), new AutoloadSourceLocator(), ])); }
  114. 114. @asgrim Given a class structure... <?php class Foo { private $bar; public function thing() { } }
  115. 115. @asgrim … we get the AST! Class, name Foo |-- Statements | |-- Property, name bar | | |-- Type [private] | | `-- Attributes [start line: 7, end line: 9] | `-- Method, name thing | |-- Type [public] | |-- Parameters [...] | |-- Statements [...] | `-- Attributes [start line: 7, end line: 9] `-- Attributes [start line: 3, end line: 10]
  116. 116. Any questions? https://joind.in/talk/a51b7 James Titcumb @asgrim

×