PHP’s Guts 
Why you should know how PHP works
Architecture 
Yo I heard you like modules in your modules so we gave you three types!.
Engine 
 Lexer, Parser, Compiler, Executor
Core 
 Streams 
 Request Management 
 Variables 
 Network
SAPIS 
 Server API 
 CLI, CGI, mod_php, phpdbg, embed 
 input args 
 output, flushing, file descriptors, interruptions, system user info 
 input filtering and optionally headers, post data, http specific stuff
Extensions 
 Talk to a C library 
 do stuff faster then PHP 
 make the engine funny
Lifecycle 
 MINIT 
 RINIT 
 GINIT 
GINIT 
MINIT 
RINIT 
GSHUTDOWN 
RSHUTDOWN SCRIPT 
MSHUTDOWN
Who Cares? 
 Pick the right SAPI 
 Fewer extensions = better 
 Static extensions = better 
 Lifecycle is important for sharing stuff 
 Newer PHP = better faster stronger
turn on the “go_fast” ini setting 
Thread, fork, async, very wow
Threading 
 Thread Safe != reentrant 
 Thread safe != parallel 
 Thread safe != async 
 Thread safe != concurrent 
 Thread safe == two threads running at the same time won’t stomp on the others data 
 yes really, that’s all it means
Reentrant 
 Let’s quit this, and run it again 
 and it will be like we never ran it
Async 
 I’m gonna work on this stuff 
 But I’m not going to block you if you have important stuff to do
Parallel … Concurrent 
 Concurrent – two things at the same time that need communication 
 Parallel – two things at the same time
TSRM 
 Thread safe resource manager 
 global data in extensions 
 making some C re-entrant 
 thread safety
Why do I care? 
 react-php (parallel) 
 pecl event (async) 
 pthreads (concurrent) 
 pcntl (fork and pray) 
 proc_open/popen (subprocessing) 
 queues and jobs and workers 
 native tls rfc
Welcome to the Engine 
Lexers and Parsers and Opcodes OH MY!
Lexer 
 checks PHP’s spelling 
 turns into tokens 
 see token_get_all for what PHP sees
Parser 
 checks PHP’s grammar 
 E_PARSE means “bad phpish” 
 creates opcodes (or AST)
Compiler 
 Only with AST 
 Turns AST into Opcodes 
 Allows for fancier grammar
Opcodes 
 dump with http://derickrethans.nl/projects.html 
 machine readable language the runtime understands
Opcache (and AST) 
 cache opcodes – skip lexing and parsing 
 https://support.cloud.engineyard.com/entries/26902267-PHP-Performance-I-Everything- 
You-Need-to-Know-About-OpCode-Caches 
 https://wiki.php.net/rfc/abstract_syntax_tree
Engine (Virtual Machine) 
 reads opcode 
 does something 
 ??? 
 PROFIT
Why do I care? 
 Use an opcode cache 
 If you don’t, you’re crazy, stupid, or lazy 
 Upgrade to get cooler stuff
Variables 
PHP is a C types wrapper
Zvals 
typedef union _zvalue_value { 
long lval; 
double dval; 
struct { 
char *val; 
int len; 
} str; 
HashTable *ht; 
zend_object_value obj; 
} zvalue_value; 
typedef union _zend_value { 
zend_long lval’; 
double dval; 
zend_refcounted *counted; 
zend_string *str; 
zend_array *arr; 
zend_object *obj; 
zend_resource *res; 
zend_reference *ref; 
zend_ast_ref *ast; 
zval *zv; 
void *ptr; 
zend_class_entry *ce; 
zend_function *func; 
} zend_value;
Numbers 
 Booleans are unsigned char 
 Integers are really signed long integers 
 Longs are platform dependent 
 Floats and doubles are doubles not floats
64 Bit Madness 
 LLP64 
 short = 16 
 int = 32 
 long = 32 
 long long = 64 
 pointer = 64 
(windows) 
 LP64 
 short = 16 
 integer = 32 
 long = 64 
 long long = 64 
 pointer = 64 
(unices)
Strings 
 Char * 
 Translated to what we see by an algorithm 
 ASCII, UTF8, binary – EVERYTHING has a codepage 
 wchar? screw you
Arrays 
 they’re not 
 hashtables 
 and doubly linked lists
Resources 
 stores random opaque C data 
 in a giant list of doom 
 sigh
Objects 
 handlers 
 property tables 
 magic storage
Why do I care? 
 Know the limitations of your data types 
 Remember that arrays aren’t arrays 
 Beware of many many resources 
 Beware of many many objects 
 64 bit can be broken in strange ways
C Moar 
Implementation WTFeries and other fun
Stack? Heap? 
 Stack = scratch space for thread of execution 
 can overflow! 
 slightly faster 
 size determined at thread start 
 Heap = space for dynamic allocation 
 managed by program 
 can fragment 
 leaky!
Zend Memory Manager 
 Internal Heap Allocator 
 frees yo memory (leak management) 
 preallocates blocks in set sizes that PHP uses 
 caches allocations to avoid fragmentation 
 allows monitoring of memory usage
COW (not moo) 
 Copy On Write 
 1 zval, many variables 
 each variable increases refcount 
 destroy after refcount 
 Oh no, a change! copy
Refcounts, GC, and PHPNG 
 Sometimes you have a refcount but no var to reference it 
 This is a circular reference, this sucks (ask doctrine) 
 GC checks for this periodically and cleans up 
 PHPNG
References are not Pointers 
 PHP is smarter than you are 
 access the same variable content by different names 
 using symbol table aliases 
 variable name != variable content
Side Track – Objects are not References 
$a = new stdClass; 
$b = $a; 
$a->foo = 'bar'; 
var_dump($b); 
$a = 'baz'; 
var_dump($b);
Places to Learn More 
 http://www.phpinternalsbook.com 
 http://php.net 
 http://lxr.php.net 
 http://wiki.php.net 
 http://nikic.github.io/ 
 http://blog.krakjoe.ninja/
About Me 
 http://emsmith.net 
 @auroraeosrose 
 That’s Aurora Eos Rose 
 auroraeosrose@gmail.com 
 freenode in #phpmentoring #phpwomen #phpinternals

Php’s guts

  • 1.
    PHP’s Guts Whyyou should know how PHP works
  • 2.
    Architecture Yo Iheard you like modules in your modules so we gave you three types!.
  • 3.
    Engine  Lexer,Parser, Compiler, Executor
  • 4.
    Core  Streams  Request Management  Variables  Network
  • 5.
    SAPIS  ServerAPI  CLI, CGI, mod_php, phpdbg, embed  input args  output, flushing, file descriptors, interruptions, system user info  input filtering and optionally headers, post data, http specific stuff
  • 6.
    Extensions  Talkto a C library  do stuff faster then PHP  make the engine funny
  • 7.
    Lifecycle  MINIT  RINIT  GINIT GINIT MINIT RINIT GSHUTDOWN RSHUTDOWN SCRIPT MSHUTDOWN
  • 8.
    Who Cares? Pick the right SAPI  Fewer extensions = better  Static extensions = better  Lifecycle is important for sharing stuff  Newer PHP = better faster stronger
  • 9.
    turn on the“go_fast” ini setting Thread, fork, async, very wow
  • 10.
    Threading  ThreadSafe != reentrant  Thread safe != parallel  Thread safe != async  Thread safe != concurrent  Thread safe == two threads running at the same time won’t stomp on the others data  yes really, that’s all it means
  • 11.
    Reentrant  Let’squit this, and run it again  and it will be like we never ran it
  • 12.
    Async  I’mgonna work on this stuff  But I’m not going to block you if you have important stuff to do
  • 13.
    Parallel … Concurrent  Concurrent – two things at the same time that need communication  Parallel – two things at the same time
  • 14.
    TSRM  Threadsafe resource manager  global data in extensions  making some C re-entrant  thread safety
  • 15.
    Why do Icare?  react-php (parallel)  pecl event (async)  pthreads (concurrent)  pcntl (fork and pray)  proc_open/popen (subprocessing)  queues and jobs and workers  native tls rfc
  • 16.
    Welcome to theEngine Lexers and Parsers and Opcodes OH MY!
  • 17.
    Lexer  checksPHP’s spelling  turns into tokens  see token_get_all for what PHP sees
  • 18.
    Parser  checksPHP’s grammar  E_PARSE means “bad phpish”  creates opcodes (or AST)
  • 19.
    Compiler  Onlywith AST  Turns AST into Opcodes  Allows for fancier grammar
  • 20.
    Opcodes  dumpwith http://derickrethans.nl/projects.html  machine readable language the runtime understands
  • 21.
    Opcache (and AST)  cache opcodes – skip lexing and parsing  https://support.cloud.engineyard.com/entries/26902267-PHP-Performance-I-Everything- You-Need-to-Know-About-OpCode-Caches  https://wiki.php.net/rfc/abstract_syntax_tree
  • 22.
    Engine (Virtual Machine)  reads opcode  does something  ???  PROFIT
  • 23.
    Why do Icare?  Use an opcode cache  If you don’t, you’re crazy, stupid, or lazy  Upgrade to get cooler stuff
  • 24.
    Variables PHP isa C types wrapper
  • 25.
    Zvals typedef union_zvalue_value { long lval; double dval; struct { char *val; int len; } str; HashTable *ht; zend_object_value obj; } zvalue_value; typedef union _zend_value { zend_long lval’; double dval; zend_refcounted *counted; zend_string *str; zend_array *arr; zend_object *obj; zend_resource *res; zend_reference *ref; zend_ast_ref *ast; zval *zv; void *ptr; zend_class_entry *ce; zend_function *func; } zend_value;
  • 26.
    Numbers  Booleansare unsigned char  Integers are really signed long integers  Longs are platform dependent  Floats and doubles are doubles not floats
  • 27.
    64 Bit Madness  LLP64  short = 16  int = 32  long = 32  long long = 64  pointer = 64 (windows)  LP64  short = 16  integer = 32  long = 64  long long = 64  pointer = 64 (unices)
  • 28.
    Strings  Char*  Translated to what we see by an algorithm  ASCII, UTF8, binary – EVERYTHING has a codepage  wchar? screw you
  • 29.
    Arrays  they’renot  hashtables  and doubly linked lists
  • 30.
    Resources  storesrandom opaque C data  in a giant list of doom  sigh
  • 31.
    Objects  handlers  property tables  magic storage
  • 32.
    Why do Icare?  Know the limitations of your data types  Remember that arrays aren’t arrays  Beware of many many resources  Beware of many many objects  64 bit can be broken in strange ways
  • 33.
    C Moar ImplementationWTFeries and other fun
  • 34.
    Stack? Heap? Stack = scratch space for thread of execution  can overflow!  slightly faster  size determined at thread start  Heap = space for dynamic allocation  managed by program  can fragment  leaky!
  • 35.
    Zend Memory Manager  Internal Heap Allocator  frees yo memory (leak management)  preallocates blocks in set sizes that PHP uses  caches allocations to avoid fragmentation  allows monitoring of memory usage
  • 36.
    COW (not moo)  Copy On Write  1 zval, many variables  each variable increases refcount  destroy after refcount  Oh no, a change! copy
  • 37.
    Refcounts, GC, andPHPNG  Sometimes you have a refcount but no var to reference it  This is a circular reference, this sucks (ask doctrine)  GC checks for this periodically and cleans up  PHPNG
  • 38.
    References are notPointers  PHP is smarter than you are  access the same variable content by different names  using symbol table aliases  variable name != variable content
  • 39.
    Side Track –Objects are not References $a = new stdClass; $b = $a; $a->foo = 'bar'; var_dump($b); $a = 'baz'; var_dump($b);
  • 40.
    Places to LearnMore  http://www.phpinternalsbook.com  http://php.net  http://lxr.php.net  http://wiki.php.net  http://nikic.github.io/  http://blog.krakjoe.ninja/
  • 41.
    About Me http://emsmith.net  @auroraeosrose  That’s Aurora Eos Rose  auroraeosrose@gmail.com  freenode in #phpmentoring #phpwomen #phpinternals

Editor's Notes

  • #2 story of how I got into internals in the first place and how each new discovery (extensions, sapis, engine, oh look now I can do it all) led down the alice rabbit hole but it also made me a better PHP programmer because I knew all the WTFs
  • #3 So PHP does the architecture of it’s system right – it’s as big as it needs to be, and no bigger – but all the important components are pluggable and extendable which makes it awesome glue take a side track about learning more about programming and how down or up the stack is usually more valuable in general than going across stacks
  • #4 well talk more about this later, but this is the part that actually looks at and analyzes your source code and makes it actually like – talk to your cpu and run yes, this isn’t really different from say c# or java – the difference is WHAT it compiles to C compiles to machine code your system can immediately use c# goes to msil which is run on their runtime java gotes to bytecode that runs on the java vm smarty compiles a template to a (horrible) php file
  • #5 so the core functionality of PHP (in main) is kind of a mishmash – but generally it’s IO the php manual lies though – if you look up “core” functionality what is actually IN core is not nearly so much instead what you’re seeing is extensions you can’t “turn off” – well you can if you’re nuts try it sometime, PHP is really boring without it’s “standard lib”
  • #6 SAPIs provide the glue for interfacing PHP into an application. They define the ways in which data is passed between an application and PHP this is really what sets PHP apart this can also make or break you if you choose poorly a lot of sapi choice is dictated by server choice, although most have gone to fastcgi at this point, which although it’s an old protocol it works well and is stable and “shared nothing” for example python invented it’s own interface (wsgi) which requires a separate server that talks something your server actually talks (fastcgi, scgi) instead of using a pluggable model there are those that whine that PHP doesn’t have this “middleware” – it’s actually easily doable though – you could do a dedicated sapi or just use the embedded sapi sapis take care of setting up interpreter context, dealing optionally with headers and input args (don’t need to do anything, it’s entirely optional, if you want yur PHP code itself needed work) we could use more sapis! people run away from writing these which is sad – I recruit – have a list of ones I’d love to mentor you into writing 
  • #7 almost everything is an extension there are two types, regular and “zend” extensions zend extensions can “hook” engine behavior using opcodes 99.9% of the functionality you use comes from this
  • #8 SO – threading makes this a little weird – because MINIT is not run in new threads so ginit is called right before rinit if ZTS is on (annoying)
  • #9 so why is it important to know this stuff you need to know what extensions are available why you would compile your own PHP with all static extensions sharing can be limited when requests aren’t shared
  • #10 This is going to annoy some people because they go on and on about how PHP is “thread safe” – but really it’s not it’s kind of almost able to be threaded when compiled right most of the time others blame things on libraries no – no there are some very bad things in core that totally prevent this 
  • #11 ah “thread safety”
  • #14 Parallelism is the act of taking a large job, splitting it up into smaller ones, and doing them at once. People often use "parallel" and "concurrent" interchangably, but there is a subtle difference. Concurrency is necessary for parallelism but not the other way around. If I alternate between cooking eggs and pancakes I'm doing both concurrently. If I'm cooking eggs while you are cooking pancakes, we are cooking concurrently and in parallel. Technically if I'm cooking eggs and you are mowing the lawn we are also working in parallel, but since no coordination is needed in that case there's nothing to talk about.
  • #16 talk about what each of these means
  • #18  Lexical Analysis Converts the source from a sequence of characters into a sequence of tokens
  • #19  Syntax Analysis Analyzes a sequence of tokens to determine their grammaticalstructure
  • #20 5.6 and 7+
  • #21  Generate bytecode based on the information gathered byanalyzing the sourcecode
  • #22 abstract syntax tree – decouples compiler and parser steps – even though we compile to opcode, it’s still a compile before php7 we emit opcodes directrly from parsing (a bit eww) now we can do better, cooler stuff
  • #23 so zend is actually a “virtual machine” it interprets OPCODES and does stuff with them reads each opcode and does a specific action – like a giant state machine
  • #25 underlying it all PHP just has some basic types
  • #26 every zval stores some value and the type this value has A union defines multiple members of different types, but only one of them can ever be used at a time unions store all their members at the same memory location and just interpret the value located there differently depending on which member you access. The size of the union is the size of its largest member so why do we care that this is how PHP stores stuff? at the end of the day those are actual C types underneath we’re just “mapping” to with many of the conversion rules that go along with it
  • #27 char = smallest addressable unit of the machine IEEE 754 single-precision binary floating-point format = float IEEE 754 double-precision binary floating-point format = double this is more precision – but remember to use gmp for real math
  • #28 The disadvantage of the LP64 model is that storing a long into an int may overflow converting a pointer to a long will “work” in LP64 useful for – the lazy LLP64 is generally the “safer” route – you can’t convert a pointer to a long (WTF WOULD YOU?) and a long to an int won’t overflow (BC considerations) therefore logical choice for windows handling of strings >= 2^31 handling of 64 bit integers large file support handling of numeric 64 bit hash keys Fixed in PHP7
  • #29 null terminated array of chars Another way of accessing a contiguous chunk of memory, instead of with an array, is with a pointer. the character array containing the string must already exist (having been either statically- or dynamically-allocated) C is a programming language that was developed in an environment where the dominant character set was the 7-bit ASCII code. Hence since then the 8-bit byte is the most common unit of encoding. However when a software is developed for an international purpose, it has to be able to represent different characters. For example character encoding schemes to represent the Indian, Chinese, Japanese writing systems should be available. The inconvenience of handling such varied multibyte characters can be eliminated by using characters that are simply a uniform number of bytes. ANSI C provides a type that allows manipulation of variable width characters as uniform sized data objects called wide character story of Microsoft’s early adoption, utf8 on a napkin
  • #30 Arrays in C are just regions of memory that can be accessed by offset offsets must be continuous integers complex key becomes integer via hash function for hash collisions – PHP stores all the items with the same hash in a linked list
  • #31 so – advantage of resources is they’re smaller than objecst in php disadvantages are very numerous they’re slow – depending on what you’re doing much slower than objects they’re limited – you can literally run out of resources, and they get shove in a giant list in the executor no seriously this is why they suck sadpanda
  • #32 like resources, these are stored in executor globals like resources eventually some day you might run out but you can sure do a lot more with them store opaque data, deal with opaque data, etc
  • #35 The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer. The heap is memory set aside for dynamic allocation. Unlike the stack, there's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns. Each thread gets a stack, while there's typically only one heap for the application (although it isn't uncommon to have multiple heaps for different types of allocation). things get crazy when you add dynamically loaded modules – they may have their OWN heap
  • #36 fewer calls to malloc – less cpu usage, less kernel madness less fragmentation LIBRARIES do not use it! you can turn it off, but 99.9% of the time it’s better with it
  • #37 The behavior is very straightforward: When a reference is added, increment the refcount, if a reference is removed, decrement it. If the refcount reaches 0, the zval is destroyed.
  • #38 All values in existing Zend Engine implementation were allocated on heap and they were subject for reference counting and garbage collection. Zend engine mostly operated by pointers to zvals phpng stores data in a totally different wayand even though I saw an interesting talk on it at zendcon I’m stillw rapping my head around the code basically it seperates scalar from non-scalar and uses flags to give it information (type, etc)
  • #39 Assigning values by references when you don't need to (in order to later modify the original value through a different label) is NOT a case of you outsmarting the silly engine and gaining speed and performance. It's the opposite, it's you TRYING to outsmart the engine and failing, because the engine is already doing a better job than you think. in other words, references are basically useful for – digging into internal nested arrays and input/output parameters – that’s about it
  • #40  objects do behave in a references but remember a variable that is assigned to an object just holds a pointer to the actual value of the object – which is elsewhere if you later assign that
  • #41 I should blog more, there’s a lot in my damn head