PHP 7 performances
From PHP 5 in Symfony context
Hello everybody
 Julien PAULI
 SensioLabs Blackfire team
 Programming with PHP since early 2000s
 Now : Unix system programmer (C)
 PHP Internals programmer/reviewer
 PHP 5.5 & 5.6 Release Manager
 @julienpauli
 Tech blog at jpauli.github.io
 jpauli@php.net
What we'll cover together
 Profiling a simple SF2 app
 Under PHP 5
 Under PHP 7
 Compare profiles using Blackfire graph comparison
 Analyze numbers
 Dive into PHP 7 performances
 Structures optimizations
 New variable model (zval)
 New HashTable model
 String management with zend_string
 Other ideas...
Profiles
 Done on my laptop (not on prod env)
 LP64
 Done on app_dev.php (debug mode)
 Do not take numbers for real
 But relative measures
 Performed with Blackfire
 On PHP-5.6 latest
 On PHP-7.0.0RC8
Blackfire
 General profiler
 Not only PHP, but works best for PHP
 Free version exists
 Collects many metrics
 memory, CPU, IO, Network trafic, SQL ...
 Graphs useful info, trashes useless info
 Immediately spot your perf problems
 Nice graph comparison view
Blackfire collector
 Collector is a PHP extension
 ~ 5000 C lines
 Available for 5.3, 5.4, 5.5 and 5.6
 In beta for PHP 7, but soon to be released
 In beta for Windows platforms, but soon to be
released
 Collector impact is NULL if not triggered
 Collector works in prod environment
 It is highly optimized for performances
 It is finely optimized for each PHP version
Hello Hangman
Profiles
 Done on my laptop
 Done with app_dev.php
 Do not take numbers for real
 But relative measures
Which PHP ?
Which PHP ?
PHP 7
PHP 5
Which PHP ?
 PHP 7 is slower than PHP 5 ...
 When no OPCode cache is used !
 This is a 15% perf difference
 (Remember numbers target this SF2-based small
app)
PHP 5
PHP 7
PHP 7 changes
 PHP 7 now uses an AST based compiler
 The PHP 7 compiler is SLOWER than PHP 5's
 But much more well designed
 Creating and compiling an AST is slow
 The AST is hookable with PHP extensions
 The AST is hookable in userland using nikic/php-ast
 The compiler is more complex, it tries to optimize runtime
 ... But as you use an OPCode cache
 This is not a problem to you
 PHP 7 compiler generates better runtime OPCodes
 Your runtime will be better compared to PHP 5
Profiling with OPCache
PHP 7
PHP 5
Comparing view, with OPCache
 PHP 7 runs faster on this app by a factor of 23%
 PHP 7 CPU usage is 22% less than PHP 5
 PHP 7 memory footprint is 38% less than PHP 5
 ~ 3.85Mb less in our case
Comparing view, with OPCache
 Some components benefit more than others of PHP
7 performance optimizations
PHP 7 optimizations
Optimizing CPU time
 Latency Numbers Every Programmer Should Know
 http://lwn.net/Articles/250967/
 http://www.eecs.berkeley.edu/~rcs/research/interactive_l
atency.html
2016 numbers (may vary with chip)
---------------------------------------------------
L1 cache reference 1 ns
Branch mispredict 3 ns
L2 cache reference 4 ns 4x L1 cache
L3 cache reference 12 ns 3X L2 cache, 12x L1 cache
Main memory reference 100 ns 25x L2 cache, 100x L1 cache
SSD random read 16,000 ns
HDD random read(seek) 200,000,000 ns
Optimizing CPU cache efficiency
 If we can reduce payload size, the CPU will use its
caches more often
 CPU caches prefetch data on a "line" basis
 Improve data locality to improve cache efficiency
 https://software.intel.com/en-us/articles/optimize-data-
structures-and-memory-access-patterns-to-improve-
data-locality
 That means in C
 Reduce number of pointer indirections
 Stick data together (struct hacks, struct merges)
 Use smaller data sizes
Optimizing CPU cache efficiency
 If we can reduce payload size, the CPU will use its
caches more often
PHP 5.6 (debug)
456,483483 task-clock # 0,974 CPUs utilized
1 405 context-switches # 0,003 M/sec
7 CPU-migrations # 0,000 M/sec
8 633 page-faults # 0,019 M/sec
1 163 771 607 cycles # 2,549 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1 247 617 395 instructions # 1,07 insns per cycle
181 700 375 branches # 398,044 M/sec
5 257 940 branch-misses # 2,89% of all branches
9 085 235 cache-references # 20,787 M/sec
1 108 044 cache-misses # 12,196 % of all cache refs
0,468451813 seconds time elapsed
Optimizing CPU cache efficiency
 If we can reduce payload size, the CPU will use its
caches more often
PHP 7.0.0RC8 (debug)
306,006739 task-clock # 0,916 CPUs utilized
1 446 context-switches # 0,005 M/sec
2 CPU-migrations # 0,000 M/sec
4 330 page-faults # 0,014 M/sec
787 684 146 cycles # 2,574 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
817 673 456 instructions # 1,04 insns per cycle
121 452 445 branches # 396,895 M/sec
3 356 650 branch-misses # 2,76% of all branches
5 741 559 cache-references # 18,464 M/sec
873 581 cache-misses # 15,215 % of all cache refs
0,334226815 seconds time elapsed
PHP 7 cache efficiency
PHP 7.0.0RC8 (debug)
306,006739 task-clock
1 446 context-switches
4 330 page-faults
787 684 146 cycles
817 673 456 instructions
121 452 445 branches
3 356 650 branch-misses
5 741 559 cache-references
873 581 cache-misses
0,334226815 seconds time elapsed
PHP 5.6 (debug)
456,483483 task-clock
1 405 context-switches
8 633 page-faults
1 163 771 607 cycles
1 247 617 395 instructions
181 700 375 branches
5 257 940 branch-misses
9 085 235 cache-references
1 108 044 cache-misses
0,468451813 seconds time elapsed
PHP 7 optimizations
 Every variable in PHP is coded on a zval struct
 This struct has been reorganized in PHP 7
 Narrowed / shrinked
 separated
PHP 5 variables
value
refcount is_ref
type
gc_info
dval
str_val* str_len
hashtable*
object*
lval
ast*
zval
zval_value
...
...
HashTable
32 bytes
$a
8 bytes
zval *
XX bytes
 40 bytes + complex value size
 2 indirections
PHP 7 variables
value
type
internal_int
dval
zend_string*
object*
lval
...
zval
zval_value
...
...
HashTable
16 bytes
$a
zval
XX bytes
 16 bytes + complex value size
 1 indirection
hashtable*
gc_infos
refcount
infosflags
gc_infos
PHP 5 vs PHP 7 variable design
 zval container no longer stores GC infos
 No more need to heap allocate a zval *
 GC infos stored into each complex types
 each complex type may now be shared
 In PHP 5, we had to share the zval containing them
 PHP 7 variables are more CPU cache efficient
Hashtables (arrays)
 In PHP, HashTables are used to represent the PHP
array type
 But HashTables are also used internally
 Everywhere
 HashTables optimization in PHP 7 are well felt as
they are heavilly used internally
HashTables in PHP 5
 Each element needs
 4 pointer indirections
 72 bytes for a bucket + 32 bytes for a zval
zval
zval *
HashTable
$a
zval *
HashTable*
bucket *
zval
64 bytes
72 bytesbucket
HashTables in PHP 7
 Each element needs
 2 pointer indirections
 32 bytes for a bucket
zval
bucket
HashTable
$a
zval
HashTable*
zval
56 bytes
32 bytes
bucket*
PHP 7 Hash
 Memory layout is as contiguous as possible
hash"foo" 1234 | (-table_size) -3
nIndex
PHP 7 Hash
 Memory layout is as contiguous as possible
hash"foo" 1234 | (-table_size) -3
buckets*
arData
-1-2-3
2 X XX
-4 1 2
nIndex
nIndex idx
idx
hash
key
zval
bucket
PHP 7 HashTables memory layout
 $a['foo'] = 42;
arData
-6-7
arData[0]
arData[1]
42 bucket
String management
 In PHP 5, strings don't have their own structure
 String management is hard
 Leads to many strings duplication
 Many more memory access
 In PHP 7, strings share the zend_string structure
 They are refcounted
 hashes are precomputed, often at compile time
 struct hack is used to compact memory
Strings in PHP
char * str
...
zval
gc_infos
int len
refcount is_ref zend_string *
...
zval
...
hash
gc_infos
char str[1]size_t len
...
zend_string
PHP 5 PHP 7
Strings in PHP 5
 $a = "foo";
3 0X00007FFFF7F8C708
foo0
Strings in PHP 7
 $a = "foo";
foo0
C struct hack
3
memory border
Comparing VM times for array operations
PHP 7
PHP 5
Other VM operation comparisons
 Casts
PHP 7
PHP 5
Other VM operation comparisons
 Concatenations
PHP 7
PHP 5
Other VM operation comparisons
 @ usage (error suppression)
PHP 7
PHP 5
PHP 7 is released :-)
get it now <3
Thank you for listening

PHP 7 performances from PHP 5

  • 1.
    PHP 7 performances FromPHP 5 in Symfony context
  • 2.
    Hello everybody  JulienPAULI  SensioLabs Blackfire team  Programming with PHP since early 2000s  Now : Unix system programmer (C)  PHP Internals programmer/reviewer  PHP 5.5 & 5.6 Release Manager  @julienpauli  Tech blog at jpauli.github.io  jpauli@php.net
  • 3.
    What we'll covertogether  Profiling a simple SF2 app  Under PHP 5  Under PHP 7  Compare profiles using Blackfire graph comparison  Analyze numbers  Dive into PHP 7 performances  Structures optimizations  New variable model (zval)  New HashTable model  String management with zend_string  Other ideas...
  • 4.
    Profiles  Done onmy laptop (not on prod env)  LP64  Done on app_dev.php (debug mode)  Do not take numbers for real  But relative measures  Performed with Blackfire  On PHP-5.6 latest  On PHP-7.0.0RC8
  • 5.
    Blackfire  General profiler Not only PHP, but works best for PHP  Free version exists  Collects many metrics  memory, CPU, IO, Network trafic, SQL ...  Graphs useful info, trashes useless info  Immediately spot your perf problems  Nice graph comparison view
  • 6.
    Blackfire collector  Collectoris a PHP extension  ~ 5000 C lines  Available for 5.3, 5.4, 5.5 and 5.6  In beta for PHP 7, but soon to be released  In beta for Windows platforms, but soon to be released  Collector impact is NULL if not triggered  Collector works in prod environment  It is highly optimized for performances  It is finely optimized for each PHP version
  • 7.
  • 8.
    Profiles  Done onmy laptop  Done with app_dev.php  Do not take numbers for real  But relative measures
  • 9.
  • 10.
  • 11.
    Which PHP ? PHP 7 is slower than PHP 5 ...  When no OPCode cache is used !  This is a 15% perf difference  (Remember numbers target this SF2-based small app) PHP 5 PHP 7
  • 12.
    PHP 7 changes PHP 7 now uses an AST based compiler  The PHP 7 compiler is SLOWER than PHP 5's  But much more well designed  Creating and compiling an AST is slow  The AST is hookable with PHP extensions  The AST is hookable in userland using nikic/php-ast  The compiler is more complex, it tries to optimize runtime  ... But as you use an OPCode cache  This is not a problem to you  PHP 7 compiler generates better runtime OPCodes  Your runtime will be better compared to PHP 5
  • 13.
  • 14.
    Comparing view, withOPCache  PHP 7 runs faster on this app by a factor of 23%  PHP 7 CPU usage is 22% less than PHP 5  PHP 7 memory footprint is 38% less than PHP 5  ~ 3.85Mb less in our case
  • 15.
    Comparing view, withOPCache  Some components benefit more than others of PHP 7 performance optimizations
  • 16.
  • 17.
    Optimizing CPU time Latency Numbers Every Programmer Should Know  http://lwn.net/Articles/250967/  http://www.eecs.berkeley.edu/~rcs/research/interactive_l atency.html 2016 numbers (may vary with chip) --------------------------------------------------- L1 cache reference 1 ns Branch mispredict 3 ns L2 cache reference 4 ns 4x L1 cache L3 cache reference 12 ns 3X L2 cache, 12x L1 cache Main memory reference 100 ns 25x L2 cache, 100x L1 cache SSD random read 16,000 ns HDD random read(seek) 200,000,000 ns
  • 18.
    Optimizing CPU cacheefficiency  If we can reduce payload size, the CPU will use its caches more often  CPU caches prefetch data on a "line" basis  Improve data locality to improve cache efficiency  https://software.intel.com/en-us/articles/optimize-data- structures-and-memory-access-patterns-to-improve- data-locality  That means in C  Reduce number of pointer indirections  Stick data together (struct hacks, struct merges)  Use smaller data sizes
  • 19.
    Optimizing CPU cacheefficiency  If we can reduce payload size, the CPU will use its caches more often PHP 5.6 (debug) 456,483483 task-clock # 0,974 CPUs utilized 1 405 context-switches # 0,003 M/sec 7 CPU-migrations # 0,000 M/sec 8 633 page-faults # 0,019 M/sec 1 163 771 607 cycles # 2,549 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 1 247 617 395 instructions # 1,07 insns per cycle 181 700 375 branches # 398,044 M/sec 5 257 940 branch-misses # 2,89% of all branches 9 085 235 cache-references # 20,787 M/sec 1 108 044 cache-misses # 12,196 % of all cache refs 0,468451813 seconds time elapsed
  • 20.
    Optimizing CPU cacheefficiency  If we can reduce payload size, the CPU will use its caches more often PHP 7.0.0RC8 (debug) 306,006739 task-clock # 0,916 CPUs utilized 1 446 context-switches # 0,005 M/sec 2 CPU-migrations # 0,000 M/sec 4 330 page-faults # 0,014 M/sec 787 684 146 cycles # 2,574 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 817 673 456 instructions # 1,04 insns per cycle 121 452 445 branches # 396,895 M/sec 3 356 650 branch-misses # 2,76% of all branches 5 741 559 cache-references # 18,464 M/sec 873 581 cache-misses # 15,215 % of all cache refs 0,334226815 seconds time elapsed
  • 21.
    PHP 7 cacheefficiency PHP 7.0.0RC8 (debug) 306,006739 task-clock 1 446 context-switches 4 330 page-faults 787 684 146 cycles 817 673 456 instructions 121 452 445 branches 3 356 650 branch-misses 5 741 559 cache-references 873 581 cache-misses 0,334226815 seconds time elapsed PHP 5.6 (debug) 456,483483 task-clock 1 405 context-switches 8 633 page-faults 1 163 771 607 cycles 1 247 617 395 instructions 181 700 375 branches 5 257 940 branch-misses 9 085 235 cache-references 1 108 044 cache-misses 0,468451813 seconds time elapsed
  • 22.
    PHP 7 optimizations Every variable in PHP is coded on a zval struct  This struct has been reorganized in PHP 7  Narrowed / shrinked  separated
  • 23.
    PHP 5 variables value refcountis_ref type gc_info dval str_val* str_len hashtable* object* lval ast* zval zval_value ... ... HashTable 32 bytes $a 8 bytes zval * XX bytes  40 bytes + complex value size  2 indirections
  • 24.
    PHP 7 variables value type internal_int dval zend_string* object* lval ... zval zval_value ... ... HashTable 16bytes $a zval XX bytes  16 bytes + complex value size  1 indirection hashtable* gc_infos refcount infosflags gc_infos
  • 25.
    PHP 5 vsPHP 7 variable design  zval container no longer stores GC infos  No more need to heap allocate a zval *  GC infos stored into each complex types  each complex type may now be shared  In PHP 5, we had to share the zval containing them  PHP 7 variables are more CPU cache efficient
  • 26.
    Hashtables (arrays)  InPHP, HashTables are used to represent the PHP array type  But HashTables are also used internally  Everywhere  HashTables optimization in PHP 7 are well felt as they are heavilly used internally
  • 27.
    HashTables in PHP5  Each element needs  4 pointer indirections  72 bytes for a bucket + 32 bytes for a zval zval zval * HashTable $a zval * HashTable* bucket * zval 64 bytes 72 bytesbucket
  • 28.
    HashTables in PHP7  Each element needs  2 pointer indirections  32 bytes for a bucket zval bucket HashTable $a zval HashTable* zval 56 bytes 32 bytes bucket*
  • 29.
    PHP 7 Hash Memory layout is as contiguous as possible hash"foo" 1234 | (-table_size) -3 nIndex
  • 30.
    PHP 7 Hash Memory layout is as contiguous as possible hash"foo" 1234 | (-table_size) -3 buckets* arData -1-2-3 2 X XX -4 1 2 nIndex nIndex idx idx hash key zval bucket
  • 31.
    PHP 7 HashTablesmemory layout  $a['foo'] = 42; arData -6-7 arData[0] arData[1] 42 bucket
  • 32.
    String management  InPHP 5, strings don't have their own structure  String management is hard  Leads to many strings duplication  Many more memory access  In PHP 7, strings share the zend_string structure  They are refcounted  hashes are precomputed, often at compile time  struct hack is used to compact memory
  • 33.
    Strings in PHP char* str ... zval gc_infos int len refcount is_ref zend_string * ... zval ... hash gc_infos char str[1]size_t len ... zend_string PHP 5 PHP 7
  • 34.
    Strings in PHP5  $a = "foo"; 3 0X00007FFFF7F8C708 foo0
  • 35.
    Strings in PHP7  $a = "foo"; foo0 C struct hack 3 memory border
  • 36.
    Comparing VM timesfor array operations PHP 7 PHP 5
  • 37.
    Other VM operationcomparisons  Casts PHP 7 PHP 5
  • 38.
    Other VM operationcomparisons  Concatenations PHP 7 PHP 5
  • 39.
    Other VM operationcomparisons  @ usage (error suppression) PHP 7 PHP 5
  • 40.
    PHP 7 isreleased :-) get it now <3
  • 41.
    Thank you forlistening