Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy

18,452 views

Published on

We all have certainly learned data structures at school: arrays, lists, sets, stacks, queues (LIFO/FIFO), heaps, associative arrays, trees, ... and what do we mostly use in PHP? The "array"! In most cases, we do everything and anything with it but we stumble upon it when profiling code. During this session, we'll learn again to use the structures appropriately, leaning closer on the way to employ arrays, the SPL and other structures from PHP extensions as well.The impact that PHP 7 should have on data structures will be introduced as well.

Published in: Technology
  • Be the first to comment

PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy

  1. 1. PHP Data Structures (and the impact of PHP 7 on them) Patrick Allaert phpDay Verona 2015, Italy
  2. 2. About me ● Patrick Allaert ● Founder of Libereco and co-founder of catchy.io ● Playing with PHP/Linux for +15 years ● eZ Publish core developer ● Author of the APM PHP extension ● @patrick_allaert ● patrickallaert@php.net ● http://github.com/patrickallaert/ ● http://patrickallaert.blogspot.com/
  3. 3. PHP native datatypes ● NULL ● Booleans ● Integers ● Floating point numbers ● Strings ● Arrays ● Objects ● Resources
  4. 4. Datatypes on Wikipedia ● 2-3-4 tree ● 2-3 heap ● 2-3 tree ● AA tree ● Abstract syntax tree ● (a,b)-tree ● Adaptive k-d tree ● Adjacency list ● Adjacency matrix ● AF-heap ● Alternating decision tree ● And-inverter graph ● And–or tree ● Array ● AVL tree ● Beap ● Bidirectional map ● Bin ● Binary decision diagram ● Binary heap ● Binary search tree ● Binary tree ● Binomial heap ● Bit array ● Bitboard ● Bit field ● Bitmap ● BK-tree ● Bloom filter ● Boolean ● Bounding interval hierarchy ● B sharp tree ● BSP tree ● B-tree ● B*-tree ● B+ tree ● B-trie ● Bx-tree ● Cartesian tree ● Char ● Circular buffer ● Compressed suffix array ● Container ● Control table ● Cover tree ● Ctrie ● Dancing tree ● D-ary heap ● Decision tree ● Deque ● Directed acyclic graph ● Directed graph ● Disjoint-set ● Distributed hash table ● Double ● Doubly connected edge list ● Doubly linked list ● Dynamic array ● Enfilade ● Enumerated type ● Expectiminimax tree ● Exponential tree ● Fenwick tree ● Fibonacci heap ● Finger tree ● Float ● FM-index ● Fusion tree ● Gap buffer ● Generalised suffix tree ● Graph ● Graph-structured stack ● Hash ● Hash array mapped trie ● Hashed array tree ● Hash list ● Hash table ● Hash tree ● Hash trie ● Heap ● Heightmap ● Hilbert R-tree ● Hypergraph ● Iliffe vector ● Image ● Implicit kd-tree ● Interval tree ● Int ● Judy array ● Kdb tree ● Kd-tree ● Koorde ● Leftist heap ● Lightmap ● Linear octree ● Link/cut tree ● Linked list ● Lookup table ● Map/Associative array/Dictionary ● Matrix ● Metric tree ● Minimax tree ● Min/max kd-tree ● M-tree ● Multigraph ● Multimap ● Multiset ● Octree ● Pagoda ● Pairing heap ● Parallel array ● Parse tree ● Plain old data structure ● Prefix hash tree ● Priority queue ● Propositional directed acyclic graph ● Quad-edge ● Quadtree ● Queap ● Queue ● Radix tree ● Randomized binary search tree ● Range tree ● Rapidly-exploring random tree ● Record (also called tuple or struct) ● Red-black tree ● Rope ● Routing table ● R-tree ● R* tree ● R+ tree ● Scapegoat tree ● Scene graph ● Segment tree ● Self-balancing binary search tree ● Self-organizing list ● Set ● Skew heap ● Skip list ● Soft heap ● Sorted array ● Spaghetti stack ● Sparse array ● Sparse matrix ● Splay tree ● SPQR-tree ● Stack ● String ● Suffix array ● Suffix tree ● Symbol table ● Syntax tree ● Tagged union (variant record, discriminated union, disjoint union) ● Tango tree ● Ternary heap ● Ternary search tree ● Threaded binary tree ● Top tree ● Treap ● Tree ● Trees ● Trie ● T-tree ● UB-tree ● Union ● Unrolled linked list ● Van Emde Boas tree ● Variable-length array ● VList ● VP-tree ● Weight-balanced tree ● Winged edge ● X-fast trie ● Xor linked list ● X-tree ● Y-fast trie ● Zero suppressed decision diagram ● Zipper ● Z-order
  5. 5. Game: Can you recognize some structures?
  6. 6. Array: PHP's untruthfulness PHP“Arrays”are not true Arrays!
  7. 7. Array: PHP's untruthfulness PHP“Arrays”are not true Arrays! An array typically looks like this: Data DataDataData Data Data 0 1 2 3 4 5
  8. 8. Array: PHP's untruthfulness PHP“Arrays”can dynamically grow and be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.
  9. 9. Array: PHP's untruthfulness PHP“Arrays”can dynamically grow and be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations. Let's have a Doubly Linked List (DLL): Data Data Data Data Data Head Tail Enables Queue, Stack and Deque implementations
  10. 10. Array: PHP's untruthfulness PHP“Arrays”elements are always accessible using a key (index).
  11. 11. Array: PHP's untruthfulness PHP“Arrays”elements are always accessible using a key (index). Let's have an Hash Table: Data Data Data Data Data Head Tail Bucket Bucket Bucket Bucket Bucket Bucket pointers array Bucket * 0 Bucket * 1 Bucket * 2 Bucket * 3 Bucket * 4 Bucket * 5 ... Bucket * nTableSize -1
  12. 12. Array: PHP's untruthfulness http://php.net/manual/en/language.types.array.php: “This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”
  13. 13. Optimized for anything ≈ Optimized for nothing!
  14. 14. Optimized for anything ≈ Optimized for nothing!
  15. 15. Array: PHP's untruthfulness ● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 MiB. ● In PHP 5: ● it will take 13.97 MiB!≅ ● A variable (containing an integer) takes 48 bytes. ● The overhead for every“array”entries is about 96 bytes.
  16. 16. Array: PHP's untruthfulness ● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 MiB. ● In PHP 5 7: ● it will take ≅ 13.97 4 MiB! ● A variable (containing an integer) takes 48 16 bytes. ● The overhead for every“array”entries is about 96 20 bytes.
  17. 17. Data Structure
  18. 18. Structs (or records, tuples,...)
  19. 19. Structs (or records, tuples,...) ● A struct is a value containing other values which are typically accessed using a name. ● Example: Person => firstName / lastName ComplexNumber => realPart / imaginaryPart
  20. 20. Structs – Using array $person = [ "firstName" => "Patrick", "lastName" => "Allaert", ];
  21. 21. Structs – Using a class $person = new PersonStruct( "Patrick", "Allaert" );
  22. 22. Structs – Using a class (Implementation) class PersonStruct { public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } }
  23. 23. Structs – Using a class (Implementation) class PersonStruct { public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception } }
  24. 24. Structs – Pros and Cons Creating 107 “person”structs array object 0 1000 2000 3000 4000 5000 6000 5621 41274098 1403 PHP 5.6 PHP 7 Memory(MiB) array object 0 0,5 1 1,5 2 2,5 1,52 2,26 0,5 0,9 PHP 5.6 PHP 7 Time(s)
  25. 25. Structs – Pros and Cons Using a class implementation + Type hinting possible + Rigid structure + More OO + Uses ~ 26% less memory - Slower to create by ~ 50% Starting PHP 7: + Uses ~ 66% less memory - Slower to create by a factor 2!
  26. 26. (true) Arrays
  27. 27. (true) Arrays ● An array is a fixed size collection where elements are each identified by a numeric index.
  28. 28. (true) Arrays ● An array is a fixed size collection where elements are each identified by a numeric index. Data DataDataData Data Data 0 1 2 3 4 5
  29. 29. (true) Arrays – Using SplFixedArray $array = new SplFixedArray(3); $array[0] = 1; // or $array->offsetSet() $array[1] = 2; // or $array->offsetSet() $array[2] = 3; // or $array->offsetSet() $array[0]; // gives 1 $array[1]; // gives 2 $array[2]; // gives 3
  30. 30. (true) Arrays – Pros and Cons Creating/iterating 104 arrays of 1000 elements array SplFixedArray 0 200 400 600 800 1000 1200 1400 1600 1378 539 353 159 PHP 5.6 PHP 7 Memory(MiB) array (create) array (iterate) SplFixedArray (create) SplFixedArray (iterate) 0 0,5 1 1,5 2 2,5 3 2,49 0,2 0,92 0,360,33 0,09 0,24 0,19 PHP 5.6 PHP 7 Time(s)
  31. 31. (true) Arrays – Pros and Cons Using SplFixedArray + Uses much less memory + Takes less time at creation - Takes a bit more time to iterate
  32. 32. Queues
  33. 33. Queues ● A queue is an ordered collection respecting First In, First Out (FIFO) order. ● Elements are inserted at one end and removed at the other.
  34. 34. Queues ● A queue is an ordered collection respecting First In, First Out (FIFO) order. ● Elements are inserted at one end and removed at the other. Data DataDataData Data Data Data Data Enqueue Dequeue
  35. 35. Queues – Using array $queue = []; $queue[] = 1; // or array_push() $queue[] = 2; // or array_push() $queue[] = 3; // or array_push() array_shift($queue); // gives 1 array_shift($queue); // gives 2 array_shift($queue); // gives 3
  36. 36. Queues – Using SplQueue $queue = new SplQueue(); $queue[] = 1; // or $queue->enqueue() $queue[] = 2; // or $queue->enqueue() $queue[] = 3; // or $queue->enqueue() $queue->dequeue(); // gives 1 $queue->dequeue(); // gives 2 $queue->dequeue(); // gives 3
  37. 37. Stacks
  38. 38. Stacks ● A stack is an ordered collection respecting Last In, First Out (LIFO) order. ● Elements are inserted and removed on the same end.
  39. 39. Stacks ● A stack is an ordered collection respecting Last In, First Out (LIFO) order. ● Elements are inserted and removed on the same end. Data DataDataData Data Data Data Data Push Pop
  40. 40. Stacks – Using array $stack = []; $stack[] = 1; // or array_push() $stack[] = 2; // or array_push() $stack[] = 3; // or array_push() array_pop($stack); // gives 3 array_pop($stack); // gives 2 array_pop($stack); // gives 1
  41. 41. Stacks – Using SplStack $stack = new SplStack(); $stack[] = 1; // or $stack->push() $stack[] = 2; // or $stack->push() $stack[] = 3; // or $stack->push() $stack->pop(); // gives 3 $stack->pop(); // gives 2 $stack->pop(); // gives 1
  42. 42. Stack/Queue – Pros and Cons Creating 104 stacks/queues of 103 elements array Spl(Queue|Stack) 0 200 400 600 800 1000 1200 1400 1600 1378 920 353 541 PHP 5.6 PHP 7 Memory(MiB) array (create) array (iterate) Spl(Stack|Queue) (create) Spl(Stack|Queue) (iterate) 0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 0,57 0,2 1,62 0,27 0,17 0,09 1,22 0,18 PHP 5.6 PHP 7 Time(s)
  43. 43. Queues/Stacks – Pros and Cons SplQueue / SplStack + Uses less memory + Type hinting + More OO - A bit more cpu intensive Starting PHP 7 (comparatively to arrays): - Uses more memory - Much more cpu intensive => They haven't received as much attention as arrays did (yet?).
  44. 44. Sets People with strong views on the distinction between geeks and nerds Geeks Nerds
  45. 45. Sets ● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
  46. 46. Sets ● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them. Data Data Data Data Data
  47. 47. Sets – Using array $set = []; // Adding elements to a set $set[] = 1; $set[] = 2; $set[] = 3; // Checking presence in a set in_array(2, $set); // true in_array(5, $set); // false array_merge($set1, $set2); // union array_intersect($set1, $set2); // intersection array_diff($set1, $set2); // complement
  48. 48. Sets – Using array $set = []; // Adding elements to a set $set[] = 1; $set[] = 2; $set[] = 3; // Checking presence in a set in_array(2, $set); // true in_array(5, $set); // false array_merge($set1, $set2); // union array_intersect($set1, $set2); // intersection array_diff($set1, $set2); // complement True performance killers!
  49. 49. Sets – Mis-usage if (in_array($value, ["val1", "val2", "val3"])) { // ... }
  50. 50. Sets – Mis-usage if ($value === "val1" || $value === "val2" || $value === "val3"))) { // ... }
  51. 51. Sets – Mis-usage switch ($value) { case "val1": case "val2": case "val3": // ... }
  52. 52. Sets – Mis-usage Testing 5 * 107 membership against set of 3 elements in_array compare switch optimized way ;) 0 5 10 15 20 25 19,59 3,15 5,2 1,97 3,43 2,34 1,53 0,75 PHP 5.6 PHP 7 Time(s)
  53. 53. Sets – Using array (simple types) $set = []; // Adding elements to a set $set[1] = true; // Any dummy value $set[2] = true; // is good but NULL! $set[3] = true; // Checking presence in a set isset($set[2]); // true isset($set[5]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement
  54. 54. Sets – Using array (simple types) $set = []; // Adding elements to a set $set[1] = true; // Any dummy value $set[2] = true; // is good but NULL! $set[3] = true; // Checking presence in a set isset($set[2]); // true isset($set[5]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement ● Remember that PHP Array keys can be integers or strings only!
  55. 55. Sets – Using array (objects) $set = []; // Adding elements to a set $set[spl_object_hash($object1)] = $object1; $set[spl_object_hash($object2)] = $object2; $set[spl_object_hash($object3)] = $object3; // Checking presence in a set isset($set[spl_object_hash($object2)]); // true isset($set[spl_object_hash($object5)]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement
  56. 56. Sets – Using array (objects) $set = []; // Adding elements to a set $set[spl_object_hash($object1)] = $object1; $set[spl_object_hash($object2)] = $object2; $set[spl_object_hash($object3)] = $object3; // Checking presence in a set isset($set[spl_object_hash($object2)]); // true isset($set[spl_object_hash($object5)]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement Store a reference of the object!
  57. 57. Sets – Using SplObjectStorage (objects) $set = new SplObjectStorage(); // Adding elements to a set $set->attach($object1); // or $set[$object1] = null; $set->attach($object2); // or $set[$object2] = null; $set->attach($object3); // or $set[$object3] = null; // Checking presence in a set isset($set[$object2]); // true isset($set[$object5]); // false $set1->$addAll($set2); // union $set1->removeAllExcept($set2); // intersection $set1->removeAll($set2); // complement
  58. 58. Sets – Using QuickHash (int) ● No union/intersection/complement operations (yet?) ● Yummy features like (loadFrom|saveTo)(String|File) $set = new QuickHashIntSet(64,QuickHashIntSet::CHECK_FOR_DUPES); // Adding elements to a set $set->add(1); $set->add(2); $set->add(3); // Checking presence in a set $set->exists(2); // true $set->exists(5); // false isset($set[2]);
  59. 59. Sets – Using bitsets function remove( $path, $files = true, $dir = true, $links = true, $exec = true ) { if (!$files && is_file($path)) return false; if (!$dir && is_dir($path)) return false; if (!$links && is_link($path)) return false; if (!$exec && is_executable($path)) return false; // ... }
  60. 60. Sets – Using bitsets (example) remove("/tmp/removeMe", true, false, true, false);
  61. 61. Sets – Using bitsets (example) remove("/tmp/removeMe", true, false, true, false); // WTF ?!
  62. 62. Sets – Using bitsets
  63. 63. Sets – Using bitsets E_ERROR E_WARNING E_PARSE E_NOTICE
  64. 64. Sets – Using bitsets define("E_ERROR", 1); define("E_WARNING", 2); define("E_PARSE", 4); define("E_NOTICE", 8);
  65. 65. Sets – Using bitsets define("E_ERROR", 1); define("E_WARNING", 2); define("E_PARSE", 4); define("E_NOTICE", 8); E_ERROR E_PARSE E_NOTICE 10000000 00100000 00010000
  66. 66. Sets – Using bitsets define("E_ERROR", 1); define("E_WARNING", 2); define("E_PARSE", 4); define("E_NOTICE", 8); E_ERROR E_PARSE E_NOTICE E_ERROR | E_PARSE | E_NOTICE 10000000 00100000 00010000 10110000
  67. 67. Sets – Using bitsets define("E_ERROR", 1 << 0); define("E_WARNING", 1 << 1); define("E_PARSE", 1 << 2); define("E_NOTICE", 1 << 3); E_ERROR E_PARSE E_NOTICE E_ERROR | E_PARSE | E_NOTICE 10000000 00100000 00010000 10110000
  68. 68. Sets – Using bitsets define("E_ERROR", 1 << 0); define("E_WARNING", 1 << 1); define("E_PARSE", 1 << 2); define("E_NOTICE", 1 << 3); // Adding elements to a set $set = 0; $set |= E_ERROR; $set |= E_WARNING; $set |= E_PARSE; // Checking presence in a set $set & E_ERROR; // true $set & E_NOTICE; // false $set1 | $set2; // union $set1 & $set2; // intersection $set1 ^ $set2; // complement
  69. 69. Sets – Using bitsets (example) define("REMOVE_FILES", 1 << 0); define("REMOVE_DIRS", 1 << 1); define("REMOVE_LINKS", 1 << 2); define("REMOVE_EXEC", 1 << 3); define("REMOVE_ALL", ~0); // Setting all bits function remove($path, $options = REMOVE_ALL) { if (~$options & REMOVE_FILES && is_file($path)) return false; if (~$options & REMOVE_DIRS && is_dir($path)) return false; if (~$options & REMOVE_LINKS && is_link($path)) return false; if (~$options & REMOVE_EXEC && is_executable($path)) return false; // ... }
  70. 70. Sets – Using bitsets (example) remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS);
  71. 71. Sets – Using bitsets (example) remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)
  72. 72. Sets: Conclusions ● Use the key and not the value when using PHP Arrays. ● Use QuickHash for set of integers if possible. ● Use SplObjectStorage as soon as you are playing with objects. ● Use bitsets when playing with finite number of elements (and known in advance). ● Avoid array_unique() / in_array() at all price!
  73. 73. Maps ● A map is a collection of key/value pairs where all keys are unique.
  74. 74. Maps – Using array ● Don't use array_merge() on maps. $map = []; $map["ONE"] = 1; $map["TWO"] = 2; $map["THREE"] = 3; // Merging maps: array_merge($map1, $map2); // SLOW! $map2 + $map1; // Fast :)
  75. 75. Maps – Using array Testing 107 merges against 2 maps of 5 elements array_merge + 0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5 4,74 2,77 1,42 1,09 PHP 5.6 PHP 7 Time(s)
  76. 76. Multikey Maps – Using array $map = []; $map["ONE"] = 1; $map["UN"] =& $map["ONE"]; $map["UNO"] =& $map["ONE"]; $map["TWO"] = 2; $map["DEUX"] =& $map["TWO"]; $map["DUE"] =& $map["TWO"]; $map["UNO"] = "once"; $map["DEUX"] = "twice"; var_dump($map); /* array(6) { ["ONE"] => &string(4) "once" ["UN"] => &string(4) "once" ["UNO"] => &string(4) "once" ["TWO"] => &string(5) "twice" ["DEUX"] => &string(5) "twice" ["DUE"] => &string(5) "twice" } */
  77. 77. Heap ● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
  78. 78. Heap ● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
  79. 79. Heap – Using Spl(Min|Max)Heap $heap = new SplMinHeap; $heap->insert(30); $heap->insert(20); $heap->insert(25); var_dump($heap->top()); /* int(20) */
  80. 80. Heaps: Conclusions ● MUCH faster than having to re-sort() an array at every insertion. ● If you don't require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach. ● SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.
  81. 81. Bloom filters ● A bloom filter is a space-efficient probabilistic data structure used to test whether an element is member of a set. ● False positives are possible, but false negatives are not!
  82. 82. Bloom filters – Using bloomy $bloom = new BloomFilter( 10000, // capacity 0,001 // (optional) error rate // (optional) random seed ); $bloom->add("An element"); $bloom->has("An element"); // true for sure $bloom->has("Foo"); // false, most probably
  83. 83. Other related projects ● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
  84. 84. Other related projects ● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types ● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
  85. 85. Other related projects ● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types ● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy ● Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.
  86. 86. Conclusions ● Use appropriate data structure. It will keep your code clean and fast.
  87. 87. Conclusions ● Use appropriate data structure. It will keep your code clean and fast. ● Think about the time and space complexity involved by your algorithms.
  88. 88. Conclusions ● Use appropriate data structure. It will keep your code clean and fast. ● Think about the time and space complexity involved by your algorithms. ● Name your variables accordingly: use“Map”,“Set”, “List”,“Queue”,... to describe them instead of using something like: $ordersArray.
  89. 89. Questions?
  90. 90. Thanks Don't forget to rate this talk on https://joind.in/14535 Stay in touch! @patrick_allaert patrickallaert@php.net
  91. 91. Photo Credits ● Tuned car: http://www.flickr.com/photos/gioxxswall/5783867752 ● London Eye Structure: http://www.flickr.com/photos/photographygal123/4883546484 ● Heap structure: http://en.wikipedia.org/wiki/File:Max-Heap.svg ● Drawers: http://www.flickr.com/photos/jamesclay/2312912612 ● Stones stack: http://www.flickr.com/photos/silent_e/2282729987 ● Tree: http://www.flickr.com/photos/drewbandy/6002204996

×