Masterizing php data structure 102

12,017 views

Published on

We all have certainly learned data structures at school: arrays, lists, sets, stacks, queues (LIFO/FIFO), heaps, associative arrays, trees, … and what do we mostly use in PHP? The “array”! In most cases, we do everything and anything with it but we stumble upon it when profiling code.
During this session, we’ll learn again to use the structures appropriately, leaning closer on the way to employ arrays, the SPL and other structures from PHP extensions as well.

Published in: Technology, Business
1 Comment
30 Likes
Statistics
Notes
  • slide #49:
    $set1 + $set2; //union is actually intersection if you use numeric keys

    array('1','2') + array('3','4') = array('1','2')
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
12,017
On SlideShare
0
From Embeds
0
Number of Embeds
63
Actions
Shares
0
Downloads
289
Comments
1
Likes
30
Embeds 0
No embeds

No notes for slide

Masterizing php data structure 102

  1. 1. Masterizing PHP Data Structure 102 Patrick AllaertPHPBenelux Conference Antwerp 2012
  2. 2. About me● Patrick Allaert● Founder of Libereco● Playing with PHP/Linux for +10 years● eZ Publish core developer● Author of the APM PHP extension● @patrick_allaert● patrickallaert@php.net● http://github.com/patrickallaert/● http://patrickallaert.blogspot.com/
  3. 3. Masterizing = Mastering + Rising
  4. 4. PHP native datatypes● NULL (IS_NULL)● Booleans (IS_BOOL)● Integers (IS_LONG)● Floating point numbers (IS_DOUBLE)● Strings (IS_STRING)● Arrays (IS_ARRAY, IS_CONSTANT_ARRAY)● Objects (IS_OBJECT)● Resources (IS_RESOURCE)● Callable (IS_CALLABLE)
  5. 5. Wikipedia datatypes● 2-3-4 tree ● Bit field ● Directed acyclic ● Hashed array tree Map/Associative ● ● Rapidly-exploring ● Suffix tree graph array/Dictionary random tree Symbol table● 2-3 heap ● Bitmap Hash list ● ● Directed graph Matrix Record (also called Syntax tree 2-3 tree BK-tree ● ● ● Hash table ●● ● Disjoint-set tuple or struct) Tagged union (variant ● Metric tree AA tree ● Bloom filter record, discriminated ● ●● Distributed hash Hash tree Red-black tree ● ● Minimax tree ● union, disjoint union) Abstract syntax tree ● ●● ● Boolean table Hash trie Min/max kd-tree ● Rope Tango tree (a,b)-tree ● ● ●● Bounding interval Double Routing table Ternary heap ● ● Heap M-tree ● Adaptive k-d tree ● hierarchy ● ●● Doubly connected R-tree Ternary search tree ● Heightmap Multigraph ● ●● Adjacency list B sharp tree edge list ● ● R* tree Threaded binary tree ● Multimap ● ●● Adjacency matrix BSP tree ● Doubly linked list ● Hilbert R-tree ● R+ tree Top tree Multiset ● ● ● AF-heap Dynamic array Treap Hypergraph ● B-tree Scapegoat tree ● Octree● ● ● ● ● Tree Alternating decision Enfilade ● Scene graph Iliffe vector ●● tree ● B*-tree ● ● Pagoda ● Trees Enumerated type ● Segment tree ● B+ tree ● Image Pairing heap ● Trie And-inverter graph ● Expectiminimax tree ● ● ●● Self-balancing T-tree Implicit kd-tree Parallel array ● And–or tree ● B-trie Exponential tree ● ● ● binary search tree ●● UB-tree Parse tree ●● Array ● Bx-tree Fenwick tree ● Interval tree ● ● Self-organizing list ● Union Plain old data ● ● AVL tree Cartesian tree Fibonacci heap Int ● Set Unrolled linked list● ● ● ● structure ● ●● Beap ● Char ● Finger tree ● Judy array Prefix hash tree ● Skew heap ● Van Emde Boas tree ● Skip list Variable-length array● Bidirectional map ● Circular buffer ● Float Kdb tree Priority queue ● ● ● ● VList Bin Compressed suffix FM-index Soft heap ●● ● ● Kd-tree ● Propositional ● VP-tree ● Binary decision array Fusion tree directed acyclic Sorted array ●● ● Koorde ● Weight-balanced tree diagram Gap buffer graph Spaghetti stack ● ● Container ● ● ● Winged edge Binary heap Generalised suffix Leftist heap Quad-edge Sparse array ●● ● Control table ● ● ● ● X-fast trie Quadtree ● Binary search tree tree Lightmap Sparse matrix Xor linked list Cover tree ●● ● ● ● Binary tree ● Graph Queap Splay tree X-tree● Ctrie ● ● Linear octree ● ● ● Binomial heap ● Graph-structured Queue SPQR-tree Y-fast trie● Dancing tree ● stack Link/cut tree ● ● ● ● Radix tree Zero suppressed Stack ● Bit array ● ●● D-ary heap Hash ● Linked list Randomized binary ● decision diagram Bitboard ● ● ● String Zipper Hash array mapped Lookup table search tree● ● Decision tree ● ● Suffix array ● ● trie Z-order Deque ● Range tree ● ● ●
  6. 6. Game:Can you recognize some structures?
  7. 7. Array: PHPs untruthfulnessPHP “Arrays” are not true Arrays!
  8. 8. Array: PHPs untruthfulnessPHP “Arrays” are not true Arrays!An array is typically implemented like this: Data Data Data Data Data Data
  9. 9. Array: PHPs untruthfulnessPHP “Arrays” can be iterated both directions (reset(),next(), prev(), end()), exclusively with O(1) operations.
  10. 10. Array: PHPs untruthfulnessPHP “Arrays” can be iterated both directions (reset(),next(), prev(), end()), exclusively with O(1) operations.Implementation based on a Doubly Linked List (DLL): Head Tail Data Data Data Data DataEnables List, Deque, Queue and Stackimplementations
  11. 11. Array: PHPs untruthfulnessPHP “Arrays” elements are always accessible using akey (index).
  12. 12. Array: PHPs untruthfulnessPHP “Arrays” elements are always accessible using akey (index).Implementation based on a Hash Table: Head Bucket pointers array Tail 0 1 2 3 4 5 ... nTableSize -1 Bucket * Bucket * Bucket * Bucket * Bucket * Bucket * Bucket * Bucket Bucket Bucket Bucket Bucket Data Data Data Data Data
  13. 13. Array: PHPs untruthfulnesshttp://php.net/manual/en/language.types.array.php: “This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”
  14. 14. Optimized for anything ≈ Optimized for nothing!
  15. 15. Array: PHPs untruthfulness● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 Mb.● In PHP: it will take ≅ 13.97 Mb!● A PHP variable (containing an integer) takes 48 bytes.● The overhead of buckets for every “array” entries is about 96 bytes.● More details: http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
  16. 16. Data Structure
  17. 17. Structs (or records, tuples,...)● A struct is a value containing other values which are typically accessed using a name.● Example: Person => firstName / lastName ComplexNumber => realPart / imaginaryPart
  18. 18. Structs – Using array$person = array( "firstName" => "Patrick", "lastName" => "Allaert");
  19. 19. Structs – Using a class$person = new PersonStruct( "Patrick", "Allaert");
  20. 20. Structs – Using a class (Implementation)class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; }}
  21. 21. Structs – Using a class (Implementation)class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception }}
  22. 22. Structs – Pros and Cons Array Class+ Uses less memory (PHP < 5.4) - Uses more memory (PHP < 5.4)- Uses more memory (PHP = 5.4) + Uses less memory (PHP = 5.4)- No type hinting + Type hinting possible- Flexible structure + Rigid structure+|- Less OO +|- More OO+ Slightly faster - Slightly slower
  23. 23. “true” Arrays● An array is a fixed size collection where elements are each identified by a numeric index.
  24. 24. “true” Arrays● An array is a fixed size collection where elements are each identified by a numeric index. 0 1 2 3 4 5 Data Data Data Data Data Data
  25. 25. “true” Arrays – Using SplFixedArray$array = new SplFixedArray(3);$array[0] = 1; // or $array->offsetSet()$array[1] = 2; // or $array->offsetSet()$array[2] = 3; // or $array->offsetSet()$array[0]; // gives 1$array[1]; // gives 2$array[2]; // gives 3
  26. 26. “true” Arrays – Pros and Cons Array SplFixedArray- Uses more memory + Uses less memory+|- Less OO +|- More OO+ Slightly faster - Slightly slower
  27. 27. Queues● A queue is an ordered collection respecting First In, First Out (FIFO) order.● Elements are inserted at one end and removed at the other.
  28. 28. Queues● A queue is an ordered collection respecting First In, First Out (FIFO) order.● Elements are inserted at one end and removed at the other. Data Dequeue Data Data Data Data Data Data Enqueue Data
  29. 29. Queues – Using array$queue = array();$queue[] = 1; // or array_push()$queue[] = 2; // or array_push()$queue[] = 3; // or array_push()array_shift($queue); // gives 1array_shift($queue); // gives 2array_shift($queue); // gives 3
  30. 30. Queues – Using SplQueue$queue = new SplQueue();$queue[] = 1; // or $queue->enqueue()$queue[] = 2; // or $queue->enqueue()$queue[] = 3; // or $queue->enqueue()$queue->dequeue(); // gives 1$queue->dequeue(); // gives 2$queue->dequeue(); // gives 3
  31. 31. Queues – Pros and Cons Array SplQueue- Uses more memory + Uses less memory(overhead / entry: 96 bytes) (overhead / entry: 48 bytes)- No type hinting + Type hinting possible+|- Less OO +|- More OO
  32. 32. Stacks● A stack is an ordered collection respecting Last In, First Out (LIFO) order.● Elements are inserted and removed on the same end.
  33. 33. Stacks● A stack is an ordered collection respecting Last In, First Out (LIFO) order.● Elements are inserted and removed on the same end. Data Push Data Data Data Data Data Data Pop Data
  34. 34. Stacks – Using array$stack = array();$stack[] = 1; // or array_push()$stack[] = 2; // or array_push()$stack[] = 3; // or array_push()array_pop($stack); // gives 3array_pop($stack); // gives 2array_pop($stack); // gives 1
  35. 35. Stacks – Using SplStack$stack = new SplStack();$stack[] = 1; // or $stack->push()$stack[] = 2; // or $stack->push()$stack[] = 3; // or $stack->push()$stack->pop(); // gives 3$stack->pop(); // gives 2$stack->pop(); // gives 1
  36. 36. Stacks – Pros and Cons Array Class- Uses more memory + Uses less memory(overhead / entry: 96 bytes) (overhead / entry: 48 bytes)- No type hinting + Type hinting possible+|- Less OO +|- More OO
  37. 37. Sets● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
  38. 38. Sets● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them. Data Data Data Data Data
  39. 39. Sets – Using array$set = array();$set[] = 1;$set[] = 2;$set[] = 3;in_array(2, $set); // truein_array(5, $set); // falsearray_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement
  40. 40. Sets – Using array$set = array();$set[] = 1;$set[] = 2;$set[] = 3;in_array(2, $set); // true Truein_array(5, $set); // false performance killers!array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement
  41. 41. Sets – Using array (simple types)$set = array();$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;isset($set[2]); // trueisset($set[5]); // false$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
  42. 42. Sets – Using array (simple types)$set = array();$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;isset($set[2]); // trueisset($set[5]); // false$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement● Remember that PHP Array keys can be integers or strings only!
  43. 43. Sets – Using array (objects)$set = array();$set[spl_object_hash($object1)] = $object1;$set[spl_object_hash($object2)] = $object2;$set[spl_object_hash($object3)] = $object3;isset($set[spl_object_hash($object2)]); // trueisset($set[spl_object_hash($object5)]); // false$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
  44. 44. Sets – Using SplObjectStorage (objects)$set = new SplObjectStorage();$set->attach($object1); // or $set[$object1] = null;$set->attach($object2); // or $set[$object2] = null;$set->attach($object3); // or $set[$object3] = null;isset($set[$object2]); // trueisset($set[$object2]); // false$set1->addAll($set2); // union$set1->removeAllExcept($set2); // intersection$set1->removeAll($set2); // complement
  45. 45. Sets – Using QuickHash (int)$set = new QuickHashIntSet(64,QuickHashIntSet::CHECK_FOR_DUPES);$set->add(1);$set->add(2);$set->add(3);$set->exists(2); // true$set->exists(5); // false● No union/intersection/complement operations (yet?)● Yummy features like (loadFrom|saveTo)(String|File)
  46. 46. Sets – With finite possible valuesdefine("E_ERROR", 1); // or 1<<0define("E_WARNING", 2); // or 1<<1define("E_PARSE", 4); // or 1<<2define("E_NOTICE", 8); // or 1<<3$set = 0;$set |= E_ERROR;$set |= E_WARNING;$set |= E_PARSE;$set & E_ERROR; // true$set & E_NOTICE; // false$set1 | $set2; // union$set1 & $set2; // intersection$set1 ^ $set2; // complement
  47. 47. Sets – With finite possible values (function features)Instead of:function remove($path, $files = true, $directories = true, $links = true,$executable = true){ if (!$files && is_file($path)) return false; if (!$directories && is_dir($path)) return false; if (!$links && is_link($path)) return false; if (!$executable && is_executable($path)) return false; // ...}remove("/tmp/removeMe", true, false, true, false); // WTF ?!
  48. 48. Sets – With finite possible values (function features)Instead of:define("REMOVE_FILES", 1 << 0);define("REMOVE_DIRS", 1 << 1);define("REMOVE_LINKS", 1 << 2);define("REMOVE_EXEC", 1 << 3);define("REMOVE_ALL", ~0); // Setting all bitsfunction remove($path, $options = REMOVE_ALL){ if (~$options & REMOVE_FILES && is_file($path)) return false; if (~$options & REMOVE_DIRS && is_dir($path)) return false; if (~$options & REMOVE_LINKS && is_link($path)) return false; if (~$options & REMOVE_EXEC && is_executable($path)) return false; // ...}remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)
  49. 49. Sets: Conclusions● Use the key and not the value when using PHP Arrays.● Use QuickHash for set of integers if possible.● Use SplObjectStorage as soon as you are playing with objects.● Dont use array_unique() when you need a set!
  50. 50. Bloom filters● A bloom filter is a space-efficient probabilistic data structure used to test whether an element is member of a set.● False positives are possible, but false negatives are not!
  51. 51. Bloom filters – Using bloomy// BloomFilter::__construct(int capacity [, doubleerror_rate [, int random_seed ] ])$bloomFilter = new BloomFilter(10000, 0.001);$bloomFilter->add("An element");$bloomFilter->has("An element"); // true for sure$bloomFilter->has("Foo"); // false, most probably
  52. 52. Maps● A map is a collection of key/value pairs where all keys are unique.
  53. 53. Maps – Using array$map = array();$map["ONE"] = 1;$map["TWO"] = 2;$map["THREE"] = 3;// Merging maps:array_merge($map1, $map2); // SLOW!$map2 + $map1; // Fast :)● Dont use array_merge() on maps.
  54. 54. Multikey Maps – Using array$map = array();$map["ONE"] = 1;$map["UN"] =& $map["ONE"];$map["UNO"] =& $map["ONE"];$map["TWO"] = 2;$map["DEUX"] =& $map["TWO"];$map["DUE"] =& $map["TWO"];$map["UNO"] = "once";$map["DEUX"] = "twice";var_dump($map);/*array(6) {["ONE"] => &string(4) "once"● Dont use array_merge() on maps.["UN"] => &string(4) "once"["UNO"] => &string(4) "once"["TWO"] => &string(5) "twice"["DEUX"] => &string(5) "twice"["DUE"] => &string(5) "twice"}*/
  55. 55. Heap● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
  56. 56. Heap● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
  57. 57. Heap – Using array$heap = array();$heap[] = 3;sort($heap);$heap[] = 1;sort($heap);$heap[] = 2;sort($heap);
  58. 58. Heap – Using Spl(Min|Max)Heap$heap = new SplMinHeap;$heap->insert(3);$heap->insert(1);$heap->insert(2);
  59. 59. Heaps: Conclusions● MUCH faster than having to re-sort() an array at every insertion.● If you dont require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach.● SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.
  60. 60. Other related projects● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
  61. 61. Other related projects● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
  62. 62. Other related projects● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy● Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.
  63. 63. Conclusions● Use appropriate data structure. It will keep your code clean and fast.
  64. 64. Conclusions● Use appropriate data structure. It will keep your code clean and fast.● Think about the time and space complexity involved by your algorithms.
  65. 65. Conclusions● Use appropriate data structure. It will keep your code clean and fast.● Think about the time and space complexity involved by your algorithms.● Name your variables accordingly: use “Map”, “Set”, “List”, “Queue”,... to describe them instead of using something like: $ordersArray.
  66. 66. Questions?
  67. 67. Thanks● Dont forget to rate this talk on https://joind.in/4753
  68. 68. Photo Credits● Northstar Ski Jump: http://www.flickr.com/photos/renotahoe/5593248965● Tuned car: http://www.flickr.com/photos/gioxxswall/5783867752● London Eye Structure: http://www.flickr.com/photos/photographygal123/4883546484● Cigarette: http://www.flickr.com/photos/superfantastic/166215927● Heap structure: http://en.wikipedia.org/wiki/File:Max-Heap.svg

×