Mastering PHP Data Structure 102 - phpDay 2012 Verona
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Mastering PHP Data Structure 102 - phpDay 2012 Verona

  • 5,533 views
Uploaded on

We all have certainly learned data structures at school: arrays, lists, sets, stacks, queues (LIFO/FIFO), heaps, associative arrays, trees, ... and what do we mostly use in PHP? The "array"! In......

We all have certainly learned data structures at school: arrays, lists, sets, stacks, queues (LIFO/FIFO), heaps, associative arrays, trees, ... and what do we mostly use in PHP? The "array"! In most cases, we do everything and anything with it but we stumble upon it when profiling code.

During this session, we'll learn again to use the structures appropriately, leaning closer on the way to employ arrays, the SPL and other structures from PHP extensions as well.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,533
On Slideshare
5,456
From Embeds
77
Number of Embeds
4

Actions

Shares
Downloads
72
Comments
0
Likes
6

Embeds 77

http://2012.phpday.it 43
http://lanyrd.com 31
https://si0.twimg.com 2
https://twitter.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Mastering PHP Data Structure 102 Patrick Allaert phpDay 2012 Verona, Italy
  • 2. About me● Patrick Allaert● Founder of Libereco● Playing with PHP/Linux for +10 years● eZ Publish core developer● Author of the APM PHP extension● @patrick_allaert● patrickallaert@php.net● http://github.com/patrickallaert/● http://patrickallaert.blogspot.com/
  • 3. APM
  • 4. APM
  • 5. PHP native datatypes● NULL (IS_NULL)● Booleans (IS_BOOL)● Integers (IS_LONG)● Floating point numbers (IS_DOUBLE)● Strings (IS_STRING)● Arrays (IS_ARRAY, IS_CONSTANT_ARRAY)● Objects (IS_OBJECT)● Resources (IS_RESOURCE)● Callable (IS_CALLABLE)
  • 6. Wikipedia datatypes● 2-3-4 tree ● Bit field ● Directed acyclic ● Hashed array tree Map/Associative ● ● Rapidly-exploring ● Suffix tree graph array/Dictionary random tree Symbol table 2-3 heap ●● ● Bitmap ● Hash list ● Directed graph ● Matrix ● Record (also called ● Syntax tree● 2-3 tree BK-tree ● Disjoint-set ● Hash table tuple or struct) Tagged union (variant Metric tree ● ● ●● AA tree Bloom filter record, discriminated ● Distributed hash ● Hash tree ● Red-black tree ● ● Minimax tree union, disjoint union)● Abstract syntax tree ● Boolean table ● Hash trie ● Min/max kd-tree ● Rope ● Tango tree● (a,b)-tree Double Routing table ● Bounding interval ● M-tree ● Ternary heap Heap ● ● ●● Adaptive k-d tree hierarchy ● Doubly connected ● R-tree ● Ternary search tree Heightmap ● Multigraph● Adjacency list ● B sharp tree edge list ● ● R* tree ● Threaded binary tree ● Multimap● Adjacency matrix ● BSP tree ● Doubly linked list ● Hilbert R-tree ● R+ tree ● Top tree ● Multiset Treap● AF-heap ● Dynamic array ● Hypergraph Scapegoat tree ● B-tree ● ● ● Octree ● Tree Alternating decision Enfilade Scene graph Iliffe vector● ● B*-tree ● ● tree ● ● Pagoda ● Trees ● Enumerated type Segment tree Image ● ● B+ tree ● ● Pairing heap ● Trie● And-inverter graph Expectiminimax tree Self-balancing ● T-tree ● Implicit kd-tree Parallel array ●● And–or tree ● B-trie ● Exponential tree ● ● binary search tree ● UB-tree Bx-tree Interval tree ● Parse tree● Array ● ● Fenwick tree ● ● Self-organizing list ● Union ● Plain old data● AVL tree ● Cartesian tree ● Fibonacci heap ● Int ● Set Unrolled linked list structure ●● Beap ● Char ● Finger tree ● Judy array ● Prefix hash tree ● Skew heap ● Van Emde Boas tree Skip list Variable-length array Bidirectional map Circular buffer Float ● ● Kdb tree● ● Priority queue ● ● ● ● VList Bin Compressed suffix FM-index ● ● Soft heap Propositional● Kd-tree ● ● ● ● VP-tree● Binary decision array Fusion tree ● directed acyclic ● Sorted array Koorde ● Weight-balanced tree diagram Gap buffer ● graph Spaghetti stack ● Container ● ● ● Winged edge● Binary heap Generalised suffix ● Leftist heap ● Quad-edge Sparse array Control table ● ● ● ● X-fast trie● Binary search tree tree ● Lightmap ● Quadtree ● Sparse matrix Xor linked list Cover tree ● ● Binary tree ● Graph ● Queap Splay tree X-tree● ● Ctrie ● Linear octree ● ● Binomial heap ● Graph-structured ● Queue SPQR-tree ● Y-fast trie Link/cut tree● ● Dancing tree ● ● stack Radix tree ● Zero suppressed Stack ●● Bit array ● ● D-ary heap ● Hash ● Linked list Randomized binary decision diagram String ●● Bitboard ● ● Zipper ● Decision tree ● Hash array mapped ● Lookup table search tree trie ● Suffix array ● Z-order ● Deque ● Range tree
  • 7. Game:Can you recognize some structures?
  • 8. Array: PHPs untruthfulnessPHP “Arrays” are not true Arrays!
  • 9. Array: PHPs untruthfulnessPHP “Arrays” are not true Arrays!An array typically looks like this: 0 1 2 3 4 5 Data Data Data Data Data Data
  • 10. Array: PHPs untruthfulnessPHP “Arrays” can dynamically grow and be iteratedboth directions (reset(), next(), prev(), end()),exclusively with O(1) operations.
  • 11. Array: PHPs untruthfulnessPHP “Arrays” can dynamically grow and be iteratedboth directions (reset(), next(), prev(), end()),exclusively with O(1) operations.Lets have a Doubly Linked List (DLL): Head Tail Data Data Data Data DataEnables List, Deque, Queue and Stackimplementations
  • 12. Array: PHPs untruthfulnessPHP “Arrays” elements are always accessible using akey (index).
  • 13. Array: PHPs untruthfulnessPHP “Arrays” elements are always accessible using akey (index).Lets have an Hash Table: Head Bucket pointers array Tail 0 1 2 3 4 5 ... nTableSize -1 Bucket * Bucket * Bucket * Bucket * Bucket * Bucket * Bucket * Bucket Bucket Bucket Bucket Bucket Data Data Data Data Data
  • 14. Array: PHPs untruthfulnesshttp://php.net/manual/en/language.types.array.php: “This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”
  • 15. Optimized for anything ≈ Optimized for nothing!
  • 16. Array: PHPs untruthfulness● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 Mb.● In PHP: it will take ≅ 13.97 Mb!● A PHP variable (containing an integer) takes 48 bytes.● The overhead of buckets for every “array” entries is about 96 bytes.● More details: http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
  • 17. Data Structure
  • 18. Structs (or records, tuples,...)
  • 19. Structs (or records, tuples,...)● A struct is a value containing other values which are typically accessed using a name.● Example: Person => firstName / lastName ComplexNumber => realPart / imaginaryPart
  • 20. Structs – Using array$person = array( "firstName" => "Patrick", "lastName" => "Allaert");
  • 21. Structs – Using a class$person = new PersonStruct( "Patrick", "Allaert");
  • 22. Structs – Using a class (Implementation)class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; }}
  • 23. Structs – Using a class (Implementation)class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception }}
  • 24. Structs – Pros and Cons Array Class+ Uses less memory (PHP < 5.4) - Uses more memory (PHP < 5.4)- Uses more memory (PHP = 5.4) + Uses less memory (PHP = 5.4)- No type hinting + Type hinting possible- Flexible structure + Rigid structure+|- Less OO +|- More OOSlightly faster? Slightly slower?
  • 25. (true) Arrays
  • 26. (true) Arrays● An array is a fixed size collection where elements are each identified by a numeric index.
  • 27. (true) Arrays● An array is a fixed size collection where elements are each identified by a numeric index. 0 1 2 3 4 5 Data Data Data Data Data Data
  • 28. (true) Arrays – Using SplFixedArray$array = new SplFixedArray(3);$array[0] = 1; // or $array->offsetSet()$array[1] = 2; // or $array->offsetSet()$array[2] = 3; // or $array->offsetSet()$array[0]; // gives 1$array[1]; // gives 2$array[2]; // gives 3
  • 29. (true) Arrays – Pros and Cons Array SplFixedArray- Uses more memory + Uses less memory+|- Less OO +|- More OO
  • 30. Queues
  • 31. Queues● A queue is an ordered collection respecting First In, First Out (FIFO) order.● Elements are inserted at one end and removed at the other.
  • 32. Queues● A queue is an ordered collection respecting First In, First Out (FIFO) order.● Elements are inserted at one end and removed at the other. Data Dequeue Data Data Data Data Data Data Enqueue Data
  • 33. Queues – Using array$queue = array();$queue[] = 1; // or array_push()$queue[] = 2; // or array_push()$queue[] = 3; // or array_push()array_shift($queue); // gives 1array_shift($queue); // gives 2array_shift($queue); // gives 3
  • 34. Queues – Using SplQueue$queue = new SplQueue();$queue[] = 1; // or $queue->enqueue()$queue[] = 2; // or $queue->enqueue()$queue[] = 3; // or $queue->enqueue()$queue->dequeue(); // gives 1$queue->dequeue(); // gives 2$queue->dequeue(); // gives 3
  • 35. Stacks
  • 36. Stacks● A stack is an ordered collection respecting Last In, First Out (LIFO) order.● Elements are inserted and removed on the same end.
  • 37. Stacks● A stack is an ordered collection respecting Last In, First Out (LIFO) order.● Elements are inserted and removed on the same end. Data Push Data Data Data Data Data Data Pop Data
  • 38. Stacks – Using array$stack = array();$stack[] = 1; // or array_push()$stack[] = 2; // or array_push()$stack[] = 3; // or array_push()array_pop($stack); // gives 3array_pop($stack); // gives 2array_pop($stack); // gives 1
  • 39. Stacks – Using SplStack$stack = new SplStack();$stack[] = 1; // or $stack->push()$stack[] = 2; // or $stack->push()$stack[] = 3; // or $stack->push()$stack->pop(); // gives 3$stack->pop(); // gives 2$stack->pop(); // gives 1
  • 40. Queues/Stacks – Pros and Cons Array SplQueue / SplStack- Uses more memory + Uses less memory(overhead / entry: 96 bytes) (overhead / entry: 48 bytes)- No type hinting + Type hinting possible+|- Less OO +|- More OO
  • 41. SetsGeeks Nerds People with strong views on the distinction between geeks and nerds
  • 42. Sets● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
  • 43. Sets● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them. Data Data Data Data Data
  • 44. Sets – Using array$set = array();// Adding elements to a set$set[] = 1;$set[] = 2;$set[] = 3;// Checking presence in a setin_array(2, $set); // truein_array(5, $set); // falsearray_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement
  • 45. Sets – Using array$set = array();// Adding elements to a set$set[] = 1;$set[] = 2;$set[] = 3; True// Checking presence in a set performancein_array(2, $set); // truein_array(5, $set); // false killers!array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement
  • 46. Sets – Mis-usageif ($value === "val1" || $value === "val2" || $value ==="val3"))){ // ...}
  • 47. Sets – Mis-usageif (in_array($value, array("val1", "val2", "val3"))){ // ...}
  • 48. Sets – Mis-usageswitch ($value){ case "val1": case "val2": case "val3": // ...}
  • 49. Sets – Using array (simple types)$set = array();// Adding elements to a set$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;// Checking presence in a setisset($set[2]); // trueisset($set[5]); // false$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
  • 50. Sets – Using array (simple types)$set = array();// Adding elements to a set$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;// Checking presence in a setisset($set[2]); // trueisset($set[5]); // false$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement● Remember that PHP Array keys can be integers or strings only!
  • 51. Sets – Using array (objects)$set = array();// Adding elements to a set$set[spl_object_hash($object1)] = $object1;$set[spl_object_hash($object2)] = $object2;$set[spl_object_hash($object3)] = $object3;// Checking presence in a setisset($set[spl_object_hash($object2)]); // trueisset($set[spl_object_hash($object5)]); // false$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
  • 52. Sets – Using array (objects)$set = array();// Adding elements to a set$set[spl_object_hash($object1)] = $object1; Store a$set[spl_object_hash($object2)] = $object2; reference of$set[spl_object_hash($object3)] = $object3; the object!// Checking presence in a setisset($set[spl_object_hash($object2)]); // trueisset($set[spl_object_hash($object5)]); // false$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
  • 53. Sets – Using SplObjectStorage (objects)$set = new SplObjectStorage();// Adding elements to a set$set->attach($object1); // or $set[$object1] = null;$set->attach($object2); // or $set[$object2] = null;$set->attach($object3); // or $set[$object3] = null;// Checking presence in a setisset($set[$object2]); // trueisset($set[$object2]); // false$set1->addAll($set2); // union$set1->removeAllExcept($set2); // intersection$set1->removeAll($set2); // complement
  • 54. Sets – Using QuickHash (int)$set = new QuickHashIntSet(64,QuickHashIntSet::CHECK_FOR_DUPES);// Adding elements to a set$set->add(1);$set->add(2);$set->add(3);// Checking presence in a set$set->exists(2); // true$set->exists(5); // false// Soonish: isset($set[2]);● No union/intersection/complement operations (yet?)● Yummy features like (loadFrom|saveTo)(String|File)
  • 55. Sets – Using bitsetsdefine("E_ERROR", 1); // or 1<<0define("E_WARNING", 2); // or 1<<1define("E_PARSE", 4); // or 1<<2define("E_NOTICE", 8); // or 1<<3// Adding elements to a set$set = 0;$set |= E_ERROR;$set |= E_WARNING;$set |= E_PARSE;// Checking presence in a set$set & E_ERROR; // true$set & E_NOTICE; // false$set1 | $set2; // union$set1 & $set2; // intersection$set1 ^ $set2; // complement
  • 56. Sets – Using bitsets (example)Instead of:function remove($path, $files = true, $directories = true, $links = true,$executable = true){ if (!$files && is_file($path)) return false; if (!$directories && is_dir($path)) return false; if (!$links && is_link($path)) return false; if (!$executable && is_executable($path)) return false; // ...}remove("/tmp/removeMe", true, false, true, false); // WTF ?!
  • 57. Sets – Using bitsets (example)Instead of:define("REMOVE_FILES", 1 << 0);define("REMOVE_DIRS", 1 << 1);define("REMOVE_LINKS", 1 << 2);define("REMOVE_EXEC", 1 << 3);define("REMOVE_ALL", ~0); // Setting all bitsfunction remove($path, $options = REMOVE_ALL){ if (~$options & REMOVE_FILES && is_file($path)) return false; if (~$options & REMOVE_DIRS && is_dir($path)) return false; if (~$options & REMOVE_LINKS && is_link($path)) return false; if (~$options & REMOVE_EXEC && is_executable($path)) return false; // ...}remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)
  • 58. Sets: Conclusions● Use the key and not the value when using PHP Arrays.● Use QuickHash for set of integers if possible.● Use SplObjectStorage as soon as you are playing with objects.● Dont use array_unique() when you need a set!
  • 59. Maps● A map is a collection of key/value pairs where all keys are unique.
  • 60. Maps – Using array$map = array();$map["ONE"] = 1;$map["TWO"] = 2;$map["THREE"] = 3;// Merging maps:array_merge($map1, $map2); // SLOW!$map2 + $map1; // Fast :)● Dont use array_merge() on maps.
  • 61. Multikey Maps – Using array$map = array();$map["ONE"] = 1;$map["UN"] =& $map["ONE"];$map["UNO"] =& $map["ONE"];$map["TWO"] = 2;$map["DEUX"] =& $map["TWO"];$map["DUE"] =& $map["TWO"];$map["UNO"] = "once";$map["DEUX"] = "twice";var_dump($map);/*array(6) {["ONE"] => &string(4) "once"["UN"] => &string(4) "once"["UNO"] => &string(4) "once"["TWO"] => &string(5) "twice"["DEUX"] => &string(5) "twice"["DUE"] => &string(5) "twice"}*/
  • 62. Heap● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
  • 63. Heap● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
  • 64. Heap – Using array$heap = array();$heap[] = 3;sort($heap);$heap[] = 1;sort($heap);$heap[] = 2;sort($heap);
  • 65. Heap – Using Spl(Min|Max)Heap$heap = new SplMinHeap;$heap->insert(3);$heap->insert(1);$heap->insert(2);
  • 66. Heaps: Conclusions● MUCH faster than having to re-sort() an array at every insertion.● If you dont require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach.● SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.
  • 67. Bloom filters● A bloom filter is a space-efficient probabilistic data structure used to test whether an element is member of a set.● False positives are possible, but false negatives are not!
  • 68. Bloom filters – Using bloomy// BloomFilter::__construct(int capacity [, doubleerror_rate [, int random_seed ] ])$bloomFilter = new BloomFilter(10000, 0.001);$bloomFilter->add("An element");$bloomFilter->has("An element"); // true for sure$bloomFilter->has("Foo"); // false, most probably
  • 69. Other related projects● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
  • 70. Other related projects● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
  • 71. Other related projects● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy● Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.
  • 72. Conclusions● Use appropriate data structure. It will keep your code clean and fast.
  • 73. Conclusions● Use appropriate data structure. It will keep your code clean and fast.● Think about the time and space complexity involved by your algorithms.
  • 74. Conclusions● Use appropriate data structure. It will keep your code clean and fast.● Think about the time and space complexity involved by your algorithms.● Name your variables accordingly: use “Map”, “Set”, “List”, “Queue”,... to describe them instead of using something like: $ordersArray.
  • 75. Questions?
  • 76. Thanks● Dont forget to rate this talk on http://joind.in/6371
  • 77. Photo Credits● Tuned car: http://www.flickr.com/photos/gioxxswall/5783867752● London Eye Structure: http://www.flickr.com/photos/photographygal123/4883546484● Cigarette: http://www.flickr.com/photos/superfantastic/166215927● Heap structure: http://en.wikipedia.org/wiki/File:Max-Heap.svg● Drawers: http://www.flickr.com/photos/jamesclay/2312912612● Stones stack: http://www.flickr.com/photos/silent_e/2282729987● Tree: http://www.flickr.com/photos/drewbandy/6002204996