Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Php data structures – beyond spl (online version)

1,675 views

Published on

Presentation on the Trie datastructure, showing how it works, how it's used and what it can be used for; and an implementation of Tries in PHP... with occasional references to Rugby League

Example code to go with the slides can be found at https://github.com/MarkBaker/Tries
and
https://github.com/MarkBaker/QuadTrees

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Php data structures – beyond spl (online version)

  1. 1. PHP DataStructures – Beyond SPL A dreamscape made from random noise. Illustration: Google
  2. 2. DataStructures A data structure is a particular way of organizing data in a computer so that it can be used efficiently. Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks.
  3. 3. DataStructures in PHP • Some basic DataStructures available in PHP’s SPL • Stack • Queue • Heap • Doubly-Linked List • Fixed Array • SPL Object Storage • SPL is the Standard PHP Library • (Yet another recursive acronym)
  4. 4. DataStructures • Some additional DataStructures that don’t exist in core PHP • Tries • QuadTrees
  5. 5. Tries
  6. 6. Tries • A Tree structure comprising a hierarchy of “indexed” nodes • Each node can contain: • A series of pointers (keys) to the next node in the hierarchy • A bucket for data values • This allows for multiple values with the same key • There are three basic types of Tries: • Tries • Radix Tries • Suffix Tries
  7. 7. Tries – Purpose • Fast lookup with a partial key • Example implementation https://github.com/MarkBaker/Tries
  8. 8. Tries – Uses • Replacement for PHP Arrays (Hashmaps) • No key collisions • Duplicate Keys supported • No Hashing function required • Partial Key Lookups • Predictive Text • Autocomplete • Spell-Checking • Hyphen-isation
  9. 9. Tries – Methods • add($key, $value = null) Adds new data to a Trie • search($prefix) Find data in a Trie • delete($key) • isNode($key) • isMember($key)
  10. 10. Tries – Basic Trie • Node pointers comprise a single character or byte
  11. 11. Tries – Basic Trie $trie = new Trie(); $trie->add('cat', 'cat data'); C A T
  12. 12. Tries – Basic Trie $trie = new Trie(); $trie->add('cat', 'cat data'); $trie->add('car', 'car data'); C A T R
  13. 13. Tries – Basic Trie $trie = new Trie(); $trie->add('cat', 'cat data'); $trie->add('car', 'car data'); $trie->add('cart', 'cart data'); C A T R T
  14. 14. Tries – Basic Trie $trie = new Trie(); $trie->add('cat', 'cat data'); $trie->add('car', 'car data'); $trie->add('cart', 'cart data'); $trie->search('car'); T T C C A A R R
  15. 15. Tries – Basic Trie • The key to a data node is inherent in the path to that node, so it is not necessary to store the key
  16. 16. Tries – Radix Trie • Node pointers comprise one or more characters or bytes • This means they can be more compact and memory efficient than a basic Trie • It can add more overhead to building the Trie • It may be faster to search the Trie hierarchy
  17. 17. Tries – Radix Trie $radixTrie = new RadixTrie(); $radixTrie->add('cat', 'cat data'); CAT
  18. 18. Tries – Radix Trie $radixTrie = new Trie(); $radixTrie->add('cat', 'cat data'); $radixTrie->add('car', 'car data'); CA T R
  19. 19. Tries – Radix Trie $radixTrie = new Trie(); $radixTrie->add('cat', 'cat data'); $radixTrie->add('car', 'car data'); $radixTrie->add('cart', 'cart data'); CA T R T
  20. 20. Tries – Suffix Trie $suffixTrie = new SuffixTrie(); $suffixTrie->add('cat', 'cat data'); C A T
  21. 21. Tries – Suffix Trie $suffixTrie = new SuffixTrie(); $suffixTrie->add('cat', 'cat data'); C A T TA T
  22. 22. Tries – Suffix Trie $suffixTrie = new SuffixTrie(); $suffixTrie->add('cat', 'cat data'); $suffixTrie->search('at'); C A T T A T A T
  23. 23. Tries – Suffix Tries • Memory hungry • n + n-1 + n-2… 2 + 1 nodes (where n is key length) used for every key/value stored in a Suffix Trie • Slow to populate • Can be used to search for “contains” rather than simply “begins with”
  24. 24. Tries – Suffix Tries • It is necessary to store the key with the data • A search can return duplicate values • e.g. “banana” if we search for “a” or “n” or even “ana” • Data should only be stored once for the “full word”, and subsequent sequences should only store a pointer to that data
  25. 25. QuadTrees
  26. 26. QuadTrees • A Tree structure that partitions a 2-Dimensional space by recursively subdividing it into quadrants (or regions) • Each node can contain: • A series of pointers (keys) to the next node in the hierarchy • A bucket for data values • There are different types of QuadTrees: • Point QuadTrees • Region QuadTrees • Edge QuadTrees • Polygonal Map (PM) QuadTrees
  27. 27. QuadTrees – Purpose • Fast Geo-spatial or Graph lookup • Sparse data compression • Example implementation https://github.com/MarkBaker/QuadTrees
  28. 28. QuadTrees – Uses • Spatial Indexing • Storing Sparse Data e.g. • Spreadsheet format data • Pixel data in images • Collision Detection • Points within a field of vision
  29. 29. QuadTrees – Methods • insert($xyCoordinate, $value = null) Adds new data to a QuadTree • search($boundingBox) Find data in a QuadTree
  30. 30. QuadTrees – Point QuadTree • Used for Spatial Indexing
  31. 31. QuadTrees – Spatial Indexing$quadTree = new QuadTree( -180, 90, 180, -90, // Dimensions 3 // Bucket size ); -90 90 0 -180 180
  32. 32. $quadTree = new QuadTree( -180, 90, 180, -90, // Dimensions 3 // Bucket size ); $quadTree->add('London', 51.5072, -0.1275); $quadTree->add('New York', 40.7127, - 74.0059); $quadTree->add('Paris', 48.8567, 2.3508); QuadTrees – Spatial Indexing -90 90 0 -180 180
  33. 33. QuadTrees – Spatial Indexing$quadTree = new QuadTree( -180, 90, 180, -90, // Dimensions 3 // Bucket size ); $quadTree->add('London', 51.5072, -0.1275); $quadTree->add('New York', 40.7127, - 74.0059); $quadTree->add('Paris', 48.8567, 2.3508); $quadTree->add('Munich', 48.1333, 11.5667); $quadTree->add('Dublin', 53.3478, 6.2597); $quadTree->add('Rome', 41.9000, 12.5000); $quadTree->add('Athens', 37.9667, 23.7167); -90 90 90 0 0 -180 -180 1800 0 45 90 0 45 180
  34. 34. QuadTrees – Spatial Indexing$quadTree = new QuadTree( -180, 90, 180, -90, // Dimensions 3 // Bucket size ); $quadTree->add('London', 51.5072, -0.1275); $quadTree->add('New York', 40.7127, - 74.0059); $quadTree->add('Paris', 48.8567, 2.3508); $quadTree->add('Munich', 48.1333, 11.5667); $quadTree->add('Dublin', 53.3478, 6.2597); $quadTree->add('Rome', 41.9000, 12.5000); $quadTree->add('Athens', 37.9667, 23.7167); $quadTree->add('Amsterdam', 52.3667, 4.9000); -90 90 90 0 90 45 0 -180 -180 1800 0 45 90 0 45 180 0 90
  35. 35. $quadTree = new QuadTree( -180, 90, 180, -90, // Dimensions 3 // Bucket size ); $quadTree->add('London', 51.5072, -0.1275); $quadTree->add('New York', 40.7127, - 74.0059); $quadTree->add('Paris', 48.8567, 2.3508); $quadTree->add('Munich', 48.1333, 11.5667); $quadTree->add('Dublin', 53.3478, 6.2597); $quadTree->add('Rome', 41.9000, 12.5000); $quadTree->add('Athens', 37.9667, 23.7167); $quadTree->add('Amsterdam', 52.3667, 4.9000); … // Search QuadTree for Northern Europe $quadTree->find( -15.0, 60.0, 25.0, 45.0 ); QuadTrees – Spatial Indexing -90 90 90 0 90 45 45 45 0 0 0 0 45 45 67.5 45 -45 0 -90 -180 180 -180 1800 0 0 180 90 0 45 0 90 0 90 90 180 0 45
  36. 36. QuadTrees – Spatial Indexing • The top-level node need not be limited to the maximum graph space (i.e. the whole world)
  37. 37. QuadTrees – Spatial Indexing
  38. 38. QuadTrees – Spatial Indexing • With a larger bucket size • QuadTree is smaller, fewer nodes using less memory • More points need checking in each node • Faster to insert / slower to search • With a smaller bucket size • The QuadTree uses more memory • Fewer points in each node to check • Slower to insert / faster to search
  39. 39. QuadTrees – Region QuadTree • Used for Sparse-data Compression • Used for Level-based Aggregations
  40. 40. QuadTrees – Image Compression
  41. 41. QuadTrees • The same principles can be applied to 3-Dimensional space using an Octree
  42. 42. PHP DataStructures – Beyond SPL A dreamscape made from random noise. Illustration: Google Questions ?
  43. 43. Who am I? Mark Baker Design and Development Manager InnovEd (Innovative Solutions for Education) Learning Ltd Coordinator and Developer of: Open Source PHPOffice library PHPExcel, PHPWord, PHPPowerPoint, PHPProject, PHPVisio Minor contributor to PHP core Other small open source libraries available on github @Mark_Baker https://github.com/MarkBaker http://uk.linkedin.com/pub/mark-baker/b/572/171

×