Presentation on the Trie datastructure, showing how it works, how it's used and what it can be used for; and an implementation of Tries in PHP... with occasional references to Rugby League
Example code to go with the slides can be found at https://github.com/MarkBaker/Tries
2. Who am I?
Mark Baker
Design and Development Manager
InnovEd (Innovative Solutions for Education) Ltd
Coordinator and Developer of:
Open Source PHPOffice library
PHPExcel, PHPWord,PHPPowerPoint, PHPProject, PHPVisio
Minor contributor to PHP core
@Mark_Baker
https://github.com/MarkBaker
http://uk.linkedin.com/pub/mark-baker/b/572/171
3. Tries – What is a Trie?
A trie is a key/value data structure that stores
the information about the key of each node
in the path from the root to the node, rather
than in the node itself.
Each path between the root and a leaf
represents a key.
Each transition between two nodes is
labelled with a single character from a key.
Typically (though not always) the keys will be
string values.
4. Tries – What is a Trie?
A special marker on each node indicates
whether or not it represents the end of a key.
An “end” node may still have child nodes.
5. Tries – What is a Trie?
Methods:
◦ Insert(key, value)
◦ Delete(key)
◦ Search(key)
6. Tries – What is a Trie?
Used for:
◦ Dictionary lookups
◦ Predictive text / Autocomplete
◦ Spell checkers
◦ DNA sequencing
◦ Burst Sort
7. Conversions – Tries in PHP
class TrieNode {
/**
* Array of child nodes indexed by next character
*
* @var TrieNode[]
**/
public $children = array();
/**
* Flag indicating if this node is an end node
*
* @var boolean
**/
public $valueNode = false;
/**
* Data value (empty unless this is an end node)
*
* @var mixed
**/
public $value;
}
10. Conversions – Tries in PHP
class Trie {
/**
* Adds a new entry to the Trie
* If the specified node already exists, then its value will be overwritten
*
* @param mixed $key Key for this node entry
* @param mixed $value Data Value for this node entry
* @return null
*/
public function add($key, $value = null) {
$trieNodeEntry = $this->getTrieNodeByKey($key, true);
$trieNodeEntry->valueNode = true;
$trieNodeEntry->value = $value;
}
}
12. Conversions – Tries in PHP
class Trie {
/**
* Backtrack toward the root of the Trie, deleting as we go,
* until we reach a node that we shouldn't delete
*
* @param TrieNode $trieNode This node entry
* @param mixed $key The full key for this node entry
* @return null
*/
private function delete_backtrace(TrieNode $trieNode, $key) {
$previousKey = substr($key, 0, -1);
$thisChar = substr($key, -1);
$previousTrieNode = $this->getTrieNodeByKey($previousKey);
unset($previousTrieNode->children[$thisChar]);
if ((count($previousTrieNode->children) == 0) && (!$previousTrieNode->valueNode)) {
$this->delete_backtrace($previousTrieNode, $previousKey);
}
}
/**
* Delete a node in the Trie
*
* @param mixed $key The key for the node that we want to delete
* @return boolean Success or failure, false if the node didn't exist
*/
public function delete($key) {
$trieNode = $this->getTrieNodeByKey($key);
if (!$trieNode) {
return false;
}
if (!empty($trieNode->children)) {
$trieNode->valueNode = false;
$trieNode->value = null;
} else {
$this->delete_backtrace($trieNode, $key);
}
return true;
}
}
13. Conversions – Tries in PHP
function buildTries($fileName) {
$playerData = json_decode(
file_get_contents($fileName)
);
$trie = new Trie();
foreach($playerData as $player) {
$playerName = $player->surname . ', ' . $player->firstname;
$trie->add($playerName, $player);
}
return $trie;
}
/* Populate the trie */
$tries = buildTries(__DIR__ . '/RugbyData.json');
/* Do some searches */
$searchResult = $tries->search($searchName);
if (empty($searchResult)) {
echo 'No matches found', PHP_EOL;
} else {
$players = array_slice($searchResult, 0, $limit);
foreach($players as $player) {
echo $player->surname, ', ', $player->firstname, PHP_EOL;
}
}
14. Conversions – Tries in PHP
/usr/mark/presentations/tries php trieSearch.php Hall
Load Time: 0.0221 s
Current Memory: 4499.16 k
Peak Memory: 4569.01 k
Hall, Bill
Hall, Harry
Hall, James
Hall, Martin
Halliwell, Billy
Halliwell, C
Halliwell, Frank
Halliwell, Jimmy
Search Time: 0.0045 s
Current Memory: 4500.70 k
Peak Memory: 4569.01 k