Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Arranging the words of a text lexicographically trie

41 views

Published on

How can you arrange the words of an input text lexicographically? We can do it using Brute Force method.But is there any other better way? Yes, there are. One such way is by using Trie or Prefix Tree Data Structure. This document shows how to do it using practical code.

Published in: Software
  • Be the first to comment

Arranging the words of a text lexicographically trie

  1. 1. SOM-ITSOLUTIONS ALGORITHM Arranging the words of an Input Text Lexicographically - Trie SOMENATH MUKHOPADHYAY som-itsolutions  #A2 1/13 South Purbachal Hospital Road Kolkata 700078 Mob: +91 9748185282 Email: ​som.mukhopadhyay@som-itsolutions.com​ / ​som.mukhopadhyay@gmail.com Website: ​http://www.som-itsolutions.com/ Blog: ​www.som-itsolutions.blogspot.com
  2. 2. As i am trying to recapitulate my knowledge base vis-a-vis data structure and algorithm, i am trying to solve various practical real life problems. The real aim is to prepare my students for programming olympiad. Thus while browsing through the Indian Association for Research in Computing Science, i got a this problem : “ Problem : Word List In this problem the input will consist of a number of lines of English text consisting of the letters of the English alphabet, the punctuation marks ' (apostrophe), . (full stop), , (comma), ; (semicolon), :(colon) and white space characters (blank, newline). Your task is print the words in the text in lexicographic order (that is, dictionary order). Each word should appear exactly once in your list. You can ignore the case (for instance, "The" and "the" are to be treated as the same word.) There should be no uppercase letters in the output. For example, consider the following candidate for the input text: This is a sample piece of text to illustrate this problem. The corresponding output would read as: a illustrate is of piece problem sample text this to ” So i thought to use the concept of Trie or Prefix Tree to come out with a solution. Trie is a special kind of Tree Data Structure which is very useful for Dictionary kind of application. It has the in built properties of Prefix matching, neighbor search etc. The Trie data structure is as shown below:
  3. 3. With the given problem, we first remove the punctuation and make all lowercase of the input text. Then we can tokenize the string and get different words. We then put these words in a Trie. After that if we traverse the Trie Preorderly, we will get the desired output. The whole solution looks like the following. Class Trie package​ ​com.somitsolutions.java.training.wordsindictionaryform​; public​ ​class​ ​Trie​ ​{ ​// Alphabet size (# of symbols) ​static​ ​final​ ​int​ ALPHABET_SIZE ​=​ ​26​; ​// trie node ​static​ ​class​ ​TrieNode ​{ TrieNode​[]​ children ​=​ ​new​ TrieNode​[​ALPHABET_SIZE​];
  4. 4. ​// isEndOfWord is true if the node represents ​// end of a word ​boolean​ isEndOfWord​; String key​;​//Store the TrieNode​(){ key ​=​ ​null​; isEndOfWord ​=​ ​false​; ​for​ ​(​int​ i ​=​ ​0​;​ i ​<​ ALPHABET_SIZE​;​ i​++) children​[​i​]​ ​=​ ​null​; ​} ​} ​static​ TrieNode root​; ​// If not present, inserts key into trie ​// If the key is prefix of trie node, ​// just marks leaf node ​static​ ​void​ ​insert​(​String key​) ​{ ​int​ level​; ​int​ length ​=​ key​.​length​(); ​int​ index​; TrieNode pCrawl ​=​ root​; ​for​ ​(​level ​=​ ​0​;​ level ​<​ length​;​ level​++) ​{ index ​=​ key​.​charAt​(​level​)​ ​-​ ​'a'​; ​if​ ​(​pCrawl​.​children​[​index​]​ ​==​ ​null​) pCrawl​.​children​[​index​]​ ​=​ ​new​ TrieNode​(); pCrawl ​=​ pCrawl​.​children​[​index​]; ​} ​// mark last node as leaf pCrawl​.​isEndOfWord​ ​=​ ​true​; ​//Store the key at the leaf node pCrawl​.​key​ ​=​ key​; ​} ​//This will print out Lexicographically sorted output. ​//To begin with, pass pCrawl as root ​static​ ​void​ ​preorder​(​TrieNode pCrawl ​){ ​//exit condition ​if​(​pCrawl ​==​ ​null​){ ​return​;
  5. 5. ​} ​for​(​int​ index ​=​ ​0​;​ index​<​ ​26​ ​;​index​++){ if​(​pCrawl​.​children​[​index​]​ ​!=​ ​null​){ if​(​pCrawl​.​children​[​index​].​key​ ​!=​ ​null​){ System​.​out​.​println​(​pCrawl​.​children​[​index​].​key​); } preorder​(​pCrawl​.​children​[​index​]); } } ​} } //Class Main package​ ​com.somitsolutions.java.training.wordsindictionaryform​; import​ ​java.util.StringTokenizer​; public​ ​class​ ​Main​ ​{ static​ String keys​[]​ ​=​ ​new​ String​[​1000​]; public​ ​static​ ​void​ ​main​(​String​[]​ args​)​ ​{ // TODO Auto-generated method stub String inputStr ​=​ ​"This is a lamb. This is white in color."​; String refinedStr ​= inputStr​.​replaceAll​(​"[^a-zA-Z -]"​, ""​).​toLowerCase​(); dictionary​(​refinedStr​); } ​private​ ​static​ ​void​ ​dictionary​(​String in​){ StringTokenizer st ​=​ ​new​ StringTokenizer​(​in​,​ ​" "​); int​ i ​=​ ​0​; Trie​.​root​ ​=​ ​new​ Trie​.​TrieNode​(); while​ ​(​st​.​hasMoreTokens​()){ String strTemp ​=​ st​.​nextToken​(); Trie​.​insert​(​strTemp​); i​++;
  6. 6. } Trie​.​preorder​(​Trie​.​root​); } }

×