2. •System to automate company registration
process
•Compares the company names using string
matching algorithms
•Names are ranked according to their similarity
percentage
•A name is rejected if the similarity score is 100%
Introduction
Introduction
8/30/2013
3. •To develop a system to resolve naming conflict.
•To find names similar to the name proposed by
user.
•To provide the ranks of matched proposed name
with other existing names.
Objectives
Objectives
8/30/2013
4. Building Base Dictionary
Keyword Generation
Finding Possible Matches
Finding Duplicates
Finding Ranks
Methodology
Methodology
8/30/2013
16. Translation
Translation
• conversion of the meaning of a source-language
text by means of an equivalent target-language
text
metal
center
nepal
Stemmed Tokens Translated Tokens
8/30/2013
21. Database Query using Final Token List
•nepal medical centre pvt. ltd.
•nepal dhatu company
•metal nepal pvt. ltd.
•enter nepal
•nepal metal industries
•dhatu sankalan kendra
8/30/2013
39. 111160 Registered Company Names
106299 Unique Reg. ID / Company Names
16326 Words in English- Nepali Dictionary
144 British-American Words for
Transformation
Dataset
Dataset
8/30/2013
41. •Stemming sometimes produces incorrect results if
input contains a Nepali word
•Dictionary (English-Nepali) does not contain enough
words
•Tokenization is based on whitespace and hyphen only
•Comparison is not phonetic based
Limitations
Limitations
8/30/2013
42. •Use of Taxonomy for classifying the tokens
•Using some weighing measures to assign weights to
tokens
•Implementation of faster searching methods
•Integration of phonetic based similarity measures
Future Enhancements
Future Enhancements
8/30/2013
43. Thank You
Gaurav Kumar Goyal 16214
Janardan Chaudhary 16216
Nimesh Mishra 16221
Sanat Maharjan 16230
8/30/2013
Editor's Notes
Add Presentation Date
Downcasting also referred as type refinement is act of casting script from uppercaseletters to lowercases. It is done so as to make sure there is no conflict in company namesdue to uppercase letters between the words to make it a unique name.
Transformation is the conversion of words from British English word to that to AmericanEnglish words. Transformation is done to avoid the generation of unwanted keywords orconflicting keywords. Our dictionary consist of around 130 commonly used words thatis converted when found from British English word to American English word.
remove the words that are considered similar/unimportant according to the Office of the Company Registrar.
Downcasting also referred as type refinement is act of casting script from uppercaseletters to lowercases. It is done so as to make sure there is no conflict in company namesdue to uppercase letters between the words to make it a unique name.
Process of reducing a word to its root form Stemming is the process of reducing a word to a root, or simpler form which are presentin plural forms. Stemming is often used in text processing applications. There are manydifferent approaches to stemming, each with their own design goals. Some areaggressive, reducing words to the smallest root possible. Here, Stemming is done withthe help of morphological analyzer. Morphological analysis is done in order to produceEnglish dictionary based words. For example, words like “services”, “metals” arereduced to simpler singular forms as “service” and “metal”.We used stemming to obtain the dictionary based root words. Using root words, wesimplified the matching process.
Translation is the conversion of the meaning of a source-language text by means of an equivalent target-language text. In this process, equivalent Nepali text is obtained of the English words as obtained by mapping each keyword matched accordingly with the English Dictionary. The matched word are then mapped with the English-Nepali Dictionary provided by Madan Puraskar Pustakalaya. The unmatched words are simply placed with translatedtokens. For Example the word “nepal”, “metal” is mapped onto the dictionary to get the word “नेपाल”, “धातु”.
Transliteration is the conversion of a text from one script to another. To transliterate a Nepali word to English word, we used dictionary mapping to map individual Nepali syllable to form English alphabet. Here in above example of translation the word “नेपाल”,“धातु” are transliterated to “Nepal” and “dhatu” and then extracted to the pool of keywords for further processing.
Obtained from the process of stemming and transliteration. Unique tokens are taken aften double metaphone comparison.
Database query is constructed from the final token list and matched against using MySQL inbuilt like function %like%