SlideShare a Scribd company logo
1 of 20
Presented by:
Joyeeta Bagchi
Sneha Sarkar
Anasuya Paul
Koushik Dutta
Under the guidance of-
Mr. Alok Ranjan Pal 1
CONTENTS
 Introduction
 Difficulty of the language
 Overview of stemming
 Related Work
 Pictorial Representation of proposed approach
 Example of each step
 Module 1(Suffix Stripping)
 Module 2(Applying Rules)
 Explanation of module 2 with example
 Algorithm
 Partial view of input file
 List of suffixes
 Partial view of output file
 Efficiency and time complexity
 Conclusion and future work
2
INTRODUCTION
Stemming is an operation that splits a word into the
constituent root part and affix without doing
complete morphological analysis.
For example, Eating = Eat + -ing
Worked = Work + -ed
The main purpose of stemming is to reduce different
grammatical forms / word forms of a word like its
noun, adjective, verb, adverb etc. to its root form.
We can say that the goal of stemming is to reduce
inflectional forms and sometimes derivationally
related forms of a word to a common base form.
3
DIFFICULTY OF THE LANGUAGE
Bengali is one of the most morphologically rich language
and stemming of Bengali verb is the most problematic
area for Stemming.
Sometimes, nearly 10x5 forms for a certain verb in
Bengali may appear in different contexts.
4
OVERVIEW OF STEMMING
A typical simple stemmer algorithm involves
removing suffixes using a list of frequent suffixes,
while a more complex one would use morphological
knowledge to derive a stem from the words.
5
RELATED WORK
 In 1980 Martin Porter developed the “Porter Stemmer”.It
uses the fact that English language suffixes are mostly a
combination of smaller and simpler suffixes.
 Ramanathan and Rao (2003) proposed a lightweight
stemmer for Hindi which has used a hand crafted suffix list
and has performed longest match stripping
 Dasgupta and Ng (2006) proposed unsupervised
morphological parsing of Bengali. When evaluated on a
set of 4,110 human-segmented Bengali words, the
algorithm achieves 83% success.
 Majgaonker and Siddiqui (2010) developed an
unsupervised approach for Marathi stemmer. The
maximum accuracy observed is 82.5% for the statistical
suffix stripping approach.
 Suba et al. (2011) proposed two stemmers for Gujarati, with
an average accuracy of about 90.7%.
6
PICTORIAL REPRESENTATION OF
PROPOSED APPROACH
7
EXPLANATION OF EACH STEP
8
MODULE 1 (SUFFIX STRPPING)
9
MODULE 2 (APPLYING RULES)
10
EXPLANATION OF MODULE 2 USING
EXAMPLE
11
ALGORITHM
STEP1: Start of algorithm.
STEP 2: Create 4 new string[] namely splits1[], splits2[ ] and splits3[ ].
STEP 3: Read the contents of the doc files and split the words by space (‘ ’)
separator.
3.1. Store the words of each sentence in splits1[ ].
3.2. Store the inflexions in splits2[ ].
3.3. Store the desired root words in splits3[ ].
STEP 4: Declare and initialize variables l1=length of splits1[ ] , 12=length of
splits2[ ] .
STEP 5: Fetch the inflected verb forms in input1[] from splits1[i] if ‘/verb’ is
contained by the currently fetched word. This step is repeated 11
times.
5.1. Determine the subroot from input1[i] by repeating the steps
12 times.
5.1.1. if splits2[j] in contained in input1[i] then,
5.1.1.a. Declare variable index which stores the index
of last occurrence of splits2[j] in input1[i]. 12
13
5.1.1.b. If index is greater than equal to 2 then,
5.1.1.b.i. Store the substring of input1[i] from
begindex=0 to endindex=index in input1[i].
5.1.1.b.ii. Break the loop.
5.2. Determine the actual root input1[i] by repeating the steps l1 times.
5.2.1. Check the ending kar of input1[i].
5.2.1.a. I f input1[i] ends with e-kar(‘ি ’), o-kar(‘ো ’),
a-kar(‘ে ’) or aa-kar(‘ ’) then, replace it
with aa-kar(‘ ’).
5.2.1.b. if length of input1[i] is less than 3, concate it
with aa-kar(‘ ’).
5.2.2. Check the starting kar of input1[i].
5.2.2.a. if input1[i] starts with e-kar (‘ি ’), then
replace it with a-kar(‘ে ’).
5.2.2.b. if input1[i] starts with u-kar (‘ ’), then
replace it with o-kar(‘ে ’).
5.2.2.c. if input1[i] starts with a-kar(‘ে ’), then
replace it with aa-kar(‘ ’).
14
STEP 6: Generate the output doc file by copying the contents of
splits1[] and concatenating it with their obtained root
words from input1[] wherever the word contains ‘/verb’.
STEP 7: Compare the obtained sentences in splits1[ ] with the
desired sentences in splits3[ ] and calculate the efficiency.
STEP 8: End of algorithm.
PARTIAL VIEW OF INPUT FILE
15
LIST OF SUFFIXES
16
PARTIAL VIEW OF OUTPUT FILE
17
EFFICIENCY:
Dealing with 450 inflections of 14 selected root verbs, the
proposed approach gives an efficiency of 99.36%.
TIME COMPLEXITY:
The time complexity of the proposed algorithm in worst
case is O(n2).
EFFICIENCY & TIME COMPLEXITY
18
CONCLUSION AND FUTURE WORK
In this project, we present a lightweight stemmer for
14 selected Bengali Verbs that strips the suffixes
using a predefined suffix list, on a “longest match”
basis, and then finds root on basis of some rules.
Except a few cases, the result obtained from our
algorithm is quite satisfactory according to our
expectation.
We argue that a stronger and populated learning set
would invariably yield better result. In future , we
plan to test our algorithm with more sets of Bengali
verbs.
19
20

More Related Content

What's hot

Jarrar: Introduction to Information Retrieval
Jarrar: Introduction to Information RetrievalJarrar: Introduction to Information Retrieval
Jarrar: Introduction to Information RetrievalMustafa Jarrar
 
HMM BASED POS TAGGER FOR HINDI
HMM BASED POS TAGGER FOR HINDIHMM BASED POS TAGGER FOR HINDI
HMM BASED POS TAGGER FOR HINDIcscpconf
 
An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...csandit
 
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...cscpconf
 
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATIONHANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATIONijnlc
 
A lexisearch algorithm for the Bottleneck Traveling Salesman Problem
A lexisearch algorithm for the Bottleneck Traveling Salesman ProblemA lexisearch algorithm for the Bottleneck Traveling Salesman Problem
A lexisearch algorithm for the Bottleneck Traveling Salesman ProblemCSCJournals
 
Abstract Class Presentation
Abstract Class PresentationAbstract Class Presentation
Abstract Class Presentationtigerwarn
 

What's hot (7)

Jarrar: Introduction to Information Retrieval
Jarrar: Introduction to Information RetrievalJarrar: Introduction to Information Retrieval
Jarrar: Introduction to Information Retrieval
 
HMM BASED POS TAGGER FOR HINDI
HMM BASED POS TAGGER FOR HINDIHMM BASED POS TAGGER FOR HINDI
HMM BASED POS TAGGER FOR HINDI
 
An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...
 
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
 
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATIONHANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
 
A lexisearch algorithm for the Bottleneck Traveling Salesman Problem
A lexisearch algorithm for the Bottleneck Traveling Salesman ProblemA lexisearch algorithm for the Bottleneck Traveling Salesman Problem
A lexisearch algorithm for the Bottleneck Traveling Salesman Problem
 
Abstract Class Presentation
Abstract Class PresentationAbstract Class Presentation
Abstract Class Presentation
 

Similar to NEW_PPT

Monitoring and feedback in the process of language acquisition analysis and ...
Monitoring and feedback in the process of language acquisition  analysis and ...Monitoring and feedback in the process of language acquisition  analysis and ...
Monitoring and feedback in the process of language acquisition analysis and ...ijnlc
 
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)ijnlc
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Keyappasami
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicijnlc
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONcscpconf
 
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION cscpconf
 
Multi Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkMulti Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkIRJET Journal
 
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...Waqas Tariq
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIijnlc
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabikevig
 
Topic Segmentation in Dialogue
Topic Segmentation in DialogueTopic Segmentation in Dialogue
Topic Segmentation in DialogueJinho Choi
 
IRJET- A Rule-Based Stemmer for Punjabi Verbs
IRJET- A Rule-Based Stemmer for Punjabi VerbsIRJET- A Rule-Based Stemmer for Punjabi Verbs
IRJET- A Rule-Based Stemmer for Punjabi VerbsIRJET Journal
 
BanglaDocAnalyzer
BanglaDocAnalyzerBanglaDocAnalyzer
BanglaDocAnalyzerSamina Azad
 
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISONSIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISONIJCSEA Journal
 
Development of text to speech system for yoruba language
Development of text to speech system for yoruba languageDevelopment of text to speech system for yoruba language
Development of text to speech system for yoruba languageAlexander Decker
 
Development of Lexicons Generation Tools for Arabic: Case of an Open Source C...
Development of Lexicons Generation Tools for Arabic: Case of an Open Source C...Development of Lexicons Generation Tools for Arabic: Case of an Open Source C...
Development of Lexicons Generation Tools for Arabic: Case of an Open Source C...ijnlc
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlpeSAT Journals
 

Similar to NEW_PPT (20)

Monitoring and feedback in the process of language acquisition analysis and ...
Monitoring and feedback in the process of language acquisition  analysis and ...Monitoring and feedback in the process of language acquisition  analysis and ...
Monitoring and feedback in the process of language acquisition analysis and ...
 
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Key
 
project doc (1)
project doc (1)project doc (1)
project doc (1)
 
UWB semeval2016-task5
UWB semeval2016-task5UWB semeval2016-task5
UWB semeval2016-task5
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
Ay34306312
Ay34306312Ay34306312
Ay34306312
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
 
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
 
Multi Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkMulti Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation Network
 
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabi
 
Topic Segmentation in Dialogue
Topic Segmentation in DialogueTopic Segmentation in Dialogue
Topic Segmentation in Dialogue
 
IRJET- A Rule-Based Stemmer for Punjabi Verbs
IRJET- A Rule-Based Stemmer for Punjabi VerbsIRJET- A Rule-Based Stemmer for Punjabi Verbs
IRJET- A Rule-Based Stemmer for Punjabi Verbs
 
BanglaDocAnalyzer
BanglaDocAnalyzerBanglaDocAnalyzer
BanglaDocAnalyzer
 
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISONSIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
 
Development of text to speech system for yoruba language
Development of text to speech system for yoruba languageDevelopment of text to speech system for yoruba language
Development of text to speech system for yoruba language
 
Development of Lexicons Generation Tools for Arabic: Case of an Open Source C...
Development of Lexicons Generation Tools for Arabic: Case of an Open Source C...Development of Lexicons Generation Tools for Arabic: Case of an Open Source C...
Development of Lexicons Generation Tools for Arabic: Case of an Open Source C...
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 

NEW_PPT

  • 1. Presented by: Joyeeta Bagchi Sneha Sarkar Anasuya Paul Koushik Dutta Under the guidance of- Mr. Alok Ranjan Pal 1
  • 2. CONTENTS  Introduction  Difficulty of the language  Overview of stemming  Related Work  Pictorial Representation of proposed approach  Example of each step  Module 1(Suffix Stripping)  Module 2(Applying Rules)  Explanation of module 2 with example  Algorithm  Partial view of input file  List of suffixes  Partial view of output file  Efficiency and time complexity  Conclusion and future work 2
  • 3. INTRODUCTION Stemming is an operation that splits a word into the constituent root part and affix without doing complete morphological analysis. For example, Eating = Eat + -ing Worked = Work + -ed The main purpose of stemming is to reduce different grammatical forms / word forms of a word like its noun, adjective, verb, adverb etc. to its root form. We can say that the goal of stemming is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. 3
  • 4. DIFFICULTY OF THE LANGUAGE Bengali is one of the most morphologically rich language and stemming of Bengali verb is the most problematic area for Stemming. Sometimes, nearly 10x5 forms for a certain verb in Bengali may appear in different contexts. 4
  • 5. OVERVIEW OF STEMMING A typical simple stemmer algorithm involves removing suffixes using a list of frequent suffixes, while a more complex one would use morphological knowledge to derive a stem from the words. 5
  • 6. RELATED WORK  In 1980 Martin Porter developed the “Porter Stemmer”.It uses the fact that English language suffixes are mostly a combination of smaller and simpler suffixes.  Ramanathan and Rao (2003) proposed a lightweight stemmer for Hindi which has used a hand crafted suffix list and has performed longest match stripping  Dasgupta and Ng (2006) proposed unsupervised morphological parsing of Bengali. When evaluated on a set of 4,110 human-segmented Bengali words, the algorithm achieves 83% success.  Majgaonker and Siddiqui (2010) developed an unsupervised approach for Marathi stemmer. The maximum accuracy observed is 82.5% for the statistical suffix stripping approach.  Suba et al. (2011) proposed two stemmers for Gujarati, with an average accuracy of about 90.7%. 6
  • 9. MODULE 1 (SUFFIX STRPPING) 9
  • 10. MODULE 2 (APPLYING RULES) 10
  • 11. EXPLANATION OF MODULE 2 USING EXAMPLE 11
  • 12. ALGORITHM STEP1: Start of algorithm. STEP 2: Create 4 new string[] namely splits1[], splits2[ ] and splits3[ ]. STEP 3: Read the contents of the doc files and split the words by space (‘ ’) separator. 3.1. Store the words of each sentence in splits1[ ]. 3.2. Store the inflexions in splits2[ ]. 3.3. Store the desired root words in splits3[ ]. STEP 4: Declare and initialize variables l1=length of splits1[ ] , 12=length of splits2[ ] . STEP 5: Fetch the inflected verb forms in input1[] from splits1[i] if ‘/verb’ is contained by the currently fetched word. This step is repeated 11 times. 5.1. Determine the subroot from input1[i] by repeating the steps 12 times. 5.1.1. if splits2[j] in contained in input1[i] then, 5.1.1.a. Declare variable index which stores the index of last occurrence of splits2[j] in input1[i]. 12
  • 13. 13 5.1.1.b. If index is greater than equal to 2 then, 5.1.1.b.i. Store the substring of input1[i] from begindex=0 to endindex=index in input1[i]. 5.1.1.b.ii. Break the loop. 5.2. Determine the actual root input1[i] by repeating the steps l1 times. 5.2.1. Check the ending kar of input1[i]. 5.2.1.a. I f input1[i] ends with e-kar(‘ি ’), o-kar(‘ো ’), a-kar(‘ে ’) or aa-kar(‘ ’) then, replace it with aa-kar(‘ ’). 5.2.1.b. if length of input1[i] is less than 3, concate it with aa-kar(‘ ’). 5.2.2. Check the starting kar of input1[i]. 5.2.2.a. if input1[i] starts with e-kar (‘ি ’), then replace it with a-kar(‘ে ’). 5.2.2.b. if input1[i] starts with u-kar (‘ ’), then replace it with o-kar(‘ে ’). 5.2.2.c. if input1[i] starts with a-kar(‘ে ’), then replace it with aa-kar(‘ ’).
  • 14. 14 STEP 6: Generate the output doc file by copying the contents of splits1[] and concatenating it with their obtained root words from input1[] wherever the word contains ‘/verb’. STEP 7: Compare the obtained sentences in splits1[ ] with the desired sentences in splits3[ ] and calculate the efficiency. STEP 8: End of algorithm.
  • 15. PARTIAL VIEW OF INPUT FILE 15
  • 17. PARTIAL VIEW OF OUTPUT FILE 17
  • 18. EFFICIENCY: Dealing with 450 inflections of 14 selected root verbs, the proposed approach gives an efficiency of 99.36%. TIME COMPLEXITY: The time complexity of the proposed algorithm in worst case is O(n2). EFFICIENCY & TIME COMPLEXITY 18
  • 19. CONCLUSION AND FUTURE WORK In this project, we present a lightweight stemmer for 14 selected Bengali Verbs that strips the suffixes using a predefined suffix list, on a “longest match” basis, and then finds root on basis of some rules. Except a few cases, the result obtained from our algorithm is quite satisfactory according to our expectation. We argue that a stronger and populated learning set would invariably yield better result. In future , we plan to test our algorithm with more sets of Bengali verbs. 19
  • 20. 20