SlideShare a Scribd company logo
1 of 34
Download to read offline
February 2024: Top10 Cited Articles in
Natural Language Computing
International Journal on Natural Language
Computing (IJNLC)
https://airccse.org/journal/ijnlc/index.html
ISSN: 2278 - 1307 [Online]; 2319 - 4111 [Print]
Google Scholar
https://scholar.google.com/citations?user=A5tqIdoAAAAJ&hl=en
AN IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
Mohammed Al-Maolegi1 , Bassam Arkok2
Computer Science, Jordan University of Science and Technology, Irbid, Jordan
ABSTRACT
There are several mining algorithms of association rules. One of the most popular algorithms is
Apriori that is used to extract frequent itemsets from large database and getting the association
rule for discovering the knowledge. Based on this algorithm, this paper indicates the limitation of
the original Apriori algorithm of wasting time for scanning the whole database searching on the
frequent itemsets, and presents an improvement on Apriori by reducing that wasted time
depending on scanning only some transactions. The paper shows by experimental results with
several groups of transactions, and with several values of minimum support that applied on the
original Apriori and our implemented improved Apriori that our improved Apriori reduces the
time consumed by 67.38% in comparison with the original Apriori, and makes the Apriori
algorithm more efficient and less time consuming.
KEYWORDS
Apriori, Improved Apriori, Frequent itemset, Support, Candidate itemset, Time consuming.
Full Text: https://airccse.org/journal/ijnlc/papers/3114ijnlc03.pdf
Volume URL: https://airccse.org/journal/ijnlc/vol3.html
REFERENCES
[1] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng,
B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, “Top 10 algorithms in
data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, Dec. 2007.
[2] S. Rao, R. Gupta, “Implementing Improved Algorithm Over APRIORI Data Mining
Association Rule Algorithm”, International Journal of Computer Science And Technology, pp.
489-493, Mar. 2012
[3] H. H. O. Nasereddin, “Stream data mining,” International Journal of Web Applications, vol.
1, no. 4, pp. 183–190, 2009.
[4] F. Crespo and R. Weber, “A methodology for dynamic data mining based on fuzzy
clustering,” Fuzzy Sets and Systems, vol. 150, no. 2, pp. 267–284, Mar. 2005.
[5] R. Srikant, “Fast algorithms for mining association rules and sequential patterns,”
UNIVERSITY OF WISCONSIN, 1996.
[6] J. Han, M. Kamber,”Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers,
Book, 2000.
[7] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery
in databases,” AI magazine, vol. 17, no. 3, p. 37, 1996.
[8] F. H. AL-Zawaidah, Y. H. Jbara, and A. L. Marwan, “An Improved Algorithm for Mining
Association Rules in Large Databases,” Vol. 1, No. 7, 311-316, 2011
[9] T. C. Corporation, “Introduction to Data Miningand Knowledge Discovery”, Two Crows
Corporation, Book, 1999.
[10] R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items
in large databases,” in ACM SIGMOD Record, vol. 22, pp. 207–216, 1993
[11] M. Halkidi, “Quality assessment and uncertainty handling in data mining process,” in Proc,
EDBT Conference, Konstanz, Germany, 2000.
NAMED ENTITY RECOGNITION USING HIDDEN MARKOV MODEL (HMM)
Sudha Morwal 1 , Nusrat Jahan 2 and Deepti Chopra 3
1Associate Professor, Banasthali University, Jaipur, Rajasthan-302001
2 M.Tech (CS), Banasthali University, Jaipur, Rajasthan-302001
3 M. Tech (CS), Banasthali University, Jaipur, Rajasthan-302001
ABSTRACT:
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP)
which is the branch of artificial intelligence. It has many applications mainly in machine
translation, text to speech synthesis, natural language understanding, Information Extraction,
Information retrieval, question answering etc. The aim of NER is to classify words into some
predefined categories like location name, person name, organization name, date, time etc. In
this paper we describe the Hidden Markov Model (HMM) based approach of machine
learning in detail to identify the named entities. The main idea behind the use of HMM model
for building NER system is that it is language independent and we can apply this system for
any language domain. In our NER system the states are not fixed means it is of dynamic in
nature one can use it according to their interest. The corpus used by our NER system is also
not domain specific.
KEYWORDS
Named Entity Recognition (NER), Natural Language processing (NLP), Hidden Markov
Model (HMM).
Full Text: http://airccse.org/journal/ijnlc/papers/1412ijnlc02.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol1.html
REFERENCES
[1] Pramod Kumar Gupta, Sunita Arora “An Approach for Named Entity Recognition
System for Hindi: An Experimental Study” in Proceedings of ASCNT – 2009, CDAC, Noida,
India, pp. 103 – 108.
[2] Shilpi Srivastava, Mukund Sanglikar & D.C Kothari. ”Named Entity Recognition System
for Hindi Language: A Hybrid Approach” International Journal of Computational Linguistics
(IJCL), Volume(2):Issue(1):2011.Availableat:
http://cscjournals.org/csc/manuscript/Journals/IJCL/volume2/Issue1/IJCL-19.pdf
[3] “Padmaja Sharma, Utpal Sharma, Jugal Kalita”Named Entity Recognition: A Survey for
the Indian Languages”(Language in India www.languageinindia.com 11:5 May 2011 Special
Volume: Problems of Parsing in Indian Languages.) Available at:
http://www.languageinindia.com/may2011/padmajautpaljugal.pdf.
[4] Lawrence R. Rabiner, " A Tutorial on Hidden Markov Models and Selected Applications
in Speech Recognition", In Proceedings of the IEEE, VOL.77,NO.2, February
1989.Available at: http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf.
[5] Sujan Kumar Saha, Sudeshna Sarkar, Pabitra Mitra “Gazetteer Preparation for Named
Entity Recognition in Indian Languages” in the Proceeding of the 6th Workshop on Asian
Language Resources, 2008 . Available at: http://www.aclweb.org/anthology-new/I/I08/I08-
7002.pdf
[6] B. Sasidhar#1, P. M. Yohan*2, Dr. A. Vinaya Babu3, Dr. A. Govardhan4” A Survey on
Named Entity Recognition in Indian Languages with particular reference to Telugu” in IJCSI
International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011 available at :
http://www.ijcsi.org/papers/IJCSI-8-2-438-443.pdf.
[7] GuoDong Zhou Jian Su,” Named Entity Recognition using an HMM-based Chunk
Tagger” in Proceedings of the 40th Annual Meeting of the Association for Computational
Linguistics (ACL), Philadelphia, July 2002, pp. 473-480.
[8] http://en.wikipedia.org/wiki/Forward–backward_algorithm
[9] http://en.wikipedia.org/wiki/Baum-Welch_algorithm.
[10] Dan Shen, jie Zhang, Guodong Zhou,Jian Su, Chew-Lim Tan” Effective Adaptation of a
Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain” available
at: http://acl.ldc.upenn.edu/W/W03/W03-1307.pdf
SENTIMENT ANALYSIS FOR MODERN STANDARD ARABIC AND COLLOQUIAL
Hossam S. Ibrahim 1, Sherif M. Abdou2 and Mervat Gheith 1
1Computer Science Department, Institute of statistical studies and research (ISSR), Cairo
University, EGYPT
2 Information Technology Department, Faculty of Computers and information Cairo
University, EGYPT
ABSTRACT
The rise of social media such as blogs and social networks has fueled interest in sentiment
analysis. With the proliferation of reviews, ratings, recommendations and other forms of online
expression, online opinion has turned into a kind of virtual currency for businesses looking to
market their products, identify new opportunities and manage their reputations, therefore many
are now looking to the field of sentiment analysis. In this paper, we present a feature-based
sentence level approach for Arabic sentiment analysis. Our approach is using Arabic
idioms/saying phrases lexicon as a key importance for improving the detection of the sentiment
polarity in Arabic sentences as well as a number of novels and rich set of linguistically motivated
features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for
conflicting phrases which enhance the sentiment classification accuracy. Furthermore, we
introduce an automatic expandable wide coverage polarity lexicon of Arabic sentiment words.
The lexicon is built with gold-standard sentiment words as a seed which is manually collected
and annotated and it expands and detects the sentiment orientation automatically of new
sentiment words using synset aggregation technique and free online Arabic lexicons and
thesauruses. Our data focus on modern standard Arabic (MSA) and Egyptian dialectal Arabic
tweets and microblogs (hotel reservation, product reviews, etc.). The experimental results using
our resources and techniques with SVM classifier indicate high performance levels, with
accuracies of over 95%.
KEYWORDS
Sentiment Analysis, opinion mining, social network, sentiment lexicon, modern standard Arabic,
colloquial, natural language processing
Full Text: https://airccse.org/journal/ijnlc/papers/4215ijnlc07.pdf
Volume URL: https://airccse.org/journal/ijnlc/vol4.html
REFERENCES
[1] A. Shoukry and A. Rafea, "Sentence-level Arabic sentiment analysis," in Collaboration
Technologies and Systems (CTS) International Conference, Denver, CO, USA, 2012, pp. 546-
550.
[2] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment classification using machine
learning techniques," in Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), 2002, pp. 79–86.
[3] D. Davidiv, O. Tsur, and A. Rappoport, "Enhanced Sentiment Learning Using Twitter Hash-
tags and Smileys," in Proceedings of the 23rd International Conference on Computational
Linguistics (Coling2010), Beijing, China, 2010, pp. 241–249.
[4] L. Barbosa and J. Feng, "Robust Sentiment Detection on Twitter from Biased and Noisy Data
" in Proceedings of the 23rd International Conference on Computational Linguistics (Coling),
2010.
[5] P. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised
Classification of Reviews," in Proceedings of the 40th Annual Meeting on Association for
Computational Linguistics ACL '02, Stroudsburg, PA, USA, 2002, pp. 417-424.
[6] V. Hatzivassiloglou and K. R. McKeown, "Predicting the semantic orientation of adjectives,"
in Proceedings of the Joint ACL / EACL Conference, 1997, pp. 174–181.
[7] B. Pang and L. Lee, "Opinion mining and sentiment analysis," Foundations and Trends in
Information Retrieval vol. 2, pp. 1–135, 2008.
[8] M. Hu and B. Liu, "Mining and summarizing customer reviews " in Proceedings of the ACM
SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2004, pp. 168–177.
[9] B. Liu, "Sentiment Analysis and Subjectivity," in Handbook of Natural Language Processing,
Second ed: CRC Press, Taylor and Francis Group, 2010.
[10] P. Alexander and P. Patrick, "Twitter as a Corpus for Sentiment Analysis and Opinion
Mining " in Proceedings of the Seventh conference on International Language Resources and
Evaluation (LREC'10), European Language Resources Association ELRA, Valletta, Malta, 2010.
[11] C. Scheible and H. Schütze, "Bootstrapping Sentiment Labels For Unannotated Documents
With Polarity PageRank," in Proceedings of the Eight International Conference on Language
Resources and Evaluation (LREC 2012), Istambol-Turki, 2012.
[12] C. Manning and D. Klein, "Optimization, maxent models, and conditional estimation
without magic," in Proceedings of the 2003 Conference of the North American Chapter of the
Association for Computational Linguistics on Human Language Technology, 2003, p. 8.
[13] A. Abbasi, H. Chen, and A. Salem, "Sentiment Analysis in Multiple Languages: Feature
Selection for Opinion Classification in Web Forums," ACM Transactions on Information
Systems, vol. 26, 2008.
[14] E. Riloff and J. Wiebe, "Learning extraction patterns for subjective expressions," in
Proceedings of the Conference on Empirical Methods in Natural Language Processing
(EMNLP), 2003.
[15] E. Riloff, J. Wiebe, and T. Wilson, "Learning subjective nouns using extraction pattern
bootstrapping," in Proceedings of the Conference on Natural Language Learning (CoNLL),
2003, pp. 25–32.
[16] M. Abdul-Mageed and M. Diab, "Subjectivity and Sentiment Annotation of Modern
Standard Arabic Newswire," in Proceedings of the Fifth Law Workshop (LAW V), Association
for Computational Linguistics, Portland, Oregon, 2011, pp. 110–118.
[17] M. Abdul-Mageed, M. Diab, and M. Korayem, "Subjectivity and sentiment analysis of
modern standard Arabic," in Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics, 2011.
[18] M. Abdul-Mageed, K. Sandra, and M. Diab, "SAMAR: A System for Subjectivity and
Sentiment Analysis of Arabic Social Media," in Proceedings of the 3rd Workshop on
Computational Approaches to Subjectivity and Sentiment Analysis, Jeju,Republic of Korea,
2012, pp. 19–28.
[19] A. Mourad and K. Darwish, "Subjectivity and Sentiment Analysis of Modern Standard
Arabic and Arabic Microblogs," in Proceedings of the 4th Workshop on Computational
Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), Atlanta, Georgia,
2013, pp. 55–64.
[20] M. Korayem, D. Crandall, and M. Abdul-Mageed, "Subjectivity and Sentiment Analysis of
Arabic: A Survey," in Advanced Machine Learning Technologies and Applications,
Communications in Computer and Information Science series 322, (Springer), AMLTA, 2012,
pp. 128-139.
[21]M. Abdul-Mageed and M. Diab, "AWATIF: A multi-genre corpus for Arabic subjectivity
and sentiment analysis," in Proceedings of the 8th International Conference on Language
Resources and Evaluation (LREC), Istanbul, Turkey, 2012a.
[22] M. Rushdi-Saleh, M. Mart´ın-Valdivia, L. Ure˜na-L´opez, and J. Perea-Ortega, "Oca:
Opinion corpus for Arabic," Journal of the American Society for Information Science and
Technology, vol. 62, pp. 2045–2054, 2011.
[23] M. Elarnaoty, S. AbdelRahman, and A. Fahmy, "A Machine Learning Approach for
Opinion Holder Extraction Arabic Language," CoRR, abs/1206.1011, vol. 3, 2012.
[24] M. Abdul-Mageed and M. Diab, "SANA: A Large Scale Multi-Genre, Multi-Dialect
Lexicon for Arabic Subjectivity and Sentiment Analysis," in Proceedings of The 9th edition of
the Language Resources and Evaluation Conference (LREC ), Reykjavik, Iceland, 2014.
[25] E. Refaee and V. Rieser, "An Arabic Twitter Corpus for Subjectivity and Sentiment
Analysis," in Proceedings of The 9th edition of the Language Resources and Evaluation
Conference (LREC 2014), Reykjavik, Iceland, 2014.
[26] M. Elmahdy, G. Rainer, M. Wolfgang, and A. Slim, "Survey on common Arabic language
forms from a speech recognition point of view," in proceeding of International conference on
Acoustics (NAG-DAGA), Rotterdam, Netherlands, 2009, pp. 63-66.
[27] J. C. Carletta, "Assessing agreement on classification tasks: the KAPPA statistic "
Computational Linguistics, vol. 22, pp. 249- 254, 1996.
[28] B. Liu, Sentiment Analysis and Opinion Mining Morgan &Claypool Publishers, 2012.
:sayings Colloquial [‫ا‬B ‫العال‬ ‫مثال‬
‫ا‬ ‫الحرف‬ ‫حسب‬ ‫ومرتبة‬ ‫مشروحة‬ :‫مية‬ B‫موضوعى‬ ‫كشاف‬ ‫مع‬ ‫المثل‬ ‫من‬ ‫ول‬ ,Basha.
[29] A an annotated and arranged by the first letter of ideals with the Scout TOPICAL]. Egypt:
Al-Ahram Foundation - Al-Ahram Center for Translation and Publishing, 1986.
[30] A. Saalan, ‫مثال‬ ‫الشعبية‬ ‫المصرية‬B‫موسوعة‬ ‫]ا‬ Encyclopedia of Egyptian popular sayings], First ed.
Egypt: Dar-alafkalarabia press, 2003. Egyptian, sayings Colloquial [ ,‫الشعبية‬ ‫القصص‬ ,‫العربية‬ ‫النوادر‬
‫ا‬B‫المصرى‬ ‫الفولكلور‬ ,‫العامية‬ ‫مثال‬ ,Husain. F
[31]F. Husain folklore]. Egypt: General Egyptian Book Organization GEBO, 1984.
[32] G. Taher. (2006). ‫دراسة‬ ‫علمية‬
-
‫مثال‬ ‫الشعبية‬ P‫موسوعة‬ ‫]ا‬ Encyclopedia of public sayings - a
scientific study]. Available: http://books.google.com.eg/books?id=2CR_EKTjxRgC
[33] PROz. (2014). PROz website for Arabic Idioms/Maxims/Sayings (Jan 2014). Available:
http://www.proz.com/glossary-translations/
[34] M. Diab, "Towards an optimal POS tag set for Modern Standard Arabic processing," in
Proceedings of Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria,
2007.
[35] O. F. Zaidan and C. Callison-Burch, "Arabic dialect identification," Computational
Linguistics, vol. 40, pp. 171-202, March 2014 2012.
[36] H. S. Ibrahim, S. M. Abdou, and M. Gheith, "Automatic expandable large-scale sentiment
lexicon of Modern Standard Arabic and Colloquial," in 16th International Conference on
Intelligent Text Processing and Computational Linguistics (CICLING), Cairo - Egypt, 2015.
[37] M. Sharifi and W. Cohen. (2008, May, 2014). “Finding domain specifc polar words for
sentiment classification. Available: http://www.cs.cmu.edu/~mehrbod/polarity_08.pdf
[38] J. YI, T. NASUKAWA, R. BUNESCU, and W. NIBLACK, "Sentiment analyzer:
Extracting sentiments about a given topic using natural language processing techniques " in
Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), 2003, pp. 427–
434.
[39] Z. Fei, J. LIU, and G. WU, "Sentiment classification using phrase patterns," in Proceedings
of the 4th IEEE International Conference on Computer Information Technology, 2004, pp.
1147–1152.
[40] T. Joachims. (2008, Jan-2013). SVM-light: Support vector machine. Available:
http://svmlight.joachims.org/
SURVEY OF MACHINE TRANSLATION SYSTEMS IN INDIA
G V Garje1 and G K Kharate2
1Department of Computer Engineering and Information Technology PVG’s College of
Engineering and Technology, Pune, India
2 Principal, Matoshri College of Engineering and Research Centre, Nashik, India
ABSTRACT
The work in the area of machine translation has been going on for last few decades but the
promising translation work began in the early 1990s due to advanced research in Artificial
Intelligence and Computational Linguistics. India is a multilingual and multicultural country
with over 1.25 billion population and 22 constitutionally recognized languages which are
written in 12 different scripts. This necessitates the automated machine translation system for
English to Indian languages and among Indian languages so as to exchange the information
amongst people in their local language. Many usable machine translation systems have been
developed and are under development in India and around the world. The paper focuses on
different approaches used in the development of Machine Translation Systems and also
briefly described some of the Machine Translation Systems along with their features,
domains and limitations.
KEYWORDS
Machine Translation, Example-based MT, Transfer-based MT, Interlingua-based MT
Full Text: http://airccse.org/journal/ijnlc/papers/2513ijnlc04.pdf
Volume URL: http://airccse.org/journal/ijnlc/vol2.html
REFERENCES
[1] Sitender & Seema Bawa, (2012) “Survey of Indian Machine Translation Systems”,
International Journal Computer Science and Technolgy, Vol. 3, Issue 1, pp. 286-290, ISSN :
0976-8491 (Online) | ISSN : 2229-4333 (Print)
[2] Sanjay Kumar Dwivedi & Pramod Premdas Sukhadeve, (2010) “Machine Translation
System in Indian Perspectives”, Journal of Computer Science 6 (10): 1082-1087, ISSN 1549-
3636, © 2010 Science
[3] John Hutchins, (2005) “Current commercial machine translation systems and computer-
based translation tools: system types and their uses”, International Journal of Translation
vol.17, no.1-2, pp.5-38.
[4] Vishal Goyal & Gurpreet Singh Lehal, (2009) “Advances in Machine Translation
Systems”, National Open Access Journal, Volume 9, ISSN 1930-2940
http://www.languageinindia.
[5] Latha R. Nair & David Peter S., (2012) “Machine Translation Systems for Indian
Languages”, International Journal of Computer Applications (0975 – 8887) Volume 39–
No.1
[6] Vishal Goyal & Gurpreet Singh Lehal, (2010) “Web Based Hindi to Punjabi Machine
Translation System”, International Journal of Emerging Technologies in Web Intelligence,
Vol. 2, no. 2, pp. 148-151, ACADEMY PUBLISHER
[7] Shachi Dave, Jignashu Parikh & Pushpak Bhattacharyya, (2002) “Interlingua-based
English-Hindi Machine Translation and Language Divergence”, Journal of Machine
Translation, pp. 251-304.
[8] Sudip Naskar & Shivaji Bandyopadhyay, (2005) “Use of Machine Translation in India:
Current status” AAMT Journal, pp. 25-31.
[9] Sneha Tripathi & Juran Krishna Sarkhel, (2010) “Approaches to Machine Translation”,
International journal of Annals of Library and Information Studies, Vol. 57, pp. 388-393
[10] Gurpreet Singh Josan & Jagroop Kaur, (2011) “Punjabi To Hindi Statistical Machine
Transliteration”, International Journal of Information Technology and Knowledge
Management , Volume 4, No. 2, pp. 459-463.
[11] S. Bandyopadhyay, (2004) "ANUBAAD - The Translator from English to Indian
Languages", in proceedings of the VIIth State Science and Technology Congress. Calcutta.
India. pp. 43-51
[12] R.M.K. Sinha & A. Jain, (2002) “AnglaHindi: An English to Hindi Machine-Aided
Translation System”, International Conference AMTA(Association of Machine Translation
in the Americas)
[13] Murthy. K, (2002) “MAT: A Machine Assisted Translation System”, In Proceedings of
Symposium on Translation Support System( STRANS-2002), IIT Kanpur. pp. 134-139.
[14] Lata Gore & Nishigandha Patil, (2002) “English to Hindi - Translation System”, In
proceedings of Symposium on Translation Support Systems. IIT Kanpur. pp. 178-184.
[15] Kommaluri Vijayanand, Sirajul Islam Choudhury & Pranab Ratna
“VAASAANUBAADA - Automatic Machine Translation of Bilingual Bengali-Assamese
News Texts”, in proceedings of Language Engineering Conference-2002, Hyderabad, India
© IEEE Computer Society.
[16] Bharati, R. Moona, P. Reddy, B. Sankar, D.M. Sharma & R. Sangal, (2003) “Machine
Translation: The Shakti Approach”, Pre-Conference Tutorial, ICON-2003.
[17] S. Mohanty & R. C. Balabantaray, (2004) “English to Oriya Translation System
(OMTrans)” cs.pitt.edu/chang/cpol/c087.pdf
[18] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah,
Sawani Bade & Sasikumar M., (2006) “MaTra: A Practical Approach to Fully- Automatic
Indicative EnglishHindi Machine Translation”, In the proceedings of MSPIL-06.
[19] G. S. Josan & G. S. Lehal, (2008) “A Punjabi to Hindi Machine Translation System”, in
proceedings of COLING-2008: Companion volume: Posters and Demonstrations,
Manchester, UK, pp. 157-160.
[20] Sanjay Chatterji, Devshri Roy, Sudeshna Sarkar & Anupam Basu, (2009) “A Hybrid
Approach for Bengali to Hindi Machine Translation”, In proceedings of ICON-2009, 7th
International Conference on Natural Language Processing, pp. 83-91.
[21] Vishal Goyal & Gurpreet Singh Lehal, (2011) “Hindi to Punjabi Machine Translation
System”, in proceedings of the ACL-HLT 2011 System Demonstrations, pages 1–6, Portland,
Oregon, USA, 21 June 2011.
[22] Ankit Kumar Srivastava, Rejwanul Haque, Sudip Kumar Naskar & Andy Way, (2008)
“The MATREX (Machine Translation using Example): The DCU Machine Translation
System for ICON 2008”, in Proceedings of ICON-2008: 6th International Conference on
Natural Language Processing, Macmillan Publishers, India,
http://ltrc.iiit.ac.in/proceedings/ICON-2008.
[23] hutchinsweb.me.uk/Nutshell-2005.pdf
[24] John Hutchins “Historical survey of machine translation in Eastern and Central Europe”,
Based on an unpublished presentation at the conference on Crosslingual Language
Technology in service of an integrated multilingual Europe, 4-5 May 2012, Hamburg,
Germany. (www.hutchinsweb.me.uk/Hamburg-2012.pdf)
[25] Sampark: Machine Translation System among Indian languages (2009)
http://tdildc.in/index.php?option=com_vertical&parentid=74, http://sampark.iiit.ac.in/
[26] Akshar Bharti, Chaitanya Vineet, Amba P. Kulkarni & Rajiv Sangal, (1997)
”ANUSAARAKA: Machine Translation in stages’, Vivek, a quarterly in Artificial
Intelligence, Vol. 10, No. 3, NCST Mumbai, pp. 22-25
[27] Akshar Bharti, Chaitanya Vineet, Amba P. Kulkarni & Rajiv Sangal, (2001)
”ANUSAARAKA: overcoming the language barrier in India”, published in Anuvad:
approaches to Translation
[28] Hemant Darabari, (1999) “Computer Assisted Translation System- An Indian
Perspective”, in proceedings of MT Summit VII, Thialand [29] R. Mahesh K. Sinha & Anil
Thakur, (2005) “Machine Translation of Bi-lingual Hindi-English (Hinglish) Text”, in
proceedings of 10th Machine Translation Summit organized by Asia-Pacific Association for
Machine Translation (AAMT), Phuket, Thailand
[30] Parameswari K, Sreenivasulu N.V., Uma Maheshwar Rao G & Christopher M, (2012)
“Development of Telugu-Tamil Bidirectional Machine Translation System: A special focus
on case divergence”, in proceedings of 11th International Tamil Internet conference, pp 180-
191
[31] Salil Badodekar, (2004) “Translation Resources, Services and Tools for Indian
Languages”, a report of Centre for Indian Language Technology, IITB,
http://www.cfilt.iitb.ac.in/Translationsurvey/survey.pdf
[32] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah,
Sawani Bade & Sasikumar M, (2006) “MaTra: A Practical Approach to Fully-Automatic
Indicative EnglishHindi Machine Translation”, in proceedings of the first national
symposium on Modelling and shallow parsing of Indian languages (MSPIL-06) organized by
IIT Bambay, 202.141.152.9/clir/papers/matra_mspil06.pdf
[33] CDAC Mumbai, (2008) “MaTra: an English to Hindi Machine Translation System”, a
report by CDAC Mumbai formerly NCST.
[34] Sanjay Chatterji, Praveen Sonare, Sudeshna Sarkar & Anupam Basu, (2011) “Lattice
Based Lexical Transfer in Bengali Hindi Machine Translation Framework”, in Proceedings
of ICON2011: 9th International Conference on Natural Language Processing, Macmillan
Publishers, India. Also accessible from ltrc.iiit.ac.in/proceedings/ICON-2011.
[35] R. Ananthakrishnan, Jayprasad Hegde, Pushpak Bhattacharyya, Ritesh Shah & M.
Sasikumar, (2008) “Simple Syntactic and Morphological Processing Can Help English-Hindi
Statistical Machine Translation”, in proceedings of International Joint Conference on NLP
(IJCNLP08), Hyderabad, India.
[36] Yanjun Ma, John Tinsley, Hany Hassan, Jinhua Du & Andy Way, (2008) “Exploiting
Alignment Techniques in MATREX: the DCU Machine Translation System for IWSLT
2008’, in proceedings of IWSLT 2008, Hawaii, USA
[37] projects.uptuwatch.com/cs-it/anubharti-an-hybrid-example-based-approach-for-
machine-aidedtrapnslation/
[38] Sugata Sanyal & Rajdeep Borgohain, (2013) “Machine Translation Systems in India”,
Cornel University Library, arxiv.org/ftp/arxiv/papers/1304/1304.7728.pdf [39] Antony P. J.,
(2013) “Machine Translation Approaches and Survey for Indian Languages”, International
journal of Computational Linguistics and Chinese Language Processing Vol. 18, No. 1, pp.
47-78.
[40] Manoj Jain & Om P. Damani, (2009) “English to UNL (Interlingua) Enconversion”, in
proceedings of 4th Language and Translation Conference (LTC-09).
[41] Smriti Singh, Mrugank Dalal, Vishal Vachhani, Pushpak Bhattacharyya & Om P.
Damani, (2007) “Hindi Generation from Interlingua (UNL)”, in proceedings of MT Summit,
2007
[42] language.worldofcomputing.net
[43] sampark.iiit.ac.in [44] www.cdacmumbai.in/xlit [
45] www.cdacmumbai.in/rupantar
[46] translationjournal.net/journal/29computers.htm
[47] www.cfilt.iitb.ac.in/resources/surveys/MT-Literature%20Survey-2012-Somya.pdf
[48] www.cdacmumbai.in/e-ilmt
[49] www.iiit.net/ltrc/Anusaaraka/anu_home.html
[50] cdac.in/html/aai/mantra.asp
[51] translate.google.com/about/intl/en_ALL/
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
Deepti Bhalla1 , Nisheeth Joshi2 and Iti Mathur3
1,2,3 Apaji Institute, Banasthali University, Rajasthan, India
ABSTRACT
Machine Transliteration has come out to be an emerging and a very important research area in
the field of machine translation. Transliteration basically aims to preserve the phonological
structure of words. Proper transliteration of name entities plays a very significant role in
improving the quality of machine translation. In this paper we are doing machine transliteration
for English-Punjabi language pair using rule based approach. We have constructed some rules
for syllabification. Syllabification is the process to extract or separate the syllable from the
words. In this we are calculating the probabilities for name entities (Proper names and location).
For those words which do not come under the category of name entities, separate probabilities
are being calculated by using relative frequency through a statistical machine translation toolkit
known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
KEYWORDS
Machine Translation, Machine Transliteration, Name entity recognition, Syllabification.
Full Text: https://airccse.org/journal/ijnlc/papers/2213ijnlc07.pdf
Volume URL: https://airccse.org/journal/ijnlc/vol2.html
REFERENCES
[1] Kamal Deep and Vishal Goyal, (2011) ”Development of a Punjabi to English transliteration
system”. In International Journal of Computer Science and Communication Vol. 2, No. 2, pp.
521-526.
[2] Shubhangi Sharma, Neha Bora and Mitali Halder, (2012) “English-Hindi Transliteration
using Statistical Machine Translation in different Notation” International Conference on
Computing and Control Engineering (ICCCE 2012).
[3] Kamal Deep, Dr.Vishal Goyal, (2011) “Hybrid Approach for Punjabi to English
Transliteration System” International Journal of Computer Applications (0975 – 8887) Volume
28– No.1.
[4] Jasleen kaur Gurpreet Singh josan , (2011) “Statistical Approach to Transliteration from
English to Punjabi”, In Proceeding of International Journal on Computer Science and
Engineering (IJCSE), Vol. 3 Issue 4, p1518.
[5] Er. Sheilly Padda, Rupinderdeep Kaur, Er. Nidhi, (2012) “Punjabi Phonetic: Punjabi Text to
IPA Conversion” International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com ISSN 2250-2459, Volume 2, Issue 10.
[6] Gurpreet Singh Josan, Gurpreet Singh Lehal, (2010) “A Punjabi to Hindi Machine
Transliteration System” Computational Linguistics and Chinese Language Processing Vol. 15,
No. 2, pp. 77-102.
[7] Manikrao L Dhore, Shantanu K Dixit, Tushar D Sonwalkar, (2012) “Hindi to English
Machine Transliteration of Named Entities using Conditional Random Fields.” International
Journal of Computer Applications;6/15/2012, Vol. 48, p31.
[8] Musa, Hafiz, Rabith A.kadir, Azreen Azman, M.taufik Abadullah, (2011) "Syllabification
algorithm based on syllable rules matching for Malay language." Proceedings of the 10th
WSEAS international conference on Applied computer and applied computational science.
World Scientific and Engineering Academy and Society (WSEAS).
[9] To download IRSTLM toolkit http://www.statmt.org
[10] Jenny Rose Finkel, Trond Grenager, and Christopher Manning, (2005) Incorporating Non-
local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the
43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363-
370.
[11] Daniel Jurafsky, James H. Martin Speech and Language processing An Introduction to
speech Recognition, natural language processing, and computational linguistics.
HYBRID PART-OF-SPEECH TAGGER FOR NON-VOCALIZED ARABIC TEXT
Meryeme Hadni1 , Said Alaoui Ouatik1 , Abdelmonaime Lachkar2 and Mohammed
Meknassi1
1FSDM, Sidi Mohamed Ben Abdellah University (USMBA), Morocco
2E.N.S.A, Sidi Mohamed Ben Abdellah University (USMBA), Morocco
ABSTRACT
Part of speech tagging (POS tagging) has a crucial role in different fields of natural language
processing (NLP) including Speech Recognition, Natural Language Parsing, Information
Retrieval and Multi Words Term Extraction. This paper proposes an efficient and accurate POS
Tagging technique for Arabic language using hybrid approach. Due to the ambiguity issue,
Arabic Rule-Based method suffers from misclassified and unanalyzed words. To overcome these
two problems, we propose a Hidden Markov Model (HMM) integrated with Arabic Rule-Based
method. Our POS tagger generates a set of three POS tags: Noun, Verb, and Particle. The
proposed technique uses the different contextual information of the words with a variety of the
features which are helpful to predict the various POS classes. To evaluate its accuracy, the
proposed method has been trained and tested with two corpora: the Holy Quran Corpus and
Kalimat Corpus for undiacritized Classical Arabic language. The experiment results demonstrate
the efficiency of our method for Arabic POS Tagging. In fact, the obtained accuracies rates are
97.6%, 96.8% and 94.4% for respectively our Hybrid Tagger, HMM Tagger and for the Rule-
Based Tagger with Holy Quran Corpus. And for Kalimat Corpus we obtained 94.60%, 97.40%
and 98% for respectively Rule-Based Tagger, HMM Tagger and our Hybrid Tagger.
KEY WORDS
Part-Of-Speech Tagger, Natural Language Applications, Natural Language Parsing, Hidden
Markov Model, Multi Words Term Extraction, Speech Recognition.
Full Text: https://airccse.org/journal/ijnlc/papers/2613ijnlc01.pdf
Volume URL: https://airccse.org/journal/ijnlc/vol2.html
REFERENCE
[1] Lee, S.hyun. & Kim Mi Na, (2008) “This is my paper”, ABC Transactions on ECE, Vol. 10,
No. 5, pp120-122.
[2] Gizem, Aksahya & Ayese, Ozcan (2009) Comunications & Networks, Network Books, ABC
Publishers. [1] http://en.wikipedia.org/wiki/Part-of-speech_tagging. [2] L.Van Guilder, (1995)
“Automated Part of Speech Tagging: A Brief Overview” Handout for LING361, Georgetown
University.
[3] H. Halteren, J.Zavrel & Walter Daelemans (2001).Improving Accuracy in NLP Through
Combination of Machine Learning Systems. Computational Linguistics. 27(2): 199–229.
[4] DeRose & J.Steven (1990) "Stochastic Methods for Resolution of Grammatical Category
Ambiguity in Inflected and Uninflected Languages." PhD.Dissertation. Providence, RI: Brown
University Department of Cognitive and Linguistic Sciences.
[5] N. kumar Kumar, Anikel Dalal &Uma Sawant (2006)”hindi part of speech tagging and
chunking”, NLPAI machine learning contest.
[6] M. Mohseni, H. Motalebi, B. Minaei-bidgoli & M. Shokrollahi-far (2008) “A farsi part-of-
speech tagger based on markov”. In the proceedings of ACM symposium on Applied computing,
Brazil.
[7] S. Jabbari &B. Allison(2007)“Persian Part of Speech Tagging”, In the Proceedings of
Workshop on Computational Approaches to Arabic Script-Based Languages (CAASL-2), USA.
[8] E. Brill (1995) “Transformation-Based Error-Driven Learning and Natural Language
Processing: A case Study in Part of Speech Tagging”, Computational Linguistics, USA.
[9] M. Hepple (2000), ”Independence and Commitment: Assumptions for Rapid Training and
Execution of Rule-based Part of-Speech Taggers”, In Proceedings of the 38th Annual Meeting of
the Association for Computational Linguistics (ACL). Hong Kong.
[10] T. Brants (200),“TNT – a Statistical Part-of-Speech Tagger”, In the Proceedings of 6th
conference on applied natural language processing (ANLP), USA.
[11] K. Megerdoomian (2004), “Developing a Persian part-of speech tagger”, In the Proceedings
of first Workshop on Persian Language and computer, Iran .
[12] Khoja, S.( 2001) “ APT: Arabic part-of-speech tagger”. Proceeding of the Student
Workshop at the 2nd Meeting of the NAACL, (NAACL’01), Carnegie Mellon University,
Pennsylvania, pp: 1- 6. http://zeus.cs.pacificu.edu/shereen/NAACL.pdf
[13] Freeman A (2001), “Brill’s POS tagger and a morphology parser for Arabic”, In ACL’01
Workshop on Arabic language processing.
[14] Maamouri M, Cieri C. (2002). “Resources for Arabic Natural Language Processing at the
LDC”, Proceedings of the International Symposium on the Processing of Arabic,Tunisia, pp.125-
146.
[15] Diab M., Hacioglu K. and Jurafsky D. (2004), “Automatic Tagging of Arabic Text: From
Raw Text to Base Phrase Chunks”. proc. of HLTNAACL’04: 149–152.
[16] Banko M, Moore R. C. (2004). “Part of Speech Tagging in Context”, Proc of the 20th
international conference on Computational Linguistics, Switzerland.
[17] Tlili-Guiassa Y. (2006) “Hybrid Method for Tagging Arabic Text”. Journal of Computer
Science 2 (3): 245-248.
[18] L. Young-Suk, K. Papineni & S. Roukos ( 2003), “Language Model Based Arabic Word
Segmentation,” in Proceedings of the Annual Meeting on Association for Computational
Linguistics, Japan, pp. 399- 406.
[19] A.T Al-Taani & S. Abu-Al-Rub (2009),”A rule-based approaches for tagging non-vocalized
Arabic words”. The International Arab Journal of Information Technology, Volume6 (3): 320-
328.
[20] T. Brants (2000),” TnT: A statistical part of speech tagger”, Proceedings of the 6th
Conference on Applied Natural Language Processing, Apr. 29- May 04, Association for
Computational Linguistics Morristown, New Jersey, USA., pp: 224-231.
[21] NLTK, Natural Language Toolkit. http://www.nltk.org/Home
[22] Quranic Arabic Corpus: http://corpus.quran.com
[23] Quran Tagset: http://corpus.quran.com/documentation/tagset.jsp
[24] N. Habash & O. Rambow (2005), “Arabic Tokenization, Part-of-Speech Tagging and
Morphological Disambiguation in One Fell Swoop,” in Proceedings of the Annual Meeting on
Association for Computational Linguistics, Michigan, pp. 573-580.
[25] http://sibawayh.emi.ac.ma/web/s/?q=node/79
[26] http://bit.ly/16jO3Ks
[27] http://www.alwatan.com/
[28] F. Al Shamsi & A.Guessoum(2006),” A Hidden Markov Model–Based POS Tagger for
Arabic”, 8es Journées internationales d’Analyse statistique des Données Textuelles (JADT).
[29] M. Albared & O.Nazlia(2010),” Automatic Part of Speech Tagging for Arabic: An
Experiment Using Bigram Hidden Markov Model “,Springer-Verlag Berlin Heidelberg, LNAI
6401, pp. 361– 370.
[30] Y.O. Mohamed Elhadj(2009),” Statistical Part-of-Speech Tagger for Traditional Arabic
Texts”, Journal of Computer Science 5 (11): 794-800.
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
P H Rathod1 , M L Dhore2 , R M Dhore3
1,2Department of Computer Engineering, Vishwakarma Institute of Technology, Pune
3 Pune Vidhyarthi Griha’s College of Engineering and Technology, Pune
ABSTRACT
Language transliteration is one of the important areas in NLP. Transliteration is very useful for
converting the named entities (NEs) written in one script to another script in NLP applications
like Cross Lingual Information Retrieval (CLIR), Multilingual Voice Chat Applications and Real
Time Machine Translation (MT). The most important requirement of Transliteration system is to
preserve the phonetic properties of source language after the transliteration in target language. In
this paper, we have proposed the named entity transliteration for Hindi to English and Marathi to
English language pairs using Support Vector Machine (SVM). In the proposed approach, the
source named entity is segmented into transliteration units; hence transliteration problem can be
viewed as sequence labeling problem. The classification of phonetic units is done by using the
polynomial kernel function of Support Vector Machine (SVM). Proposed approach uses phonetic
of the source language and n-gram as two features for transliteration.
KEYWORDS
Machine Transliteration, n-gram, Support Vector Machine, Syllabification
Full Text: https://airccse.org/journal/ijnlc/papers/2413ijnlc04.pdf
Volume URL: https://airccse.org/journal/ijnlc/vol2.html
REFERENCES
[1] Padariya Nilesh, Chinnakotla Manoj, Nagesh Ajay, Damani Om P.(2008) “Evaluation of
Hindi to English, Marathi to English and English to Hindi”, IIT Mumbai CLIR at FIRE.
[2] Saha Sujan Kumar, Ghosh P. S, Sarkar Sudeshna and Mitra Pabitra (2008) “Named entity
recognition in Hindi using maximum entropy and transliteration.” [3] BIS (1991) “Indian
standard code for information interchange (ISCII)”, Bureau of Indian Standards, New Delhi.
[4] Joshi R K, Shroff Keyur and Mudur S P (2003) “A Phonemic code based scheme for
effective processing of Indian languages”, National Centre for Software Technology, Mumbai,
23rd Internationalization and Unicode Conference, Prague, Czech Republic, pp 1-17.
[5] Arbabi M, Fischthal S M, Cheng V C and Bart E (1994) “Algorithms for Arabic name
transliteration”, IBM Journal of Research and Development, pp 183-194.
[6] Knight Kevin and Graehl Jonathan (1997) “Machine transliteration”, In proceedings of the
35th annual meetings of the Association for Computational Linguistics, pp 128-135.
[7] Stalls Bonnie Glover and Kevin Knight (1998) “Translating names and technical terms in
Arabic text.”
[8] Al-Onaizan Y, Knight K (2002) “Machine translation of names in Arabic text”, Proceedings
of the ACL conference workshop on computational approaches to Semitic languages.
[9] Jaleel Nasreen Abdul and Larkey Leah S. (2003) “Statistical transliteration for English-
Arabic cross language information retrieval”, In Proceedings of the 12th international conference
on information and knowledge management, pp 139 – 146.
[10] Jung S. Y., Hong S., S., Paek E.(2003) “English to Korean transliteration model of extended
Markov window”, In Proceedings of the 18th Conference on Computational Linguistics, pp 383–
389.
[11] Ganapathiraju M., Balakrishnan M., Balakrishnan N., Reddy R. (2005) “OM: One Tool for
Many (Indian) Languages”, ICUDL: International Conference on Universal Digital Library,
Hangzhou.
[12] Malik M G A (2006) “Punjabi Machine Transliteration”, Proceedings of the 21st
International Conference on Computational Linguistics and the 44th annual meeting of the ACL,
pp 1137–1144.
[13] Sproat R.(2002) “Brahmi scripts, In Constraints on Spelling Changes”, Fifth International
Workshop on Writing Systems, Nijmegen, The Netherlands.
[14] Sproat R.(2003) “A formal computational analysis of Indic scripts”, In International
Symposium on Indic Scripts: Past and Future, Tokyo.
[15] Sproat R.(2004) “A computational theory of writing systems, In Constraints on Spelling
Changes”, Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands.
[16] Kopytonenko M. , Lyytinen K. , and Krkkinen T.(2006) “Comparison of phonological
representations for the grapheme-to-phoneme mapping, In Constraints on Spelling Changes”,
Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands.
[17] Ganesh S, Harsha S, Pingali P, and Verma V (2008) “Statistical transliteration for cross
language information retrieval using HMM alignment and CRF”, In Proceedings of the
Workshop on CLIA, Addressing the Needs of Multilingual Societies.
[18] Sumaja Sasidharan, Loganathan R, and Soman K P (2009) “English to Malayalam
Transliteration Using Sequence Labeling Approach” International Journal of Recent Trends in
Engineering, Vol. 1, No. 2, pp 170-172
[19] Oh Jong-Hoon, Kiyotaka Uchimoto, and Kentaro Torisawa (2009) “Machine transliteration
using target-language grapheme and phoneme: Multi-engine transliteration approach”,
Proceedings of the Named Entities Workshop ACL-IJCNLP Suntec, Singapore,AFNLP, pp 36–
39
[20] Antony P.J, Soman K.P (2010) “Kernel Method for English to Kannada Transliteration”,
Conference on Machine Learning and Cybernetics, pp 11-14
[21] Ekbal A. and Bandyopadhyay S. (2007) “A Hidden Markov Model based named entity
recognition system: Bengali and Hindi as case studies”, Proceedings of 2nd International
conference in Pattern Recognition and Machine Intelligence, Kolkata, India, pp 545–552.
[22] Ekbal A. and Bandyopadhyay S. (2008) “Bengali named entity recognition using support
vector machine”, In Proceedings of the IJCNLP-08 Workshop on NER for South and South East
Asian languages, Hyderabad, India, pp 51–58.
[23] Ekbal A. and Bandyopadhyay S. (2008), “Development of Bengali named entity tagged
corpus and its use in NER system”, In Proceedings of the 6th Workshop on Asian Language
Resources.
[24] Ekbal A. and Bandyopadhyay S. (2008) “A web-based Bengali news corpus for named
entity recognition”, Language Resources & Evaluation, vol. 42, pp 173–182.
[25] Ekbal A. and Bandyopadhyay S.(2008) “Improving the performance of a NER system by
postprocessing and voting”, In Proceedings of Joint IAPR International Workshop on Structural
Syntactic and Statistical Pattern Recognition, Orlando, Florida, pp 831–841.
[26] Ekbal A. and Bandyopadhyay S.(2009) “Bengali Named Entity Recognition using Classifier
Combination”, In Proceedings of Seventh International Conference on Advances in Pattern
Recognition, pp 259–262.
[27] Ekbal A. and Bandyopadhyay S. (2009) “Voted NER system using appropriate unlabelled
data”, In Proceedings of the Named Entities Workshop, ACL-IJCNLP.
[28] Ekbal A. and Bandyopadhyay S. (2010) “ Named entity recognition using appropriate
unlabeled data, post-processing and voting”, In Informatica, Vol 34, No. 1, pp 55-76.
[29] Chinnakotla Manoj K., Damani Om P., and Satoskar Avijit (2010) “Transliteration for
ResourceScarce Languages”, ACM Trans. Asian Lang. Inform,Article 14, pp 1-30.
[30] Kishorjit Nongmeikapam (2012) “Transliterated SVM Based Manipuri POS Tagging”,
Advances in Computer Science and Engineering and Applications, pp 989-999
[31] K.P.Sonam, V. Ajay, R. Laganatha.(2009) “Machine Learning with SVM and Other Kernel
Methods”, Machine Learning Book, PHI.
[32] Koul Omkar N. (2008) “Modern Hindi Grammar”, Dunwoody Press [33] Walambe M. R.
(1990) “Marathi Shuddalekhan”, Nitin Prakashan, Pune
[34] Walambe M. R. (1990) “Marathi Vyakran”, Nitin Prakashan, Pune
[35] Dhore M L, Dixit S K and Dhore R M (2012) “Hindi and Marathi to English NE
Transliteration Tool using Phonology and Stress Analysis”, 24th International Conference on
Computational Linguistic,s Proceedings of COLING Demonstration Papers, at IIT Bombay, pp
111-118
HYBRID APPROACHES FOR AUTOMATIC VOWELIZATION OF ARABIC TEXTS
Mohamed Bebah1 Chennoufi Amine2 Mazroui Azzeddine3 and Lakhouaja Abdelhak4
1Arab Center for Research and Policy Studies, Doha, Qatar
2 Faculty of Sciences/University Mohamed I, Oujda, Morocco
3 Faculty of Sciences/University Mohamed I, Oujda, Morocco 4 Faculty of
Sciences/University Mohamed I, Oujda, Morocco
ABSTRACT
Hybrid approaches for automatic vowelization of Arabic texts are presented in this article. The
process is made up of two modules. In the first one, a morphological analysis of the text words is
performed using the open source morphological Analyzer AlKhalil Morpho Sys. Outputs for
each word analyzed out of context, are its different possible vowelizations. The integration of
this Analyzer in our vowelization system required the addition of a lexical database containing
the most frequent words in Arabic language. Using a statistical approach based on two hidden
Markov models (HMM), the second module aims to eliminate the ambiguities. Indeed, for the
first HMM, the unvowelized Arabic words are the observed states and the vowelized words are
the hidden states. The observed states of the second HMM are identical to those of the first, but
the hidden states are the lists of possible diacritics of the word without its Arabic letters. Our
system uses Viterbi algorithm to select the optimal path among the solutions proposed by Al
Khalil Morpho Sys. Our approach opens an important way to improve the performance of
automatic vowelization of Arabic texts for other uses in automatic natural language processing.
KEYWORDS
Arabic language, Automatic vowelization, morphological analysis, hidden Markov model,
corpus
Full Text: https://airccse.org/journal/ijnlc/papers/3414ijnlc04.pdf
Volume URL: https://airccse.org/journal/ijnlc/vol3.html
REFERENCE
[1] Debili, Fathi & Hadhemi Achour (1998) Voyellation automatique de l’arabe. In Proceedings
of the workshop on Computation approaches to Semitic languages, COLING-ACL ’98, pages
42–49.
[2] Maamouri, Mohamed, Ann Bies, and Seth Kulick. (2006) Diacritization: a challenge to
Arabic treebank annotation and parsing. In Proceedings of the British Computer Society Arabic
NLP/MT Conference.
[3] Zitouni, Imed, Jefrey S. Sorensen, and Ruhi Sarikaya. (2006) Maximum entropy based
restoration of arabic diacritics. In Proceedings of the 21st International Conference on
Computational Linguistics and 44th Annual Meeting of the Association for Computational
Linguistics. Workshop on Computational approaches to Semitic Languages, Sydney, Australia.
July 2006, pages 577– 584.
[4] Vergyri, Dimitra & Katrin Kirchhoff. (2004) Automatic diacritization of arabic for acoustic
modeling in speech recognition. In Proceedings of the Workshop on Computational Approaches
to Arabic Script-based Languages. COLING, Geneva, pages 66–73.
[5] Messaoudi, Abdel, Lori Lamel, and Jean-Luc Gauvain. (2004) The limsi rt04 b arabic
system. In Proceedings DARPA RT04, Palisades NY.
[6] Elshafei, Moustafa, Husni Al-Muhtaseb, and Mansour Alghamdi. (2006) Machine generation
of arabic diacritical marks. In The 2006 World Congress in Computer Science Computer
Engineering, and Applied Computing. Las Vegas, USA., pages 128–133.
[7] Emam, Ossama and Volker Fischer. (2005) Hierarchical approach for the statistical
vowelization of arabic text. Technical report, IBM Corporation Intellectual Property Law,
Austin, TX, US.
[8] Schlippe, Tim, ThuyLinh Guyen, and ThuyLinh Vogel. (2008) Diacritization as a
machinetranslation problem and as a sequence labeling problem. In 8th AMTA conference,
Hawai., pages 21–25.
[9] Gal, Yaakov. (2002) An hmm approach to vowel restoration in arabic and hebrew. In
Proceedings of the Workshop on Computational Approaches to Semitic Languages-
Philadelphia- Association for Computational Linguistics, pages 27–33.
[10] Nelken, Rani and Stuart M. Shieber. (2005) Arabic diacritization using weighted finite-state
transducers. In Proceedings of the ACL 2005 Workshop On Computational Approaches To
Semitic Languages, Ann Arbor, Michigan, USA,, pages 79–86.
[11] Habash, Nizar and Owen Rambow. (2007) Arabic diacritization through full morphological
tagging. In Proceeding NAACL-Short ’07 Human Language Technologies 2007: The
Conference of the North American Chapter of the Association for Computational Linguistics -
Companion Volume - Short Papers Rochester - New York- USA, pages 53–56.
[12] Bebah, Mohamed Ould Abdallahi Ould, Abdelouafi Meziane, Azzeddine Mazroui, and
Abdelhak Lakhouaja. (2012) Approche morpho-statistique pour la voyellation des texts arabes.
Journal of Computer Science and Engineering, 5(1).
[13] Bebah, Mohamed Ould Abdallahi Ould, Abdelouafi Meziane, Azzeddine Mazroui, and
Abdelhak Lakhouaja. (2011) Alkhalil morpho sys. In 7th International Computing Conference in
Arabic, May 31- June 2, 2011, Riyadh, Saudi Arabia.
[14] El-Sadany, T and M Hashish. (1988) Semi-automatic vowelization of arabic verbs. In 10th
NC Conference, Jeddah, Saudi Arabia.
[15] Manning, Chris and Hinrich Schutze. (1999) Foundations of statistical natural language
processing. Massachusetts Institute of Technology Press - Library of Congress Cataloging in
publication Information.
[16] Deltour, Amelie. (2003) Methodes statistiques pour la voyellisation des texts arabes.
Master’s thesis, ENSIMAG-Karlsruhe University.
[17] Mansour, Alghamdi, Muhammad Khursheed, Mustafa Elshafei, Fayz Alhargan, Muhammed
Alkanhal, Abu Aus Alshamsan, Saad Alqahtani, Syed Zeeshan Muzaffar, Yasser Altowim,
Adnan Yusuf, and Husni Almuhtasib. 2006. Automatic arabic text diacritizer-final report ci 25
02. Technical report, KING ABDUL AZIZ CITY FOR SCIENCE AND TECHNOLOGY
KACST.
[18] Rashwan, Mohsen, Mohammad Al-Badrashiny, Mohamed Attia, and Sherif M. Abdou.
2009. A hybrid system for automatic arabic diacritization. In Natural Language Processing and
Knowledge Engineering. NLP-KE 2009 Cairo, Egypt,, pages 1–8.
[19] Buckwalter, Tim. 2004. Arabic morphological analyzer version 2.0 - ldc2004l02. In
Linguistic Data Consortium, University of Pennsylvania, 2002. LDC Cat alog No.:
LDC2004L02, ISBN 1- 58563-324-0.
[20] Abbas, Mourad and Kamel Smaili. 2005. Comparison of topic identification methods for
Arabic language. In the International conference RANLP05 Recent Advances in Natural
Language Processing, Borovets Bulgary, pages 21–23.
[21] Rafalovitch, Alexandre and Robert Dale. 2009. United nations general assembly resolutions:
a sixlanguage parallel corpus. In Proceedings of the MT Summit XII, Ottawa, Canada,, pages
292–299.
[22] Atiyya, Muhammad, Khalid Choukri, and Mustafa Yaseen. 2005. Specifications of the
Arabic written corpus produced within the nemlar project. Technical report, NEMLAR, Center
for Sprogteknologi.
[23] Neuhoff, D.L. 1975. The viterbi algorithm as an aid in text recognition. IEEE Transaction
on Information Theory, pages 222–226.
[24] Hifni, Yasser. 2012. Smoothing techniques for arabic diacritics restoration. In Proceedings
of the Twelfth Conference on Language Engineering (ESOLEC’12).
AN UNSUPERVISED APPROACH TO DEVELOP STEMMER
Mohd. Shahid Husain
Department of Information Technology, Integral University, Lucknow
ABSTRACT
This paper presents an unsupervised approach for the development of a stemmer (For the case of
Urdu & Marathi language). Especially, during last few years, a wide range of information in
Indian regional languages has been made available on web in the form of e-data. But the access
to these data repositories is very low because the efficient search engines/retrieval systems
supporting these languages are very limited. Hence automatic information processing and
retrieval is become an urgent requirement. To train the system training dataset, taken from
CRULP [22] and Marathi corpus [23] are used. For generating suffix rules two different
approaches, namely, frequency based stripping and length based stripping have been proposed.
The evaluation has been made on 1200 words extracted from the Emille corpus. The experiment
results shows that in the case of Urdu language the frequency based suffix generation approach
gives the maximum accuracy of 85.36% whereas Length based suffix stripping algorithm gives
maximum accuracy of 79.76%. In the case of Marathi language the systems gives 63.5%
accuracy in the case of frequency based stripping and achieves maximum accuracy of 82.5% in
the case of length based suffix stripping algorithm.
KEYWORDS
Stemming, Morphology, Urdu stemmer, Marathi stemmer, Information retrieval
Full Text: https://airccse.org/journal/ijnlc/papers/1212ijnlc02.pdf
Volume URL: https://airccse.org/journal/ijnlc/vol1.html
REFERENCES
[1] Rizvi, J et. al. “Modeling case marking system of Urdu-Hindi languages by using semantic
information”. Proceedings of the IEEE International Conference on Natural Language
Processing and Knowledge Engineering (IEEE NLP-KE '05). 2005.
[2] Butt, M. King, T. “Non-Nominative Subjects in Urdu: A Computational Analysis”.
Proceedings of the International Symposium on Non-nominative Subjects, Tokyo, December, pp.
525-548, 2001.
[3] Savoy, J. “Stemming of French words based on grammatical categories”. Journal of the
American Society for Information Science, 44(1), 1-9, 1993.
[4] Lovins Julie Beth: Development of a stemming algorithm. Mechanical Translation and
Computational Linguistics 11:22–31. (1968)
[5] Mokhtaripour, A., Jahanpour, S. “Introduction to a New Farsi Stemmer”. Proceedings of
CIKM Arlington VA, USA, 826-827, 2006.
[6] R. Wicentowski. "Multilingual Noise-Robust Supervised Morphological Analysis using the
Word Frame Model." In Proceedings of Seventh Meeting of the ACL Special Interest Group on
Computational Phonology (SIGPHON), pp. 70-77, 2004.
[7] Rizvi, Hussain M. “Analysis, Design and Implementation of Urdu Morphological Analyzer”.
SCONEST, 1-7, 2005.
[8] Krovetz, R. “View Morphology as an Inference Process”. In the Proceedings of 5th
International Conference on Research and Development in Information Retrieval, 1993. [9]
Porter, M. “An Algorithm for Suffix Stripping”. Program, 14(3): 130-137, 1980.
[10] Thabet, N. “Stemming the Qur’an”. In the Proceedings of the Workshop on Computational
Approaches to Arabic Script-based Languages, 2004.
[11] Paik, Pauri. “A Simple Stemmer for Inflectional Languages”. FIRE 2008. [12] Sharifloo,
A.A., Shamsfard M. “A Bottom up Approach to Persian Stemming”. IJCNLP, 2008
[13] Croft and Xu. “Corpus-Based Stemming Using Co occurrence of Word Variants”. ACM
Transactions on Information Systems (61-81), 1998.
[14] Kumar, A. and Siddiqui, T. “An Unsupervised Hindi Stemmer with Heuristics
Improvements”. In Proceedings of the Second Workshop on Analytics for Noisy Unstructured
Text Data, 2008.
[15] Kumar, M. S. and Murthy, K. N. “Corpus Based Statistical Approach for Stemming
Telugu”. Creation of Lexical Resources for Indian Language Computing and Processing (LRIL),
C-DAC, Mumbai, India, 2007.
[16] Qurat-ul-Ain Akram, Asma Naseer, Sarmad Hussain. “Assas-Band, an Affix-Exception-List
Based Urdu Stemmer”. Proceedings of ACL-IJCNLP 2009.
[17] http://en.wikipedia.org/wiki/Urdu
[18] http://www.bbc.co.uk/languages/other/guide/urdu/steps.shtml
[19] http://www.andaman.org/BOOK/reprints/weber/rep-weber.htm
[20] Natural Language processing and Information Retrieval by Tanveer Siddiqui, U S Tiwary.
[21] Information retrieval: data structure and algorithms by William B. Frakes, Ricardo Baeza-
Yates.
[22] http://www.crulp.org/software/ling_resources.htm
[23] Marathi Corpus, http://www.cfilt.iitb.ac.in/marathi_Corpus/ , IIT Powai, Mumbai
WORD SENSE DISAMBIGUATION USING WSD SPECIFIC WORDNET OF
POLYSEMY WORDS
Udaya Raj Dhungana1, Subarna Shakya2 , Kabita Baral3 and Bharat Sharma4
1, 2, 4Department of Electronics and Computer Engineering, Central Campus, IOE,
Tribhuvan University, Lalitpur, Nepal
3Department of Computer Science, GBS, Lamachaur, Kaski, Nepal
ABSTRACT
This paper presents a new model of WordNet that is used to disambiguate the correct sense of
polysemy word based on the clue words. The related words for each sense of a polysemy word as
well as single sense word are referred to as the clue words. The conventional WordNet organizes
nouns, verbs, adjectives and adverbs together into sets of synonyms called synsets each
expressing a different concept. In contrast to the structure of WordNet, we developed a new
model of WordNet that organizes the different senses of polysemy words as well as the single
sense words based on the clue words. These clue words for each sense of a polysemy word as
well as for single sense word are used to disambiguate the correct meaning of the polysemy word
in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms. The
clue word can be a noun, verb, adjective or adverb.
KEYWORDS
Word Sense Disambiguation, WordNet, Polysemy Words, Synset, Hypernymy, Context word,
Clue Words
Full Text: https://airccse.org/journal/ijnlc/papers/3414ijnlc05.pdf
Volume URL: https://airccse.org/journal/ijnlc/vol3.html
REFERENCES
[1] N. Ide and J. Véronis, “Word sense disambiguation: The state of the art,” Computational
Linguistics, pp. 1–40, 1998.
[2] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller, “Introduction to wordnet:
An on-line lexical database,” International Journal of Lexicography, 1998.
[3] U. R. Dhungana and S. Shakya, “Word sense disambiguation in nepali language,” in The
Fourth International Conference on Digital Information and Communication Technology and Its
Application (DICTAP2014), Bangkok, Thailand, 2014, pp. 46–50.
[4] M. E. Lesk, “Automatic sense disambiguation using machine readable dictionaries: How to
tell a pine cone from a ice cream cone,” in SIGDOC Conference, Toronto, Ontario, Canada,
1986.
[5] S. Banerjee and T. Pedersen, “An adapted lesk algorithm for word sense disambiguation
using wordnet,” in Third International Conference on Intelligent Text Processing and
Computational Linguistics, Gelbukh, 2002.
[6] M. Sinha, M. K. Reddy, P. Bhattacharyya, P. Pandey, and L. Kashyap, “Hindi word sense
disambiguation,” Master’s thesis, Indian Institute of Technology Bombay, Mumbai, India, 2004.
[7] N. Shrestha, A. V. H. Patrick, and S. K. Bista, “Resources for nepali word sense
disambiguation,” in IEEE International conference on Natural Language Processing and
Knowledge Engineering (IEEE NLP-KE’08), Beijing, China, 2008.
[8] P. Bhattacharyya, P. Pande, and L. Lupu, “Hindi wordnet,” Indian Institute of Technology
Bombay, Mumbai, India, Tech. Rep., 2008.
[9] N. Shrestha, A. V. H. Patrick, and S. K. Bista, “Nepali word sense disambiguation using lesk
algorithm,” Master’s thesis, Kathmandu University, Dhulikhel, Kavre, Nepal, 2004.

More Related Content

Similar to February 2024 - Top 10 cited articles.pdf

Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrievalunyil96
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningIOSR Journals
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYcscpconf
 
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGESSTUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGESijistjournal
 
An Ontology Model for Knowledge Representation over User Profiles
An Ontology Model for Knowledge Representation over User ProfilesAn Ontology Model for Knowledge Representation over User Profiles
An Ontology Model for Knowledge Representation over User ProfilesIJMER
 
Extraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringExtraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringIRJET Journal
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual GuideIRJET Journal
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
 
Enriching search results using ontology
Enriching search results using ontologyEnriching search results using ontology
Enriching search results using ontologyIAEME Publication
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMijcsa
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET Journal
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Miningijsc
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining  A Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining ijsc
 
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENTAUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENTijnlc
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewINFOGAIN PUBLICATION
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Dhabal Sethi
 

Similar to February 2024 - Top 10 cited articles.pdf (20)

Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
A0210110
A0210110A0210110
A0210110
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 
Viva
VivaViva
Viva
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
 
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGESSTUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
 
An Ontology Model for Knowledge Representation over User Profiles
An Ontology Model for Knowledge Representation over User ProfilesAn Ontology Model for Knowledge Representation over User Profiles
An Ontology Model for Knowledge Representation over User Profiles
 
Extraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringExtraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web Engineering
 
M045067275
M045067275M045067275
M045067275
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual Guide
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
 
Enriching search results using ontology
Enriching search results using ontologyEnriching search results using ontology
Enriching search results using ontology
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining  A Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
 
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENTAUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENT
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11
 

More from kevig

Genetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech TaggingGenetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech Taggingkevig
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabikevig
 
Improving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data OptimizationImproving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data Optimizationkevig
 
Document Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language StructureDocument Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language Structurekevig
 
Rag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented GenerationRag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented Generationkevig
 
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...kevig
 
Evaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English LanguageEvaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English Languagekevig
 
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONIMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONkevig
 
Document Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language StructureDocument Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language Structurekevig
 
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONRAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONkevig
 
Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...kevig
 
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEEVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEkevig
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithmkevig
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignmentskevig
 
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov ModelNERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Modelkevig
 
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENENLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENEkevig
 
January 2024: Top 10 Downloaded Articles in Natural Language Computing
January 2024: Top 10 Downloaded Articles in Natural Language ComputingJanuary 2024: Top 10 Downloaded Articles in Natural Language Computing
January 2024: Top 10 Downloaded Articles in Natural Language Computingkevig
 
Clustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language BrowsingClustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language Browsingkevig
 
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...kevig
 
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...kevig
 

More from kevig (20)

Genetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech TaggingGenetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech Tagging
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabi
 
Improving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data OptimizationImproving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data Optimization
 
Document Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language StructureDocument Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language Structure
 
Rag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented GenerationRag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented Generation
 
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
 
Evaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English LanguageEvaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English Language
 
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONIMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
 
Document Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language StructureDocument Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language Structure
 
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONRAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
 
Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...
 
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEEVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov ModelNERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
 
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENENLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE
 
January 2024: Top 10 Downloaded Articles in Natural Language Computing
January 2024: Top 10 Downloaded Articles in Natural Language ComputingJanuary 2024: Top 10 Downloaded Articles in Natural Language Computing
January 2024: Top 10 Downloaded Articles in Natural Language Computing
 
Clustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language BrowsingClustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language Browsing
 
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...
 
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
 

Recently uploaded

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 

February 2024 - Top 10 cited articles.pdf

  • 1. February 2024: Top10 Cited Articles in Natural Language Computing International Journal on Natural Language Computing (IJNLC) https://airccse.org/journal/ijnlc/index.html ISSN: 2278 - 1307 [Online]; 2319 - 4111 [Print] Google Scholar https://scholar.google.com/citations?user=A5tqIdoAAAAJ&hl=en
  • 2. AN IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES Mohammed Al-Maolegi1 , Bassam Arkok2 Computer Science, Jordan University of Science and Technology, Irbid, Jordan ABSTRACT There are several mining algorithms of association rules. One of the most popular algorithms is Apriori that is used to extract frequent itemsets from large database and getting the association rule for discovering the knowledge. Based on this algorithm, this paper indicates the limitation of the original Apriori algorithm of wasting time for scanning the whole database searching on the frequent itemsets, and presents an improvement on Apriori by reducing that wasted time depending on scanning only some transactions. The paper shows by experimental results with several groups of transactions, and with several values of minimum support that applied on the original Apriori and our implemented improved Apriori that our improved Apriori reduces the time consumed by 67.38% in comparison with the original Apriori, and makes the Apriori algorithm more efficient and less time consuming. KEYWORDS Apriori, Improved Apriori, Frequent itemset, Support, Candidate itemset, Time consuming. Full Text: https://airccse.org/journal/ijnlc/papers/3114ijnlc03.pdf Volume URL: https://airccse.org/journal/ijnlc/vol3.html
  • 3. REFERENCES [1] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, Dec. 2007. [2] S. Rao, R. Gupta, “Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm”, International Journal of Computer Science And Technology, pp. 489-493, Mar. 2012 [3] H. H. O. Nasereddin, “Stream data mining,” International Journal of Web Applications, vol. 1, no. 4, pp. 183–190, 2009. [4] F. Crespo and R. Weber, “A methodology for dynamic data mining based on fuzzy clustering,” Fuzzy Sets and Systems, vol. 150, no. 2, pp. 267–284, Mar. 2005. [5] R. Srikant, “Fast algorithms for mining association rules and sequential patterns,” UNIVERSITY OF WISCONSIN, 1996. [6] J. Han, M. Kamber,”Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, Book, 2000. [7] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery in databases,” AI magazine, vol. 17, no. 3, p. 37, 1996. [8] F. H. AL-Zawaidah, Y. H. Jbara, and A. L. Marwan, “An Improved Algorithm for Mining Association Rules in Large Databases,” Vol. 1, No. 7, 311-316, 2011 [9] T. C. Corporation, “Introduction to Data Miningand Knowledge Discovery”, Two Crows Corporation, Book, 1999. [10] R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” in ACM SIGMOD Record, vol. 22, pp. 207–216, 1993 [11] M. Halkidi, “Quality assessment and uncertainty handling in data mining process,” in Proc, EDBT Conference, Konstanz, Germany, 2000.
  • 4. NAMED ENTITY RECOGNITION USING HIDDEN MARKOV MODEL (HMM) Sudha Morwal 1 , Nusrat Jahan 2 and Deepti Chopra 3 1Associate Professor, Banasthali University, Jaipur, Rajasthan-302001 2 M.Tech (CS), Banasthali University, Jaipur, Rajasthan-302001 3 M. Tech (CS), Banasthali University, Jaipur, Rajasthan-302001 ABSTRACT: Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific. KEYWORDS Named Entity Recognition (NER), Natural Language processing (NLP), Hidden Markov Model (HMM). Full Text: http://airccse.org/journal/ijnlc/papers/1412ijnlc02.pdf Volume URL: http://airccse.org/journal/ijnlc/vol1.html
  • 5. REFERENCES [1] Pramod Kumar Gupta, Sunita Arora “An Approach for Named Entity Recognition System for Hindi: An Experimental Study” in Proceedings of ASCNT – 2009, CDAC, Noida, India, pp. 103 – 108. [2] Shilpi Srivastava, Mukund Sanglikar & D.C Kothari. ”Named Entity Recognition System for Hindi Language: A Hybrid Approach” International Journal of Computational Linguistics (IJCL), Volume(2):Issue(1):2011.Availableat: http://cscjournals.org/csc/manuscript/Journals/IJCL/volume2/Issue1/IJCL-19.pdf [3] “Padmaja Sharma, Utpal Sharma, Jugal Kalita”Named Entity Recognition: A Survey for the Indian Languages”(Language in India www.languageinindia.com 11:5 May 2011 Special Volume: Problems of Parsing in Indian Languages.) Available at: http://www.languageinindia.com/may2011/padmajautpaljugal.pdf. [4] Lawrence R. Rabiner, " A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", In Proceedings of the IEEE, VOL.77,NO.2, February 1989.Available at: http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf. [5] Sujan Kumar Saha, Sudeshna Sarkar, Pabitra Mitra “Gazetteer Preparation for Named Entity Recognition in Indian Languages” in the Proceeding of the 6th Workshop on Asian Language Resources, 2008 . Available at: http://www.aclweb.org/anthology-new/I/I08/I08- 7002.pdf [6] B. Sasidhar#1, P. M. Yohan*2, Dr. A. Vinaya Babu3, Dr. A. Govardhan4” A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu” in IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011 available at : http://www.ijcsi.org/papers/IJCSI-8-2-438-443.pdf. [7] GuoDong Zhou Jian Su,” Named Entity Recognition using an HMM-based Chunk Tagger” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 473-480. [8] http://en.wikipedia.org/wiki/Forward–backward_algorithm [9] http://en.wikipedia.org/wiki/Baum-Welch_algorithm. [10] Dan Shen, jie Zhang, Guodong Zhou,Jian Su, Chew-Lim Tan” Effective Adaptation of a Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain” available at: http://acl.ldc.upenn.edu/W/W03/W03-1307.pdf
  • 6. SENTIMENT ANALYSIS FOR MODERN STANDARD ARABIC AND COLLOQUIAL Hossam S. Ibrahim 1, Sherif M. Abdou2 and Mervat Gheith 1 1Computer Science Department, Institute of statistical studies and research (ISSR), Cairo University, EGYPT 2 Information Technology Department, Faculty of Computers and information Cairo University, EGYPT ABSTRACT The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations, therefore many are now looking to the field of sentiment analysis. In this paper, we present a feature-based sentence level approach for Arabic sentiment analysis. Our approach is using Arabic idioms/saying phrases lexicon as a key importance for improving the detection of the sentiment polarity in Arabic sentences as well as a number of novels and rich set of linguistically motivated features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for conflicting phrases which enhance the sentiment classification accuracy. Furthermore, we introduce an automatic expandable wide coverage polarity lexicon of Arabic sentiment words. The lexicon is built with gold-standard sentiment words as a seed which is manually collected and annotated and it expands and detects the sentiment orientation automatically of new sentiment words using synset aggregation technique and free online Arabic lexicons and thesauruses. Our data focus on modern standard Arabic (MSA) and Egyptian dialectal Arabic tweets and microblogs (hotel reservation, product reviews, etc.). The experimental results using our resources and techniques with SVM classifier indicate high performance levels, with accuracies of over 95%. KEYWORDS Sentiment Analysis, opinion mining, social network, sentiment lexicon, modern standard Arabic, colloquial, natural language processing Full Text: https://airccse.org/journal/ijnlc/papers/4215ijnlc07.pdf Volume URL: https://airccse.org/journal/ijnlc/vol4.html
  • 7. REFERENCES [1] A. Shoukry and A. Rafea, "Sentence-level Arabic sentiment analysis," in Collaboration Technologies and Systems (CTS) International Conference, Denver, CO, USA, 2012, pp. 546- 550. [2] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment classification using machine learning techniques," in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002, pp. 79–86. [3] D. Davidiv, O. Tsur, and A. Rappoport, "Enhanced Sentiment Learning Using Twitter Hash- tags and Smileys," in Proceedings of the 23rd International Conference on Computational Linguistics (Coling2010), Beijing, China, 2010, pp. 241–249. [4] L. Barbosa and J. Feng, "Robust Sentiment Detection on Twitter from Biased and Noisy Data " in Proceedings of the 23rd International Conference on Computational Linguistics (Coling), 2010. [5] P. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews," in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics ACL '02, Stroudsburg, PA, USA, 2002, pp. 417-424. [6] V. Hatzivassiloglou and K. R. McKeown, "Predicting the semantic orientation of adjectives," in Proceedings of the Joint ACL / EACL Conference, 1997, pp. 174–181. [7] B. Pang and L. Lee, "Opinion mining and sentiment analysis," Foundations and Trends in Information Retrieval vol. 2, pp. 1–135, 2008. [8] M. Hu and B. Liu, "Mining and summarizing customer reviews " in Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2004, pp. 168–177. [9] B. Liu, "Sentiment Analysis and Subjectivity," in Handbook of Natural Language Processing, Second ed: CRC Press, Taylor and Francis Group, 2010. [10] P. Alexander and P. Patrick, "Twitter as a Corpus for Sentiment Analysis and Opinion Mining " in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), European Language Resources Association ELRA, Valletta, Malta, 2010. [11] C. Scheible and H. Schütze, "Bootstrapping Sentiment Labels For Unannotated Documents With Polarity PageRank," in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istambol-Turki, 2012. [12] C. Manning and D. Klein, "Optimization, maxent models, and conditional estimation without magic," in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2003, p. 8.
  • 8. [13] A. Abbasi, H. Chen, and A. Salem, "Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums," ACM Transactions on Information Systems, vol. 26, 2008. [14] E. Riloff and J. Wiebe, "Learning extraction patterns for subjective expressions," in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2003. [15] E. Riloff, J. Wiebe, and T. Wilson, "Learning subjective nouns using extraction pattern bootstrapping," in Proceedings of the Conference on Natural Language Learning (CoNLL), 2003, pp. 25–32. [16] M. Abdul-Mageed and M. Diab, "Subjectivity and Sentiment Annotation of Modern Standard Arabic Newswire," in Proceedings of the Fifth Law Workshop (LAW V), Association for Computational Linguistics, Portland, Oregon, 2011, pp. 110–118. [17] M. Abdul-Mageed, M. Diab, and M. Korayem, "Subjectivity and sentiment analysis of modern standard Arabic," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011. [18] M. Abdul-Mageed, K. Sandra, and M. Diab, "SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media," in Proceedings of the 3rd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, Jeju,Republic of Korea, 2012, pp. 19–28. [19] A. Mourad and K. Darwish, "Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs," in Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), Atlanta, Georgia, 2013, pp. 55–64. [20] M. Korayem, D. Crandall, and M. Abdul-Mageed, "Subjectivity and Sentiment Analysis of Arabic: A Survey," in Advanced Machine Learning Technologies and Applications, Communications in Computer and Information Science series 322, (Springer), AMLTA, 2012, pp. 128-139. [21]M. Abdul-Mageed and M. Diab, "AWATIF: A multi-genre corpus for Arabic subjectivity and sentiment analysis," in Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 2012a. [22] M. Rushdi-Saleh, M. Mart´ın-Valdivia, L. Ure˜na-L´opez, and J. Perea-Ortega, "Oca: Opinion corpus for Arabic," Journal of the American Society for Information Science and Technology, vol. 62, pp. 2045–2054, 2011. [23] M. Elarnaoty, S. AbdelRahman, and A. Fahmy, "A Machine Learning Approach for Opinion Holder Extraction Arabic Language," CoRR, abs/1206.1011, vol. 3, 2012.
  • 9. [24] M. Abdul-Mageed and M. Diab, "SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis," in Proceedings of The 9th edition of the Language Resources and Evaluation Conference (LREC ), Reykjavik, Iceland, 2014. [25] E. Refaee and V. Rieser, "An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis," in Proceedings of The 9th edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik, Iceland, 2014. [26] M. Elmahdy, G. Rainer, M. Wolfgang, and A. Slim, "Survey on common Arabic language forms from a speech recognition point of view," in proceeding of International conference on Acoustics (NAG-DAGA), Rotterdam, Netherlands, 2009, pp. 63-66. [27] J. C. Carletta, "Assessing agreement on classification tasks: the KAPPA statistic " Computational Linguistics, vol. 22, pp. 249- 254, 1996. [28] B. Liu, Sentiment Analysis and Opinion Mining Morgan &Claypool Publishers, 2012. :sayings Colloquial [‫ا‬B ‫العال‬ ‫مثال‬ ‫ا‬ ‫الحرف‬ ‫حسب‬ ‫ومرتبة‬ ‫مشروحة‬ :‫مية‬ B‫موضوعى‬ ‫كشاف‬ ‫مع‬ ‫المثل‬ ‫من‬ ‫ول‬ ,Basha. [29] A an annotated and arranged by the first letter of ideals with the Scout TOPICAL]. Egypt: Al-Ahram Foundation - Al-Ahram Center for Translation and Publishing, 1986. [30] A. Saalan, ‫مثال‬ ‫الشعبية‬ ‫المصرية‬B‫موسوعة‬ ‫]ا‬ Encyclopedia of Egyptian popular sayings], First ed. Egypt: Dar-alafkalarabia press, 2003. Egyptian, sayings Colloquial [ ,‫الشعبية‬ ‫القصص‬ ,‫العربية‬ ‫النوادر‬ ‫ا‬B‫المصرى‬ ‫الفولكلور‬ ,‫العامية‬ ‫مثال‬ ,Husain. F [31]F. Husain folklore]. Egypt: General Egyptian Book Organization GEBO, 1984. [32] G. Taher. (2006). ‫دراسة‬ ‫علمية‬ - ‫مثال‬ ‫الشعبية‬ P‫موسوعة‬ ‫]ا‬ Encyclopedia of public sayings - a scientific study]. Available: http://books.google.com.eg/books?id=2CR_EKTjxRgC [33] PROz. (2014). PROz website for Arabic Idioms/Maxims/Sayings (Jan 2014). Available: http://www.proz.com/glossary-translations/ [34] M. Diab, "Towards an optimal POS tag set for Modern Standard Arabic processing," in Proceedings of Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, 2007. [35] O. F. Zaidan and C. Callison-Burch, "Arabic dialect identification," Computational Linguistics, vol. 40, pp. 171-202, March 2014 2012. [36] H. S. Ibrahim, S. M. Abdou, and M. Gheith, "Automatic expandable large-scale sentiment lexicon of Modern Standard Arabic and Colloquial," in 16th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING), Cairo - Egypt, 2015. [37] M. Sharifi and W. Cohen. (2008, May, 2014). “Finding domain specifc polar words for sentiment classification. Available: http://www.cs.cmu.edu/~mehrbod/polarity_08.pdf
  • 10. [38] J. YI, T. NASUKAWA, R. BUNESCU, and W. NIBLACK, "Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques " in Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), 2003, pp. 427– 434. [39] Z. Fei, J. LIU, and G. WU, "Sentiment classification using phrase patterns," in Proceedings of the 4th IEEE International Conference on Computer Information Technology, 2004, pp. 1147–1152. [40] T. Joachims. (2008, Jan-2013). SVM-light: Support vector machine. Available: http://svmlight.joachims.org/
  • 11. SURVEY OF MACHINE TRANSLATION SYSTEMS IN INDIA G V Garje1 and G K Kharate2 1Department of Computer Engineering and Information Technology PVG’s College of Engineering and Technology, Pune, India 2 Principal, Matoshri College of Engineering and Research Centre, Nashik, India ABSTRACT The work in the area of machine translation has been going on for last few decades but the promising translation work began in the early 1990s due to advanced research in Artificial Intelligence and Computational Linguistics. India is a multilingual and multicultural country with over 1.25 billion population and 22 constitutionally recognized languages which are written in 12 different scripts. This necessitates the automated machine translation system for English to Indian languages and among Indian languages so as to exchange the information amongst people in their local language. Many usable machine translation systems have been developed and are under development in India and around the world. The paper focuses on different approaches used in the development of Machine Translation Systems and also briefly described some of the Machine Translation Systems along with their features, domains and limitations. KEYWORDS Machine Translation, Example-based MT, Transfer-based MT, Interlingua-based MT Full Text: http://airccse.org/journal/ijnlc/papers/2513ijnlc04.pdf Volume URL: http://airccse.org/journal/ijnlc/vol2.html
  • 12. REFERENCES [1] Sitender & Seema Bawa, (2012) “Survey of Indian Machine Translation Systems”, International Journal Computer Science and Technolgy, Vol. 3, Issue 1, pp. 286-290, ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print) [2] Sanjay Kumar Dwivedi & Pramod Premdas Sukhadeve, (2010) “Machine Translation System in Indian Perspectives”, Journal of Computer Science 6 (10): 1082-1087, ISSN 1549- 3636, © 2010 Science [3] John Hutchins, (2005) “Current commercial machine translation systems and computer- based translation tools: system types and their uses”, International Journal of Translation vol.17, no.1-2, pp.5-38. [4] Vishal Goyal & Gurpreet Singh Lehal, (2009) “Advances in Machine Translation Systems”, National Open Access Journal, Volume 9, ISSN 1930-2940 http://www.languageinindia. [5] Latha R. Nair & David Peter S., (2012) “Machine Translation Systems for Indian Languages”, International Journal of Computer Applications (0975 – 8887) Volume 39– No.1 [6] Vishal Goyal & Gurpreet Singh Lehal, (2010) “Web Based Hindi to Punjabi Machine Translation System”, International Journal of Emerging Technologies in Web Intelligence, Vol. 2, no. 2, pp. 148-151, ACADEMY PUBLISHER [7] Shachi Dave, Jignashu Parikh & Pushpak Bhattacharyya, (2002) “Interlingua-based English-Hindi Machine Translation and Language Divergence”, Journal of Machine Translation, pp. 251-304. [8] Sudip Naskar & Shivaji Bandyopadhyay, (2005) “Use of Machine Translation in India: Current status” AAMT Journal, pp. 25-31. [9] Sneha Tripathi & Juran Krishna Sarkhel, (2010) “Approaches to Machine Translation”, International journal of Annals of Library and Information Studies, Vol. 57, pp. 388-393 [10] Gurpreet Singh Josan & Jagroop Kaur, (2011) “Punjabi To Hindi Statistical Machine Transliteration”, International Journal of Information Technology and Knowledge Management , Volume 4, No. 2, pp. 459-463. [11] S. Bandyopadhyay, (2004) "ANUBAAD - The Translator from English to Indian Languages", in proceedings of the VIIth State Science and Technology Congress. Calcutta. India. pp. 43-51 [12] R.M.K. Sinha & A. Jain, (2002) “AnglaHindi: An English to Hindi Machine-Aided Translation System”, International Conference AMTA(Association of Machine Translation
  • 13. in the Americas) [13] Murthy. K, (2002) “MAT: A Machine Assisted Translation System”, In Proceedings of Symposium on Translation Support System( STRANS-2002), IIT Kanpur. pp. 134-139. [14] Lata Gore & Nishigandha Patil, (2002) “English to Hindi - Translation System”, In proceedings of Symposium on Translation Support Systems. IIT Kanpur. pp. 178-184. [15] Kommaluri Vijayanand, Sirajul Islam Choudhury & Pranab Ratna “VAASAANUBAADA - Automatic Machine Translation of Bilingual Bengali-Assamese News Texts”, in proceedings of Language Engineering Conference-2002, Hyderabad, India © IEEE Computer Society. [16] Bharati, R. Moona, P. Reddy, B. Sankar, D.M. Sharma & R. Sangal, (2003) “Machine Translation: The Shakti Approach”, Pre-Conference Tutorial, ICON-2003. [17] S. Mohanty & R. C. Balabantaray, (2004) “English to Oriya Translation System (OMTrans)” cs.pitt.edu/chang/cpol/c087.pdf [18] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah, Sawani Bade & Sasikumar M., (2006) “MaTra: A Practical Approach to Fully- Automatic Indicative EnglishHindi Machine Translation”, In the proceedings of MSPIL-06. [19] G. S. Josan & G. S. Lehal, (2008) “A Punjabi to Hindi Machine Translation System”, in proceedings of COLING-2008: Companion volume: Posters and Demonstrations, Manchester, UK, pp. 157-160. [20] Sanjay Chatterji, Devshri Roy, Sudeshna Sarkar & Anupam Basu, (2009) “A Hybrid Approach for Bengali to Hindi Machine Translation”, In proceedings of ICON-2009, 7th International Conference on Natural Language Processing, pp. 83-91. [21] Vishal Goyal & Gurpreet Singh Lehal, (2011) “Hindi to Punjabi Machine Translation System”, in proceedings of the ACL-HLT 2011 System Demonstrations, pages 1–6, Portland, Oregon, USA, 21 June 2011. [22] Ankit Kumar Srivastava, Rejwanul Haque, Sudip Kumar Naskar & Andy Way, (2008) “The MATREX (Machine Translation using Example): The DCU Machine Translation System for ICON 2008”, in Proceedings of ICON-2008: 6th International Conference on Natural Language Processing, Macmillan Publishers, India, http://ltrc.iiit.ac.in/proceedings/ICON-2008. [23] hutchinsweb.me.uk/Nutshell-2005.pdf [24] John Hutchins “Historical survey of machine translation in Eastern and Central Europe”, Based on an unpublished presentation at the conference on Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, Hamburg,
  • 14. Germany. (www.hutchinsweb.me.uk/Hamburg-2012.pdf) [25] Sampark: Machine Translation System among Indian languages (2009) http://tdildc.in/index.php?option=com_vertical&parentid=74, http://sampark.iiit.ac.in/ [26] Akshar Bharti, Chaitanya Vineet, Amba P. Kulkarni & Rajiv Sangal, (1997) ”ANUSAARAKA: Machine Translation in stages’, Vivek, a quarterly in Artificial Intelligence, Vol. 10, No. 3, NCST Mumbai, pp. 22-25 [27] Akshar Bharti, Chaitanya Vineet, Amba P. Kulkarni & Rajiv Sangal, (2001) ”ANUSAARAKA: overcoming the language barrier in India”, published in Anuvad: approaches to Translation [28] Hemant Darabari, (1999) “Computer Assisted Translation System- An Indian Perspective”, in proceedings of MT Summit VII, Thialand [29] R. Mahesh K. Sinha & Anil Thakur, (2005) “Machine Translation of Bi-lingual Hindi-English (Hinglish) Text”, in proceedings of 10th Machine Translation Summit organized by Asia-Pacific Association for Machine Translation (AAMT), Phuket, Thailand [30] Parameswari K, Sreenivasulu N.V., Uma Maheshwar Rao G & Christopher M, (2012) “Development of Telugu-Tamil Bidirectional Machine Translation System: A special focus on case divergence”, in proceedings of 11th International Tamil Internet conference, pp 180- 191 [31] Salil Badodekar, (2004) “Translation Resources, Services and Tools for Indian Languages”, a report of Centre for Indian Language Technology, IITB, http://www.cfilt.iitb.ac.in/Translationsurvey/survey.pdf [32] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah, Sawani Bade & Sasikumar M, (2006) “MaTra: A Practical Approach to Fully-Automatic Indicative EnglishHindi Machine Translation”, in proceedings of the first national symposium on Modelling and shallow parsing of Indian languages (MSPIL-06) organized by IIT Bambay, 202.141.152.9/clir/papers/matra_mspil06.pdf [33] CDAC Mumbai, (2008) “MaTra: an English to Hindi Machine Translation System”, a report by CDAC Mumbai formerly NCST. [34] Sanjay Chatterji, Praveen Sonare, Sudeshna Sarkar & Anupam Basu, (2011) “Lattice Based Lexical Transfer in Bengali Hindi Machine Translation Framework”, in Proceedings of ICON2011: 9th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from ltrc.iiit.ac.in/proceedings/ICON-2011. [35] R. Ananthakrishnan, Jayprasad Hegde, Pushpak Bhattacharyya, Ritesh Shah & M. Sasikumar, (2008) “Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation”, in proceedings of International Joint Conference on NLP (IJCNLP08), Hyderabad, India.
  • 15. [36] Yanjun Ma, John Tinsley, Hany Hassan, Jinhua Du & Andy Way, (2008) “Exploiting Alignment Techniques in MATREX: the DCU Machine Translation System for IWSLT 2008’, in proceedings of IWSLT 2008, Hawaii, USA [37] projects.uptuwatch.com/cs-it/anubharti-an-hybrid-example-based-approach-for- machine-aidedtrapnslation/ [38] Sugata Sanyal & Rajdeep Borgohain, (2013) “Machine Translation Systems in India”, Cornel University Library, arxiv.org/ftp/arxiv/papers/1304/1304.7728.pdf [39] Antony P. J., (2013) “Machine Translation Approaches and Survey for Indian Languages”, International journal of Computational Linguistics and Chinese Language Processing Vol. 18, No. 1, pp. 47-78. [40] Manoj Jain & Om P. Damani, (2009) “English to UNL (Interlingua) Enconversion”, in proceedings of 4th Language and Translation Conference (LTC-09). [41] Smriti Singh, Mrugank Dalal, Vishal Vachhani, Pushpak Bhattacharyya & Om P. Damani, (2007) “Hindi Generation from Interlingua (UNL)”, in proceedings of MT Summit, 2007 [42] language.worldofcomputing.net [43] sampark.iiit.ac.in [44] www.cdacmumbai.in/xlit [ 45] www.cdacmumbai.in/rupantar [46] translationjournal.net/journal/29computers.htm [47] www.cfilt.iitb.ac.in/resources/surveys/MT-Literature%20Survey-2012-Somya.pdf [48] www.cdacmumbai.in/e-ilmt [49] www.iiit.net/ltrc/Anusaaraka/anu_home.html [50] cdac.in/html/aai/mantra.asp [51] translate.google.com/about/intl/en_ALL/
  • 16. RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI Deepti Bhalla1 , Nisheeth Joshi2 and Iti Mathur3 1,2,3 Apaji Institute, Banasthali University, Rajasthan, India ABSTRACT Machine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi. KEYWORDS Machine Translation, Machine Transliteration, Name entity recognition, Syllabification. Full Text: https://airccse.org/journal/ijnlc/papers/2213ijnlc07.pdf Volume URL: https://airccse.org/journal/ijnlc/vol2.html
  • 17. REFERENCES [1] Kamal Deep and Vishal Goyal, (2011) ”Development of a Punjabi to English transliteration system”. In International Journal of Computer Science and Communication Vol. 2, No. 2, pp. 521-526. [2] Shubhangi Sharma, Neha Bora and Mitali Halder, (2012) “English-Hindi Transliteration using Statistical Machine Translation in different Notation” International Conference on Computing and Control Engineering (ICCCE 2012). [3] Kamal Deep, Dr.Vishal Goyal, (2011) “Hybrid Approach for Punjabi to English Transliteration System” International Journal of Computer Applications (0975 – 8887) Volume 28– No.1. [4] Jasleen kaur Gurpreet Singh josan , (2011) “Statistical Approach to Transliteration from English to Punjabi”, In Proceeding of International Journal on Computer Science and Engineering (IJCSE), Vol. 3 Issue 4, p1518. [5] Er. Sheilly Padda, Rupinderdeep Kaur, Er. Nidhi, (2012) “Punjabi Phonetic: Punjabi Text to IPA Conversion” International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com ISSN 2250-2459, Volume 2, Issue 10. [6] Gurpreet Singh Josan, Gurpreet Singh Lehal, (2010) “A Punjabi to Hindi Machine Transliteration System” Computational Linguistics and Chinese Language Processing Vol. 15, No. 2, pp. 77-102. [7] Manikrao L Dhore, Shantanu K Dixit, Tushar D Sonwalkar, (2012) “Hindi to English Machine Transliteration of Named Entities using Conditional Random Fields.” International Journal of Computer Applications;6/15/2012, Vol. 48, p31. [8] Musa, Hafiz, Rabith A.kadir, Azreen Azman, M.taufik Abadullah, (2011) "Syllabification algorithm based on syllable rules matching for Malay language." Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science. World Scientific and Engineering Academy and Society (WSEAS). [9] To download IRSTLM toolkit http://www.statmt.org [10] Jenny Rose Finkel, Trond Grenager, and Christopher Manning, (2005) Incorporating Non- local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363- 370. [11] Daniel Jurafsky, James H. Martin Speech and Language processing An Introduction to speech Recognition, natural language processing, and computational linguistics.
  • 18. HYBRID PART-OF-SPEECH TAGGER FOR NON-VOCALIZED ARABIC TEXT Meryeme Hadni1 , Said Alaoui Ouatik1 , Abdelmonaime Lachkar2 and Mohammed Meknassi1 1FSDM, Sidi Mohamed Ben Abdellah University (USMBA), Morocco 2E.N.S.A, Sidi Mohamed Ben Abdellah University (USMBA), Morocco ABSTRACT Part of speech tagging (POS tagging) has a crucial role in different fields of natural language processing (NLP) including Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This paper proposes an efficient and accurate POS Tagging technique for Arabic language using hybrid approach. Due to the ambiguity issue, Arabic Rule-Based method suffers from misclassified and unanalyzed words. To overcome these two problems, we propose a Hidden Markov Model (HMM) integrated with Arabic Rule-Based method. Our POS tagger generates a set of three POS tags: Noun, Verb, and Particle. The proposed technique uses the different contextual information of the words with a variety of the features which are helpful to predict the various POS classes. To evaluate its accuracy, the proposed method has been trained and tested with two corpora: the Holy Quran Corpus and Kalimat Corpus for undiacritized Classical Arabic language. The experiment results demonstrate the efficiency of our method for Arabic POS Tagging. In fact, the obtained accuracies rates are 97.6%, 96.8% and 94.4% for respectively our Hybrid Tagger, HMM Tagger and for the Rule- Based Tagger with Holy Quran Corpus. And for Kalimat Corpus we obtained 94.60%, 97.40% and 98% for respectively Rule-Based Tagger, HMM Tagger and our Hybrid Tagger. KEY WORDS Part-Of-Speech Tagger, Natural Language Applications, Natural Language Parsing, Hidden Markov Model, Multi Words Term Extraction, Speech Recognition. Full Text: https://airccse.org/journal/ijnlc/papers/2613ijnlc01.pdf Volume URL: https://airccse.org/journal/ijnlc/vol2.html
  • 19. REFERENCE [1] Lee, S.hyun. & Kim Mi Na, (2008) “This is my paper”, ABC Transactions on ECE, Vol. 10, No. 5, pp120-122. [2] Gizem, Aksahya & Ayese, Ozcan (2009) Comunications & Networks, Network Books, ABC Publishers. [1] http://en.wikipedia.org/wiki/Part-of-speech_tagging. [2] L.Van Guilder, (1995) “Automated Part of Speech Tagging: A Brief Overview” Handout for LING361, Georgetown University. [3] H. Halteren, J.Zavrel & Walter Daelemans (2001).Improving Accuracy in NLP Through Combination of Machine Learning Systems. Computational Linguistics. 27(2): 199–229. [4] DeRose & J.Steven (1990) "Stochastic Methods for Resolution of Grammatical Category Ambiguity in Inflected and Uninflected Languages." PhD.Dissertation. Providence, RI: Brown University Department of Cognitive and Linguistic Sciences. [5] N. kumar Kumar, Anikel Dalal &Uma Sawant (2006)”hindi part of speech tagging and chunking”, NLPAI machine learning contest. [6] M. Mohseni, H. Motalebi, B. Minaei-bidgoli & M. Shokrollahi-far (2008) “A farsi part-of- speech tagger based on markov”. In the proceedings of ACM symposium on Applied computing, Brazil. [7] S. Jabbari &B. Allison(2007)“Persian Part of Speech Tagging”, In the Proceedings of Workshop on Computational Approaches to Arabic Script-Based Languages (CAASL-2), USA. [8] E. Brill (1995) “Transformation-Based Error-Driven Learning and Natural Language Processing: A case Study in Part of Speech Tagging”, Computational Linguistics, USA. [9] M. Hepple (2000), ”Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based Part of-Speech Taggers”, In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL). Hong Kong. [10] T. Brants (200),“TNT – a Statistical Part-of-Speech Tagger”, In the Proceedings of 6th conference on applied natural language processing (ANLP), USA. [11] K. Megerdoomian (2004), “Developing a Persian part-of speech tagger”, In the Proceedings of first Workshop on Persian Language and computer, Iran . [12] Khoja, S.( 2001) “ APT: Arabic part-of-speech tagger”. Proceeding of the Student Workshop at the 2nd Meeting of the NAACL, (NAACL’01), Carnegie Mellon University, Pennsylvania, pp: 1- 6. http://zeus.cs.pacificu.edu/shereen/NAACL.pdf [13] Freeman A (2001), “Brill’s POS tagger and a morphology parser for Arabic”, In ACL’01 Workshop on Arabic language processing.
  • 20. [14] Maamouri M, Cieri C. (2002). “Resources for Arabic Natural Language Processing at the LDC”, Proceedings of the International Symposium on the Processing of Arabic,Tunisia, pp.125- 146. [15] Diab M., Hacioglu K. and Jurafsky D. (2004), “Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks”. proc. of HLTNAACL’04: 149–152. [16] Banko M, Moore R. C. (2004). “Part of Speech Tagging in Context”, Proc of the 20th international conference on Computational Linguistics, Switzerland. [17] Tlili-Guiassa Y. (2006) “Hybrid Method for Tagging Arabic Text”. Journal of Computer Science 2 (3): 245-248. [18] L. Young-Suk, K. Papineni & S. Roukos ( 2003), “Language Model Based Arabic Word Segmentation,” in Proceedings of the Annual Meeting on Association for Computational Linguistics, Japan, pp. 399- 406. [19] A.T Al-Taani & S. Abu-Al-Rub (2009),”A rule-based approaches for tagging non-vocalized Arabic words”. The International Arab Journal of Information Technology, Volume6 (3): 320- 328. [20] T. Brants (2000),” TnT: A statistical part of speech tagger”, Proceedings of the 6th Conference on Applied Natural Language Processing, Apr. 29- May 04, Association for Computational Linguistics Morristown, New Jersey, USA., pp: 224-231. [21] NLTK, Natural Language Toolkit. http://www.nltk.org/Home [22] Quranic Arabic Corpus: http://corpus.quran.com [23] Quran Tagset: http://corpus.quran.com/documentation/tagset.jsp [24] N. Habash & O. Rambow (2005), “Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop,” in Proceedings of the Annual Meeting on Association for Computational Linguistics, Michigan, pp. 573-580. [25] http://sibawayh.emi.ac.ma/web/s/?q=node/79 [26] http://bit.ly/16jO3Ks [27] http://www.alwatan.com/ [28] F. Al Shamsi & A.Guessoum(2006),” A Hidden Markov Model–Based POS Tagger for Arabic”, 8es Journées internationales d’Analyse statistique des Données Textuelles (JADT).
  • 21. [29] M. Albared & O.Nazlia(2010),” Automatic Part of Speech Tagging for Arabic: An Experiment Using Bigram Hidden Markov Model “,Springer-Verlag Berlin Heidelberg, LNAI 6401, pp. 361– 370. [30] Y.O. Mohamed Elhadj(2009),” Statistical Part-of-Speech Tagger for Traditional Arabic Texts”, Journal of Computer Science 5 (11): 794-800.
  • 22. HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM P H Rathod1 , M L Dhore2 , R M Dhore3 1,2Department of Computer Engineering, Vishwakarma Institute of Technology, Pune 3 Pune Vidhyarthi Griha’s College of Engineering and Technology, Pune ABSTRACT Language transliteration is one of the important areas in NLP. Transliteration is very useful for converting the named entities (NEs) written in one script to another script in NLP applications like Cross Lingual Information Retrieval (CLIR), Multilingual Voice Chat Applications and Real Time Machine Translation (MT). The most important requirement of Transliteration system is to preserve the phonetic properties of source language after the transliteration in target language. In this paper, we have proposed the named entity transliteration for Hindi to English and Marathi to English language pairs using Support Vector Machine (SVM). In the proposed approach, the source named entity is segmented into transliteration units; hence transliteration problem can be viewed as sequence labeling problem. The classification of phonetic units is done by using the polynomial kernel function of Support Vector Machine (SVM). Proposed approach uses phonetic of the source language and n-gram as two features for transliteration. KEYWORDS Machine Transliteration, n-gram, Support Vector Machine, Syllabification Full Text: https://airccse.org/journal/ijnlc/papers/2413ijnlc04.pdf Volume URL: https://airccse.org/journal/ijnlc/vol2.html
  • 23. REFERENCES [1] Padariya Nilesh, Chinnakotla Manoj, Nagesh Ajay, Damani Om P.(2008) “Evaluation of Hindi to English, Marathi to English and English to Hindi”, IIT Mumbai CLIR at FIRE. [2] Saha Sujan Kumar, Ghosh P. S, Sarkar Sudeshna and Mitra Pabitra (2008) “Named entity recognition in Hindi using maximum entropy and transliteration.” [3] BIS (1991) “Indian standard code for information interchange (ISCII)”, Bureau of Indian Standards, New Delhi. [4] Joshi R K, Shroff Keyur and Mudur S P (2003) “A Phonemic code based scheme for effective processing of Indian languages”, National Centre for Software Technology, Mumbai, 23rd Internationalization and Unicode Conference, Prague, Czech Republic, pp 1-17. [5] Arbabi M, Fischthal S M, Cheng V C and Bart E (1994) “Algorithms for Arabic name transliteration”, IBM Journal of Research and Development, pp 183-194. [6] Knight Kevin and Graehl Jonathan (1997) “Machine transliteration”, In proceedings of the 35th annual meetings of the Association for Computational Linguistics, pp 128-135. [7] Stalls Bonnie Glover and Kevin Knight (1998) “Translating names and technical terms in Arabic text.” [8] Al-Onaizan Y, Knight K (2002) “Machine translation of names in Arabic text”, Proceedings of the ACL conference workshop on computational approaches to Semitic languages. [9] Jaleel Nasreen Abdul and Larkey Leah S. (2003) “Statistical transliteration for English- Arabic cross language information retrieval”, In Proceedings of the 12th international conference on information and knowledge management, pp 139 – 146. [10] Jung S. Y., Hong S., S., Paek E.(2003) “English to Korean transliteration model of extended Markov window”, In Proceedings of the 18th Conference on Computational Linguistics, pp 383– 389. [11] Ganapathiraju M., Balakrishnan M., Balakrishnan N., Reddy R. (2005) “OM: One Tool for Many (Indian) Languages”, ICUDL: International Conference on Universal Digital Library, Hangzhou. [12] Malik M G A (2006) “Punjabi Machine Transliteration”, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, pp 1137–1144. [13] Sproat R.(2002) “Brahmi scripts, In Constraints on Spelling Changes”, Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands. [14] Sproat R.(2003) “A formal computational analysis of Indic scripts”, In International Symposium on Indic Scripts: Past and Future, Tokyo.
  • 24. [15] Sproat R.(2004) “A computational theory of writing systems, In Constraints on Spelling Changes”, Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands. [16] Kopytonenko M. , Lyytinen K. , and Krkkinen T.(2006) “Comparison of phonological representations for the grapheme-to-phoneme mapping, In Constraints on Spelling Changes”, Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands. [17] Ganesh S, Harsha S, Pingali P, and Verma V (2008) “Statistical transliteration for cross language information retrieval using HMM alignment and CRF”, In Proceedings of the Workshop on CLIA, Addressing the Needs of Multilingual Societies. [18] Sumaja Sasidharan, Loganathan R, and Soman K P (2009) “English to Malayalam Transliteration Using Sequence Labeling Approach” International Journal of Recent Trends in Engineering, Vol. 1, No. 2, pp 170-172 [19] Oh Jong-Hoon, Kiyotaka Uchimoto, and Kentaro Torisawa (2009) “Machine transliteration using target-language grapheme and phoneme: Multi-engine transliteration approach”, Proceedings of the Named Entities Workshop ACL-IJCNLP Suntec, Singapore,AFNLP, pp 36– 39 [20] Antony P.J, Soman K.P (2010) “Kernel Method for English to Kannada Transliteration”, Conference on Machine Learning and Cybernetics, pp 11-14 [21] Ekbal A. and Bandyopadhyay S. (2007) “A Hidden Markov Model based named entity recognition system: Bengali and Hindi as case studies”, Proceedings of 2nd International conference in Pattern Recognition and Machine Intelligence, Kolkata, India, pp 545–552. [22] Ekbal A. and Bandyopadhyay S. (2008) “Bengali named entity recognition using support vector machine”, In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian languages, Hyderabad, India, pp 51–58. [23] Ekbal A. and Bandyopadhyay S. (2008), “Development of Bengali named entity tagged corpus and its use in NER system”, In Proceedings of the 6th Workshop on Asian Language Resources. [24] Ekbal A. and Bandyopadhyay S. (2008) “A web-based Bengali news corpus for named entity recognition”, Language Resources & Evaluation, vol. 42, pp 173–182. [25] Ekbal A. and Bandyopadhyay S.(2008) “Improving the performance of a NER system by postprocessing and voting”, In Proceedings of Joint IAPR International Workshop on Structural Syntactic and Statistical Pattern Recognition, Orlando, Florida, pp 831–841. [26] Ekbal A. and Bandyopadhyay S.(2009) “Bengali Named Entity Recognition using Classifier Combination”, In Proceedings of Seventh International Conference on Advances in Pattern Recognition, pp 259–262.
  • 25. [27] Ekbal A. and Bandyopadhyay S. (2009) “Voted NER system using appropriate unlabelled data”, In Proceedings of the Named Entities Workshop, ACL-IJCNLP. [28] Ekbal A. and Bandyopadhyay S. (2010) “ Named entity recognition using appropriate unlabeled data, post-processing and voting”, In Informatica, Vol 34, No. 1, pp 55-76. [29] Chinnakotla Manoj K., Damani Om P., and Satoskar Avijit (2010) “Transliteration for ResourceScarce Languages”, ACM Trans. Asian Lang. Inform,Article 14, pp 1-30. [30] Kishorjit Nongmeikapam (2012) “Transliterated SVM Based Manipuri POS Tagging”, Advances in Computer Science and Engineering and Applications, pp 989-999 [31] K.P.Sonam, V. Ajay, R. Laganatha.(2009) “Machine Learning with SVM and Other Kernel Methods”, Machine Learning Book, PHI. [32] Koul Omkar N. (2008) “Modern Hindi Grammar”, Dunwoody Press [33] Walambe M. R. (1990) “Marathi Shuddalekhan”, Nitin Prakashan, Pune [34] Walambe M. R. (1990) “Marathi Vyakran”, Nitin Prakashan, Pune [35] Dhore M L, Dixit S K and Dhore R M (2012) “Hindi and Marathi to English NE Transliteration Tool using Phonology and Stress Analysis”, 24th International Conference on Computational Linguistic,s Proceedings of COLING Demonstration Papers, at IIT Bombay, pp 111-118
  • 26. HYBRID APPROACHES FOR AUTOMATIC VOWELIZATION OF ARABIC TEXTS Mohamed Bebah1 Chennoufi Amine2 Mazroui Azzeddine3 and Lakhouaja Abdelhak4 1Arab Center for Research and Policy Studies, Doha, Qatar 2 Faculty of Sciences/University Mohamed I, Oujda, Morocco 3 Faculty of Sciences/University Mohamed I, Oujda, Morocco 4 Faculty of Sciences/University Mohamed I, Oujda, Morocco ABSTRACT Hybrid approaches for automatic vowelization of Arabic texts are presented in this article. The process is made up of two modules. In the first one, a morphological analysis of the text words is performed using the open source morphological Analyzer AlKhalil Morpho Sys. Outputs for each word analyzed out of context, are its different possible vowelizations. The integration of this Analyzer in our vowelization system required the addition of a lexical database containing the most frequent words in Arabic language. Using a statistical approach based on two hidden Markov models (HMM), the second module aims to eliminate the ambiguities. Indeed, for the first HMM, the unvowelized Arabic words are the observed states and the vowelized words are the hidden states. The observed states of the second HMM are identical to those of the first, but the hidden states are the lists of possible diacritics of the word without its Arabic letters. Our system uses Viterbi algorithm to select the optimal path among the solutions proposed by Al Khalil Morpho Sys. Our approach opens an important way to improve the performance of automatic vowelization of Arabic texts for other uses in automatic natural language processing. KEYWORDS Arabic language, Automatic vowelization, morphological analysis, hidden Markov model, corpus Full Text: https://airccse.org/journal/ijnlc/papers/3414ijnlc04.pdf Volume URL: https://airccse.org/journal/ijnlc/vol3.html
  • 27. REFERENCE [1] Debili, Fathi & Hadhemi Achour (1998) Voyellation automatique de l’arabe. In Proceedings of the workshop on Computation approaches to Semitic languages, COLING-ACL ’98, pages 42–49. [2] Maamouri, Mohamed, Ann Bies, and Seth Kulick. (2006) Diacritization: a challenge to Arabic treebank annotation and parsing. In Proceedings of the British Computer Society Arabic NLP/MT Conference. [3] Zitouni, Imed, Jefrey S. Sorensen, and Ruhi Sarikaya. (2006) Maximum entropy based restoration of arabic diacritics. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Workshop on Computational approaches to Semitic Languages, Sydney, Australia. July 2006, pages 577– 584. [4] Vergyri, Dimitra & Katrin Kirchhoff. (2004) Automatic diacritization of arabic for acoustic modeling in speech recognition. In Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages. COLING, Geneva, pages 66–73. [5] Messaoudi, Abdel, Lori Lamel, and Jean-Luc Gauvain. (2004) The limsi rt04 b arabic system. In Proceedings DARPA RT04, Palisades NY. [6] Elshafei, Moustafa, Husni Al-Muhtaseb, and Mansour Alghamdi. (2006) Machine generation of arabic diacritical marks. In The 2006 World Congress in Computer Science Computer Engineering, and Applied Computing. Las Vegas, USA., pages 128–133. [7] Emam, Ossama and Volker Fischer. (2005) Hierarchical approach for the statistical vowelization of arabic text. Technical report, IBM Corporation Intellectual Property Law, Austin, TX, US. [8] Schlippe, Tim, ThuyLinh Guyen, and ThuyLinh Vogel. (2008) Diacritization as a machinetranslation problem and as a sequence labeling problem. In 8th AMTA conference, Hawai., pages 21–25. [9] Gal, Yaakov. (2002) An hmm approach to vowel restoration in arabic and hebrew. In Proceedings of the Workshop on Computational Approaches to Semitic Languages- Philadelphia- Association for Computational Linguistics, pages 27–33. [10] Nelken, Rani and Stuart M. Shieber. (2005) Arabic diacritization using weighted finite-state transducers. In Proceedings of the ACL 2005 Workshop On Computational Approaches To Semitic Languages, Ann Arbor, Michigan, USA,, pages 79–86. [11] Habash, Nizar and Owen Rambow. (2007) Arabic diacritization through full morphological tagging. In Proceeding NAACL-Short ’07 Human Language Technologies 2007: The
  • 28. Conference of the North American Chapter of the Association for Computational Linguistics - Companion Volume - Short Papers Rochester - New York- USA, pages 53–56. [12] Bebah, Mohamed Ould Abdallahi Ould, Abdelouafi Meziane, Azzeddine Mazroui, and Abdelhak Lakhouaja. (2012) Approche morpho-statistique pour la voyellation des texts arabes. Journal of Computer Science and Engineering, 5(1). [13] Bebah, Mohamed Ould Abdallahi Ould, Abdelouafi Meziane, Azzeddine Mazroui, and Abdelhak Lakhouaja. (2011) Alkhalil morpho sys. In 7th International Computing Conference in Arabic, May 31- June 2, 2011, Riyadh, Saudi Arabia. [14] El-Sadany, T and M Hashish. (1988) Semi-automatic vowelization of arabic verbs. In 10th NC Conference, Jeddah, Saudi Arabia. [15] Manning, Chris and Hinrich Schutze. (1999) Foundations of statistical natural language processing. Massachusetts Institute of Technology Press - Library of Congress Cataloging in publication Information. [16] Deltour, Amelie. (2003) Methodes statistiques pour la voyellisation des texts arabes. Master’s thesis, ENSIMAG-Karlsruhe University. [17] Mansour, Alghamdi, Muhammad Khursheed, Mustafa Elshafei, Fayz Alhargan, Muhammed Alkanhal, Abu Aus Alshamsan, Saad Alqahtani, Syed Zeeshan Muzaffar, Yasser Altowim, Adnan Yusuf, and Husni Almuhtasib. 2006. Automatic arabic text diacritizer-final report ci 25 02. Technical report, KING ABDUL AZIZ CITY FOR SCIENCE AND TECHNOLOGY KACST. [18] Rashwan, Mohsen, Mohammad Al-Badrashiny, Mohamed Attia, and Sherif M. Abdou. 2009. A hybrid system for automatic arabic diacritization. In Natural Language Processing and Knowledge Engineering. NLP-KE 2009 Cairo, Egypt,, pages 1–8. [19] Buckwalter, Tim. 2004. Arabic morphological analyzer version 2.0 - ldc2004l02. In Linguistic Data Consortium, University of Pennsylvania, 2002. LDC Cat alog No.: LDC2004L02, ISBN 1- 58563-324-0. [20] Abbas, Mourad and Kamel Smaili. 2005. Comparison of topic identification methods for Arabic language. In the International conference RANLP05 Recent Advances in Natural Language Processing, Borovets Bulgary, pages 21–23. [21] Rafalovitch, Alexandre and Robert Dale. 2009. United nations general assembly resolutions: a sixlanguage parallel corpus. In Proceedings of the MT Summit XII, Ottawa, Canada,, pages 292–299. [22] Atiyya, Muhammad, Khalid Choukri, and Mustafa Yaseen. 2005. Specifications of the Arabic written corpus produced within the nemlar project. Technical report, NEMLAR, Center for Sprogteknologi.
  • 29. [23] Neuhoff, D.L. 1975. The viterbi algorithm as an aid in text recognition. IEEE Transaction on Information Theory, pages 222–226. [24] Hifni, Yasser. 2012. Smoothing techniques for arabic diacritics restoration. In Proceedings of the Twelfth Conference on Language Engineering (ESOLEC’12).
  • 30. AN UNSUPERVISED APPROACH TO DEVELOP STEMMER Mohd. Shahid Husain Department of Information Technology, Integral University, Lucknow ABSTRACT This paper presents an unsupervised approach for the development of a stemmer (For the case of Urdu & Marathi language). Especially, during last few years, a wide range of information in Indian regional languages has been made available on web in the form of e-data. But the access to these data repositories is very low because the efficient search engines/retrieval systems supporting these languages are very limited. Hence automatic information processing and retrieval is become an urgent requirement. To train the system training dataset, taken from CRULP [22] and Marathi corpus [23] are used. For generating suffix rules two different approaches, namely, frequency based stripping and length based stripping have been proposed. The evaluation has been made on 1200 words extracted from the Emille corpus. The experiment results shows that in the case of Urdu language the frequency based suffix generation approach gives the maximum accuracy of 85.36% whereas Length based suffix stripping algorithm gives maximum accuracy of 79.76%. In the case of Marathi language the systems gives 63.5% accuracy in the case of frequency based stripping and achieves maximum accuracy of 82.5% in the case of length based suffix stripping algorithm. KEYWORDS Stemming, Morphology, Urdu stemmer, Marathi stemmer, Information retrieval Full Text: https://airccse.org/journal/ijnlc/papers/1212ijnlc02.pdf Volume URL: https://airccse.org/journal/ijnlc/vol1.html
  • 31. REFERENCES [1] Rizvi, J et. al. “Modeling case marking system of Urdu-Hindi languages by using semantic information”. Proceedings of the IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE '05). 2005. [2] Butt, M. King, T. “Non-Nominative Subjects in Urdu: A Computational Analysis”. Proceedings of the International Symposium on Non-nominative Subjects, Tokyo, December, pp. 525-548, 2001. [3] Savoy, J. “Stemming of French words based on grammatical categories”. Journal of the American Society for Information Science, 44(1), 1-9, 1993. [4] Lovins Julie Beth: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11:22–31. (1968) [5] Mokhtaripour, A., Jahanpour, S. “Introduction to a New Farsi Stemmer”. Proceedings of CIKM Arlington VA, USA, 826-827, 2006. [6] R. Wicentowski. "Multilingual Noise-Robust Supervised Morphological Analysis using the Word Frame Model." In Proceedings of Seventh Meeting of the ACL Special Interest Group on Computational Phonology (SIGPHON), pp. 70-77, 2004. [7] Rizvi, Hussain M. “Analysis, Design and Implementation of Urdu Morphological Analyzer”. SCONEST, 1-7, 2005. [8] Krovetz, R. “View Morphology as an Inference Process”. In the Proceedings of 5th International Conference on Research and Development in Information Retrieval, 1993. [9] Porter, M. “An Algorithm for Suffix Stripping”. Program, 14(3): 130-137, 1980. [10] Thabet, N. “Stemming the Qur’an”. In the Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, 2004. [11] Paik, Pauri. “A Simple Stemmer for Inflectional Languages”. FIRE 2008. [12] Sharifloo, A.A., Shamsfard M. “A Bottom up Approach to Persian Stemming”. IJCNLP, 2008 [13] Croft and Xu. “Corpus-Based Stemming Using Co occurrence of Word Variants”. ACM Transactions on Information Systems (61-81), 1998. [14] Kumar, A. and Siddiqui, T. “An Unsupervised Hindi Stemmer with Heuristics Improvements”. In Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, 2008. [15] Kumar, M. S. and Murthy, K. N. “Corpus Based Statistical Approach for Stemming Telugu”. Creation of Lexical Resources for Indian Language Computing and Processing (LRIL), C-DAC, Mumbai, India, 2007.
  • 32. [16] Qurat-ul-Ain Akram, Asma Naseer, Sarmad Hussain. “Assas-Band, an Affix-Exception-List Based Urdu Stemmer”. Proceedings of ACL-IJCNLP 2009. [17] http://en.wikipedia.org/wiki/Urdu [18] http://www.bbc.co.uk/languages/other/guide/urdu/steps.shtml [19] http://www.andaman.org/BOOK/reprints/weber/rep-weber.htm [20] Natural Language processing and Information Retrieval by Tanveer Siddiqui, U S Tiwary. [21] Information retrieval: data structure and algorithms by William B. Frakes, Ricardo Baeza- Yates. [22] http://www.crulp.org/software/ling_resources.htm [23] Marathi Corpus, http://www.cfilt.iitb.ac.in/marathi_Corpus/ , IIT Powai, Mumbai
  • 33. WORD SENSE DISAMBIGUATION USING WSD SPECIFIC WORDNET OF POLYSEMY WORDS Udaya Raj Dhungana1, Subarna Shakya2 , Kabita Baral3 and Bharat Sharma4 1, 2, 4Department of Electronics and Computer Engineering, Central Campus, IOE, Tribhuvan University, Lalitpur, Nepal 3Department of Computer Science, GBS, Lamachaur, Kaski, Nepal ABSTRACT This paper presents a new model of WordNet that is used to disambiguate the correct sense of polysemy word based on the clue words. The related words for each sense of a polysemy word as well as single sense word are referred to as the clue words. The conventional WordNet organizes nouns, verbs, adjectives and adverbs together into sets of synonyms called synsets each expressing a different concept. In contrast to the structure of WordNet, we developed a new model of WordNet that organizes the different senses of polysemy words as well as the single sense words based on the clue words. These clue words for each sense of a polysemy word as well as for single sense word are used to disambiguate the correct meaning of the polysemy word in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms. The clue word can be a noun, verb, adjective or adverb. KEYWORDS Word Sense Disambiguation, WordNet, Polysemy Words, Synset, Hypernymy, Context word, Clue Words Full Text: https://airccse.org/journal/ijnlc/papers/3414ijnlc05.pdf Volume URL: https://airccse.org/journal/ijnlc/vol3.html
  • 34. REFERENCES [1] N. Ide and J. Véronis, “Word sense disambiguation: The state of the art,” Computational Linguistics, pp. 1–40, 1998. [2] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller, “Introduction to wordnet: An on-line lexical database,” International Journal of Lexicography, 1998. [3] U. R. Dhungana and S. Shakya, “Word sense disambiguation in nepali language,” in The Fourth International Conference on Digital Information and Communication Technology and Its Application (DICTAP2014), Bangkok, Thailand, 2014, pp. 46–50. [4] M. E. Lesk, “Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone,” in SIGDOC Conference, Toronto, Ontario, Canada, 1986. [5] S. Banerjee and T. Pedersen, “An adapted lesk algorithm for word sense disambiguation using wordnet,” in Third International Conference on Intelligent Text Processing and Computational Linguistics, Gelbukh, 2002. [6] M. Sinha, M. K. Reddy, P. Bhattacharyya, P. Pandey, and L. Kashyap, “Hindi word sense disambiguation,” Master’s thesis, Indian Institute of Technology Bombay, Mumbai, India, 2004. [7] N. Shrestha, A. V. H. Patrick, and S. K. Bista, “Resources for nepali word sense disambiguation,” in IEEE International conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE’08), Beijing, China, 2008. [8] P. Bhattacharyya, P. Pande, and L. Lupu, “Hindi wordnet,” Indian Institute of Technology Bombay, Mumbai, India, Tech. Rep., 2008. [9] N. Shrestha, A. V. H. Patrick, and S. K. Bista, “Nepali word sense disambiguation using lesk algorithm,” Master’s thesis, Kathmandu University, Dhulikhel, Kavre, Nepal, 2004.