SlideShare a Scribd company logo
1 of 4
Download to read offline
International Journal of Informatics and Communication Technology (IJ-ICT)
Vol.8, No.1, April 2019, pp. 25~28
ISSN: 2252-8776, DOI: 10.11591/ijict.v8i1.pp25-28  25
Journal homepage: http://iaescore.com/journals/index.php/IJICT
A survey of Arabic text classification models
Ahed M. F. Al Sbou
Department of Computer Science, Faculty of Information Technology, Al Hussein Bin Talal University, Jordan
Article Info ABSTRACT
Article history:
Received Sep 30, 2018
Revised Dec 16, 2018
Accepted Jan 11, 2019
There is a huge content of Arabic text available over online that requires an
organization of these texts. As result, here are many applications of natural
languages processing (NLP) that concerns with text organization. One of the
is text classification (TC). TC helps to make dealing with unorganized text.
However, it is easier to classify them into suitable class or labels. This paper
is a survey of Arabic text classification. Also, it presents comparison among
different methods in the classification of Arabic texts, where Arabic text is
represented a complex text due to its vocabularies. Arabic language is one of
the richest languages in the world, where it has many linguistic bases. The
research in Arabic language processing is very few compared to English. As
a result, these problems represent challenges in the classification, and
organization of specific Arabic text. Text classification (TC) helps to access
the most documents, or information that has already classified into specific
classes, or categories to one or more classes or categories. In addition,
classification of documents facilitate search engine to decrease the amount of
document to, and then to become easier to search and matching with queries.
Keywords:
Arabic language processing
Arabic text categorization
Arabic text mining
Classification algorithms
Clustering algorithms
Natural languages processing
Text classification
Copyright © 2019 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Ahed M. F. Al Sbou,
Department of Computer Science, Faculty of Information Technology,
Al Hussein Bin Talal University,
Rawdat Al-Amir Rashid, Ma'an, Jordan.
Email: ahed_alsbou@ahu.edu.jo
1. INTRODUCTION
The Arabic language is one of the most common languages with more than 420 million speakers
over the world. Unlike English, Arabic doesn’t have upper cases . It also differs from other natural languages
due to the presence of diacritics which represent a small vowel letters such as “fatha, kasra, damma, sukun,
shadda, and tanween”. The Arabic language's orthographic system is based on diacritics effect, where each
specific type of diacritics produces different words with different meanings. This language has specific letters
known as Arabic vowels (waw, yaa, alf) that require a special system of morphology and grammars. What
also distinguishes Arabic is the huge amount of vocabularies and concepts [1].
Although the Arabic texts are viewed as the most difficult ones, there are few studies on the
processing of Arabic texts for reasons related to the linguistic characteristics of the Arabic language Due to
the strict linguistic characteristics of Arabic texts and the limitations of studies on processing it [2], this study
will deal with a variety of NLP applications that have recently emerged to manipulate languages such as
Arabic, English, and Urdu. One of these applications is text classification (TC), which aims to make a set of
documents from unstructured documents. This structured set of texts includes a description of the content of
documents. TC is a process of classifying the textual document into groups based on subject’s similarity or
other features [3].
This paper is organized as follows. The present section offers a brief introduction to the topic and
the design of the paper. The second section briefly describes the related works in the area of Arabic texts
classification. The third section displays the Arabic text challenges. The fourth section provides a brief
 ISSN: 2252-8776
IJ-ICT Vol. 8, No. 1, April 2019 : 25–28
26
explanation about the common techniques and algorithms used in Arabic texts classification. The last section
provides a summary of the paper and suggests future work.
2. RESEARCH METHOD
There are many classification Algorithms that have been applied to Arabic Texts. In their
application of Naive Bayes (NB) algorithm to classify 1500 Arabic text documents, El-Kourdi et al find five
major categories whose results indicated that the accuracy was around 68.78% [4]. Sawaf et al conducted
another study based on collecting data from the Arabic NEWSWIRE corpus by using statistical methods. The
results were 62.7% [5].
El-Halees et al also classified 300 Arabic documents by applying different algorithms such as vector
space model (VSM), K-Nearest Neighbor algorithms (KNN), and Naïve Bayes (NB). The accuracy of the
classification was 74.41% [6]. The same accuracy was obtained in Al-Zoghby's study which includes
CHARM algorithm to classify Arabic text documents from 5524 records [7].
Other studies applied by Mesleh to classify Arabic document through using Support Vector
Machine (SVMs) with Chi Square feature. He conducted an experimental study on 1445 online Arabic
corpus that involves Al-Nahar, Al-hayat, Al-Jazeera, Al-Ahram, and Al-Dostor to be classified into 9
categories. The F-measure result was 88.11% [8].
Harrag et al developed Arabic TCs through using Hybrid approach with tree algorithm factor to
select the features. The data was collected from several Arabian scientific encyclopedia in many fields. The
accuracy was 91% and 93% for literary and scientific corpus, respectively [9].
3. ARABIC TEXT CHALLENGES
Natural languages such as Arabic language have been processed through different methods. This
language which has several textual features requires a specific categorical environment of its morphology,
concepts, and ontology. Arabic language is one of the most complex natural languages. It comprises 28
characters [1]. The characters in this language are written in different forms based on their positions in the
word. The characters may come in the front, middle, or last part of the word [10]. TC seeks to collect similar
documents into specific categories that assign the categories of Arabic texts, and manipulate the relative
categories that have been produced from other text classifications [11].
The system of retrieving information from the large amount of Arabic texts accessible on the web is
very challenging. The retrieval task of query to all relevant documents is very important to the users, too.
Therefore, the TC to access the different categories makes the processes of query easier and then can attain
the information needed from them [12]. Further, Arabic texts include some problematic issues due to the
nature of language. To the best of my knowledge, the studies on Arabic language are very limited, in which
there is a lack of Arabic corpus, language tools, and comprehensive studies on preprocessing Arabic texts.
All these problems refer to diverse areas of challenges to categorize the specific Arabic textual data into a
closed category
4. TEXT CLASSIFICATION
TC includes different phases. The first phase starts from preprocessing the text to remove the
punctuations, stop words, and normalization. The second and third phases include TC and evaluating the
classified text [7, 11, 13]. TC is the best mechanism to manage and organize the data. It helps machine to
access the data categories and text labels using predefined process [14, 15]. This mechanism can be used to
classify a group of documents into kinds of documents using several features such as contents, authors, or
publisher [16]. The core goal of TC is to convert unstructured text into organized or structured that can be
used in different NLP applications such as summarization or retrieval [10, 17].
There are two methods utilized in TC: machine learning in which the text can be classified by using
a set of training documents, and rule-based TC which allows the usage of experts, or engineer's knowledge to
classify the text [18]. Furthermore, the TC can be used in several applications of computer science such as
spam or e-mail filtering, or as an accessible tool for interesting information in particular documents [4, 9].
4.1. Common models, and algorithms of Arabic text classification
Different algorithms are used to classify the Arabic documents. In this section, we will focus on the
following models: Naïve Bayesian algorithm (NB), K-Nearest Neighbor algorithm (KNN), Support Vector
Model (SVM), Artificial Neural Network (ANN).
IJ-ICT ISSN: 2252-8776 
A Survey of Arabic Text Classification Models (Ahed M. F. Al Sbou)
27
4.1.1 Naïve Bayesian algorithm
NB is a machine learning technique used to classify text into predefined categories based on the
similar features. NB has been applied to improve the processing and manipulating of texts or information
from different sources. This algorithm represents a probabilistic method. In other words, NB classifier
assumes that the absence of class feature is unrelated to the absence of other features. NB is commonly used
to classify documents due to that is given a good performance in classification, NB computes the probability
of documents that related to classify them into different classes, and then assigns them to the specific class
with the highest probability [19].
Like many other models, NB has numerous advantages. It is generally considered the most powerful
model used in this field. NB is understandable and very simple in implementation. As for the disadvantages,
NB suffers some limitations such as it needs occurrence of class, because depends on probability, whereas
the probability in usually depends on frequency
4.1.2 K-Nearest neighbor algorithms
K-Nearest neighbor (KNN) is another type of machine learning algorithm of TC. It represents a non-
parametric technique to classify documents or objects depend on closed class or training feature. It includes k
value that is always a positive value; KNN the object has been classified to close neighbor class. KNN
attempts to classify the object that is most vote of its neighbor [16].
KNN is possible when training data is large, and very large. Yet, there are some disadvantages of
KNN including the necessity of defining k-parameter value where k represents a nearest neighbor’s number.
Also, using this model is so expensive in comparison with other algorithms [20]
4.1.3 Support vector model (SVM)
VSM is one of the supervised learning models that have been applied for TC. It classifies the
different objects and documents into a finite dimensional space. VSM is also used to analyze data, texts, and
documents in order to compute the similarity among them [21].
VSM shows different helpful aspects as an important model used in computer science. First, this
method defends on a linear algebra, where it doesn't contain any complex algebra equation [8]. The other
advantage is the efficiency of weights ascribed to concepts or terms. This model also shows a special sense of
ease in comparison with other methods. It makes the machine compute the similarity among documents [22].
However, VSM contains some limitations that prevent some researchers to use it. The difficulty of using
synonyms in Arabic represents a massive challenging area, where Arabic language has many synonyms for
each word, or concept. Other limitations that it’s assume that the terms are statistically independent. While
most of Arabic terms have a strong relationship with other terms
4.1.4 Artificial Neural Network
ANN is one of machine learning of information processing likes human brain. It has been applied in
different computer areas such as classification and pattern recognition. It consists of a set of inputs and
adaptive weight, non-linear function, and outputs [23].
Different advantages and disadvantages are worth mentioning for using artificial neural networks. It
represents one of the easy models to use. Also, it is usually appropriate for complex problems or large texts.
The disadvantages of this model include the less recommendation of using with simpler solutions or small
texts. This method also needs loading the training data.
5. CONCLUSION AND FUTURE WORK
This paper is a survey of the importance of TC, as well as the current methods used in NLP field. In
this research, we have discussed the traditional TC models that are used to classify the Arabic texts, corpus,
and documents into different categories.
The future work needs more efforts to build and develop a new standard model of Arabic TC. This
model must be more efficient than the current traditional methods. The other important task that need to
improvement in this model is language dialects. In other words, due to different Arabic dialects this model
must be compatible with these Arabic language dialects. also, it can be applied in any Arabic texts.
ACKNOWLEDGEMENTS
We would like to thank Al Hussein bin Talal University (AHU) for providing us a good scientific
environment to produce this simple work.
 ISSN: 2252-8776
IJ-ICT Vol. 8, No. 1, April 2019 : 25–28
28
REFERENCES
[1] Duwairi, R. M. Arabic text categorization. Int. Arab J. Inf. Technol, 2007. 4(2), 125-132.
[2] Fauzi, M. A., Arifin, A. Z., & Yuniarti, A. Arabic Book Retrieval using Class and Book Index Based Term
Weighting. International Journal of Electrical and Computer Engineering (IJECE), 2017. 7(6), 3705-3711.
[3] El-Halees, A. Mining Arabic association rules for text classification. Paper presented at the Proceedings of the first
international conference on Mathematical Sciences. Al-Azhar University of Gaza, Palestine. 2006.
[4] Caballero, Y., Bello, R., Alvarez, D., & Garcia, M. M. Two new feature selection algorithms with Rough Sets
Theory. Paper presented at the IFIP International Conference on Artificial Intelligence in Theory and Practice. 2006.
[5] El Kourdi, M., Bensaid, A., & Rachidi, T.-e. Automatic Arabic document categorization based on the Naïve Bayes
algorithm. Paper presented at the Proceedings of the Workshop on Computational Approaches to Arabic Script-
based Languages. 2004.
[6] Al-Zoghby, A., Eldin, A. S., Ismail, N. A., & Hamza, T. Mining Arabic text using soft-matching association rules.
Paper presented at the Computer Engineering & Systems, 2007. ICCES International Conference on. 2007.
[7] Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M., & Al-Rajeh, A. Automatic Arabic text classification.
2008.
[8] Mesleh, A. Support vector machines based Arabic language text classification system: feature selection comparative
study Advances in Computer and Information Sciences and Engineering. 2008. (pp. 11-16): Springer.
[9] Dharmadhikari, S. C., Ingle, M., & Kulkarni, P. Empirical studies on machine learning based text classification
algorithms. Advanced Computing, 2011. 2(6), 161.
[10] Khan, A., Baharudin, B., Lee, L. H., & Khan, K. A review of machine learning algorithms for text-documents
classification. Journal of advances in information technology, 2010. 1(1), 4-20.
[11] Ababneh, J., Almomani, O., Hadi, W., El-Omari, N. K. T., & Al-Ibrahim, A. Vector space models to classify Arabic
text. International Journal of Computer Trends and Technology (IJCTT), 2014. 7(4), 219-223.
[12] Mesleh, A. Chi square feature extraction based svms Arabic language text categorization system. Journal of
Computer Science, 2007. 3(6), 430-435.
[13] Khorsheed, M. S., & Al-Thubaity, A. O. Comparative evaluation of text classification techniques using a large
diverse Arabic dataset. Language resources and evaluation, 2013. 47(2), 513-538.
[14] Sebastiani, F. Machine learning in automated text categorization. ACM computing surveys (CSUR), 2002. 34(1), 1-
47.
[15] Khreisat, L. A machine learning approach for Arabic text classification using N-gram frequency statistics. Journal
of Informetrics, 2009. 3(1), 72-77.
[16] Alaa, E. A comparative study on Arabic text classification. Egypt. Comput. Sci. J, 2008. 2.
[17] Mesleh, A. Support Vector Machine Text Classifier for Arabic Articles: Ant Colony Optimization-Based Feature
Subset Selection. The Arab Academy for Banking and Financial Sciences. 2008.
[18] Sebastiani, F. Text categorization Encyclopedia of Database Technologies and Applications, 2005. (pp. 683-687):
IGI Global.
[19] Abu-Errub, A. Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements. International
Journal of Computer Applications, 2014. 93(6).
[20] Al-Shalabi, R., Kanaan, G., & Gharaibeh, M. Arabic text categorization using KNN algorithm. Paper presented at
the the Proc. of Int. multi conf. on computer science and information technology CSIT06. 2006.
[21] Jackson, P., & Moulinier, I. Natural language processing for online applications: Text retrieval, extraction and
categorization. 2007. (Vol. 5): John Benjamins Publishing.
[22] Sawaf, H., Zaplo, J., & Ney, H. Statistical classification methods for Arabic news articles. Natural Language
Processing in ACL 2001, Toulouse, France. 2001.
[23] Harrag, F., El-Qawasmeh, E., & Pichappan, P. Improving Arabic text categorization using decision trees. Paper
presented at the Networked Digital Technologies, 2009. NDT'09. First International Conference on. 2009.
BIOGRAPHY OF AUTHOR
Ahed Al-Sbou is a lecturer in the Information Technology School of Computer Science at the
University of Al-Hussein Bin Talal where he has been a faculty member since 2014. He holds the
master degree in computer science.
Ahed completed his Master degree from Al-Balqa Applied University, Salt, Jordan in 2012 and
his B.S. degree in computer science from Al-Hussein Bin Talal University, Ma'an, Jordan in 2006.
His research interests lie in computer science are in the area of programming languages, ranging
from theory to design to implementation, Data base, Data Mining, Natural languages Processing
(NLP), and information systems.
Ahed has worked as a computer lab supervisor (2006-2014) at Al-Hussein Bin Talal University

More Related Content

What's hot

Hybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic textsHybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic textsijnlc
 
Exploring language technologies to provide support to WCAG 2.0 and E2R guidel...
Exploring language technologies to provide support to WCAG 2.0 and E2R guidel...Exploring language technologies to provide support to WCAG 2.0 and E2R guidel...
Exploring language technologies to provide support to WCAG 2.0 and E2R guidel...Grupo HULAT
 
Exploring the effects of stemming on
Exploring the effects of stemming onExploring the effects of stemming on
Exploring the effects of stemming onijaia
 
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGEMULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGEcsandit
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrievaldannyijwest
 
Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approach
Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid ApproachPunjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approach
Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approachcscpconf
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalamijcsit
 
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONEFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONIJDKP
 
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATANAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATAijnlc
 
Language Identifier for Languages of Pakistan Including Arabic and Persian
Language Identifier for Languages of Pakistan Including Arabic and PersianLanguage Identifier for Languages of Pakistan Including Arabic and Persian
Language Identifier for Languages of Pakistan Including Arabic and PersianWaqas Tariq
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textijnlc
 
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...CSCJournals
 
The classification of the modern arabic poetry using machine learning
The classification of the modern arabic poetry using machine learningThe classification of the modern arabic poetry using machine learning
The classification of the modern arabic poetry using machine learningTELKOMNIKA JOURNAL
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMijnlc
 
Arabic tweets categorization
Arabic tweets categorizationArabic tweets categorization
Arabic tweets categorizationcsandit
 
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUEIMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUEcsandit
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indianeSAT Publishing House
 

What's hot (18)

A SURVEY ON VARIOUS CLIR TECHNIQUES
A SURVEY ON VARIOUS CLIR TECHNIQUESA SURVEY ON VARIOUS CLIR TECHNIQUES
A SURVEY ON VARIOUS CLIR TECHNIQUES
 
Hybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic textsHybrid approaches for automatic vowelization of arabic texts
Hybrid approaches for automatic vowelization of arabic texts
 
Exploring language technologies to provide support to WCAG 2.0 and E2R guidel...
Exploring language technologies to provide support to WCAG 2.0 and E2R guidel...Exploring language technologies to provide support to WCAG 2.0 and E2R guidel...
Exploring language technologies to provide support to WCAG 2.0 and E2R guidel...
 
Exploring the effects of stemming on
Exploring the effects of stemming onExploring the effects of stemming on
Exploring the effects of stemming on
 
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGEMULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrieval
 
Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approach
Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid ApproachPunjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approach
Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approach
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalam
 
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONEFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
 
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATANAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
 
Language Identifier for Languages of Pakistan Including Arabic and Persian
Language Identifier for Languages of Pakistan Including Arabic and PersianLanguage Identifier for Languages of Pakistan Including Arabic and Persian
Language Identifier for Languages of Pakistan Including Arabic and Persian
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
 
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
 
The classification of the modern arabic poetry using machine learning
The classification of the modern arabic poetry using machine learningThe classification of the modern arabic poetry using machine learning
The classification of the modern arabic poetry using machine learning
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
 
Arabic tweets categorization
Arabic tweets categorizationArabic tweets categorization
Arabic tweets categorization
 
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUEIMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 

Similar to Survey of Arabic Text Classification Models Using Machine Learning

Extracting numerical data from unstructured Arabic texts(ENAT)
Extracting numerical data from unstructured Arabic  texts(ENAT)Extracting numerical data from unstructured Arabic  texts(ENAT)
Extracting numerical data from unstructured Arabic texts(ENAT)nooriasukmaningtyas
 
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer IJCSIS Research Publications
 
Classification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four ClassifiersClassification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four ClassifiersIJCSIS Research Publications
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...gerogepatton
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...gerogepatton
 
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)ijnlc
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodijcsit
 
Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCar-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCSCJournals
 
Diacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemDiacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemCSCJournals
 
Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...ijnlc
 
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...IJECEIAES
 
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONijnlc
 
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONkevig
 
Building of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemBuilding of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemWaqas Tariq
 
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORMSTANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORMijnlc
 
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...kevig
 
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...kevig
 
A systematic review of text classification research based on deep learning mo...
A systematic review of text classification research based on deep learning mo...A systematic review of text classification research based on deep learning mo...
A systematic review of text classification research based on deep learning mo...IJECEIAES
 
02 15034 neural network
02 15034 neural network02 15034 neural network
02 15034 neural networkIAESIJEECS
 

Similar to Survey of Arabic Text Classification Models Using Machine Learning (20)

Extracting numerical data from unstructured Arabic texts(ENAT)
Extracting numerical data from unstructured Arabic  texts(ENAT)Extracting numerical data from unstructured Arabic  texts(ENAT)
Extracting numerical data from unstructured Arabic texts(ENAT)
 
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer Keywords- Based on Arabic Information Retrieval Using Light Stemmer
Keywords- Based on Arabic Information Retrieval Using Light Stemmer
 
Classification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four ClassifiersClassification of Arabic Texts using Four Classifiers
Classification of Arabic Texts using Four Classifiers
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation method
 
Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of EvacuationCar-Following Parameters by Means of Cellular Automata in the Case of Evacuation
Car-Following Parameters by Means of Cellular Automata in the Case of Evacuation
 
Diacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemDiacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval System
 
Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...
 
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
 
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
 
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONFURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICON
 
Building of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemBuilding of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert System
 
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORMSTANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
 
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
 
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC...
 
almisbarIEEE-1
almisbarIEEE-1almisbarIEEE-1
almisbarIEEE-1
 
A systematic review of text classification research based on deep learning mo...
A systematic review of text classification research based on deep learning mo...A systematic review of text classification research based on deep learning mo...
A systematic review of text classification research based on deep learning mo...
 
02 15034 neural network
02 15034 neural network02 15034 neural network
02 15034 neural network
 

More from IAESIJEECS

08 20314 electronic doorbell...
08 20314 electronic doorbell...08 20314 electronic doorbell...
08 20314 electronic doorbell...IAESIJEECS
 
07 20278 augmented reality...
07 20278 augmented reality...07 20278 augmented reality...
07 20278 augmented reality...IAESIJEECS
 
06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...IAESIJEECS
 
05 20275 computational solution...
05 20275 computational solution...05 20275 computational solution...
05 20275 computational solution...IAESIJEECS
 
04 20268 power loss reduction ...
04 20268 power loss reduction ...04 20268 power loss reduction ...
04 20268 power loss reduction ...IAESIJEECS
 
03 20237 arduino based gas-
03 20237 arduino based gas-03 20237 arduino based gas-
03 20237 arduino based gas-IAESIJEECS
 
02 20274 improved ichi square...
02 20274 improved ichi square...02 20274 improved ichi square...
02 20274 improved ichi square...IAESIJEECS
 
01 20264 diminution of real power...
01 20264 diminution of real power...01 20264 diminution of real power...
01 20264 diminution of real power...IAESIJEECS
 
08 20272 academic insight on application
08 20272 academic insight on application08 20272 academic insight on application
08 20272 academic insight on applicationIAESIJEECS
 
07 20252 cloud computing survey
07 20252 cloud computing survey07 20252 cloud computing survey
07 20252 cloud computing surveyIAESIJEECS
 
06 20273 37746-1-ed
06 20273 37746-1-ed06 20273 37746-1-ed
06 20273 37746-1-edIAESIJEECS
 
05 20261 real power loss reduction
05 20261 real power loss reduction05 20261 real power loss reduction
05 20261 real power loss reductionIAESIJEECS
 
04 20259 real power loss
04 20259 real power loss04 20259 real power loss
04 20259 real power lossIAESIJEECS
 
03 20270 true power loss reduction
03 20270 true power loss reduction03 20270 true power loss reduction
03 20270 true power loss reductionIAESIJEECS
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancementIAESIJEECS
 
08 17079 ijict
08 17079 ijict08 17079 ijict
08 17079 ijictIAESIJEECS
 
07 20269 ijict
07 20269 ijict07 20269 ijict
07 20269 ijictIAESIJEECS
 
06 10154 ijict
06 10154 ijict06 10154 ijict
06 10154 ijictIAESIJEECS
 
05 20255 ijict
05 20255 ijict05 20255 ijict
05 20255 ijictIAESIJEECS
 
04 18696 ijict
04 18696 ijict04 18696 ijict
04 18696 ijictIAESIJEECS
 

More from IAESIJEECS (20)

08 20314 electronic doorbell...
08 20314 electronic doorbell...08 20314 electronic doorbell...
08 20314 electronic doorbell...
 
07 20278 augmented reality...
07 20278 augmented reality...07 20278 augmented reality...
07 20278 augmented reality...
 
06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...06 17443 an neuro fuzzy...
06 17443 an neuro fuzzy...
 
05 20275 computational solution...
05 20275 computational solution...05 20275 computational solution...
05 20275 computational solution...
 
04 20268 power loss reduction ...
04 20268 power loss reduction ...04 20268 power loss reduction ...
04 20268 power loss reduction ...
 
03 20237 arduino based gas-
03 20237 arduino based gas-03 20237 arduino based gas-
03 20237 arduino based gas-
 
02 20274 improved ichi square...
02 20274 improved ichi square...02 20274 improved ichi square...
02 20274 improved ichi square...
 
01 20264 diminution of real power...
01 20264 diminution of real power...01 20264 diminution of real power...
01 20264 diminution of real power...
 
08 20272 academic insight on application
08 20272 academic insight on application08 20272 academic insight on application
08 20272 academic insight on application
 
07 20252 cloud computing survey
07 20252 cloud computing survey07 20252 cloud computing survey
07 20252 cloud computing survey
 
06 20273 37746-1-ed
06 20273 37746-1-ed06 20273 37746-1-ed
06 20273 37746-1-ed
 
05 20261 real power loss reduction
05 20261 real power loss reduction05 20261 real power loss reduction
05 20261 real power loss reduction
 
04 20259 real power loss
04 20259 real power loss04 20259 real power loss
04 20259 real power loss
 
03 20270 true power loss reduction
03 20270 true power loss reduction03 20270 true power loss reduction
03 20270 true power loss reduction
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancement
 
08 17079 ijict
08 17079 ijict08 17079 ijict
08 17079 ijict
 
07 20269 ijict
07 20269 ijict07 20269 ijict
07 20269 ijict
 
06 10154 ijict
06 10154 ijict06 10154 ijict
06 10154 ijict
 
05 20255 ijict
05 20255 ijict05 20255 ijict
05 20255 ijict
 
04 18696 ijict
04 18696 ijict04 18696 ijict
04 18696 ijict
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Survey of Arabic Text Classification Models Using Machine Learning

  • 1. International Journal of Informatics and Communication Technology (IJ-ICT) Vol.8, No.1, April 2019, pp. 25~28 ISSN: 2252-8776, DOI: 10.11591/ijict.v8i1.pp25-28  25 Journal homepage: http://iaescore.com/journals/index.php/IJICT A survey of Arabic text classification models Ahed M. F. Al Sbou Department of Computer Science, Faculty of Information Technology, Al Hussein Bin Talal University, Jordan Article Info ABSTRACT Article history: Received Sep 30, 2018 Revised Dec 16, 2018 Accepted Jan 11, 2019 There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The research in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries. Keywords: Arabic language processing Arabic text categorization Arabic text mining Classification algorithms Clustering algorithms Natural languages processing Text classification Copyright © 2019 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Ahed M. F. Al Sbou, Department of Computer Science, Faculty of Information Technology, Al Hussein Bin Talal University, Rawdat Al-Amir Rashid, Ma'an, Jordan. Email: ahed_alsbou@ahu.edu.jo 1. INTRODUCTION The Arabic language is one of the most common languages with more than 420 million speakers over the world. Unlike English, Arabic doesn’t have upper cases . It also differs from other natural languages due to the presence of diacritics which represent a small vowel letters such as “fatha, kasra, damma, sukun, shadda, and tanween”. The Arabic language's orthographic system is based on diacritics effect, where each specific type of diacritics produces different words with different meanings. This language has specific letters known as Arabic vowels (waw, yaa, alf) that require a special system of morphology and grammars. What also distinguishes Arabic is the huge amount of vocabularies and concepts [1]. Although the Arabic texts are viewed as the most difficult ones, there are few studies on the processing of Arabic texts for reasons related to the linguistic characteristics of the Arabic language Due to the strict linguistic characteristics of Arabic texts and the limitations of studies on processing it [2], this study will deal with a variety of NLP applications that have recently emerged to manipulate languages such as Arabic, English, and Urdu. One of these applications is text classification (TC), which aims to make a set of documents from unstructured documents. This structured set of texts includes a description of the content of documents. TC is a process of classifying the textual document into groups based on subject’s similarity or other features [3]. This paper is organized as follows. The present section offers a brief introduction to the topic and the design of the paper. The second section briefly describes the related works in the area of Arabic texts classification. The third section displays the Arabic text challenges. The fourth section provides a brief
  • 2.  ISSN: 2252-8776 IJ-ICT Vol. 8, No. 1, April 2019 : 25–28 26 explanation about the common techniques and algorithms used in Arabic texts classification. The last section provides a summary of the paper and suggests future work. 2. RESEARCH METHOD There are many classification Algorithms that have been applied to Arabic Texts. In their application of Naive Bayes (NB) algorithm to classify 1500 Arabic text documents, El-Kourdi et al find five major categories whose results indicated that the accuracy was around 68.78% [4]. Sawaf et al conducted another study based on collecting data from the Arabic NEWSWIRE corpus by using statistical methods. The results were 62.7% [5]. El-Halees et al also classified 300 Arabic documents by applying different algorithms such as vector space model (VSM), K-Nearest Neighbor algorithms (KNN), and Naïve Bayes (NB). The accuracy of the classification was 74.41% [6]. The same accuracy was obtained in Al-Zoghby's study which includes CHARM algorithm to classify Arabic text documents from 5524 records [7]. Other studies applied by Mesleh to classify Arabic document through using Support Vector Machine (SVMs) with Chi Square feature. He conducted an experimental study on 1445 online Arabic corpus that involves Al-Nahar, Al-hayat, Al-Jazeera, Al-Ahram, and Al-Dostor to be classified into 9 categories. The F-measure result was 88.11% [8]. Harrag et al developed Arabic TCs through using Hybrid approach with tree algorithm factor to select the features. The data was collected from several Arabian scientific encyclopedia in many fields. The accuracy was 91% and 93% for literary and scientific corpus, respectively [9]. 3. ARABIC TEXT CHALLENGES Natural languages such as Arabic language have been processed through different methods. This language which has several textual features requires a specific categorical environment of its morphology, concepts, and ontology. Arabic language is one of the most complex natural languages. It comprises 28 characters [1]. The characters in this language are written in different forms based on their positions in the word. The characters may come in the front, middle, or last part of the word [10]. TC seeks to collect similar documents into specific categories that assign the categories of Arabic texts, and manipulate the relative categories that have been produced from other text classifications [11]. The system of retrieving information from the large amount of Arabic texts accessible on the web is very challenging. The retrieval task of query to all relevant documents is very important to the users, too. Therefore, the TC to access the different categories makes the processes of query easier and then can attain the information needed from them [12]. Further, Arabic texts include some problematic issues due to the nature of language. To the best of my knowledge, the studies on Arabic language are very limited, in which there is a lack of Arabic corpus, language tools, and comprehensive studies on preprocessing Arabic texts. All these problems refer to diverse areas of challenges to categorize the specific Arabic textual data into a closed category 4. TEXT CLASSIFICATION TC includes different phases. The first phase starts from preprocessing the text to remove the punctuations, stop words, and normalization. The second and third phases include TC and evaluating the classified text [7, 11, 13]. TC is the best mechanism to manage and organize the data. It helps machine to access the data categories and text labels using predefined process [14, 15]. This mechanism can be used to classify a group of documents into kinds of documents using several features such as contents, authors, or publisher [16]. The core goal of TC is to convert unstructured text into organized or structured that can be used in different NLP applications such as summarization or retrieval [10, 17]. There are two methods utilized in TC: machine learning in which the text can be classified by using a set of training documents, and rule-based TC which allows the usage of experts, or engineer's knowledge to classify the text [18]. Furthermore, the TC can be used in several applications of computer science such as spam or e-mail filtering, or as an accessible tool for interesting information in particular documents [4, 9]. 4.1. Common models, and algorithms of Arabic text classification Different algorithms are used to classify the Arabic documents. In this section, we will focus on the following models: Naïve Bayesian algorithm (NB), K-Nearest Neighbor algorithm (KNN), Support Vector Model (SVM), Artificial Neural Network (ANN).
  • 3. IJ-ICT ISSN: 2252-8776  A Survey of Arabic Text Classification Models (Ahed M. F. Al Sbou) 27 4.1.1 Naïve Bayesian algorithm NB is a machine learning technique used to classify text into predefined categories based on the similar features. NB has been applied to improve the processing and manipulating of texts or information from different sources. This algorithm represents a probabilistic method. In other words, NB classifier assumes that the absence of class feature is unrelated to the absence of other features. NB is commonly used to classify documents due to that is given a good performance in classification, NB computes the probability of documents that related to classify them into different classes, and then assigns them to the specific class with the highest probability [19]. Like many other models, NB has numerous advantages. It is generally considered the most powerful model used in this field. NB is understandable and very simple in implementation. As for the disadvantages, NB suffers some limitations such as it needs occurrence of class, because depends on probability, whereas the probability in usually depends on frequency 4.1.2 K-Nearest neighbor algorithms K-Nearest neighbor (KNN) is another type of machine learning algorithm of TC. It represents a non- parametric technique to classify documents or objects depend on closed class or training feature. It includes k value that is always a positive value; KNN the object has been classified to close neighbor class. KNN attempts to classify the object that is most vote of its neighbor [16]. KNN is possible when training data is large, and very large. Yet, there are some disadvantages of KNN including the necessity of defining k-parameter value where k represents a nearest neighbor’s number. Also, using this model is so expensive in comparison with other algorithms [20] 4.1.3 Support vector model (SVM) VSM is one of the supervised learning models that have been applied for TC. It classifies the different objects and documents into a finite dimensional space. VSM is also used to analyze data, texts, and documents in order to compute the similarity among them [21]. VSM shows different helpful aspects as an important model used in computer science. First, this method defends on a linear algebra, where it doesn't contain any complex algebra equation [8]. The other advantage is the efficiency of weights ascribed to concepts or terms. This model also shows a special sense of ease in comparison with other methods. It makes the machine compute the similarity among documents [22]. However, VSM contains some limitations that prevent some researchers to use it. The difficulty of using synonyms in Arabic represents a massive challenging area, where Arabic language has many synonyms for each word, or concept. Other limitations that it’s assume that the terms are statistically independent. While most of Arabic terms have a strong relationship with other terms 4.1.4 Artificial Neural Network ANN is one of machine learning of information processing likes human brain. It has been applied in different computer areas such as classification and pattern recognition. It consists of a set of inputs and adaptive weight, non-linear function, and outputs [23]. Different advantages and disadvantages are worth mentioning for using artificial neural networks. It represents one of the easy models to use. Also, it is usually appropriate for complex problems or large texts. The disadvantages of this model include the less recommendation of using with simpler solutions or small texts. This method also needs loading the training data. 5. CONCLUSION AND FUTURE WORK This paper is a survey of the importance of TC, as well as the current methods used in NLP field. In this research, we have discussed the traditional TC models that are used to classify the Arabic texts, corpus, and documents into different categories. The future work needs more efforts to build and develop a new standard model of Arabic TC. This model must be more efficient than the current traditional methods. The other important task that need to improvement in this model is language dialects. In other words, due to different Arabic dialects this model must be compatible with these Arabic language dialects. also, it can be applied in any Arabic texts. ACKNOWLEDGEMENTS We would like to thank Al Hussein bin Talal University (AHU) for providing us a good scientific environment to produce this simple work.
  • 4.  ISSN: 2252-8776 IJ-ICT Vol. 8, No. 1, April 2019 : 25–28 28 REFERENCES [1] Duwairi, R. M. Arabic text categorization. Int. Arab J. Inf. Technol, 2007. 4(2), 125-132. [2] Fauzi, M. A., Arifin, A. Z., & Yuniarti, A. Arabic Book Retrieval using Class and Book Index Based Term Weighting. International Journal of Electrical and Computer Engineering (IJECE), 2017. 7(6), 3705-3711. [3] El-Halees, A. Mining Arabic association rules for text classification. Paper presented at the Proceedings of the first international conference on Mathematical Sciences. Al-Azhar University of Gaza, Palestine. 2006. [4] Caballero, Y., Bello, R., Alvarez, D., & Garcia, M. M. Two new feature selection algorithms with Rough Sets Theory. Paper presented at the IFIP International Conference on Artificial Intelligence in Theory and Practice. 2006. [5] El Kourdi, M., Bensaid, A., & Rachidi, T.-e. Automatic Arabic document categorization based on the Naïve Bayes algorithm. Paper presented at the Proceedings of the Workshop on Computational Approaches to Arabic Script- based Languages. 2004. [6] Al-Zoghby, A., Eldin, A. S., Ismail, N. A., & Hamza, T. Mining Arabic text using soft-matching association rules. Paper presented at the Computer Engineering & Systems, 2007. ICCES International Conference on. 2007. [7] Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M., & Al-Rajeh, A. Automatic Arabic text classification. 2008. [8] Mesleh, A. Support vector machines based Arabic language text classification system: feature selection comparative study Advances in Computer and Information Sciences and Engineering. 2008. (pp. 11-16): Springer. [9] Dharmadhikari, S. C., Ingle, M., & Kulkarni, P. Empirical studies on machine learning based text classification algorithms. Advanced Computing, 2011. 2(6), 161. [10] Khan, A., Baharudin, B., Lee, L. H., & Khan, K. A review of machine learning algorithms for text-documents classification. Journal of advances in information technology, 2010. 1(1), 4-20. [11] Ababneh, J., Almomani, O., Hadi, W., El-Omari, N. K. T., & Al-Ibrahim, A. Vector space models to classify Arabic text. International Journal of Computer Trends and Technology (IJCTT), 2014. 7(4), 219-223. [12] Mesleh, A. Chi square feature extraction based svms Arabic language text categorization system. Journal of Computer Science, 2007. 3(6), 430-435. [13] Khorsheed, M. S., & Al-Thubaity, A. O. Comparative evaluation of text classification techniques using a large diverse Arabic dataset. Language resources and evaluation, 2013. 47(2), 513-538. [14] Sebastiani, F. Machine learning in automated text categorization. ACM computing surveys (CSUR), 2002. 34(1), 1- 47. [15] Khreisat, L. A machine learning approach for Arabic text classification using N-gram frequency statistics. Journal of Informetrics, 2009. 3(1), 72-77. [16] Alaa, E. A comparative study on Arabic text classification. Egypt. Comput. Sci. J, 2008. 2. [17] Mesleh, A. Support Vector Machine Text Classifier for Arabic Articles: Ant Colony Optimization-Based Feature Subset Selection. The Arab Academy for Banking and Financial Sciences. 2008. [18] Sebastiani, F. Text categorization Encyclopedia of Database Technologies and Applications, 2005. (pp. 683-687): IGI Global. [19] Abu-Errub, A. Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements. International Journal of Computer Applications, 2014. 93(6). [20] Al-Shalabi, R., Kanaan, G., & Gharaibeh, M. Arabic text categorization using KNN algorithm. Paper presented at the the Proc. of Int. multi conf. on computer science and information technology CSIT06. 2006. [21] Jackson, P., & Moulinier, I. Natural language processing for online applications: Text retrieval, extraction and categorization. 2007. (Vol. 5): John Benjamins Publishing. [22] Sawaf, H., Zaplo, J., & Ney, H. Statistical classification methods for Arabic news articles. Natural Language Processing in ACL 2001, Toulouse, France. 2001. [23] Harrag, F., El-Qawasmeh, E., & Pichappan, P. Improving Arabic text categorization using decision trees. Paper presented at the Networked Digital Technologies, 2009. NDT'09. First International Conference on. 2009. BIOGRAPHY OF AUTHOR Ahed Al-Sbou is a lecturer in the Information Technology School of Computer Science at the University of Al-Hussein Bin Talal where he has been a faculty member since 2014. He holds the master degree in computer science. Ahed completed his Master degree from Al-Balqa Applied University, Salt, Jordan in 2012 and his B.S. degree in computer science from Al-Hussein Bin Talal University, Ma'an, Jordan in 2006. His research interests lie in computer science are in the area of programming languages, ranging from theory to design to implementation, Data base, Data Mining, Natural languages Processing (NLP), and information systems. Ahed has worked as a computer lab supervisor (2006-2014) at Al-Hussein Bin Talal University