Balochi Language Text Classification Using
Deep Learning
Presenter: Shahab Qadir
CMS ID: 48468
Session: Spring 2019
Department: Computer Science
Supervisor: Dr. Sibghat Ullah Bazai
Co-Supervisor: Engr. Syed Ali Asghar,
Shah, Mr.Sikander Khan,
Adnan Ali
Dept of Computer Science, FICT, BUITEMS
Contents
› Introduction
› Motivation
› Problem Statement
› Objectives
› Literature Review
› Methodology
› Gantt Chart
› References
Introduction
› Text classification can be defined as the categorization of any kind of textual
query, sentence, paragraph, or any other document.
› Text classification applications include sentiment analysis, question
answering, user reviews, content moderation, spam detection, news
categorization, and many others.
› Text classification is a process of utilizing machine learning and deep learning
techniques to sort unstructured text into predefined categories.
› Deep learning models for text classification are a subtype of neural network
models that are specifically designed to classify text data.
(Cont’d)…
› Convolutional Neural Networks CNNs are commonly used for text classification
tasks. They are particularly effective for short text classification, such as sentiment
analysis.
› Recurrent Neural Networks and their variants such as LSTM and GRU: RNNs
are suitable for text classification tasks that require the analysis of sequential data,
such as language modeling and text generation.
› Transformer models such as BERT and GPT-2, are particularly effective for
natural language processing tasks, including text classification. They are based on
the attention mechanism, which allows the model to focus on specific parts of the
input sequence.
› these models can be further fine-tuned for specific text classification tasks, such as
sentiment analysis or topic classification
(Cont’d)…
› Deep Learning Architectures:
› LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN)
architecture designed to handle sequential data with long-term dependencies. It
uses a memory cell and gates to selectively store and retrieve information from the
memory cell.
› GRU (Gated Recurrent Unit) is another type of RNN architecture that aims to
address the limitations of traditional RNNs and LSTMs. It uses a simpler structure
than LSTM, with two gates (update and reset gates) to control the flow of information
› Machine Learning Algorithms:
› SVM is a supervised machine learning algorithm that can be applied to classification
and regression problems. It finds the optimal boundary, called a "hyperplane," that
separates the different classes in the data by maximizing the distance, or margin,
between the boundary and the closest data points from each class.
› KNN is a supervised machine learning algorithm that can be used for classification
and regression tasks. It assigns a class label to an unknown data point by identifying
the majority class among its k-nearest neighbors in the feature space.
Motivation
› Developing a deep learning model for Balochi text classification that can accurately
identify the nature of a given text.
› Examining the effectiveness of utilizing pre-trained language models for classifying
Balochi text, and comparing the outcomes to those obtained using traditional
machine learning techniques.
› Pre-trained language models have shown to be very useful in natural language
processing tasks, therefore, investigating their use in Balochi text classification can
help to improve the performance of the models.
› The classification of Balochi text can play a role in understanding public opinion in
Balochi-speaking communities and the region, through sentiment analysis
Problem Statement
› Balochi is a relatively low-resource language and there is limited research on it,
including in the field of text classification sentiment analysis.
› There are limited labeled datasets available for training machine learning models.
› Previous research on the sentiment analysis of Balochi tweets has relied on
traditional machine learning techniques such as SVM, Naive Bayes, and Decision
Trees to classify sentiments, however, Deep Learning models were not used.
› As deep learning techniques continue to advance, it is likely that the application of
these methods to text classification in the Balochi language could lead to more
promising results.
Problem Statement
› The field of text classification in the Balochi language is under-researched due to a
scarcity of resources. Previous studies that have been conducted mainly focus on
categorizing Balochi text into various genres and subjects like news articles, poetry,
and fiction, using traditional machine learning techniques such as SVM, Decision
Tree, and Random Forest, without incorporating deep learning models.
› Classifying Balochi text can be challenging due to its script having characteristics of
Urdu, Arabic, and Persian languages
› Due to the lack of resources, text classification in the Balochi language presents a
significant challenge. With limited labeled datasets available for models training.
Objectives
› To gather resources for constructing a sentiment analysis system for the Balochi
language.
› Creating a sentiment analysis system for the Balochi language by implementing
sentiment classification and word embedding techniques.
› To evaluate the sentiment analysis system developed for the Balochi language.
Literature Review
S.No Paper Title Objectives of the Study Outcome
1. Classifying Arabic text
using deep learning[1]
To classify a Arabic text
using deep learning(CNN)
model and their own
Gstem algorithm
Authors first experimented with
CNN without using Gstem and
then used CNN with the Gstem
algorithm which showed that
the accuracy of the CNN model
is improved by using Gstem
2. Balochi non cursive
isolated character
recognition using deep
neural network[2]
Proposed a deep learning
technique based on
computer vision for the
real-time classification
and recognition of Balochi
characters
The proposed method show
96% accuracy and was more
faster then basic method.
Literature Review
S.No Paper Title Objectives of the Study Outcomes
3. Efficient English text
classification using
selected Machine
Learning
Techniques[4]
text classification using
Machine Learning
Techniques for the
English language using
svm
From the simulations,
very clear that the SVM
outperforms the rest of the
machine
learning techniques for the
dataset
4. A Unified
Understanding of
Deep NLP Models for
Text Classification[6]
(NLP) model for text
classification to employ
techniques such as
explainable AI, visual
debugging, visual
analytics, and information
interpretation to enhance
its performance.
DeepNLPVis to comprehend
and analyze the model's
behavior in text classification
tasks and investigate the
reasons behind successful and
unsuccessful cases.
Methodology
Fig.1 Proposed solution for Balochi Text classification
Data Collection: Data will be collected online from websites with web scrapping, and manually for
datasets. If data
Text Processing: After collecting the data for model training, it requires some pre-processing before
it can be used for classification, in this step the unnecessary words, digits, punctuation marks,
duplicate spaces, and other unrelated words and letters would be cleaned up.
Converting Text to Vector: In this step, Text (raw data) inputs will be converted to vectors. In this
step text data is converted into vector space using the Word2Vec technique, this technique represents
the presentation of the nearby Word2Vec words with almost the same meaning words.
Training the model: after converting word 2Vec the last step would be to train the data. This is the
phase where the Deep Learning algorithm is trained by feeding the dataset. Learning takes place at
this stage. Consistent training can significantly improve the prediction rate of Deep Learning
models. The model weights should be randomly initialized. This way the algorithm learns to adjust
the weights accordingly
Model Evaluation: Evaluate the performance of the training models to check if they classify Texts
into the correct category of the given dataset.
Text Classification Deployment: After the evaluation of the texts, if they fall into the defined
category, the model will deploy the classified texts.
Gantt Chart
References
[1] M. Galal, M. M. Madbouly, and A. El-Zoghby, “Classifying Arabic text using deep
learning,” J. Theor. Appl. Inf. Technol., vol. 97, no. 23, pp. 3412–3422, 2019.
[2] G. J. Naseer, A. Basit, I. Ali, and A. Iqbal, “Balochi non cursive isolated character
recognition using deep neural network,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 4, pp.
717–722, 2020, doi: 10.14569/IJACSA.2020.0110492.
[4] X. Luo, “Efficient English text classification using selected Machine Learning
Techniques,” Alexandria Eng. J., vol. 60, no. 3, pp. 3401–3409, 2021, doi:
10.1016/j.aej.2021.02.009.
[6] Z. Li et al., “A Unified Understanding of Deep NLP Models for Text Classification,” IEEE
Trans. Vis. Comput. Graph., pp. 1–15, 2022, doi: 10.1109/TVCG.2022.3184186.
Q&A
Thanks 16

Balochi Language Text Classification Using Deep Learning 1.pptx

  • 1.
    Balochi Language TextClassification Using Deep Learning Presenter: Shahab Qadir CMS ID: 48468 Session: Spring 2019 Department: Computer Science Supervisor: Dr. Sibghat Ullah Bazai Co-Supervisor: Engr. Syed Ali Asghar, Shah, Mr.Sikander Khan, Adnan Ali Dept of Computer Science, FICT, BUITEMS
  • 2.
    Contents › Introduction › Motivation ›Problem Statement › Objectives › Literature Review › Methodology › Gantt Chart › References
  • 3.
    Introduction › Text classificationcan be defined as the categorization of any kind of textual query, sentence, paragraph, or any other document. › Text classification applications include sentiment analysis, question answering, user reviews, content moderation, spam detection, news categorization, and many others. › Text classification is a process of utilizing machine learning and deep learning techniques to sort unstructured text into predefined categories. › Deep learning models for text classification are a subtype of neural network models that are specifically designed to classify text data.
  • 4.
    (Cont’d)… › Convolutional NeuralNetworks CNNs are commonly used for text classification tasks. They are particularly effective for short text classification, such as sentiment analysis. › Recurrent Neural Networks and their variants such as LSTM and GRU: RNNs are suitable for text classification tasks that require the analysis of sequential data, such as language modeling and text generation. › Transformer models such as BERT and GPT-2, are particularly effective for natural language processing tasks, including text classification. They are based on the attention mechanism, which allows the model to focus on specific parts of the input sequence. › these models can be further fine-tuned for specific text classification tasks, such as sentiment analysis or topic classification
  • 5.
    (Cont’d)… › Deep LearningArchitectures: › LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) architecture designed to handle sequential data with long-term dependencies. It uses a memory cell and gates to selectively store and retrieve information from the memory cell. › GRU (Gated Recurrent Unit) is another type of RNN architecture that aims to address the limitations of traditional RNNs and LSTMs. It uses a simpler structure than LSTM, with two gates (update and reset gates) to control the flow of information › Machine Learning Algorithms: › SVM is a supervised machine learning algorithm that can be applied to classification and regression problems. It finds the optimal boundary, called a "hyperplane," that separates the different classes in the data by maximizing the distance, or margin, between the boundary and the closest data points from each class. › KNN is a supervised machine learning algorithm that can be used for classification and regression tasks. It assigns a class label to an unknown data point by identifying the majority class among its k-nearest neighbors in the feature space.
  • 6.
    Motivation › Developing adeep learning model for Balochi text classification that can accurately identify the nature of a given text. › Examining the effectiveness of utilizing pre-trained language models for classifying Balochi text, and comparing the outcomes to those obtained using traditional machine learning techniques. › Pre-trained language models have shown to be very useful in natural language processing tasks, therefore, investigating their use in Balochi text classification can help to improve the performance of the models. › The classification of Balochi text can play a role in understanding public opinion in Balochi-speaking communities and the region, through sentiment analysis
  • 7.
    Problem Statement › Balochiis a relatively low-resource language and there is limited research on it, including in the field of text classification sentiment analysis. › There are limited labeled datasets available for training machine learning models. › Previous research on the sentiment analysis of Balochi tweets has relied on traditional machine learning techniques such as SVM, Naive Bayes, and Decision Trees to classify sentiments, however, Deep Learning models were not used. › As deep learning techniques continue to advance, it is likely that the application of these methods to text classification in the Balochi language could lead to more promising results.
  • 8.
    Problem Statement › Thefield of text classification in the Balochi language is under-researched due to a scarcity of resources. Previous studies that have been conducted mainly focus on categorizing Balochi text into various genres and subjects like news articles, poetry, and fiction, using traditional machine learning techniques such as SVM, Decision Tree, and Random Forest, without incorporating deep learning models. › Classifying Balochi text can be challenging due to its script having characteristics of Urdu, Arabic, and Persian languages › Due to the lack of resources, text classification in the Balochi language presents a significant challenge. With limited labeled datasets available for models training.
  • 9.
    Objectives › To gatherresources for constructing a sentiment analysis system for the Balochi language. › Creating a sentiment analysis system for the Balochi language by implementing sentiment classification and word embedding techniques. › To evaluate the sentiment analysis system developed for the Balochi language.
  • 10.
    Literature Review S.No PaperTitle Objectives of the Study Outcome 1. Classifying Arabic text using deep learning[1] To classify a Arabic text using deep learning(CNN) model and their own Gstem algorithm Authors first experimented with CNN without using Gstem and then used CNN with the Gstem algorithm which showed that the accuracy of the CNN model is improved by using Gstem 2. Balochi non cursive isolated character recognition using deep neural network[2] Proposed a deep learning technique based on computer vision for the real-time classification and recognition of Balochi characters The proposed method show 96% accuracy and was more faster then basic method.
  • 11.
    Literature Review S.No PaperTitle Objectives of the Study Outcomes 3. Efficient English text classification using selected Machine Learning Techniques[4] text classification using Machine Learning Techniques for the English language using svm From the simulations, very clear that the SVM outperforms the rest of the machine learning techniques for the dataset 4. A Unified Understanding of Deep NLP Models for Text Classification[6] (NLP) model for text classification to employ techniques such as explainable AI, visual debugging, visual analytics, and information interpretation to enhance its performance. DeepNLPVis to comprehend and analyze the model's behavior in text classification tasks and investigate the reasons behind successful and unsuccessful cases.
  • 12.
    Methodology Fig.1 Proposed solutionfor Balochi Text classification
  • 13.
    Data Collection: Datawill be collected online from websites with web scrapping, and manually for datasets. If data Text Processing: After collecting the data for model training, it requires some pre-processing before it can be used for classification, in this step the unnecessary words, digits, punctuation marks, duplicate spaces, and other unrelated words and letters would be cleaned up. Converting Text to Vector: In this step, Text (raw data) inputs will be converted to vectors. In this step text data is converted into vector space using the Word2Vec technique, this technique represents the presentation of the nearby Word2Vec words with almost the same meaning words. Training the model: after converting word 2Vec the last step would be to train the data. This is the phase where the Deep Learning algorithm is trained by feeding the dataset. Learning takes place at this stage. Consistent training can significantly improve the prediction rate of Deep Learning models. The model weights should be randomly initialized. This way the algorithm learns to adjust the weights accordingly Model Evaluation: Evaluate the performance of the training models to check if they classify Texts into the correct category of the given dataset. Text Classification Deployment: After the evaluation of the texts, if they fall into the defined category, the model will deploy the classified texts.
  • 14.
  • 15.
    References [1] M. Galal,M. M. Madbouly, and A. El-Zoghby, “Classifying Arabic text using deep learning,” J. Theor. Appl. Inf. Technol., vol. 97, no. 23, pp. 3412–3422, 2019. [2] G. J. Naseer, A. Basit, I. Ali, and A. Iqbal, “Balochi non cursive isolated character recognition using deep neural network,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 4, pp. 717–722, 2020, doi: 10.14569/IJACSA.2020.0110492. [4] X. Luo, “Efficient English text classification using selected Machine Learning Techniques,” Alexandria Eng. J., vol. 60, no. 3, pp. 3401–3409, 2021, doi: 10.1016/j.aej.2021.02.009. [6] Z. Li et al., “A Unified Understanding of Deep NLP Models for Text Classification,” IEEE Trans. Vis. Comput. Graph., pp. 1–15, 2022, doi: 10.1109/TVCG.2022.3184186.
  • 16.