SlideShare a Scribd company logo
1 of 6
Download to read offline
THE NAÏVE BAYES ALGORITHM
FOR LEARNING DATA ANALYTICS
Tran Ngoc Viet
Faculty of Information Technology, Van Lang University
Ho Chi Minh City, Vietnam
viet.tn@vlu.edu.vn
Hoang Le Minh
Faculty of Information Technology, Van Lang University
Ho Chi Minh City, Vietnam
minh.hl@vlu.edu.vn
Le Cong Hieu
Faculty of Information Technology, Van Lang University
Ho Chi Minh City, Vietnam
hieu.lc@vlu.edu.vn
Tong Hung Anh
Faculty of Information Technology, Van Lang University
Ho Chi Minh City, Vietnam
anh.th@vlu.edu.vn
Abstract
The field of data science analysis in artificial intelligence is being deeply interested by many scientists
around the world, many techniques are proposed to improve accuracy from Naive Bayes model by
reducing the problem of interdependence between its attributes. The new research of this paper, which is
presented step by step by the Naive Bayes (NB) method, is the method of applying NB with a new set of
attributes. It is worthy of consideration that using learning data analytics method is receiving increased
attention, because of the importance of learning data analytics, in order to provide predictive learning
results of learners or offer better solutions to support schools to strengthen educational measures or have
optimal plans to increase student retention rates studying at the school, as well as help students succeed in
their studies. Data related to student management and training activities are collected from softwares,
student affairs and learning management systems are operating in practice such as Edusoft.Net software,
Moodle and Microsoft Teams, student attendance system by FaceID face recognition… Research,
evaluate and select a number of new data technologies for the purpose of building student digital profile,
including document storage functions, specifically intelligent functions such as creating, processing and
storing documents.
Keywords: Machine learning; Naïve Bayes model; learning data analytics; algorithm; computational;
complexity.
1. Introduction
In this article, we define learning data analytics by using the Naïve Bayes model (LDAB) with a new algorithm
becomes a learning data analysis tool and lecturers can use the data in their courses to track and predict students
with high performance. Finally, we discuss clarifying issues and concerns with the use of learning data analytics
in higher education.
We are solving the problem of classifying data classes and creating datasets by automatic collection, pre-
processing of data from Learning Management Systems such as Microsoft Teams, Moodle, E-learning in Van
Lang University, Viet Nam for student’s learning data analytics. We are working on a classification problem
and have generated your set of hypothesis, created features and discussed the importance of variables. You have
hundreds of thousands of data points and quite a few variables in your training data set. I would have used Naive
Bayes model, which can be extremely fast relative to other classification algorithms. It works on Bayes theorem
e-ISSN : 0976-5166
p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE)
DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1038
of probability to predict the class of unknown data sets. The basics of this algorithm, so that next time when you
come across large data sets, you can bring this algorithm to action. Learning data analytics and dataset is defined
as the measurement, collection, analysis, and reporting of data about learners and their contexts, for purposes of
understanding and optimizing learning and the environments in which it occurs. Learning data analytics offers
promise for predicting and improving student success and retention [13], [16] in part because it allows faculty,
institutions, and students to make data-driven decisions about student success and retention.
The study covers how it detected the spam and non-spam messages using Naïve Bayes algorithm [3], [4].
This algorithm allows to handle large number of features. Naïve Bayes model support for handling large number
words easily. The data set followed set of processes of machine learning in order to build the model. The data
set acquired from the kaggle site and performed pre-processing using many methods including bag-of-words.
Split the data into train and test model, it has built the model and evaluated. Hypothesize, create features and
discuss the importance of variables from a set of big data. Big data such as hundreds of thousands of data points
and quite a few variables in the set of training data. We use the Naive Bayes model which can process extremely
fast compared to other categorical data analysis algorithms. Algorithm works based on Bayes theorem of
probability to train data and predict classify of unknown set of data. The purpose of this paper is to provide a
brief overview of learning data analytics and machine learning algorithms for learning data analytics. We will
also explore its usage and applications, goals and examples.
2. Naïve Bayes classifier
Naive Bayes is a classification algorithm for multiclass classification problems. It is called Naive Bayes because
the calculations of the probabilities for each class are simplified to make their calculations tractable.
Naive Bayes classifiers are built on Bayesian classification methods [4]. These rely on Bayes's theorem,
equation describing the relationship of conditional probabilities of statistical quantities. In Bayesian
classification, we're interested in finding the probability of a label given some observed features.
We can write as , tells us how to express this in tern of quantities we can compute more directly
(1)
The Naïve Bayes classifier assigns to each instance the class value having the highest conditional probability.
Given a features vector and a class variable , Bayes Theorem states that
(2)
the conditional probability that even occurs, given that has occurred. This is also known as the
posterior probability.
the conditional probability that even occurs, given that has occurre
the prior probability of class
the prior probability of predictor
The likelihood can be decomposed as
(3)
Multinomial Naive Bayes
(4)
Parameters: for each document category v and wordtype w
prior distribution over document categories v
Learning: Estimate parameters as frequency ratios
Predictions: Predict class
(5)
 
e-ISSN : 0976-5166
p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE)
DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1039
3. Process Model
Fig. 1. A dataset is split into training and testing sets
4. Steps LDAB Algorithm
Data related to student management and training activities are collected from softwares, student affairs and
learning management systems are operating in practice. New data technologies for the purpose of building
student digital profile, including document storage functions, specifically intelligent functions such as creating,
processing and storing documents.
Collected data has been processed, extracted according to the form.
Steps to build a basic Naive Bayes Model. Learning an LDAB is quite simple and mainly about esti-mating the
parameters in the LDAB from the training data. The learning algorithm for LDAB is depicted as follows. Naive
Bayes classifier in 4 easy steps, we will develop each piece of the algorithm in this section, then we will tie all
of the elements together into a working implementation applied to a real dataset in the next section.
This Naive Bayes classifier is broken down into 4 parts
Input: A set Data of training examples
Output: An hidden Naive Bayes for Data, new set of data and evaluate the accuracy.
Student digital profile, probability results are calculated depending on the level of recommendation or warning
to students about each student learning results.
Step 1: Gather the dataset/ Build a dataset frame.
Collect data by extracting from existing data storage such as MS Teams, FaceID or EduSoft.Net… Manage and
organize data including deleting and removing unnecessary data.
Step 2: Create the model LDAB in Python (in this example LDAB).
Exploiting, processing data, presenting issues that need in-depth analysis and research.
Step 3: Predict using Test Dataset.
Analyze, train deeply data of the Naïve Bayes multi-class probability model and predict learning outcomes for
the next learning period with accurate probability calculation;
Step 4: Prediction with a New Set of Dataset and evaluate the accuracy.
Report, transform and store the research results.
e-ISSN : 0976-5166
p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE)
DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1040
5. The Complexity of the Algorithm
The independence assumption, Naive Bayes classifiers can quickly learn to use high dimensional features with
limited training data compared to more sophisticated methods.
Let n is number of training examples, v* is dimensionality of the features and k is number of classes.
All it needs to do is computing the frequency of every feature value v* for each class, but space complexity
of training is O(k.v*.n) since you need to store the data which also takes time.
The computational complexity efficiency of Naive Bayes lies in the fact that the runtime complexity of
Naive Bayes classifier is O(k.v*.n).
6. Applications of LDAB Algorithm
6.1. Computational
 TRAINING
e1:x1 2 1 2 2 2 2
e2:x2 2 2 2 1 2 2
e3:x3 2 2 2 2 2 2 d = |V| = 6
Total 6 5 6 5 6 6 => NY = 34
=>λY 7/40 6/40 7/40 6/40 7/40 7/40 40 = NY + |V|
Class Y
e4:x4 0 0 1 0 2 1 => NN = 4
=>λN 1/10 1/10 2/10 1/10 3/10 2/10 10 = NN + |V|
Class N
 TEST
e5:x5 = [1, 1, 2, 1, 2, 1]
(Y|e5) = x x x x x x ≈ 4.8 x 10-7
(N|e5) = x x x x x x ≈ 1.8 x 10-7
> =>
Test data
6.2. Probability
(Y|e5) = ≈ 0.7273
=> (N|e5) = 1 - (Y|e5) ≈ 0.2727 and
6.3. Using the Naïve Bayes model in Python
Supervised learning lets us make predictions based on the data that we see and thus apply generalisations
>>> from sklearn.naive_bayes import MultinomialNB
>>> import numpy as np
>>> e1 = [2, 1, 2, 2, 2, 2] #input – training data
e-ISSN : 0976-5166
p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE)
DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1041
>>> e2 = [2, 2, 2, 1, 2, 2]
>>> e3 = [2, 2, 2, 2, 2, 2]
>>> e4 = [0, 0, 1, 0, 2, 1]
>>> training_data = np.array([e1, e2, e3, e4])
>>> result_data = np.array(['Y', 'Y', 'Y', 'N'])
>>> e5 = np.array([[1, 1, 2, 1, 2, 1]]) #test data
>>> ml = MultinomialNB(alpha=1) #call MultinomialNB
>>> ml.fit(training_data, result_data) #process model
MultinomialNB(alpha=1, class_prior=None, fit_prior=True)
>>> print('Probability of e5:', ml.predict_proba(e5))
Probability of e5: [[0.27079929 0.72920071]]
#output - new set of data and evaluate the accuracy
>>> print('Predicting class of e5:', str(ml.predict(e5)[0]))
Predicting class of e5: Y
7. Conclusion
In this article, we build a model for learning data analytics from applying the Naïve Bayes probability formula
with the purpose of modeling from real problems that can be applied accurately and effectively. Next, we
propose to build an algorithm for learning data analytics and data needs to be tested correctly. Finally, a specific
example is presented in detail to illustrate the LDAB algorithm as well as its complexity calculation.
Acknowledgements
The author would like to thank the Ministry of Technology - Higher Education and President, Van Lang
University for financial support to this research.
References
[1] Webb, G.I., Boughton, J., Wang, Z (2005). Not so naive Bayes: Aggregating onedependence estimators. Machine Learning, 5–24.
[2] Witten, I.H., Frank, E (2000). Data mining: practical machine learning tools and techniques with Java implementations. San
Francisco, CA: Morgan Kaufmann.
[3] A. S. Thanuja Nishadi (2019). Text Analysis: Naïve Bayes Algorithm using Python JupyterLab, International Journal of Scientific and
Research Publications, Issue 11.
[4] Jake VanderPlas (2014). Frequentism and Bayesianism: A Python-driven Primer.
[5] Bhargava, K. Tara Phani (2018). Analysis and Design of Visualization of Educational Institution Database using Power BI Tool.
[6] Abdous, M., He, W., & Yen, C (2012). Using data mining for predicting relationships between online question theme and final grade.
Educational Technology & Society, 15(3), 77-88.
[7] Brandon, K (2009). Investing in Education: The American Graduation Initiative. Office of SociaI Innovation and Civic Participation.
[8] Campbell, J. P., DeBlois, P. B., & Oblinger (2007). Academic analytics: A new tool for a new era. EDUCAUSE Review, 42(4), 40-57.
[9] Dietz-Uhler, B., Hurn, J.E., & Hurn (2012). Making use of data in an LMS to predict student performance: A learning analytics
investigation. Unpublished manuscript.
[10] Dringus, L. P (2012). Learning analytics considered harmful. Journal of Asynchronous Learning Networks, 16(3), 87-100.
[11] Dyckhoff, A. L., Zielke, D., Bultmann, M., Chatti, M. A., & Schroeder (2012). Design and implementation of a learning analytics
toolkit for teachers. Educational Technology & Society, 15(3), 58-76.
[12] Brain, D., Webb, G.I. (2002). The need for low bias algorithms in classification learning from large data sets. In: Proc. 16th European
Conf. Principles of Data Mining and Knowledge Discovery (PKDD2002), Berlin:Springer-Verlag, 62–73.
[13] Dziuban, C., Moskal, P., Cavanaugh, T., & Watts, A (2012). Analytics that inform the university: Using data you already have. Journal
of Asynchronous Learning Networks, 16(3), 21-38.
[14] Greller, W. & Drachslrer (2012). Translating learning into numbers: A generic framework for learning analytics.
[15] Ice, P., Diaz, S., Swan, K., Burgess, M., Sharkey, M., Sherrill, J., Huston, D., & Okimoto, H (2012). The PAR framework proof of
concept: Initial findings from a multi-institutional analysis of federated postsecondary data. Journal of Asynchronous Learning
Networks, 16(3), 63-86.
[16] Long, P. & Siemens, G (2011). Penetrating the fog: Analytics in Learning and Education.
[17] Webb, G.I. (2001). Candidate elimination criteria for lazy Bayesian rules. In: Proc. Fourteenth Australian Joint Conf. Artificial
Intelligence. Volume 2256., Berlin:Springer, 545–556.
[18] Xie, Z., Hsu, W., Liu, Z., Lee, M.L.(2002). Snnb: A selective neighborhood based naive Bayes for lazy learning. In: Advances in
Knowledge Discovery and Data Mining, Proc. Pacific-Asia Conference, Berlin:Springer, 104–114.
[19] Zhang, N. L (2004). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research 5:697–723.
e-ISSN : 0976-5166
p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE)
DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1042
[20] Fayyad, U.M., Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. 13th
Int. Joint Conf. Artificial Intelligence (IJCAI-93), Morgan Kaufmann, 1022–1029.
[21] Jones, S. J (2012). Technology review: The possibilities of learning analytics to improve learner centered decision making. The
Community College Enterprise, Spring, 89-92.
[22] Zheng, Z., and Webb, G. I (2000). Lazy learning of bayesian rules. Journal of Machine Learning 41(1):53–84.
[23] Koller, D (2013). Probabilistic Graphical Models. Coursera Stanford – Lectures. Retrieved December 6, from
https://class.coursera.org/pgm/lecture
[24] Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principals and techniques MIT Press. ISBN 978-0262013192
[25] Koller, D., Friedman, N., Getoor, L., & Taskar, B. (2007). Graphical models in a nutshell. In L. Getoor, & B. Taskar (Eds.), An
introduction to statistical relational learning () MIT Press.
[26] Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers.
[27] Jurafsky, Dan and Manning, Christopher (2014). Natural Language Processing. Coursera Stanford - Lectures. Retrieved April 5, from
https://class.coursera.org/nlp/lecture/28
[28] Naive Bayes for classifying Text (2014). Retrieved April 5 , from http://www.cs.nyu.edu/faculty/davise/ai/BayesText.html
[29] Chow, C.K. & C.N. Liu (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. on Info. Theory,
14, 462- 467.
Authors Profile
Dr. Tran Ngoc Viet. Born in 1973 in Thanh Khe District, Da Nang City,
Vietnam. He graduated from Maths_IT faculty of Da Nang university of
science. He got master of science at university of Danang and hold Ph.D Degree
in 2017 at Danang university of technology. His main major: Applicable
mathematics in transport, parallel and distributed process, data science, machine
learning, graph theory and distributed programming.
Dr. Hoang Le Minh was born in 1957 in Ha Noi, Vietnam. He got Ph.D
Degree in 1984 at Matxcova university of science. He is currently a lecturer at
the Faculty of Information Technology at Van Lang University. His main major:
Graph theory, data science, machine learning, big data analytics, data pre-
processing and analysis, distributed programming.
Le Cong Hieu was born in 1979. He graduated from information technology
faculty of Ho Chi Minh university of science. He has a master's degree in
computer science, graduated in 2014 of Hue university. He is currently a
lecturer at the Faculty of Information Technology at Van Lang University. His
main major: discrete mathemetics, graph theory, machine learning and
distributed programming.
Tong Hung Anh was born in 1964 in Hue, Vietnam. He has a master's degree
in computer networking, graduated in 2011 university Paris VI. He is currently a
lecturer at the Faculty of Information Technology at Van Lang university. His
main major: Python for machine learning, distributed programming,
mathemetics and statistics for data science, data pre-processing and analysis.
e-ISSN : 0976-5166
p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE)
DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1043

More Related Content

What's hot

lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 
AI Unit 5 machine learning
AI Unit 5 machine learning AI Unit 5 machine learning
AI Unit 5 machine learning Narayan Dhamala
 
Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819HODCSE21
 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREEAkshay Jain
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.docbutest
 
Predicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningPredicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningLovely Professional University
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Associative Classification: Synopsis
Associative Classification: SynopsisAssociative Classification: Synopsis
Associative Classification: SynopsisJagdeep Singh Malhi
 
IRJET- Using Data Mining to Predict Students Performance
IRJET-  	  Using Data Mining to Predict Students PerformanceIRJET-  	  Using Data Mining to Predict Students Performance
IRJET- Using Data Mining to Predict Students PerformanceIRJET Journal
 
Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...IJCNCJournal
 
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...ijaia
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
Review: Semi-Supervised Learning Methods for Word Sense DisambiguationReview: Semi-Supervised Learning Methods for Word Sense Disambiguation
Review: Semi-Supervised Learning Methods for Word Sense DisambiguationIOSR Journals
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityGon-soo Moon
 

What's hot (17)

lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
AI Unit 5 machine learning
AI Unit 5 machine learning AI Unit 5 machine learning
AI Unit 5 machine learning
 
Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819
 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
Predicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningPredicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data mining
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Associative Classification: Synopsis
Associative Classification: SynopsisAssociative Classification: Synopsis
Associative Classification: Synopsis
 
IRJET- Using Data Mining to Predict Students Performance
IRJET-  	  Using Data Mining to Predict Students PerformanceIRJET-  	  Using Data Mining to Predict Students Performance
IRJET- Using Data Mining to Predict Students Performance
 
Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...
 
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
Review: Semi-Supervised Learning Methods for Word Sense DisambiguationReview: Semi-Supervised Learning Methods for Word Sense Disambiguation
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
 
Ch06
Ch06Ch06
Ch06
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-Severity
 

Similar to Paper Scopus - The Naive Bayes algorithm for learning data analytics.pdf

activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.pptbutest
 
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and LimeIRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and LimeIRJET Journal
 
IRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine LearningIRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine LearningIRJET Journal
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Techniqueiosrjce
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...IJSRD
 
Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...CSITiaesprime
 
Extending the Student’s Performance via K-Means and Blended Learning
Extending the Student’s Performance via K-Means and Blended Learning Extending the Student’s Performance via K-Means and Blended Learning
Extending the Student’s Performance via K-Means and Blended Learning IJEACS
 
Student Performance Predictor
Student Performance PredictorStudent Performance Predictor
Student Performance PredictorIRJET Journal
 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_ProjectreportSampath Velaga
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
 
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
 
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
 
An approach for improved students’ performance prediction using homogeneous ...
An approach for improved students’ performance prediction  using homogeneous ...An approach for improved students’ performance prediction  using homogeneous ...
An approach for improved students’ performance prediction using homogeneous ...IJECEIAES
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...IRJET Journal
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...ijistjournal
 

Similar to Paper Scopus - The Naive Bayes algorithm for learning data analytics.pdf (20)

activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.ppt
 
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and LimeIRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
 
IRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine LearningIRJET- Student Placement Prediction using Machine Learning
IRJET- Student Placement Prediction using Machine Learning
 
J017256674
J017256674J017256674
J017256674
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Technique
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...
 
Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...Generalization of linear and non-linear support vector machine in multiple fi...
Generalization of linear and non-linear support vector machine in multiple fi...
 
Extending the Student’s Performance via K-Means and Blended Learning
Extending the Student’s Performance via K-Means and Blended Learning Extending the Student’s Performance via K-Means and Blended Learning
Extending the Student’s Performance via K-Means and Blended Learning
 
Student Performance Predictor
Student Performance PredictorStudent Performance Predictor
Student Performance Predictor
 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_Projectreport
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classication
 
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
 
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
 
An approach for improved students’ performance prediction using homogeneous ...
An approach for improved students’ performance prediction  using homogeneous ...An approach for improved students’ performance prediction  using homogeneous ...
An approach for improved students’ performance prediction using homogeneous ...
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
IJET-V4I2P98
IJET-V4I2P98IJET-V4I2P98
IJET-V4I2P98
 
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
 
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
Implementation of Naive Bayesian Classifier and Ada-Boost Algorithm Using Mai...
 

Recently uploaded

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 

Recently uploaded (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 

Paper Scopus - The Naive Bayes algorithm for learning data analytics.pdf

  • 1. THE NAÏVE BAYES ALGORITHM FOR LEARNING DATA ANALYTICS Tran Ngoc Viet Faculty of Information Technology, Van Lang University Ho Chi Minh City, Vietnam viet.tn@vlu.edu.vn Hoang Le Minh Faculty of Information Technology, Van Lang University Ho Chi Minh City, Vietnam minh.hl@vlu.edu.vn Le Cong Hieu Faculty of Information Technology, Van Lang University Ho Chi Minh City, Vietnam hieu.lc@vlu.edu.vn Tong Hung Anh Faculty of Information Technology, Van Lang University Ho Chi Minh City, Vietnam anh.th@vlu.edu.vn Abstract The field of data science analysis in artificial intelligence is being deeply interested by many scientists around the world, many techniques are proposed to improve accuracy from Naive Bayes model by reducing the problem of interdependence between its attributes. The new research of this paper, which is presented step by step by the Naive Bayes (NB) method, is the method of applying NB with a new set of attributes. It is worthy of consideration that using learning data analytics method is receiving increased attention, because of the importance of learning data analytics, in order to provide predictive learning results of learners or offer better solutions to support schools to strengthen educational measures or have optimal plans to increase student retention rates studying at the school, as well as help students succeed in their studies. Data related to student management and training activities are collected from softwares, student affairs and learning management systems are operating in practice such as Edusoft.Net software, Moodle and Microsoft Teams, student attendance system by FaceID face recognition… Research, evaluate and select a number of new data technologies for the purpose of building student digital profile, including document storage functions, specifically intelligent functions such as creating, processing and storing documents. Keywords: Machine learning; Naïve Bayes model; learning data analytics; algorithm; computational; complexity. 1. Introduction In this article, we define learning data analytics by using the Naïve Bayes model (LDAB) with a new algorithm becomes a learning data analysis tool and lecturers can use the data in their courses to track and predict students with high performance. Finally, we discuss clarifying issues and concerns with the use of learning data analytics in higher education. We are solving the problem of classifying data classes and creating datasets by automatic collection, pre- processing of data from Learning Management Systems such as Microsoft Teams, Moodle, E-learning in Van Lang University, Viet Nam for student’s learning data analytics. We are working on a classification problem and have generated your set of hypothesis, created features and discussed the importance of variables. You have hundreds of thousands of data points and quite a few variables in your training data set. I would have used Naive Bayes model, which can be extremely fast relative to other classification algorithms. It works on Bayes theorem e-ISSN : 0976-5166 p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE) DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1038
  • 2. of probability to predict the class of unknown data sets. The basics of this algorithm, so that next time when you come across large data sets, you can bring this algorithm to action. Learning data analytics and dataset is defined as the measurement, collection, analysis, and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. Learning data analytics offers promise for predicting and improving student success and retention [13], [16] in part because it allows faculty, institutions, and students to make data-driven decisions about student success and retention. The study covers how it detected the spam and non-spam messages using Naïve Bayes algorithm [3], [4]. This algorithm allows to handle large number of features. Naïve Bayes model support for handling large number words easily. The data set followed set of processes of machine learning in order to build the model. The data set acquired from the kaggle site and performed pre-processing using many methods including bag-of-words. Split the data into train and test model, it has built the model and evaluated. Hypothesize, create features and discuss the importance of variables from a set of big data. Big data such as hundreds of thousands of data points and quite a few variables in the set of training data. We use the Naive Bayes model which can process extremely fast compared to other categorical data analysis algorithms. Algorithm works based on Bayes theorem of probability to train data and predict classify of unknown set of data. The purpose of this paper is to provide a brief overview of learning data analytics and machine learning algorithms for learning data analytics. We will also explore its usage and applications, goals and examples. 2. Naïve Bayes classifier Naive Bayes is a classification algorithm for multiclass classification problems. It is called Naive Bayes because the calculations of the probabilities for each class are simplified to make their calculations tractable. Naive Bayes classifiers are built on Bayesian classification methods [4]. These rely on Bayes's theorem, equation describing the relationship of conditional probabilities of statistical quantities. In Bayesian classification, we're interested in finding the probability of a label given some observed features. We can write as , tells us how to express this in tern of quantities we can compute more directly (1) The Naïve Bayes classifier assigns to each instance the class value having the highest conditional probability. Given a features vector and a class variable , Bayes Theorem states that (2) the conditional probability that even occurs, given that has occurred. This is also known as the posterior probability. the conditional probability that even occurs, given that has occurre the prior probability of class the prior probability of predictor The likelihood can be decomposed as (3) Multinomial Naive Bayes (4) Parameters: for each document category v and wordtype w prior distribution over document categories v Learning: Estimate parameters as frequency ratios Predictions: Predict class (5)   e-ISSN : 0976-5166 p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE) DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1039
  • 3. 3. Process Model Fig. 1. A dataset is split into training and testing sets 4. Steps LDAB Algorithm Data related to student management and training activities are collected from softwares, student affairs and learning management systems are operating in practice. New data technologies for the purpose of building student digital profile, including document storage functions, specifically intelligent functions such as creating, processing and storing documents. Collected data has been processed, extracted according to the form. Steps to build a basic Naive Bayes Model. Learning an LDAB is quite simple and mainly about esti-mating the parameters in the LDAB from the training data. The learning algorithm for LDAB is depicted as follows. Naive Bayes classifier in 4 easy steps, we will develop each piece of the algorithm in this section, then we will tie all of the elements together into a working implementation applied to a real dataset in the next section. This Naive Bayes classifier is broken down into 4 parts Input: A set Data of training examples Output: An hidden Naive Bayes for Data, new set of data and evaluate the accuracy. Student digital profile, probability results are calculated depending on the level of recommendation or warning to students about each student learning results. Step 1: Gather the dataset/ Build a dataset frame. Collect data by extracting from existing data storage such as MS Teams, FaceID or EduSoft.Net… Manage and organize data including deleting and removing unnecessary data. Step 2: Create the model LDAB in Python (in this example LDAB). Exploiting, processing data, presenting issues that need in-depth analysis and research. Step 3: Predict using Test Dataset. Analyze, train deeply data of the Naïve Bayes multi-class probability model and predict learning outcomes for the next learning period with accurate probability calculation; Step 4: Prediction with a New Set of Dataset and evaluate the accuracy. Report, transform and store the research results. e-ISSN : 0976-5166 p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE) DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1040
  • 4. 5. The Complexity of the Algorithm The independence assumption, Naive Bayes classifiers can quickly learn to use high dimensional features with limited training data compared to more sophisticated methods. Let n is number of training examples, v* is dimensionality of the features and k is number of classes. All it needs to do is computing the frequency of every feature value v* for each class, but space complexity of training is O(k.v*.n) since you need to store the data which also takes time. The computational complexity efficiency of Naive Bayes lies in the fact that the runtime complexity of Naive Bayes classifier is O(k.v*.n). 6. Applications of LDAB Algorithm 6.1. Computational  TRAINING e1:x1 2 1 2 2 2 2 e2:x2 2 2 2 1 2 2 e3:x3 2 2 2 2 2 2 d = |V| = 6 Total 6 5 6 5 6 6 => NY = 34 =>λY 7/40 6/40 7/40 6/40 7/40 7/40 40 = NY + |V| Class Y e4:x4 0 0 1 0 2 1 => NN = 4 =>λN 1/10 1/10 2/10 1/10 3/10 2/10 10 = NN + |V| Class N  TEST e5:x5 = [1, 1, 2, 1, 2, 1] (Y|e5) = x x x x x x ≈ 4.8 x 10-7 (N|e5) = x x x x x x ≈ 1.8 x 10-7 > => Test data 6.2. Probability (Y|e5) = ≈ 0.7273 => (N|e5) = 1 - (Y|e5) ≈ 0.2727 and 6.3. Using the Naïve Bayes model in Python Supervised learning lets us make predictions based on the data that we see and thus apply generalisations >>> from sklearn.naive_bayes import MultinomialNB >>> import numpy as np >>> e1 = [2, 1, 2, 2, 2, 2] #input – training data e-ISSN : 0976-5166 p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE) DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1041
  • 5. >>> e2 = [2, 2, 2, 1, 2, 2] >>> e3 = [2, 2, 2, 2, 2, 2] >>> e4 = [0, 0, 1, 0, 2, 1] >>> training_data = np.array([e1, e2, e3, e4]) >>> result_data = np.array(['Y', 'Y', 'Y', 'N']) >>> e5 = np.array([[1, 1, 2, 1, 2, 1]]) #test data >>> ml = MultinomialNB(alpha=1) #call MultinomialNB >>> ml.fit(training_data, result_data) #process model MultinomialNB(alpha=1, class_prior=None, fit_prior=True) >>> print('Probability of e5:', ml.predict_proba(e5)) Probability of e5: [[0.27079929 0.72920071]] #output - new set of data and evaluate the accuracy >>> print('Predicting class of e5:', str(ml.predict(e5)[0])) Predicting class of e5: Y 7. Conclusion In this article, we build a model for learning data analytics from applying the Naïve Bayes probability formula with the purpose of modeling from real problems that can be applied accurately and effectively. Next, we propose to build an algorithm for learning data analytics and data needs to be tested correctly. Finally, a specific example is presented in detail to illustrate the LDAB algorithm as well as its complexity calculation. Acknowledgements The author would like to thank the Ministry of Technology - Higher Education and President, Van Lang University for financial support to this research. References [1] Webb, G.I., Boughton, J., Wang, Z (2005). Not so naive Bayes: Aggregating onedependence estimators. Machine Learning, 5–24. [2] Witten, I.H., Frank, E (2000). Data mining: practical machine learning tools and techniques with Java implementations. San Francisco, CA: Morgan Kaufmann. [3] A. S. Thanuja Nishadi (2019). Text Analysis: Naïve Bayes Algorithm using Python JupyterLab, International Journal of Scientific and Research Publications, Issue 11. [4] Jake VanderPlas (2014). Frequentism and Bayesianism: A Python-driven Primer. [5] Bhargava, K. Tara Phani (2018). Analysis and Design of Visualization of Educational Institution Database using Power BI Tool. [6] Abdous, M., He, W., & Yen, C (2012). Using data mining for predicting relationships between online question theme and final grade. Educational Technology & Society, 15(3), 77-88. [7] Brandon, K (2009). Investing in Education: The American Graduation Initiative. Office of SociaI Innovation and Civic Participation. [8] Campbell, J. P., DeBlois, P. B., & Oblinger (2007). Academic analytics: A new tool for a new era. EDUCAUSE Review, 42(4), 40-57. [9] Dietz-Uhler, B., Hurn, J.E., & Hurn (2012). Making use of data in an LMS to predict student performance: A learning analytics investigation. Unpublished manuscript. [10] Dringus, L. P (2012). Learning analytics considered harmful. Journal of Asynchronous Learning Networks, 16(3), 87-100. [11] Dyckhoff, A. L., Zielke, D., Bultmann, M., Chatti, M. A., & Schroeder (2012). Design and implementation of a learning analytics toolkit for teachers. Educational Technology & Society, 15(3), 58-76. [12] Brain, D., Webb, G.I. (2002). The need for low bias algorithms in classification learning from large data sets. In: Proc. 16th European Conf. Principles of Data Mining and Knowledge Discovery (PKDD2002), Berlin:Springer-Verlag, 62–73. [13] Dziuban, C., Moskal, P., Cavanaugh, T., & Watts, A (2012). Analytics that inform the university: Using data you already have. Journal of Asynchronous Learning Networks, 16(3), 21-38. [14] Greller, W. & Drachslrer (2012). Translating learning into numbers: A generic framework for learning analytics. [15] Ice, P., Diaz, S., Swan, K., Burgess, M., Sharkey, M., Sherrill, J., Huston, D., & Okimoto, H (2012). The PAR framework proof of concept: Initial findings from a multi-institutional analysis of federated postsecondary data. Journal of Asynchronous Learning Networks, 16(3), 63-86. [16] Long, P. & Siemens, G (2011). Penetrating the fog: Analytics in Learning and Education. [17] Webb, G.I. (2001). Candidate elimination criteria for lazy Bayesian rules. In: Proc. Fourteenth Australian Joint Conf. Artificial Intelligence. Volume 2256., Berlin:Springer, 545–556. [18] Xie, Z., Hsu, W., Liu, Z., Lee, M.L.(2002). Snnb: A selective neighborhood based naive Bayes for lazy learning. In: Advances in Knowledge Discovery and Data Mining, Proc. Pacific-Asia Conference, Berlin:Springer, 104–114. [19] Zhang, N. L (2004). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research 5:697–723. e-ISSN : 0976-5166 p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE) DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1042
  • 6. [20] Fayyad, U.M., Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. 13th Int. Joint Conf. Artificial Intelligence (IJCAI-93), Morgan Kaufmann, 1022–1029. [21] Jones, S. J (2012). Technology review: The possibilities of learning analytics to improve learner centered decision making. The Community College Enterprise, Spring, 89-92. [22] Zheng, Z., and Webb, G. I (2000). Lazy learning of bayesian rules. Journal of Machine Learning 41(1):53–84. [23] Koller, D (2013). Probabilistic Graphical Models. Coursera Stanford – Lectures. Retrieved December 6, from https://class.coursera.org/pgm/lecture [24] Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principals and techniques MIT Press. ISBN 978-0262013192 [25] Koller, D., Friedman, N., Getoor, L., & Taskar, B. (2007). Graphical models in a nutshell. In L. Getoor, & B. Taskar (Eds.), An introduction to statistical relational learning () MIT Press. [26] Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. [27] Jurafsky, Dan and Manning, Christopher (2014). Natural Language Processing. Coursera Stanford - Lectures. Retrieved April 5, from https://class.coursera.org/nlp/lecture/28 [28] Naive Bayes for classifying Text (2014). Retrieved April 5 , from http://www.cs.nyu.edu/faculty/davise/ai/BayesText.html [29] Chow, C.K. & C.N. Liu (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. on Info. Theory, 14, 462- 467. Authors Profile Dr. Tran Ngoc Viet. Born in 1973 in Thanh Khe District, Da Nang City, Vietnam. He graduated from Maths_IT faculty of Da Nang university of science. He got master of science at university of Danang and hold Ph.D Degree in 2017 at Danang university of technology. His main major: Applicable mathematics in transport, parallel and distributed process, data science, machine learning, graph theory and distributed programming. Dr. Hoang Le Minh was born in 1957 in Ha Noi, Vietnam. He got Ph.D Degree in 1984 at Matxcova university of science. He is currently a lecturer at the Faculty of Information Technology at Van Lang University. His main major: Graph theory, data science, machine learning, big data analytics, data pre- processing and analysis, distributed programming. Le Cong Hieu was born in 1979. He graduated from information technology faculty of Ho Chi Minh university of science. He has a master's degree in computer science, graduated in 2014 of Hue university. He is currently a lecturer at the Faculty of Information Technology at Van Lang University. His main major: discrete mathemetics, graph theory, machine learning and distributed programming. Tong Hung Anh was born in 1964 in Hue, Vietnam. He has a master's degree in computer networking, graduated in 2011 university Paris VI. He is currently a lecturer at the Faculty of Information Technology at Van Lang university. His main major: Python for machine learning, distributed programming, mathemetics and statistics for data science, data pre-processing and analysis. e-ISSN : 0976-5166 p-ISSN : 2231-3850 Tran Ngoc Viet et al. / Indian Journal of Computer Science and Engineering (IJCSE) DOI : 10.21817/indjcse/2021/v12i4/211204191 Vol. 12 No. 4 Jul-Aug 2021 1043