SlideShare a Scribd company logo
1 of 56
5/25/2020 Rubric Detail – 31228.202030
https://ucumberlands.blackboard.com/webapps/rubric/do/course/
gradeRubric?mode=grid&isPopup=true&rubricCount=1&prefix=
_843783_1&course_i… 1/4
Rubric Detail
A rubric lists grading criteria that instructors use to evaluate
student work. Your instructor linked a rubric to this item
and made it available to you. Select Grid View or List View to
change the rubric's layout.
Show Descriptions Show Feedback
Name: ITS836 (8 Week) Research Paper Rubric
Description: Please use this rubric for grading research papers
Exit
Grid View List View
No requirements are met
Includes a few of the required components as speci�ed in the
assignment.
Includes some of the required components as speci�ed in the
assignment.
Includes most of the required components as speci�ed in the
assignment.
Includes all of the required components as speci�ed in the
assignment.
Requirements
--
No Evidence 0 (0.00%) points
Limited Evidence 3 (3.00%) points
Below Expectations 7 (7.00%) points
Approaches Expectations 11 (11.00%) points
Meets Expectations 15 (15.00%) points
Fails to provide enough content to show a demonstration of
knowledge
Major errors or omissions in demonstration of knowledge.
Some signi�cant but not major errors or omissions in
demonstration of knowledge.
A few errors or omissions in demonstration of knowledge.
Demonstrates strong or adequate knowledge of the materials;
correctly represents knowledge
from the readings and sources.
Content
--
No Evidence 0 (0.00%) points
Limited Evidence 3 (3.00%) points
Below Expectations 7 (7.00%) points
Approaches Expectations 11 (11.00%) points
Meets Expectations 15 (15.00%) points
5/25/2020 Rubric Detail – 31228.202030
https://ucumberlands.blackboard.com/webapps/rubric/do/course/
gradeRubric?mode=grid&isPopup=true&rubricCount=1&prefix=
_843783_1&course_i… 2/4
g
Fails to provide a critical thinking analysis and interpretation
Major errors or omissions in analysis and interpretation.
Some signi�cant but not major errors or omissions in analysis
and interpretation.
A few errors or omissions in analysis and interpretation.
Provides a strong critical analysis and interpretation of the
information given.
Critical Analysis
--
No Evidence 0 (0.00%) points
Limited Evidence 5 (5.00%) points
Below Expectations 10 (10.00%) points
Approaches Expectations 15 (15.00%) points
Meets Expectations 20 (20.00%) points
Fails to demonstrate problem solving.
Major errors or omissions in problem solving.
Some signi�cant but not major errors or omissions in problem
solving.
A few errors or omissions in problem solving.
Demonstrates strong or adequate thought and insight in problem
solving.
Problem Solving
--
No Evidence 0 (0.00%) points
Limited Evidence 5 (5.00%) points
Below Expectations 10 (10.00%) points
Approaches Expectations 15 (15.00%) points
Meets Expectations 20 (20.00%) points
Source or example selection and integration of knowledge from
the course is clearly de�cient.
Sources or examples meet required criteria and are poorly
chosen to provide substance and
perspectives on the issue under examination.
Sources or examples meet required criteria but are less than
adequately chosen to provide
substance and perspectives on the issue under examination.
Sources or examples meet required criteria but are less than
adequately chosen to provide
substance and perspectives on the issue under examination.
Sources/Examples
--
No Evidence 0 (0.00%) points
Limited Evidence 2 (2.00%) points
Below Expectations 4 (4.00%) points
Approaches Expectations 7 (7.00%) points
5/25/2020 Rubric Detail – 31228.202030
https://ucumberlands.blackboard.com/webapps/rubric/do/course/
gradeRubric?mode=grid&isPopup=true&rubricCount=1&prefix=
_843783_1&course_i… 3/4
Sources or examples meet required criteria and are well chosen
to provide substance and
perspectives on the issue under examination.
Meets Expectations 10 (10.00%) points
Project is not organized or well written, and is not in proper
paper format. Poor-quality work;
unacceptable in terms of grammar and spelling.
Project is poorly organized; does not follow proper paper
format. Inconsistent to inadequate
sentence and paragraph development; numerous errors in
grammar and spelling.
Project is adequately organized and written, and is in proper
format as outlined in the
assignment. Reasonably good sentence and paragraph structure;
signi�cant number of errors in
grammar and spelling.
Project is fairly well organized and written, and is in proper
format as outlined in the assignment.
Reasonably good sentence and paragraph structure; signi�cant
number of errors in grammar
and spelling.
Demonstrates strong or adequate thought and insight in problem
solving.
Organization, Grammar, Style
--
No Evidence 0 (0.00%) points
Limited Evidence 2 (2.00%) points
Below Expectations 4 (4.00%) points
Approaches Expectations 7 (7.00%) points
Meets Expectations 10 (10.00%) points
Numerous errors in APA formatting, with more than eight
signi�cant errors.
Numerous errors in APA formatting, with more than �ve
signi�cant errors.
Signi�cant errors in APA formatting, with four to �ve
signi�cant errors.
Sources or examples meet required criteria but are less than
adequately chosen to provide
substance and perspectives on the issue under examination.
Sources or examples meet required criteria and are well chosen
to provide substance and
perspectives on the issue under examination.
Proper use of APA formatting
--
No Evidence 0 (0.00%) points
Limited Evidence 2 (2.00%) points
Below Expectations 4 (4.00%) points
Approaches Expectations 7 (7.00%) points
Meets Expectations 10 (10.00%) points
Name:ITS836 (8 Week) Research Paper Rubric
Description:Please use this rubric for grading research papers
Exit
5/25/2020 Rubric Detail – 31228.202030
https://ucumberlands.blackboard.com/webapps/rubric/do/course/
gradeRubric?mode=grid&isPopup=true&rubricCount=1&prefix=
_843783_1&course_i… 4/4
1
Learning Analytics or Educational Data Mining? This is the
Question...
Daniela Marcu
Ștefan cel Mare University of Suceava
Str. Universității 13, Suceava 720229
Phone: 0230 216 147
[email protected]
Mirela Danubianu
Ștefan cel Mare University of Suceava
Str. Universității 13, Suceava 720229
Phone: 0230 216 147
[email protected]
Abstract
In full expansion, a vital area such as education could not
remain indifferent to the use of
information and communication technology. Over the past two
decades we have witnessed the
emergence and development of e-learning systems, the
proliferation of MOOCs, and generally the
rise of Technology Enhanced Education. All of these
contributed to generation and storage of
unprecedented volumes of data concerning all areas of learning.
At the same time, domains such as data mining and big data
analytics have emerged and
developed. Their applications in education have spawned new
areas of research such as educational
data mining or learning analytics.
As an interdisciplinary research area Educational Data Mining
(EDM) aims to explore data
from educational environment to build models based on which
students' behavior and results are
better understood. In fact, EDM is a complex process that
consists of a few steps grouped in three
stages: data preprocessing, modelling and postprocessing. It
transforms raw data from educational
environments in useful information that could influence in a
positive way the educational process.
According to Society for Learning Analytics Research (SoLAR)
which took over the
wording of the first International Conference on Learning
Analytics and Knowledge, learning
analytics is ”the measurement, collection, analysis and reporting
of data about learners and their
contexts for purposes of understanding and optimizing learning
and the environments in which it
occurs” (Siemens, 2011).
This paper proposes a comparative study of the two concepts:
EDM and learning analytics.
Due to certain voices in the scientific environment that claim
that the two terms refer to the
same thing, we want to emphasize the similarities and
differences between them, and how each one
can serve to raise the quality in educational processes.
Keywords : EDM; LA; Data Mining; Education.
1. Introduction
The educational community has an interest in the great potential
of education. Why are
researchers so enthusiastic about this? The answer is simple.
Seeing the impact of applying data
mining to exploiting large data volumes and analyzing data
from areas such as the business
environment, social media, and other scientific areas, we can
think of the benefits for the education
system. If we could adapt the methods of finding models in the
data, used for analyzing the online
activity of clients and social media users for the educational
environment, we could get closer
evidence of reality on the activities of the training system.
The widespread use of computer-based pre-university learning,
the development of Web-
based courses, are additional reasons for EDM and LA research.
Designing educational policies based on practical evidence
provided by researchers can
bring benefits to the educational system.
BRAIN – Broad Research in Artificial Intelligence and
Neuroscience
Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957
2
The exploitation of large volumes of data from different
domains is done using specific
techniques and methods. It helps to develop tools to facilitate
progress in these areas.
The science of extracting useful information from large volumes
of data is called Data
Mining (DM) (Hand, Mannila & Smyth, 2001).
The concept is based on three key areas: statistics, artificial
intelligence and machine
learning (Figure 1).
Figure 1. Data Mining
Initially, DM used statistical algorithms. Specific techniques
such as decision trees,
association rules, clustering, artificial neural networks, and
others have been developed (Șușnea,
2012).
Applying exploitation methods for educational system data to
build models to better
understand students' behavior and outcomes is named
Educational Data Mining (EDM). Since data
and education issues are different from those in other areas,
classical DM methods have been
improved and supplemented with EDM specific methods
(Romero & Ventura, 2007). According to
some authors, there are four areas of application of EDM aimed
at: improving student modeling and
domain modeling, e-learning and scientific research (Baker,
2012).
In order to better understand learning, data from pupils and
from the educational
environment is measured, collected and analyzed. This is the
learning analysis and is a related field
of EDM. Among the Learning Analytics (LA) methods we can
list:
Buckingham Shum, 2012).
In the following sections we propose to detail relevant aspects
about EDM and LA in order
to provide viable arguments in a comparative study of the two
concepts.
2. Educational Data Mining
Over the past 10 years, the field of research aimed to exploit the
unique types of data from
education has developed quite internationally. In 2011, in
Massachusetts USA, the International
EDM Working Group (established in 2007) created the
International Society for EDM (online:
http://educationaldatamining.org/about/). Romania is, however,
at a pioneering stage in EDM.
There is currently a growing interest in using computers in
learning and Web-based training. With
the rapid increase in the volume of learning software resources,
the Romanian educational system
also accumulates huge amounts of data from students, teachers,
parents, libraries, secretariats, etc.
Getting the information needed to build models to improve the
quality of managerial decisions
becomes one of the greatest challenges of the present.
Traditional research in the field of education is time-consuming
and often non-ecological
through the waste of material resources. Developing an
experimental study, such as combating
school absenteeism, involves firstly the selection of schools,
teachers and pupils. It follows the
definition of strategies that lead to the identification of sources
of school stress, increasing the
D. Marcu, M. Danubianu - Learning Analytics or Educational
Data Mining? This is the Question...
3
motivation of students to attend classes, trust in school, family,
and so on. However, the studies
depend on context, class, geography, economic development,
teacher-student relationships.
Changing any parameter can lead to very different conclusions.
Soon there may be new factors that
could not be taken into consideration earlier in the demotivation
of students towards school. Making
traditional new studies for this topic involves the use of
important temporal resources.
By comparison, EDM proves to be more efficient. The analysis
of existing data in the
educational system through the use of specific EDM methods
allows the identification of new
models for new contexts. An enormous advantage is that the
same methods can be applied to
different data generating specific results without the need for
new analysis strategies.
More specifically, let's take the example of a course designed
for web-based training
(Romero, Ventura, De Bra, 2004). Traditionally, evaluating the
effectiveness of a course is done by
analyzing the results obtained by the student upon completion
of the course, which does not
necessarily lead to the improvement of the material or methods
and teaching tools used for the
future course versions. In fact, in the Romanian pre-university
system, the updating of educational
programs and educational resources does not present the
periodicity expected by the society.
What would it be like the knowledge of EDM data exploitation?
EDM methods aim at
discovering correlation rules between course components
(content, questions, various activities) and
student activities. In the Knowledge Discovery with Genetic
Programming for providing feedback
to the courseware author, C. Romero, S. Ventura and P. Bra
describe the four main steps in
building a software based on EDM (Romero, Ventura, De Bra,
2004): development, use,
discovering knowledge, improving
Other classification has three stages: preprocessing, data
exploitation and post processing
[3]. The cycle of these steps is illustrated in Figure 2.
Figure 2. Stages of the process of converting data into
information
If we refer again to the analysis of the efficiency of a course, in
the first stage, the
preprocessing is performed various operations such as:
formation on
pedagogical and methodological
aspects
time spent in the course, the
sections visited, the scores obtained and other interactions
appropriate for processing.
In the next step, EDM-specific algorithms are applied to obtain
different correlation rules.
The models will provide information in different formats for
analysis: numerical results of the
coefficients, tables, diagrams, correlation matrices (an example
is illustrated in Appendix 1 -
Correlation matrix obtained with the DataLab application based
on the results of the Olympiad of
computer science).
One of the most important rules for discovering knowledge is
if-else. Several such rules can
be defined in EDM: Association, Classification and Prediction
(Klosgen & Zytkow, 2002).
BRAIN – Broad Research in Artificial Intelligence and
Neuroscience
Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957
4
The teacher will analyze the results of the analyzes and study
the degree of achievement of
the initial goals.
Depending on the conclusions, it may take the decision to
improve the course and resume its
evaluation process. This may prove to be a difficult process
because opinions can differ
significantly from one teacher to another in relation to the
material and the way of interaction with
the student the course offers.
3. Methods of data exploitation
There are currently a wide variety of methods of exploiting data
in the education system.
These can be categorized into two broad categories according to
the ways to achieve the objectives:
ification, Regression, Outlier
Detecting
Discovery of data for human
judgment (Sasu, 2014).
Many of these are general DM methods: prediction,
classification, grouping, exploitation of
texts and others. But there are also specific EDM methods such
as nonnegative matrix factorization
and Knowledge tracing (KT) (Romero & Ventura, 2012). Here
are some of these:
Prediction
The method can be used in education to predict students'
behavior and outcomes. It is based
on the creation of predictive models. In the training phase, they
learn to make predictions about a
set of variables called predictors by analyzing them in
combination with other variables. Once the
enrollment phase is completed, the patterns can be applied to
the data sets for which the prediction
is to be applied. It is known the study by Baker, Gowda, Corbett
- Automatically detecting the
student's preparation for future learning: help use is key (Baker,
Gowda & Corbett, 2011). The
authors create a tool for automatically predicting a student's
future performance on the basis of
establishing positive or negative correlations between various
features such as: student test results,
time spent in response, time elapsed between receiving a clue
and typing the answer, and others. It
is experienced on a group of students, and then applied to
another group. The results are then
compared to those obtained using the Bayesian Knowledge
Tracing (BKT) model.
Classification
The method involves building a predictive model. The data in
the training set is
characterized by certain attributes. The model must identify
belonging to a class based on the set of
attributes. Suppose we built an educational software as an
interactive game for a given theme.
Based on user attributes such as age, gender, geographic area,
duration until the game is completed,
number of attempts we can build a classifier, and determine the
user's belonging to a specific class.
The model will learn to identify students. The analyzes can
provide information on the need to use
this educational method for certain age groups, interests and
education.
Methods that use the classification are: decision trees, neural
networks, bayesian
classifications, and others.
Clustering
The method involves building patterns that identify data
clustering after certain similarities.
For the model to provide quality predictions, the similarities
inside class must be maximized and
similarities between classes minimized.
The use of this method in Romanian high school education
could aim at grouping pupils
according to the pupil's learning style (auditory, visual,
practical - kinesthesis) based on the analysis
of behavior in relation to certain educational products and
pupils' characteristics. The prediction of
such a model could lead to an effective recommendation of how
to learn educational content. Thus,
the instructional process could be carried out efficiently in
relation to the learning particularities of
each student. At present, there is an attempt to unfold the
lessons in a way appropriate to the
D. Marcu, M. Danubianu - Learning Analytics or Educational
Data Mining? This is the Question...
5
students' learning styles, but the reality is that identifying
learning styles is superficial. The results
of the questionnaires are attached to the class catalog, but this
does not lead, in most cases, to the
improve teaching methods and techniques used in the lesson. In
the absence of clear alternatives,
the teacher has to improvise.
The method is successfully used in the detection of plagiarism
(Text Mining) and is also
applied in the educational sphere.
Outlier Detection
The method involves creating patterns that detect data that have
different features than
others. In Romanian education, this method could be used to
detect students with content
assimilation problems, or those with aberrant behavior.
In general, not only one EDM method is used in case studies.
Outlier Detection methods can
be used, for example, with data clustering techniques and
decision tree classification as presented in
the study by Ajith, Sai and Tejaswi (2013) - Evaluation of
student performance: an outlier detection
perspective (Ajith, Sai & Tejaswi, 2013). The study aims to
identify learners with special learning
needs to reduce the school failure rate. Input data are collected
from: participation in student
lessons, tests, notes on initial tests. In order to achieve the
proposed objective, they try to find
models for classifying students who will be helpful in setting up
study groups.
At present, in Romania, students in the high school education of
state do not have the
opportunity to trace the course matter in other groups than the
classes they belong to. Moreover,
pupils diagnosed as having special educational needs participate
in classes with other colleagues.
The teachers create for them specially programs. Then the
courses are held by under the guidance of
a single teacher who does not have any pedagogical and
methodical experience related to the
learning situation! There are special requirements for
conducting the educational process. This
based on grouping students within the same educational space
within the same timeframe to go
through different course materials. In the absence of a proper
classification, alternative methods and
means, and teachers with such experience, things happen more
or less in a manner that leads to the
best results.
Discovery with Models
Discovery with Models is the fifth category presented in Baker's
Taxonomy (Baker, 2012).
It is also one of the most widely used methods of data
exploitation in the field of education. It is
based on the use of a previously validated model as a
component in analyzes that use prediction or
exploitation of relationships in new contexts (Baker & Yacef,
2009). In this way information on
educational materials that contribute most to educational
progress can be obtained. A study carried
out by Beck and Mostow in 2008 - How who should practice:
Using learning decomposition to
evaluate the efficacy of different types of practice for different
types of students (Beck & Mostow,
2008) - on the analysis of different types of learners
demonstrates that the method supports
identifying relationships between student behavior and
characteristics of variables used.
Nonnegative Matrix Factorization (or Decomposition)
There are several algorithms used for factoring the nonnegative
matrix. This transforms
(decomposes, factorizes) a matrix V into two W and H matrices
with the property that they all have
non-negative elements. This is very useful in applications such
as determining the effectiveness of
an evaluation system in which matrices contain elements related
to: exams, abilities, and items.
Matrix V is obtained from the product of the two smaller
matrices as can be seen in Figure 3.
("Non-negative matrix factorization", 2019).
Figure 3. Illustration of approximate non-negative matrix
factorization. Source: wikipedia.org
BRAIN – Broad Research in Artificial Intelligence and
Neuroscience
Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957
6
We propose to study the evaluation of two specific abilities
defined on the columns of the
matrix W for 4 work requirements (items), defined in the W
matrix on the four lines.
Matrix H will contain two lines representing the two abilities
and 6 columns representing the
assessed students.
The result will be recorded in Matrix V that has 4 lines for each
of the 4 items and 6
columns for each of the 6 students.
A value of 1 in the W matrix indicates the need for a certain
skill (Figure 4) (Desmarais,
2012).
W
I
te
m
s
skills
0 1
1 0
1 0
1 1
X
H
sk
il
ls
students
1 1 1 0 1 1
0 0 1 1 0 0
≈
V
it
em
s
students
0 0 1 1 0 0
1 1 1 0 1 1
1 1 1 0 1 1
1 1 2 1 1 1
Figure 4. Non-negative matrix factorization - example
The first item requires the ability 2, W [1] [2] = 1. Only the 2
and 3 students have the ability
2, so item 1 will not be promoted by students 1, 2, 4 and 5.
To promote Item 4 both skills are required. Only one of the
candidates will promote this
item with the maximum score.
Using computerized analysis methods, interpretations can be
obtained in a much shorter
time and with great accuracy because machines are faster and
more accurate than humans.
4. Learning Analysis (LA)
Learning is the product of an interaction between learners and
the learning environment,
between among students / educators / teachers and others (Elias
& Lias, 2011).
The evaluation of learning, in the traditional sense, is based on
the evaluation of student /
pupil outcomes. This involves assessing knowledge but also
trying to answer questions such as:
how well this student needs, how can be improved, how to
change the course interface to make it
more accessible. At present, especially in the pre-university
system, learning evaluation is based on
questionnaires. Obtaining feed-back is lasting because the non-
automatic data processing takes time
and the analysis possibilities are quite limited.
The desire to improve the quality of learning and assessment in
the educational system is
increasing at the international level, but also in our country.
Traditional systems are confronted by
huge amounts of data and their diversity. Learning Analytics
(LA) attempts to answer questions
about how this data can be used and how it can be transformed
and analyzed to provide useful
information that can give value to the learning process (Liu &
Fan, 2014).
In 2011, at the first International Conference on Learning
Analysis (LAK 2011), the
definition of the new research area, LA, was adopted as:
"learning analysis is the measurement,
collection, analysis and reporting of pupils and students and
about the context of learning, in order
to understand and optimize learning and its environments "
(Siemens, 2011).
Data analytics was first used in sales, also called Business
Intelligence. This branch of research
uses computer techniques to synthesize huge amounts of data
and turn them into powerful tools for
making the best marketing decisions.
With the development of Web technologies, a branch of data
analysis research, Web Analytics,
has been developed. Web Analytics tools collect data about
users of a site and report on their behavior.
This leads to a better understanding of customers and making
the best decisions to improve your
browsing experience and to keep visitors to the site.
D. Marcu, M. Danubianu - Learning Analytics or Educational
Data Mining? This is the Question...
7
Learning Analytics borrows tools and methods used in Business
Intelligence and Web Analytics
to analyze educational data.
At present, many universities, companies, and organizations are
developing learning platforms
for both students and lifelong learning. An enormous advantage
of these is to personalize the learning
experience and adapt it to the physical deficiencies of the
learners.
In a research conducted by the New Media Consortium and the
EDUCAUSE Learning Initiative
in 2016, areas that will have a particular impact on university
education globally by 2020 are identified.
One of these is Learning Analytics. In the research report LA is
defined as an application in the
educational field of Web Analytics. It focuses on the collection
and detailed analysis of student
interactions with online learning platforms (Johnson, Adams
Becker & Cummins, 2016).
A free example of a Web Analytics tool is provided by Google
and is called Google Analytics. It
provides sophisticated user behavior on a website and provides
its administrators with reports about:
many of them
are new customers;
With these reports, can create additional features, add more
interesting content, enhance
interactivity, customize the interface of the application based on
the devices used for viewing.
In the following figures (5,6,7) there are illustrated sections of
various reports provided by
this tool for the site https://www.modinfo.ro - a site dedicated
to the preparation of the students
from the Romanian high schools at the course of computer
science.
Figure 5 provides a diagram representation of the number of
visitors per page of the site. We
note that students are looking for baccalaureate content
(bac.php), admission to faculty
(admission.php) and additional training for performance
(cex.php).
Figure 5. User preferred content
Figure 6 represents the percentage of visitors to the site over a
fixed period, by age category.
It can be seen that most users are aged between 25 and 34 years.
For administrators, given the
period under review, this reveals their student’s preoccupation
for to prepare for the Computer
Programming Exam.
BRAIN – Broad Research in Artificial Intelligence and
Neuroscience
Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957
8
Figure 6. Demographics and interest categories - Age of users
Figure 7 provides information on analyzing the active presence
of a specific user on a site
within a selected time interval.
Figure 7. Behavior of a user on the site within a selected time
range
Choosing how to use and constructing analytics tools starts
from the choice of quantifiable
indicators that have to be defined according to the proposed
objectives. Examples of such indicators
for the educational environment:
tool within the course and
others.
4.1. Learning Analytics methods
Methods used for learning analysis include:
quality of the expression is analyzed.
ty in relation to learning: Students
interested in the topic will ask questions,
access links to supplementary resources
motivational learning.
LA uses some methods of data mining as EDM. They can be
classified in: Prediction,
Clustering, Relationship mining, Discovery with models,
Distillation of Data for Human Judgment
(Nunn, Avella, Kanai, & Kebritchi, 2016).
We will briefly describe the methods that have not already been
presented in the previous
section.
D. Marcu, M. Danubianu - Learning Analytics or Educational
Data Mining? This is the Question...
9
Relationship mining
It's a method that uses algorithms to find association rules to
detect, for example, mistakes
made by students when solving a set of exercises. Based on the
associations made, one can predict a
certain behavior of the student depending on the hypothesis of
solving the problem from which he
starts. Thus, the teacher or course manager can intervene in
order for the pupil / student not to
mistaken. There can be found, for example, relationships
between other activities of the student
(playing on the computer, talking to a chat room colleague)
while solving his or her work tasks and
erroneous answers (Baker, Corbett, Koedinger & Wagner,
2004).
Distillation of Data for Human Judgment
This method includes statistics and visualization techniques that
help people understand data
analytics. The method is the basis for the creation of many
useful tools that provide clear analysis
that can be quickly understood by unrelated users.
An example is the formation of a map to group learners by the
amount of heat emanating
from their bodies during learning the instructional material.
This can be done with sensors mounted
on the body. The analysis provides real-time learning about
learning performance indicators
(Merceron, 2015).
5. Learning Analytics or Educational Data Mining?
Educational Data Mining is a new field of research. It is based
on the models, methods and
algorithms built for DM. However, there are also specific
methods of applying DM in education.
The main purpose of EDM is to explore large sets of data from
the educational system to create
knowledge-extraction models from the data. The main objective
is to provide useful information to
education decision makers about existing correlations between
sets of data that provide a deeper
understanding of the educational needs of students and the
system as a whole (de Almeida Neto &
Castro, 2017).
Learning Analytics is a newer field of research. It is based on
data analysis techniques in
Business Intelligence. LA uses highly sophisticated analysis
tools and predictive models to improve
learning. Most applications using LA have been created for the
university system and are dedicated
to early detection of concrete problems such as the risk of
abandoning a course by certain students.
LA also uses the expertise of other research areas, such as EDM
and Web Analytics, with the same
objectives of predicting learning outcomes and providing useful
information for improving the
quality of the learning process (Elias & Lias, 2011).
EDM is at the intersection of areas such as artificial
intelligence, machine learning,
education, and statistics.
Figure 8 shows the LA as an interdisciplinary subdomain of
Business Intelligence, Statistics
and Education.
Figure 8. Educational Data Mining and Learning Analytics
BRAIN – Broad Research in Artificial Intelligence and
Neuroscience
Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957
10
The two new areas of research are quite similar in terms of the
aims pursued an methods
used, but there are also some significant differences between
them. Some of the most important
resemblances and differences between EDM and LA are shown
in Tables 1 and 2.
Table 1. Similarities between EDM and LA
EDM LA
Both areas contribute to improving the quality of education and
education policies in schools and universities, but in
alternative education systems as well.
It is a new field of research. In 2011, in Massachusetts USA, the
International Working Group on EDM (established in 2007)
created the International Society for EDM.
The definition of this new field of research was
adopted in 2011 at the first International Conference
on Learning Analytics (LAK 2011).
It is based on the exploitation of large data collections. It is
based on analysis of large data collections.
It is based on the formulation of specific research …
Uncertainty in big data analytics: survey,
opportunities, and challenges
Reihaneh H. Hariri* , Erik M. Fredericks and Kate M. Bowers
Introduction
According to the National Security Agency, the Internet
processes 1826 petabytes (PB)
of data per day [1]. In 2018, the amount of data produced every
day was 2.5 quintil-
lion bytes [2]. Previously, the International Data Corporation
(IDC) estimated that the
amount of generated data will double every 2 years [3], however
90% of all data in the
world was generated over the last 2 years, and moreover Google
now processes more
than 40,000 searches every second or 3.5 billion searches per
day [2]. Facebook users
upload 300 million photos, 510,000 comments, and 293,000
status updates per day [2, 4].
Needless to say, the amount of data generated on a daily basis is
staggering. As a result,
techniques are required to analyze and understand this massive
amount of data, as it is a
great source from which to derive useful information.
Abstract
Big data analytics has gained wide attention from both academia
and industry as the
demand for understanding trends in massive datasets increases.
Recent developments
in sensor networks, cyber-physical systems, and the ubiquity of
the Internet of Things
(IoT) have increased the collection of data (including health
care, social media, smart
cities, agriculture, finance, education, and more) to an
enormous scale. However, the
data collected from sensors, social media, financial records, etc.
is inherently uncer-
tain due to noise, incompleteness, and inconsistency. The
analysis of such massive
amounts of data requires advanced analytical techniques for
efficiently reviewing and/
or predicting future courses of action with high precision and
advanced decision-
making strategies. As the amount, variety, and speed of data
increases, so too does the
uncertainty inherent within, leading to a lack of confidence in
the resulting analytics
process and decisions made thereof. In comparison to traditional
data techniques and
platforms, artificial intelligence techniques (including machine
learning, natural lan-
guage processing, and computational intelligence) provide more
accurate, faster, and
scalable results in big data analytics. Previous research and
surveys conducted on big
data analytics tend to focus on one or two techniques or specific
application domains.
However, little work has been done in the field of uncertainty
when applied to big data
analytics as well as in the artificial intelligence techniques
applied to the datasets. This
article reviews previous work in big data analytics and presents
a discussion of open
challenges and future directions for recognizing and mitigating
uncertainty in this
domain.
Keywords: Big data, Uncertainty, Big data analytics, Artificial
intelligence
Open Access
© The Author(s) 2019. This article is distributed under the
terms of the Creative Commons Attribution 4.0 International
License
(http://creat iveco mmons .org/licen ses/by/4.0/), which permits
unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s)
and the source, provide a link to the Creative Commons license,
and
indicate if changes were made.
SURVEY PAPER
Hariri et al. J Big Data (2019) 6:44
https://doi.org/10.1186/s40537-019-0206-3
*Correspondence:
[email protected]
edu
Oakland University,
Rochester, MI, USA
http://orcid.org/0000-0003-2173-1331
http://creativecommons.org/licenses/by/4.0/
http://crossmark.crossref.org/dialog/?doi=10.1186/s40537-019-
0206-3&domain=pdf
Page 2 of 16Hariri et al. J Big Data (2019) 6:44
Advanced data analysis techniques can be used to transform big
data into smart data
for the purposes of obtaining critical information regarding
large datasets [5, 6]. As such,
smart data provides actionable information and improves
decision-making capabilities
for organizations and companies. For example, in the field of
health care, analytics per-
formed upon big datasets (provided by applications such as
Electronic Health Records
and Clinical Decision Systems) may enable health care
practitioners to deliver effective
and affordable solutions for patients by examining trends in the
overall history of the
patient, in comparison to relying on evidence provided with
strictly localized or current
data. Big data analysis is difficult to perform using traditional
data analytics [7] as they
can lose effectiveness due to the five V’s characteristics of big
data: high volume, low
veracity, high velocity, high variety, and high value [7–9].
Moreover, many other charac-
teristics exist for big data, such as variability, viscosity,
validity, and viability [10]. Several
artificial intelligence (AI) techniques, such as machine learning
(ML), natural language
processing (NLP), computational intelligence (CI), and data
mining were designed to
provide big data analytic solutions as they can be faster, more
accurate, and more pre-
cise for massive volumes of data [8]. The aim of these advanced
analytic techniques is
to discover information, hidden patterns, and unknown
correlations in massive datasets
[7]. For instance, a detailed analysis of historical patient data
could lead to the detection
of destructive disease at an early stage, thereby enabling either
a cure or more optimal
treatment plan [11, 12]. Additionally, risky business decisions
(e.g., entering a new mar-
ket or launching a new product) can profit from simulations that
have better decision-
making skills [13].
While big data analytics using AI holds a lot of promise, a wide
range of challenges
are introduced when such techniques are subjected to
uncertainty. For instance, each of
the V characteristics introduce numerous sources of uncertainty,
such as unstructured,
incomplete, or noisy data. Furthermore, uncertainty can be
embedded in the entire ana-
lytics process (e.g., collecting, organizing, and analyzing big
data). For example, dealing
with incomplete and imprecise information is a critical
challenge for most data mining
and ML techniques. In addition, an ML algorithm may not
obtain the optimal result if
the training data is biased in any way [14, 15]. Wang et al. [16]
introduced six main chal-
lenges in big data analytics, including uncertainty. They focus
mainly on how uncertainty
impacts the performance of learning from big data, whereas a
separate concern lies in
mitigating uncertainty inherent within a massive dataset. These
challenges normally pre-
sent in data mining and ML techniques. Scaling these concerns
up to the big data level
will effectively compound any errors or shortcomings of the
entire analytics process.
Therefore, mitigating uncertainty in big data analytics must be
at the forefront of any
automated technique, as uncertainty can have a significant
influence on the accuracy of
its results.
Based on our examination of existing research, little work has
been done in terms of
how uncertainty significantly impacts the confluence of big data
and the analytics tech-
niques in use. To address this shortcoming, this article presents
an overview of the
existing AI techniques for big data analytics, including ML,
NLP, and CI from the per-
spective of uncertainty challenges, as well as suitable directions
for future research in
these domains. The contributions of this work are as follows.
First, we consider uncer-
tainty challenges in each of the 5 V’s big data characteristics.
Second, we review several
Page 3 of 16Hariri et al. J Big Data (2019) 6:44
techniques on big data analytics with impact of uncertainty for
each technique, and also
review the impact of uncertainty on several big data analytic
techniques. Third, we dis-
cuss available strategies to handle each challenge presented by
uncertainty.
To the best of our knowledge, this is the first article surveying
uncertainty in big data
analytics. The remainder of the paper is organized as follows.
“Background” section pre-
sents background information on big data, uncertainty, and big
data analytics. “Uncer-
tainty perspective of big data analytics” section considers
challenges and opportunities
regarding uncertainty in different AI techniques for big data
analytics. “Summary of mit-
igation strategies” section correlates the surveyed works with
their respective uncertain-
ties. Lastly, “Discussion” section summarizes this paper and
presents future directions of
research.
Background
This section reviews background information on the main
characteristics of big data,
uncertainty, and the analytics processes that address the
uncertainty inherent in big data.
Big data
In May 2011, big data was announced as the next frontier for
productivity, innovation,
and competition [11]. In 2018, the number of Internet users
grew 7.5% from 2016 to over
3.7 billion people [2]. In 2010, over 1 zettabyte (ZB) of data
was generated worldwide
and rose to 7 ZB by 2014 [17]. In 2001, the emerging
characteristics of big data were
defined with three V’s (Volume, Velocity, and Variety) [18].
Similarly, IDC defined big
data using four V’s (Volume, Variety, Velocity, and Value) in
2011 [19]. In 2012, Veracity
was introduced as a fifth characteristic of big data [20–22].
While many other V’s exist
[10], we focus on the five most common characteristics of big
data, as next illustrated in
Fig. 1.
Volume refers to the massive amount of data generated every
second and applies to the
size and scale of a dataset. It is impractical to define a universal
threshold for big data
volume (i.e., what constitutes a ‘big dataset’) because the time
and type of data can influ-
ence its definition [23]. Currently, datasets that reside in the
exabyte (EB) or ZB ranges
are generally considered as big data [8, 24], however challenges
still exist for datasets in
smaller size ranges. For example, Walmart collects 2.5 PB from
over a million custom-
ers every hour [25]. Such huge volumes of data can introduce
scalability and uncertainty
problems (e.g., a database tool may not be able to accommodate
infinitely large datasets).
Many existing data analysis techniques are not designed for
large-scale databases and
can fall short when trying to scan and understand the data at
scale [8, 15].
Variety refers to the different forms of data in a dataset
including structured data,
semi-structured data, and unstructured data. Structured data
(e.g., stored in a rela-
tional database) is mostly well-organized and easily sorted, but
unstructured data
(e.g., text and multimedia content) is random and difficult to
analyze. Semi-structured
data (e.g., NoSQL databases) contains tags to separate data
elements [23, 26], but
enforcing this structure is left to the database user. Uncertainty
can manifest when
converting between different data types (e.g., from unstructured
to structured data),
in representing data of mixed data types, and in changes to the
underlying struc-
ture of the dataset at run time. From the point of view of
variety, traditional big data
Page 4 of 16Hariri et al. J Big Data (2019) 6:44
analytics algorithms face challenges for handling multi-modal,
incomplete and noisy
data. Because such techniques (e.g., data mining algorithms) are
designed to consider
well-formatted input data, they may not be able to deal with
incomplete and/or dif-
ferent formats of input data [7]. This paper focuses on
uncertainty with regard to big
data analytics, however uncertainty can impact the dataset itself
as well.
Efficiently analysing unstructured and semi-structured data can
be challenging,
as the data under observation comes from heterogeneous sources
with a variety of
data types and representations. For example, real-world
databases are negatively
influenced by inconsistent, incomplete, and noisy data.
Therefore, a number of data
preprocessing techniques, including data cleaning, data
integrating, and data trans-
forming used to remove noise from data [27]. Data cleaning
techniques address data
quality and uncertainty problems resulting from variety in big
data (e.g., noise and
inconsistent data). Such techniques for removing noisy objects
during the analysis
process can significantly enhance the performance of data
analysis. For example, data
cleaning for error detection and correction is facilitated by
identifying and eliminat-
ing mislabeled training samples, ideally resulting in an
improvement in classification
accuracy in ML [28].
Velocity comprises the speed (represented in terms of batch,
near-real time, real time,
and streaming) of data processing, emphasizing that the speed
with which the data is
processed must meet the speed with which the data is produced
[8]. For example, Inter-
net of Things (IoT) devices continuously produce large amounts
of sensor data. If the
device monitors medical information, any delays in processing
the data and sending the
results to clinicians may result in patient injury or death (e.g., a
pacemaker that reports
emergencies to a doctor or facility) [20]. Similarly, devices in
the cyber-physical domain
often rely on real-time operating systems enforcing strict timing
standards on execution,
Fig. 1 Common big data characteristics
Page 5 of 16Hariri et al. J Big Data (2019) 6:44
and as such, may encounter problems when data provided from a
big data application
fails to be delivered on time.
Veracity represents the quality of the data (e.g., uncertain or
imprecise data). For
example, IBM estimates that poor data quality costs the US
economy $3.1 trillion per
year [21]. Because data can be inconsistent, noisy, ambiguous,
or incomplete, data verac-
ity is categorized as good, bad, and undefined. Due to the
increasingly diverse sources
and variety of data, accuracy and trust become more difficult to
establish in big data
analytics. For example, an employee may use Twitter to share
official corporate informa-
tion but at other times use the same account to express personal
opinions, causing prob-
lems with any techniques designed to work on the Twitter
dataset. As another example,
when analyzing millions of health care records to determine or
detect disease trends,
for instance to mitigate an outbreak that could impact many
people, any ambiguities or
inconsistencies in the dataset can interfere or decrease the
precision of the analytics pro-
cess [21].
Value represents the context and usefulness of data for decision
making, whereas the
prior V’s focus more on representing challenges in big data. For
example, Facebook,
Google, and Amazon have leveraged the value of big data via
analytics in their respective
products. Amazon analyzes large datasets of users and their
purchases to provide prod-
uct recommendations, thereby increasing sales and user
participation. Google collects
location data from Android users to improve location services in
Google Maps. Face-
book monitors users’ activities to provide targeted advertising
and friend recommenda-
tions. These three companies have each become massive by
examining large sets of raw
data and drawing and retrieving useful insight to make better
business decisions [29].
Uncertainty
Generally, “uncertainty is a situation which involves unknown
or imperfect information”
[30]. Uncertainty exists in every phase of big data learning [7]
and comes from many dif-
ferent sources, such as data collection (e.g., variance in
environmental conditions and
issues related to sampling), concept variance (e.g., the aims of
analytics do not present
similarly) and multimodality (e.g., the complexity and noise
introduced with patient
health records from multiple sensors include numerical, textual,
and image data). For
instance, most of the attribute values relating to the timing of
big data (e.g., when events
occur/have occurred) are missing due to noise and
incompleteness. Furthermore, the
number of missing links between data points in social networks
is approximately 80% to
90% and the number of missing attribute values within patient
reports transcribed from
doctor diagnoses are more than 90% [31]. Based on IBM
research in 2014, industry ana-
lysts believe that, by 2015, 80% of the world’s data will be
uncertain [32].
Various forms of uncertainty exist in big data and big data
analytics that may nega-
tively impact the effectiveness and accuracy of the results. For
example, if training
data is biased in any way, incomplete, or obtained through
inaccurate sampling, the
learning algorithm using corrupted training data will likely
output inaccurate results.
Therefore, it is critical to augment big data analytic techniques
to handle uncertainty.
Recently, meta-analysis studies that integrate uncertainty and
learning from data
have seen a sharp increase [33–35]. The handling of the
uncertainty embedded in the
entire process of data analytics has a significant effect on the
performance of learning
Page 6 of 16Hariri et al. J Big Data (2019) 6:44
from big data [16]. Other research also indicates that two more
features for big data,
such as multimodality (very complex types of data) and
changed-uncertainty (the
modeling and measure of uncertainty for big data) is remarkably
different from that of
small-size data. There is also a positive correlation in
increasing the size of a dataset
to the uncertainty of data itself and data processing [34]. For
example, fuzzy sets may
be applied to model uncertainty in big data to combat vague or
incorrect information
[36]. Moreover, and because the data may contain hidden
relationships, the uncer-
tainty is further increased.
Therefore, it is not an easy task to evaluate uncertainty in big
data, especially when
the data may have been collected in a manner that creates bias.
To combat the many
types of uncertainty that exist, many theories and techniques
have been developed to
model its various forms. We next describe several common
techniques.
Bayesian theory assumes a subjective interpretation of the
probability based on past
event/prior knowledge. In this interpretation the probability is
defined as an expres-
sion of a rational agent’s degrees of belief about uncertain
propositions [37]. Belief
function theory is a framework for aggregating imperfect data
through an informa-
tion fusion process when under uncertainty [38]. Probability
theory incorporates
randomness and generally deals with the statistical
characteristics of the input data
[34]. Classification entropy measures ambiguity between classes
to provide an index
of confidence when classifying. Entropy varies on a scale from
zero to one, where val-
ues closer to zero indicate more complete classification in a
single class, while values
closer to one indicate membership among several different
classes [39]. Fuzziness is
used to measure uncertainty in classes, notably in human
language (e.g., good and
bad) [16, 33, 40]. Fuzzy logic then handles the uncertainty
associated with human
perception by creating an approximate reasoning mechanism
[41, 42]. The method-
ology was intended to imitate human reasoning to better handle
uncertainty in the
real world [43]. Shannon’s entropy quantifies the amount of
information in a variable
to determine the amount of missing information on average in a
random source [44,
45]. The concept of entropy in statistics was introduced into the
theory of communi-
cation and transmission of information by Shannon [46].
Shannon entropy provides
a method of information quantification when it is not possible
to measure crite-
ria weights using a decision–maker. Rough set theory provides a
mathematical tool
for reasoning on vague, uncertain or incomplete information.
With the rough set
approach, concepts are described by two approximations (upper
and lower) instead of
one precise concept [47], making such methods invaluable to
dealing with uncertain
information systems [48]. Probabilistic theory and Shannon’s
entropy are often used
to model imprecise, incomplete, and inaccurate data. Moreover,
fuzzy set and rough
theory are used for modeling vague or ambiguous data [49], as
shown in Fig. 2.
Evaluating the level of uncertainty is a critical step in big data
analytics. Although
a variety of techniques exist to analyze big data, the accuracy of
the analysis may be
negatively affected if uncertainty in the data or the technique
itself is ignored. Uncer-
tainty models such as probability theory, fuzziness, rough set
theory, etc. can be used
to augment big data analytic techniques to provide more
accurate and more mean-
ingful results. Based on the previous research, Bayesian model
and fuzzy set theory
are common for modeling uncertainty and decision-making.
Table 1 compares and
Page 7 of 16Hariri et al. J Big Data (2019) 6:44
summarizes the techniques we have identified as relevant,
including a comparison
between different uncertainty strategies, focusing on
probabilistic theory, Shannon’s
entropy, fuzzy set theory, and rough set theory.
Big data analytics
Big data analytics describe the process of analyzing massive
datasets to discover pat-
terns, unknown correlations, market trends, user preferences,
and other valuable
information that previously could not be analyzed with
traditional tools [52]. With
the formalization of the big data’s five V characteristics,
analysis techniques needed
to be reevaluated to overcome their limitations on processing in
terms of time and
space [29]. Opportunities for utilizing big data are growing in
the modern world of
digital data. The global annual growth rate of big data
technologies and services is
Measuring uncertainty in
big data
Imprecise, inaccurate, and
incomplete data
Probability
Theory
Shannon's
Entropy
Vague or ambiguous data
Fuzzy Set
Theory
Rough Set
Theory
Fig. 2 Measuring uncertainty in big data
Table 1 Comparison of uncertainty strategies
Uncertainty models Features
Probability theory
Bayesian theory
Shannon’s entropy
Powerful for handling randomness and subjective uncertainty
where precision is required
Capable of handling complex data [50]
Fuzziness Handles vague and imprecise information in systems
that are difficult to model
Precision not guaranteed
Easy to implement and interpret [50]
Belief function Handle situations with some degree of ignorance
Combines distinct evidence from several sources to compute the
probability of specific
hypotheses
Considers all evidence available for the hypothesis
Ideal for incomplete and high complex data
Mathematically complex but improves uncertainty reduction
[50]
Rough set theory Provides an objective form of analysis [47]
Deals with vagueness in data
Minimal information necessary to determine set membership
Only uses the information presented within the given data [51]
Classification entropy Handles ambiguity between the classes
[39]
Page 8 of 16Hariri et al. J Big Data (2019) 6:44
predicted to increase about 36% between 2014 and 2019, with
the global income for
big data and business analytics anticipated to increase more
than 60% [53].
Several advanced data analysis techniques (i.e., ML, data
mining, NLP, and CI) and
potential strategies such as parallelization, divide-and-conquer,
incremental learn-
ing, sampling, granular computing, feature selection [16], and
instance selection [34]
can convert big problems to small problems and can be used to
make better deci-
sions, reduce costs, and enable more efficient processing.
With respect to big data analytics, parallelization reduces
computation time by
splitting large problems into smaller instances of itself and
performing the smaller
tasks simultaneously (e.g., distributing the smaller tasks across
multiple threads,
cores, or processors). Parallelization does not decrease the
amount of work per-
formed but rather reduces computation time as the small tasks
are completed at the
same point in time instead of one after another sequentially
[16].
The divide-and-conquer strategy plays an important role in
processing big data.
Divide-and-conquer consists of three phases: (1) reduce one
large problem into sev-
eral smaller problems, (2) complete the smaller problems, where
the solving of each
small problem contributes to the solving of the large problem,
and (3) incorporate
the solutions of the smaller problems into one large solution
such that the large
problem is considered solved. For many years the divide-and-
conquer strategy has
been used in very massive databases to manipulate records in
groups rather than all
the data at once [54].
Incremental learning is a learning algorithm popularly used with
streaming data
that is trained only with new data rather than only training with
existing data. Incre-
mental learning adjusts the parameters in the learning algorithm
over time accord-
ing to each new input data and each input is used for training
only once [16].
Sampling can be used as a data reduction method for big data
analytics for deriv-
ing patterns in large data sets by choosing, manipulating, and
analyzing a subset of
the data [16, 55]. Some research indicates that obtaining
effective results using sam-
pling depends on the data sampling criteria used [56].
Granular computing groups elements from a large space to
simplify the elements
into subsets, or granules [57, 58]. Granular computing is an
effective approach to
define uncertainty of objects in the search space as it reduces
large objects to a
smaller search space [59].
Feature selection is a conventional approach to handle big data
with the purpose of
choosing a subset of relative features for an aggregate but more
precise data repre-
sentation [60, 61]. Feature selection is a very useful strategy in
data mining for pre-
paring high-scale data [60].
Instance selection is practical in many ML or data mining tasks
as a major feature
in data pre-processing. By utilizing instance selection, it is
possible to reduce train-
ing sets and runtime in the classification or training phases
[62].
The costs of uncertainty (both monetarily and computationally)
and challenges
in generating effective models for uncertainties in big data
analytics have become
key to obtaining robust and performant systems. As such, we
examine several open
issues of the impacts of uncertainty on big data analytics in the
next section.
Page 9 of 16Hariri et al. J Big Data (2019) 6:44
Uncertainty perspective of big data analytics
This section examines the impact of uncertainty on three AI
techniques for big data ana-
lytics. Specifically, we focus on ML, NLP, and CI, although
many other analytics tech-
niques exist. For each presented technique, we examine the
inherent uncertainties and
discuss methods and strategies for their mitigation.
Machine learning and big data
When dealing with data analytics, ML is generally used to
create models for predic-
tion and knowledge discovery to enable data-driven decision-
making. Traditional ML
methods are not computationally efficient or scalable enough to
handle both the char-
acteristics of big data (e.g., large volumes, high speeds, varying
types, low value density,
incompleteness) and uncertainty (e.g., biased training data,
unexpected data types, etc.).
Several commonly used advanced ML techniques proposed for
big data analysis include
feature learning, deep learning, transfer learning, distributed
learning, and active learn-
ing. Feature learning includes a set of techniques that enables a
system to automatically
discover the representations needed for feature detection or
classification from raw data.
The performances of the ML algorithms are strongly influenced
by the selection of data
representation. Deep learning algorithms are designed for
analyzing and extracting valu-
able knowledge from massive amounts of data and data
collected from various sources
(e.g., separate variations within an image, such as a light,
various materials, and shapes)
[56], however current deep learning models incur a high
computational cost. Distrib-
uted learning can be used to mitigate the scalability problem of
traditional ML by carry-
ing out calculations on data sets distributed among several
workstations to scale up the
learning process [63]. Transfer learning is the ability to apply
knowledge learned in one
context to new contexts, effectively improving a learner from
one domain by transfer-
ring information from a related domain [64]. Active learning
refers to algorithms that
employ adaptive data collection [65] (i.e., processes that
automatically adjust param-
eters to collect the most useful data as quickly as possible) in
order to accelerate ML
activities and overcome labeling problems. The uncertainty
challenges of ML techniques
can be mainly attributed to learning from data with low veracity
(i.e., uncertain and
incomplete data) and data with low value (i.e., unrelated to the
current problem). We
found that, among the ML techniques, active learning, deep
learning, and fuzzy logic
theory are uniquely suited to support the challenge of reducing
uncertainty, as shown
in Fig. 3. Uncertainty can impact ML in terms of incomplete or
imprecise training sam-
ples, unclear classification boundaries, and rough knowledge of
the target data. In some
cases, the data is represented without labels, which can become
a challenge. Manually
labeling large data collections can be an expensive and
strenuous task, yet learning from
unlabeled data is very …
Research Paper – Data Science & Big Data Analytics
While this week’s topic highlighted the uncertainty of Big Data,
the author identified the following as areas for future research.
Pick one of the following for your Research paper.
· Additional study must be performed on the interactions
between each big data characteristic, as they do not exist
separately but naturally interact in the real world.
· The scalability and efficacy of existing analytics techniques
being applied to big data must be empirically examined.
· New techniques and algorithms must be developed in ML and
NLP to handle the real-time needs for decisions made based on
enormous amounts of data.
· More work is necessary on how to efficiently model
uncertainty in ML and NLP, as well as how to represent
uncertainty resulting from big data analytics.
· Since the CI algorithms are able to find an approximate
solution within a reasonable time, they have been used to tackle
ML problems and uncertainty challenges in data analytics and
process in recent years.
Your paper should meet the following requirements:
• Be approximately 3-5 pages in length, not including the
required cover page and reference page.
• Follow APA guidelines. Your paper should include an
introduction, a body with fully developed content, and a
conclusion.
• Support your response with the readings from the course and
at least five peer-reviewed articles or scholarly journals to
support your positions, claims, and observations. The UC
Library is a great place to find resources.
• Be clear with well-written, concise, using excellent grammar
and style techniques. You are being graded in part on the
quality of your writing.
References:
Marcu, D., & Danubianu, M. (2019). Learning Analytics or
Educational Data Mining? This is the Question. BRAIN: Broad
Research in Artificial Intelligence & Neuroscience, 10, 1–14.
Retrieved from
http://search.ebscohost.com/login.aspx?direct=true&AuthType=
shib&db=a9h&AN=139367236&site=eds-live
Hariri, R.H., Fredericks, E.M. & Bowers, K.M. J Big Data
(2019) 6: 44. https://doi.org/10.1186/s40537-019-0206-3

More Related Content

Similar to 5252020 Rubric Detail – 31228.202030httpsucumberlands.docx

Searching Databases.docx
Searching Databases.docxSearching Databases.docx
Searching Databases.docxwrite4
 
Educational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overviewEducational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overviewMarie Bienkowski
 
Introduction to Epidemiology Course Project Detailed Article Cri.docx
Introduction to Epidemiology Course Project Detailed Article Cri.docxIntroduction to Epidemiology Course Project Detailed Article Cri.docx
Introduction to Epidemiology Course Project Detailed Article Cri.docxmariuse18nolet
 
Evaluation Criteria for Applications and Formal Papers Level.docx
Evaluation Criteria for Applications and Formal Papers Level.docxEvaluation Criteria for Applications and Formal Papers Level.docx
Evaluation Criteria for Applications and Formal Papers Level.docxSANSKAR20
 
Group Member Discussion RubricStudent NameTotal Points PossibleTot.docx
Group Member Discussion RubricStudent NameTotal Points PossibleTot.docxGroup Member Discussion RubricStudent NameTotal Points PossibleTot.docx
Group Member Discussion RubricStudent NameTotal Points PossibleTot.docxwhittemorelucilla
 
Concordia University .docx
Concordia University                                          .docxConcordia University                                          .docx
Concordia University .docxmaxinesmith73660
 
Research Writing Methodology
Research Writing MethodologyResearch Writing Methodology
Research Writing MethodologyAiden Yeh
 
Comment this post (DL) W3-T1Pricewaterhouse Coopers is one of th.docx
Comment this post (DL) W3-T1Pricewaterhouse Coopers is one of th.docxComment this post (DL) W3-T1Pricewaterhouse Coopers is one of th.docx
Comment this post (DL) W3-T1Pricewaterhouse Coopers is one of th.docxmccormicknadine86
 
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docxEDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docxtoltonkendal
 
Discussion Examining Nursing SpecialtiesYou have probably seen .docx
Discussion Examining Nursing SpecialtiesYou have probably seen .docxDiscussion Examining Nursing SpecialtiesYou have probably seen .docx
Discussion Examining Nursing SpecialtiesYou have probably seen .docxduketjoy27252
 
WEEK6 DISCUSSION.docx
WEEK6 DISCUSSION.docxWEEK6 DISCUSSION.docx
WEEK6 DISCUSSION.docxwrite5
 
Page 1 of 6 [377] COM7005D Info.docx
Page 1 of 6 [377] COM7005D    Info.docxPage 1 of 6 [377] COM7005D    Info.docx
Page 1 of 6 [377] COM7005D Info.docxhoney690131
 
Week 3 Assignment Organizational Needs AssessmentSubmit As.docx
Week 3 Assignment Organizational Needs AssessmentSubmit As.docxWeek 3 Assignment Organizational Needs AssessmentSubmit As.docx
Week 3 Assignment Organizational Needs AssessmentSubmit As.docxendawalling
 
Qualitative Research Is Good At Simplifying And Managing...
Qualitative Research Is Good At Simplifying And Managing...Qualitative Research Is Good At Simplifying And Managing...
Qualitative Research Is Good At Simplifying And Managing...Patricia Viljoen
 

Similar to 5252020 Rubric Detail – 31228.202030httpsucumberlands.docx (15)

Searching Databases.docx
Searching Databases.docxSearching Databases.docx
Searching Databases.docx
 
Educational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overviewEducational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overview
 
Introduction to Epidemiology Course Project Detailed Article Cri.docx
Introduction to Epidemiology Course Project Detailed Article Cri.docxIntroduction to Epidemiology Course Project Detailed Article Cri.docx
Introduction to Epidemiology Course Project Detailed Article Cri.docx
 
Evaluation Criteria for Applications and Formal Papers Level.docx
Evaluation Criteria for Applications and Formal Papers Level.docxEvaluation Criteria for Applications and Formal Papers Level.docx
Evaluation Criteria for Applications and Formal Papers Level.docx
 
Group Member Discussion RubricStudent NameTotal Points PossibleTot.docx
Group Member Discussion RubricStudent NameTotal Points PossibleTot.docxGroup Member Discussion RubricStudent NameTotal Points PossibleTot.docx
Group Member Discussion RubricStudent NameTotal Points PossibleTot.docx
 
Concordia University .docx
Concordia University                                          .docxConcordia University                                          .docx
Concordia University .docx
 
Research Writing Methodology
Research Writing MethodologyResearch Writing Methodology
Research Writing Methodology
 
Comment this post (DL) W3-T1Pricewaterhouse Coopers is one of th.docx
Comment this post (DL) W3-T1Pricewaterhouse Coopers is one of th.docxComment this post (DL) W3-T1Pricewaterhouse Coopers is one of th.docx
Comment this post (DL) W3-T1Pricewaterhouse Coopers is one of th.docx
 
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docxEDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx
 
Discussion Examining Nursing SpecialtiesYou have probably seen .docx
Discussion Examining Nursing SpecialtiesYou have probably seen .docxDiscussion Examining Nursing SpecialtiesYou have probably seen .docx
Discussion Examining Nursing SpecialtiesYou have probably seen .docx
 
WEEK6 DISCUSSION.docx
WEEK6 DISCUSSION.docxWEEK6 DISCUSSION.docx
WEEK6 DISCUSSION.docx
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Page 1 of 6 [377] COM7005D Info.docx
Page 1 of 6 [377] COM7005D    Info.docxPage 1 of 6 [377] COM7005D    Info.docx
Page 1 of 6 [377] COM7005D Info.docx
 
Week 3 Assignment Organizational Needs AssessmentSubmit As.docx
Week 3 Assignment Organizational Needs AssessmentSubmit As.docxWeek 3 Assignment Organizational Needs AssessmentSubmit As.docx
Week 3 Assignment Organizational Needs AssessmentSubmit As.docx
 
Qualitative Research Is Good At Simplifying And Managing...
Qualitative Research Is Good At Simplifying And Managing...Qualitative Research Is Good At Simplifying And Managing...
Qualitative Research Is Good At Simplifying And Managing...
 

More from fredharris32

A report writingAt least 5 pagesTitle pageExecutive Su.docx
A report writingAt least 5 pagesTitle pageExecutive Su.docxA report writingAt least 5 pagesTitle pageExecutive Su.docx
A report writingAt least 5 pagesTitle pageExecutive Su.docxfredharris32
 
A reflection of how your life has changedevolved as a result of the.docx
A reflection of how your life has changedevolved as a result of the.docxA reflection of how your life has changedevolved as a result of the.docx
A reflection of how your life has changedevolved as a result of the.docxfredharris32
 
A Princeton University study argues that the preferences of average.docx
A Princeton University study argues that the preferences of average.docxA Princeton University study argues that the preferences of average.docx
A Princeton University study argues that the preferences of average.docxfredharris32
 
A rapidly growing small firm does not have access to sufficient exte.docx
A rapidly growing small firm does not have access to sufficient exte.docxA rapidly growing small firm does not have access to sufficient exte.docx
A rapidly growing small firm does not have access to sufficient exte.docxfredharris32
 
A psychiatrist bills for 10 hours of psychotherapy and medication ch.docx
A psychiatrist bills for 10 hours of psychotherapy and medication ch.docxA psychiatrist bills for 10 hours of psychotherapy and medication ch.docx
A psychiatrist bills for 10 hours of psychotherapy and medication ch.docxfredharris32
 
A project to put on a major international sporting competition has t.docx
A project to put on a major international sporting competition has t.docxA project to put on a major international sporting competition has t.docx
A project to put on a major international sporting competition has t.docxfredharris32
 
A professional services company wants to globalize by offering s.docx
A professional services company wants to globalize by offering s.docxA professional services company wants to globalize by offering s.docx
A professional services company wants to globalize by offering s.docxfredharris32
 
A presentation( PowerPoint) on the novel, Disgrace by J . M. Coetzee.docx
A presentation( PowerPoint) on the novel, Disgrace by J . M. Coetzee.docxA presentation( PowerPoint) on the novel, Disgrace by J . M. Coetzee.docx
A presentation( PowerPoint) on the novel, Disgrace by J . M. Coetzee.docxfredharris32
 
a presentatiion on how the over dependence of IOT AI and robotics di.docx
a presentatiion on how the over dependence of IOT AI and robotics di.docxa presentatiion on how the over dependence of IOT AI and robotics di.docx
a presentatiion on how the over dependence of IOT AI and robotics di.docxfredharris32
 
A P P L I C A T I O N S A N D I M P L E M E N T A T I O Nh.docx
A P P L I C A T I O N S A N D I M P L E M E N T A T I O Nh.docxA P P L I C A T I O N S A N D I M P L E M E N T A T I O Nh.docx
A P P L I C A T I O N S A N D I M P L E M E N T A T I O Nh.docxfredharris32
 
A nursing care plan (NCP) is a formal process that includes .docx
A nursing care plan (NCP) is a formal process that includes .docxA nursing care plan (NCP) is a formal process that includes .docx
A nursing care plan (NCP) is a formal process that includes .docxfredharris32
 
A nurse educator is preparing an orientation on culture and the wo.docx
A nurse educator is preparing an orientation on culture and the wo.docxA nurse educator is preparing an orientation on culture and the wo.docx
A nurse educator is preparing an orientation on culture and the wo.docxfredharris32
 
A NOVEL TEACHER EVALUATION MODEL 1 Branching Paths A Nove.docx
A NOVEL TEACHER EVALUATION MODEL 1 Branching Paths A Nove.docxA NOVEL TEACHER EVALUATION MODEL 1 Branching Paths A Nove.docx
A NOVEL TEACHER EVALUATION MODEL 1 Branching Paths A Nove.docxfredharris32
 
A Look at the Marburg Fever OutbreaksThis week we will exami.docx
A Look at the Marburg Fever OutbreaksThis week we will exami.docxA Look at the Marburg Fever OutbreaksThis week we will exami.docx
A Look at the Marburg Fever OutbreaksThis week we will exami.docxfredharris32
 
A network consisting of M cities and M-1 roads connecting them is gi.docx
A network consisting of M cities and M-1 roads connecting them is gi.docxA network consisting of M cities and M-1 roads connecting them is gi.docx
A network consisting of M cities and M-1 roads connecting them is gi.docxfredharris32
 
A minimum 20-page (not including cover page, abstract, table of cont.docx
A minimum 20-page (not including cover page, abstract, table of cont.docxA minimum 20-page (not including cover page, abstract, table of cont.docx
A minimum 20-page (not including cover page, abstract, table of cont.docxfredharris32
 
A major component of being a teacher is the collaboration with t.docx
A major component of being a teacher is the collaboration with t.docxA major component of being a teacher is the collaboration with t.docx
A major component of being a teacher is the collaboration with t.docxfredharris32
 
a mad professor slips a secret tablet in your food that makes you gr.docx
a mad professor slips a secret tablet in your food that makes you gr.docxa mad professor slips a secret tablet in your food that makes you gr.docx
a mad professor slips a secret tablet in your food that makes you gr.docxfredharris32
 
A New Mindset for   Leading Change [WLO 1][CLO 6]Through.docx
A New Mindset for   Leading Change [WLO 1][CLO 6]Through.docxA New Mindset for   Leading Change [WLO 1][CLO 6]Through.docx
A New Mindset for   Leading Change [WLO 1][CLO 6]Through.docxfredharris32
 
A N A M E R I C A N H I S T O R YG I V E M EL I B.docx
A N  A M E R I C A N  H I S T O R YG I V E  M EL I B.docxA N  A M E R I C A N  H I S T O R YG I V E  M EL I B.docx
A N A M E R I C A N H I S T O R YG I V E M EL I B.docxfredharris32
 

More from fredharris32 (20)

A report writingAt least 5 pagesTitle pageExecutive Su.docx
A report writingAt least 5 pagesTitle pageExecutive Su.docxA report writingAt least 5 pagesTitle pageExecutive Su.docx
A report writingAt least 5 pagesTitle pageExecutive Su.docx
 
A reflection of how your life has changedevolved as a result of the.docx
A reflection of how your life has changedevolved as a result of the.docxA reflection of how your life has changedevolved as a result of the.docx
A reflection of how your life has changedevolved as a result of the.docx
 
A Princeton University study argues that the preferences of average.docx
A Princeton University study argues that the preferences of average.docxA Princeton University study argues that the preferences of average.docx
A Princeton University study argues that the preferences of average.docx
 
A rapidly growing small firm does not have access to sufficient exte.docx
A rapidly growing small firm does not have access to sufficient exte.docxA rapidly growing small firm does not have access to sufficient exte.docx
A rapidly growing small firm does not have access to sufficient exte.docx
 
A psychiatrist bills for 10 hours of psychotherapy and medication ch.docx
A psychiatrist bills for 10 hours of psychotherapy and medication ch.docxA psychiatrist bills for 10 hours of psychotherapy and medication ch.docx
A psychiatrist bills for 10 hours of psychotherapy and medication ch.docx
 
A project to put on a major international sporting competition has t.docx
A project to put on a major international sporting competition has t.docxA project to put on a major international sporting competition has t.docx
A project to put on a major international sporting competition has t.docx
 
A professional services company wants to globalize by offering s.docx
A professional services company wants to globalize by offering s.docxA professional services company wants to globalize by offering s.docx
A professional services company wants to globalize by offering s.docx
 
A presentation( PowerPoint) on the novel, Disgrace by J . M. Coetzee.docx
A presentation( PowerPoint) on the novel, Disgrace by J . M. Coetzee.docxA presentation( PowerPoint) on the novel, Disgrace by J . M. Coetzee.docx
A presentation( PowerPoint) on the novel, Disgrace by J . M. Coetzee.docx
 
a presentatiion on how the over dependence of IOT AI and robotics di.docx
a presentatiion on how the over dependence of IOT AI and robotics di.docxa presentatiion on how the over dependence of IOT AI and robotics di.docx
a presentatiion on how the over dependence of IOT AI and robotics di.docx
 
A P P L I C A T I O N S A N D I M P L E M E N T A T I O Nh.docx
A P P L I C A T I O N S A N D I M P L E M E N T A T I O Nh.docxA P P L I C A T I O N S A N D I M P L E M E N T A T I O Nh.docx
A P P L I C A T I O N S A N D I M P L E M E N T A T I O Nh.docx
 
A nursing care plan (NCP) is a formal process that includes .docx
A nursing care plan (NCP) is a formal process that includes .docxA nursing care plan (NCP) is a formal process that includes .docx
A nursing care plan (NCP) is a formal process that includes .docx
 
A nurse educator is preparing an orientation on culture and the wo.docx
A nurse educator is preparing an orientation on culture and the wo.docxA nurse educator is preparing an orientation on culture and the wo.docx
A nurse educator is preparing an orientation on culture and the wo.docx
 
A NOVEL TEACHER EVALUATION MODEL 1 Branching Paths A Nove.docx
A NOVEL TEACHER EVALUATION MODEL 1 Branching Paths A Nove.docxA NOVEL TEACHER EVALUATION MODEL 1 Branching Paths A Nove.docx
A NOVEL TEACHER EVALUATION MODEL 1 Branching Paths A Nove.docx
 
A Look at the Marburg Fever OutbreaksThis week we will exami.docx
A Look at the Marburg Fever OutbreaksThis week we will exami.docxA Look at the Marburg Fever OutbreaksThis week we will exami.docx
A Look at the Marburg Fever OutbreaksThis week we will exami.docx
 
A network consisting of M cities and M-1 roads connecting them is gi.docx
A network consisting of M cities and M-1 roads connecting them is gi.docxA network consisting of M cities and M-1 roads connecting them is gi.docx
A network consisting of M cities and M-1 roads connecting them is gi.docx
 
A minimum 20-page (not including cover page, abstract, table of cont.docx
A minimum 20-page (not including cover page, abstract, table of cont.docxA minimum 20-page (not including cover page, abstract, table of cont.docx
A minimum 20-page (not including cover page, abstract, table of cont.docx
 
A major component of being a teacher is the collaboration with t.docx
A major component of being a teacher is the collaboration with t.docxA major component of being a teacher is the collaboration with t.docx
A major component of being a teacher is the collaboration with t.docx
 
a mad professor slips a secret tablet in your food that makes you gr.docx
a mad professor slips a secret tablet in your food that makes you gr.docxa mad professor slips a secret tablet in your food that makes you gr.docx
a mad professor slips a secret tablet in your food that makes you gr.docx
 
A New Mindset for   Leading Change [WLO 1][CLO 6]Through.docx
A New Mindset for   Leading Change [WLO 1][CLO 6]Through.docxA New Mindset for   Leading Change [WLO 1][CLO 6]Through.docx
A New Mindset for   Leading Change [WLO 1][CLO 6]Through.docx
 
A N A M E R I C A N H I S T O R YG I V E M EL I B.docx
A N  A M E R I C A N  H I S T O R YG I V E  M EL I B.docxA N  A M E R I C A N  H I S T O R YG I V E  M EL I B.docx
A N A M E R I C A N H I S T O R YG I V E M EL I B.docx
 

Recently uploaded

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Recently uploaded (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

5252020 Rubric Detail – 31228.202030httpsucumberlands.docx

  • 1. 5/25/2020 Rubric Detail – 31228.202030 https://ucumberlands.blackboard.com/webapps/rubric/do/course/ gradeRubric?mode=grid&isPopup=true&rubricCount=1&prefix= _843783_1&course_i… 1/4 Rubric Detail A rubric lists grading criteria that instructors use to evaluate student work. Your instructor linked a rubric to this item and made it available to you. Select Grid View or List View to change the rubric's layout. Show Descriptions Show Feedback Name: ITS836 (8 Week) Research Paper Rubric Description: Please use this rubric for grading research papers Exit Grid View List View No requirements are met Includes a few of the required components as speci�ed in the assignment. Includes some of the required components as speci�ed in the assignment. Includes most of the required components as speci�ed in the assignment.
  • 2. Includes all of the required components as speci�ed in the assignment. Requirements -- No Evidence 0 (0.00%) points Limited Evidence 3 (3.00%) points Below Expectations 7 (7.00%) points Approaches Expectations 11 (11.00%) points Meets Expectations 15 (15.00%) points Fails to provide enough content to show a demonstration of knowledge Major errors or omissions in demonstration of knowledge. Some signi�cant but not major errors or omissions in demonstration of knowledge. A few errors or omissions in demonstration of knowledge. Demonstrates strong or adequate knowledge of the materials; correctly represents knowledge from the readings and sources. Content -- No Evidence 0 (0.00%) points Limited Evidence 3 (3.00%) points
  • 3. Below Expectations 7 (7.00%) points Approaches Expectations 11 (11.00%) points Meets Expectations 15 (15.00%) points 5/25/2020 Rubric Detail – 31228.202030 https://ucumberlands.blackboard.com/webapps/rubric/do/course/ gradeRubric?mode=grid&isPopup=true&rubricCount=1&prefix= _843783_1&course_i… 2/4 g Fails to provide a critical thinking analysis and interpretation Major errors or omissions in analysis and interpretation. Some signi�cant but not major errors or omissions in analysis and interpretation. A few errors or omissions in analysis and interpretation. Provides a strong critical analysis and interpretation of the information given. Critical Analysis -- No Evidence 0 (0.00%) points Limited Evidence 5 (5.00%) points
  • 4. Below Expectations 10 (10.00%) points Approaches Expectations 15 (15.00%) points Meets Expectations 20 (20.00%) points Fails to demonstrate problem solving. Major errors or omissions in problem solving. Some signi�cant but not major errors or omissions in problem solving. A few errors or omissions in problem solving. Demonstrates strong or adequate thought and insight in problem solving. Problem Solving -- No Evidence 0 (0.00%) points Limited Evidence 5 (5.00%) points Below Expectations 10 (10.00%) points Approaches Expectations 15 (15.00%) points Meets Expectations 20 (20.00%) points Source or example selection and integration of knowledge from the course is clearly de�cient. Sources or examples meet required criteria and are poorly chosen to provide substance and
  • 5. perspectives on the issue under examination. Sources or examples meet required criteria but are less than adequately chosen to provide substance and perspectives on the issue under examination. Sources or examples meet required criteria but are less than adequately chosen to provide substance and perspectives on the issue under examination. Sources/Examples -- No Evidence 0 (0.00%) points Limited Evidence 2 (2.00%) points Below Expectations 4 (4.00%) points Approaches Expectations 7 (7.00%) points 5/25/2020 Rubric Detail – 31228.202030 https://ucumberlands.blackboard.com/webapps/rubric/do/course/ gradeRubric?mode=grid&isPopup=true&rubricCount=1&prefix= _843783_1&course_i… 3/4 Sources or examples meet required criteria and are well chosen to provide substance and perspectives on the issue under examination. Meets Expectations 10 (10.00%) points Project is not organized or well written, and is not in proper
  • 6. paper format. Poor-quality work; unacceptable in terms of grammar and spelling. Project is poorly organized; does not follow proper paper format. Inconsistent to inadequate sentence and paragraph development; numerous errors in grammar and spelling. Project is adequately organized and written, and is in proper format as outlined in the assignment. Reasonably good sentence and paragraph structure; signi�cant number of errors in grammar and spelling. Project is fairly well organized and written, and is in proper format as outlined in the assignment. Reasonably good sentence and paragraph structure; signi�cant number of errors in grammar and spelling. Demonstrates strong or adequate thought and insight in problem solving. Organization, Grammar, Style -- No Evidence 0 (0.00%) points Limited Evidence 2 (2.00%) points Below Expectations 4 (4.00%) points Approaches Expectations 7 (7.00%) points Meets Expectations 10 (10.00%) points
  • 7. Numerous errors in APA formatting, with more than eight signi�cant errors. Numerous errors in APA formatting, with more than �ve signi�cant errors. Signi�cant errors in APA formatting, with four to �ve signi�cant errors. Sources or examples meet required criteria but are less than adequately chosen to provide substance and perspectives on the issue under examination. Sources or examples meet required criteria and are well chosen to provide substance and perspectives on the issue under examination. Proper use of APA formatting -- No Evidence 0 (0.00%) points Limited Evidence 2 (2.00%) points Below Expectations 4 (4.00%) points Approaches Expectations 7 (7.00%) points Meets Expectations 10 (10.00%) points Name:ITS836 (8 Week) Research Paper Rubric Description:Please use this rubric for grading research papers Exit
  • 8. 5/25/2020 Rubric Detail – 31228.202030 https://ucumberlands.blackboard.com/webapps/rubric/do/course/ gradeRubric?mode=grid&isPopup=true&rubricCount=1&prefix= _843783_1&course_i… 4/4 1 Learning Analytics or Educational Data Mining? This is the Question... Daniela Marcu Ștefan cel Mare University of Suceava Str. Universității 13, Suceava 720229 Phone: 0230 216 147 [email protected] Mirela Danubianu Ștefan cel Mare University of Suceava Str. Universității 13, Suceava 720229 Phone: 0230 216 147 [email protected] Abstract In full expansion, a vital area such as education could not remain indifferent to the use of information and communication technology. Over the past two decades we have witnessed the
  • 9. emergence and development of e-learning systems, the proliferation of MOOCs, and generally the rise of Technology Enhanced Education. All of these contributed to generation and storage of unprecedented volumes of data concerning all areas of learning. At the same time, domains such as data mining and big data analytics have emerged and developed. Their applications in education have spawned new areas of research such as educational data mining or learning analytics. As an interdisciplinary research area Educational Data Mining (EDM) aims to explore data from educational environment to build models based on which students' behavior and results are better understood. In fact, EDM is a complex process that consists of a few steps grouped in three stages: data preprocessing, modelling and postprocessing. It transforms raw data from educational environments in useful information that could influence in a positive way the educational process. According to Society for Learning Analytics Research (SoLAR) which took over the wording of the first International Conference on Learning Analytics and Knowledge, learning analytics is ”the measurement, collection, analysis and reporting of data about learners and their contexts for purposes of understanding and optimizing learning and the environments in which it occurs” (Siemens, 2011). This paper proposes a comparative study of the two concepts: EDM and learning analytics. Due to certain voices in the scientific environment that claim
  • 10. that the two terms refer to the same thing, we want to emphasize the similarities and differences between them, and how each one can serve to raise the quality in educational processes. Keywords : EDM; LA; Data Mining; Education. 1. Introduction The educational community has an interest in the great potential of education. Why are researchers so enthusiastic about this? The answer is simple. Seeing the impact of applying data mining to exploiting large data volumes and analyzing data from areas such as the business environment, social media, and other scientific areas, we can think of the benefits for the education system. If we could adapt the methods of finding models in the data, used for analyzing the online activity of clients and social media users for the educational environment, we could get closer evidence of reality on the activities of the training system. The widespread use of computer-based pre-university learning, the development of Web- based courses, are additional reasons for EDM and LA research. Designing educational policies based on practical evidence provided by researchers can bring benefits to the educational system.
  • 11. BRAIN – Broad Research in Artificial Intelligence and Neuroscience Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957 2 The exploitation of large volumes of data from different domains is done using specific techniques and methods. It helps to develop tools to facilitate progress in these areas. The science of extracting useful information from large volumes of data is called Data Mining (DM) (Hand, Mannila & Smyth, 2001). The concept is based on three key areas: statistics, artificial intelligence and machine learning (Figure 1). Figure 1. Data Mining Initially, DM used statistical algorithms. Specific techniques such as decision trees, association rules, clustering, artificial neural networks, and others have been developed (Șușnea, 2012). Applying exploitation methods for educational system data to build models to better understand students' behavior and outcomes is named Educational Data Mining (EDM). Since data and education issues are different from those in other areas, classical DM methods have been
  • 12. improved and supplemented with EDM specific methods (Romero & Ventura, 2007). According to some authors, there are four areas of application of EDM aimed at: improving student modeling and domain modeling, e-learning and scientific research (Baker, 2012). In order to better understand learning, data from pupils and from the educational environment is measured, collected and analyzed. This is the learning analysis and is a related field of EDM. Among the Learning Analytics (LA) methods we can list: Buckingham Shum, 2012). In the following sections we propose to detail relevant aspects about EDM and LA in order to provide viable arguments in a comparative study of the two concepts. 2. Educational Data Mining Over the past 10 years, the field of research aimed to exploit the unique types of data from education has developed quite internationally. In 2011, in Massachusetts USA, the International EDM Working Group (established in 2007) created the International Society for EDM (online: http://educationaldatamining.org/about/). Romania is, however, at a pioneering stage in EDM. There is currently a growing interest in using computers in
  • 13. learning and Web-based training. With the rapid increase in the volume of learning software resources, the Romanian educational system also accumulates huge amounts of data from students, teachers, parents, libraries, secretariats, etc. Getting the information needed to build models to improve the quality of managerial decisions becomes one of the greatest challenges of the present. Traditional research in the field of education is time-consuming and often non-ecological through the waste of material resources. Developing an experimental study, such as combating school absenteeism, involves firstly the selection of schools, teachers and pupils. It follows the definition of strategies that lead to the identification of sources of school stress, increasing the D. Marcu, M. Danubianu - Learning Analytics or Educational Data Mining? This is the Question... 3 motivation of students to attend classes, trust in school, family, and so on. However, the studies depend on context, class, geography, economic development, teacher-student relationships. Changing any parameter can lead to very different conclusions. Soon there may be new factors that could not be taken into consideration earlier in the demotivation of students towards school. Making traditional new studies for this topic involves the use of important temporal resources.
  • 14. By comparison, EDM proves to be more efficient. The analysis of existing data in the educational system through the use of specific EDM methods allows the identification of new models for new contexts. An enormous advantage is that the same methods can be applied to different data generating specific results without the need for new analysis strategies. More specifically, let's take the example of a course designed for web-based training (Romero, Ventura, De Bra, 2004). Traditionally, evaluating the effectiveness of a course is done by analyzing the results obtained by the student upon completion of the course, which does not necessarily lead to the improvement of the material or methods and teaching tools used for the future course versions. In fact, in the Romanian pre-university system, the updating of educational programs and educational resources does not present the periodicity expected by the society. What would it be like the knowledge of EDM data exploitation? EDM methods aim at discovering correlation rules between course components (content, questions, various activities) and student activities. In the Knowledge Discovery with Genetic Programming for providing feedback to the courseware author, C. Romero, S. Ventura and P. Bra describe the four main steps in building a software based on EDM (Romero, Ventura, De Bra, 2004): development, use, discovering knowledge, improving Other classification has three stages: preprocessing, data exploitation and post processing
  • 15. [3]. The cycle of these steps is illustrated in Figure 2. Figure 2. Stages of the process of converting data into information If we refer again to the analysis of the efficiency of a course, in the first stage, the preprocessing is performed various operations such as: formation on pedagogical and methodological aspects time spent in the course, the sections visited, the scores obtained and other interactions appropriate for processing. In the next step, EDM-specific algorithms are applied to obtain different correlation rules. The models will provide information in different formats for analysis: numerical results of the coefficients, tables, diagrams, correlation matrices (an example is illustrated in Appendix 1 - Correlation matrix obtained with the DataLab application based on the results of the Olympiad of computer science). One of the most important rules for discovering knowledge is if-else. Several such rules can
  • 16. be defined in EDM: Association, Classification and Prediction (Klosgen & Zytkow, 2002). BRAIN – Broad Research in Artificial Intelligence and Neuroscience Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957 4 The teacher will analyze the results of the analyzes and study the degree of achievement of the initial goals. Depending on the conclusions, it may take the decision to improve the course and resume its evaluation process. This may prove to be a difficult process because opinions can differ significantly from one teacher to another in relation to the material and the way of interaction with the student the course offers. 3. Methods of data exploitation There are currently a wide variety of methods of exploiting data in the education system. These can be categorized into two broad categories according to the ways to achieve the objectives: ification, Regression, Outlier Detecting Discovery of data for human judgment (Sasu, 2014).
  • 17. Many of these are general DM methods: prediction, classification, grouping, exploitation of texts and others. But there are also specific EDM methods such as nonnegative matrix factorization and Knowledge tracing (KT) (Romero & Ventura, 2012). Here are some of these: Prediction The method can be used in education to predict students' behavior and outcomes. It is based on the creation of predictive models. In the training phase, they learn to make predictions about a set of variables called predictors by analyzing them in combination with other variables. Once the enrollment phase is completed, the patterns can be applied to the data sets for which the prediction is to be applied. It is known the study by Baker, Gowda, Corbett - Automatically detecting the student's preparation for future learning: help use is key (Baker, Gowda & Corbett, 2011). The authors create a tool for automatically predicting a student's future performance on the basis of establishing positive or negative correlations between various features such as: student test results, time spent in response, time elapsed between receiving a clue and typing the answer, and others. It is experienced on a group of students, and then applied to another group. The results are then compared to those obtained using the Bayesian Knowledge Tracing (BKT) model. Classification
  • 18. The method involves building a predictive model. The data in the training set is characterized by certain attributes. The model must identify belonging to a class based on the set of attributes. Suppose we built an educational software as an interactive game for a given theme. Based on user attributes such as age, gender, geographic area, duration until the game is completed, number of attempts we can build a classifier, and determine the user's belonging to a specific class. The model will learn to identify students. The analyzes can provide information on the need to use this educational method for certain age groups, interests and education. Methods that use the classification are: decision trees, neural networks, bayesian classifications, and others. Clustering The method involves building patterns that identify data clustering after certain similarities. For the model to provide quality predictions, the similarities inside class must be maximized and similarities between classes minimized. The use of this method in Romanian high school education could aim at grouping pupils according to the pupil's learning style (auditory, visual, practical - kinesthesis) based on the analysis of behavior in relation to certain educational products and pupils' characteristics. The prediction of such a model could lead to an effective recommendation of how
  • 19. to learn educational content. Thus, the instructional process could be carried out efficiently in relation to the learning particularities of each student. At present, there is an attempt to unfold the lessons in a way appropriate to the D. Marcu, M. Danubianu - Learning Analytics or Educational Data Mining? This is the Question... 5 students' learning styles, but the reality is that identifying learning styles is superficial. The results of the questionnaires are attached to the class catalog, but this does not lead, in most cases, to the improve teaching methods and techniques used in the lesson. In the absence of clear alternatives, the teacher has to improvise. The method is successfully used in the detection of plagiarism (Text Mining) and is also applied in the educational sphere. Outlier Detection The method involves creating patterns that detect data that have different features than others. In Romanian education, this method could be used to detect students with content assimilation problems, or those with aberrant behavior. In general, not only one EDM method is used in case studies. Outlier Detection methods can
  • 20. be used, for example, with data clustering techniques and decision tree classification as presented in the study by Ajith, Sai and Tejaswi (2013) - Evaluation of student performance: an outlier detection perspective (Ajith, Sai & Tejaswi, 2013). The study aims to identify learners with special learning needs to reduce the school failure rate. Input data are collected from: participation in student lessons, tests, notes on initial tests. In order to achieve the proposed objective, they try to find models for classifying students who will be helpful in setting up study groups. At present, in Romania, students in the high school education of state do not have the opportunity to trace the course matter in other groups than the classes they belong to. Moreover, pupils diagnosed as having special educational needs participate in classes with other colleagues. The teachers create for them specially programs. Then the courses are held by under the guidance of a single teacher who does not have any pedagogical and methodical experience related to the learning situation! There are special requirements for conducting the educational process. This based on grouping students within the same educational space within the same timeframe to go through different course materials. In the absence of a proper classification, alternative methods and means, and teachers with such experience, things happen more or less in a manner that leads to the best results. Discovery with Models Discovery with Models is the fifth category presented in Baker's
  • 21. Taxonomy (Baker, 2012). It is also one of the most widely used methods of data exploitation in the field of education. It is based on the use of a previously validated model as a component in analyzes that use prediction or exploitation of relationships in new contexts (Baker & Yacef, 2009). In this way information on educational materials that contribute most to educational progress can be obtained. A study carried out by Beck and Mostow in 2008 - How who should practice: Using learning decomposition to evaluate the efficacy of different types of practice for different types of students (Beck & Mostow, 2008) - on the analysis of different types of learners demonstrates that the method supports identifying relationships between student behavior and characteristics of variables used. Nonnegative Matrix Factorization (or Decomposition) There are several algorithms used for factoring the nonnegative matrix. This transforms (decomposes, factorizes) a matrix V into two W and H matrices with the property that they all have non-negative elements. This is very useful in applications such as determining the effectiveness of an evaluation system in which matrices contain elements related to: exams, abilities, and items. Matrix V is obtained from the product of the two smaller matrices as can be seen in Figure 3. ("Non-negative matrix factorization", 2019).
  • 22. Figure 3. Illustration of approximate non-negative matrix factorization. Source: wikipedia.org BRAIN – Broad Research in Artificial Intelligence and Neuroscience Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957 6 We propose to study the evaluation of two specific abilities defined on the columns of the matrix W for 4 work requirements (items), defined in the W matrix on the four lines. Matrix H will contain two lines representing the two abilities and 6 columns representing the assessed students. The result will be recorded in Matrix V that has 4 lines for each of the 4 items and 6 columns for each of the 6 students. A value of 1 in the W matrix indicates the need for a certain skill (Figure 4) (Desmarais, 2012). W I te m s
  • 23. skills 0 1 1 0 1 0 1 1 X H sk il ls students 1 1 1 0 1 1 0 0 1 1 0 0 ≈ V it em s students 0 0 1 1 0 0
  • 24. 1 1 1 0 1 1 1 1 1 0 1 1 1 1 2 1 1 1 Figure 4. Non-negative matrix factorization - example The first item requires the ability 2, W [1] [2] = 1. Only the 2 and 3 students have the ability 2, so item 1 will not be promoted by students 1, 2, 4 and 5. To promote Item 4 both skills are required. Only one of the candidates will promote this item with the maximum score. Using computerized analysis methods, interpretations can be obtained in a much shorter time and with great accuracy because machines are faster and more accurate than humans. 4. Learning Analysis (LA) Learning is the product of an interaction between learners and the learning environment, between among students / educators / teachers and others (Elias & Lias, 2011). The evaluation of learning, in the traditional sense, is based on the evaluation of student / pupil outcomes. This involves assessing knowledge but also trying to answer questions such as: how well this student needs, how can be improved, how to change the course interface to make it more accessible. At present, especially in the pre-university
  • 25. system, learning evaluation is based on questionnaires. Obtaining feed-back is lasting because the non- automatic data processing takes time and the analysis possibilities are quite limited. The desire to improve the quality of learning and assessment in the educational system is increasing at the international level, but also in our country. Traditional systems are confronted by huge amounts of data and their diversity. Learning Analytics (LA) attempts to answer questions about how this data can be used and how it can be transformed and analyzed to provide useful information that can give value to the learning process (Liu & Fan, 2014). In 2011, at the first International Conference on Learning Analysis (LAK 2011), the definition of the new research area, LA, was adopted as: "learning analysis is the measurement, collection, analysis and reporting of pupils and students and about the context of learning, in order to understand and optimize learning and its environments " (Siemens, 2011). Data analytics was first used in sales, also called Business Intelligence. This branch of research uses computer techniques to synthesize huge amounts of data and turn them into powerful tools for making the best marketing decisions. With the development of Web technologies, a branch of data analysis research, Web Analytics, has been developed. Web Analytics tools collect data about users of a site and report on their behavior. This leads to a better understanding of customers and making
  • 26. the best decisions to improve your browsing experience and to keep visitors to the site. D. Marcu, M. Danubianu - Learning Analytics or Educational Data Mining? This is the Question... 7 Learning Analytics borrows tools and methods used in Business Intelligence and Web Analytics to analyze educational data. At present, many universities, companies, and organizations are developing learning platforms for both students and lifelong learning. An enormous advantage of these is to personalize the learning experience and adapt it to the physical deficiencies of the learners. In a research conducted by the New Media Consortium and the EDUCAUSE Learning Initiative in 2016, areas that will have a particular impact on university education globally by 2020 are identified. One of these is Learning Analytics. In the research report LA is defined as an application in the educational field of Web Analytics. It focuses on the collection and detailed analysis of student interactions with online learning platforms (Johnson, Adams Becker & Cummins, 2016). A free example of a Web Analytics tool is provided by Google and is called Google Analytics. It provides sophisticated user behavior on a website and provides its administrators with reports about:
  • 27. many of them are new customers; With these reports, can create additional features, add more interesting content, enhance interactivity, customize the interface of the application based on the devices used for viewing. In the following figures (5,6,7) there are illustrated sections of various reports provided by this tool for the site https://www.modinfo.ro - a site dedicated to the preparation of the students from the Romanian high schools at the course of computer science. Figure 5 provides a diagram representation of the number of visitors per page of the site. We note that students are looking for baccalaureate content (bac.php), admission to faculty (admission.php) and additional training for performance (cex.php). Figure 5. User preferred content Figure 6 represents the percentage of visitors to the site over a fixed period, by age category.
  • 28. It can be seen that most users are aged between 25 and 34 years. For administrators, given the period under review, this reveals their student’s preoccupation for to prepare for the Computer Programming Exam. BRAIN – Broad Research in Artificial Intelligence and Neuroscience Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957 8 Figure 6. Demographics and interest categories - Age of users Figure 7 provides information on analyzing the active presence of a specific user on a site within a selected time interval. Figure 7. Behavior of a user on the site within a selected time range Choosing how to use and constructing analytics tools starts from the choice of quantifiable indicators that have to be defined according to the proposed objectives. Examples of such indicators for the educational environment:
  • 29. tool within the course and others. 4.1. Learning Analytics methods Methods used for learning analysis include: quality of the expression is analyzed. ty in relation to learning: Students interested in the topic will ask questions, access links to supplementary resources motivational learning. LA uses some methods of data mining as EDM. They can be classified in: Prediction, Clustering, Relationship mining, Discovery with models, Distillation of Data for Human Judgment (Nunn, Avella, Kanai, & Kebritchi, 2016). We will briefly describe the methods that have not already been presented in the previous section. D. Marcu, M. Danubianu - Learning Analytics or Educational Data Mining? This is the Question... 9
  • 30. Relationship mining It's a method that uses algorithms to find association rules to detect, for example, mistakes made by students when solving a set of exercises. Based on the associations made, one can predict a certain behavior of the student depending on the hypothesis of solving the problem from which he starts. Thus, the teacher or course manager can intervene in order for the pupil / student not to mistaken. There can be found, for example, relationships between other activities of the student (playing on the computer, talking to a chat room colleague) while solving his or her work tasks and erroneous answers (Baker, Corbett, Koedinger & Wagner, 2004). Distillation of Data for Human Judgment This method includes statistics and visualization techniques that help people understand data analytics. The method is the basis for the creation of many useful tools that provide clear analysis that can be quickly understood by unrelated users. An example is the formation of a map to group learners by the amount of heat emanating from their bodies during learning the instructional material. This can be done with sensors mounted on the body. The analysis provides real-time learning about learning performance indicators (Merceron, 2015).
  • 31. 5. Learning Analytics or Educational Data Mining? Educational Data Mining is a new field of research. It is based on the models, methods and algorithms built for DM. However, there are also specific methods of applying DM in education. The main purpose of EDM is to explore large sets of data from the educational system to create knowledge-extraction models from the data. The main objective is to provide useful information to education decision makers about existing correlations between sets of data that provide a deeper understanding of the educational needs of students and the system as a whole (de Almeida Neto & Castro, 2017). Learning Analytics is a newer field of research. It is based on data analysis techniques in Business Intelligence. LA uses highly sophisticated analysis tools and predictive models to improve learning. Most applications using LA have been created for the university system and are dedicated to early detection of concrete problems such as the risk of abandoning a course by certain students. LA also uses the expertise of other research areas, such as EDM and Web Analytics, with the same objectives of predicting learning outcomes and providing useful information for improving the quality of the learning process (Elias & Lias, 2011). EDM is at the intersection of areas such as artificial intelligence, machine learning, education, and statistics. Figure 8 shows the LA as an interdisciplinary subdomain of
  • 32. Business Intelligence, Statistics and Education. Figure 8. Educational Data Mining and Learning Analytics BRAIN – Broad Research in Artificial Intelligence and Neuroscience Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957 10 The two new areas of research are quite similar in terms of the aims pursued an methods used, but there are also some significant differences between them. Some of the most important resemblances and differences between EDM and LA are shown in Tables 1 and 2. Table 1. Similarities between EDM and LA EDM LA Both areas contribute to improving the quality of education and education policies in schools and universities, but in alternative education systems as well. It is a new field of research. In 2011, in Massachusetts USA, the International Working Group on EDM (established in 2007) created the International Society for EDM. The definition of this new field of research was adopted in 2011 at the first International Conference
  • 33. on Learning Analytics (LAK 2011). It is based on the exploitation of large data collections. It is based on analysis of large data collections. It is based on the formulation of specific research … Uncertainty in big data analytics: survey, opportunities, and challenges Reihaneh H. Hariri* , Erik M. Fredericks and Kate M. Bowers Introduction According to the National Security Agency, the Internet processes 1826 petabytes (PB) of data per day [1]. In 2018, the amount of data produced every day was 2.5 quintil- lion bytes [2]. Previously, the International Data Corporation (IDC) estimated that the amount of generated data will double every 2 years [3], however 90% of all data in the world was generated over the last 2 years, and moreover Google now processes more than 40,000 searches every second or 3.5 billion searches per day [2]. Facebook users upload 300 million photos, 510,000 comments, and 293,000 status updates per day [2, 4]. Needless to say, the amount of data generated on a daily basis is staggering. As a result, techniques are required to analyze and understand this massive amount of data, as it is a great source from which to derive useful information. Abstract Big data analytics has gained wide attention from both academia and industry as the
  • 34. demand for understanding trends in massive datasets increases. Recent developments in sensor networks, cyber-physical systems, and the ubiquity of the Internet of Things (IoT) have increased the collection of data (including health care, social media, smart cities, agriculture, finance, education, and more) to an enormous scale. However, the data collected from sensors, social media, financial records, etc. is inherently uncer- tain due to noise, incompleteness, and inconsistency. The analysis of such massive amounts of data requires advanced analytical techniques for efficiently reviewing and/ or predicting future courses of action with high precision and advanced decision- making strategies. As the amount, variety, and speed of data increases, so too does the uncertainty inherent within, leading to a lack of confidence in the resulting analytics process and decisions made thereof. In comparison to traditional data techniques and platforms, artificial intelligence techniques (including machine learning, natural lan- guage processing, and computational intelligence) provide more accurate, faster, and scalable results in big data analytics. Previous research and surveys conducted on big data analytics tend to focus on one or two techniques or specific application domains. However, little work has been done in the field of uncertainty when applied to big data analytics as well as in the artificial intelligence techniques applied to the datasets. This article reviews previous work in big data analytics and presents a discussion of open
  • 35. challenges and future directions for recognizing and mitigating uncertainty in this domain. Keywords: Big data, Uncertainty, Big data analytics, Artificial intelligence Open Access © The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. SURVEY PAPER Hariri et al. J Big Data (2019) 6:44 https://doi.org/10.1186/s40537-019-0206-3 *Correspondence: [email protected] edu Oakland University, Rochester, MI, USA http://orcid.org/0000-0003-2173-1331 http://creativecommons.org/licenses/by/4.0/ http://crossmark.crossref.org/dialog/?doi=10.1186/s40537-019- 0206-3&domain=pdf
  • 36. Page 2 of 16Hariri et al. J Big Data (2019) 6:44 Advanced data analysis techniques can be used to transform big data into smart data for the purposes of obtaining critical information regarding large datasets [5, 6]. As such, smart data provides actionable information and improves decision-making capabilities for organizations and companies. For example, in the field of health care, analytics per- formed upon big datasets (provided by applications such as Electronic Health Records and Clinical Decision Systems) may enable health care practitioners to deliver effective and affordable solutions for patients by examining trends in the overall history of the patient, in comparison to relying on evidence provided with strictly localized or current data. Big data analysis is difficult to perform using traditional data analytics [7] as they can lose effectiveness due to the five V’s characteristics of big data: high volume, low veracity, high velocity, high variety, and high value [7–9]. Moreover, many other charac- teristics exist for big data, such as variability, viscosity, validity, and viability [10]. Several artificial intelligence (AI) techniques, such as machine learning (ML), natural language processing (NLP), computational intelligence (CI), and data mining were designed to provide big data analytic solutions as they can be faster, more accurate, and more pre- cise for massive volumes of data [8]. The aim of these advanced analytic techniques is to discover information, hidden patterns, and unknown correlations in massive datasets
  • 37. [7]. For instance, a detailed analysis of historical patient data could lead to the detection of destructive disease at an early stage, thereby enabling either a cure or more optimal treatment plan [11, 12]. Additionally, risky business decisions (e.g., entering a new mar- ket or launching a new product) can profit from simulations that have better decision- making skills [13]. While big data analytics using AI holds a lot of promise, a wide range of challenges are introduced when such techniques are subjected to uncertainty. For instance, each of the V characteristics introduce numerous sources of uncertainty, such as unstructured, incomplete, or noisy data. Furthermore, uncertainty can be embedded in the entire ana- lytics process (e.g., collecting, organizing, and analyzing big data). For example, dealing with incomplete and imprecise information is a critical challenge for most data mining and ML techniques. In addition, an ML algorithm may not obtain the optimal result if the training data is biased in any way [14, 15]. Wang et al. [16] introduced six main chal- lenges in big data analytics, including uncertainty. They focus mainly on how uncertainty impacts the performance of learning from big data, whereas a separate concern lies in mitigating uncertainty inherent within a massive dataset. These challenges normally pre- sent in data mining and ML techniques. Scaling these concerns up to the big data level will effectively compound any errors or shortcomings of the entire analytics process.
  • 38. Therefore, mitigating uncertainty in big data analytics must be at the forefront of any automated technique, as uncertainty can have a significant influence on the accuracy of its results. Based on our examination of existing research, little work has been done in terms of how uncertainty significantly impacts the confluence of big data and the analytics tech- niques in use. To address this shortcoming, this article presents an overview of the existing AI techniques for big data analytics, including ML, NLP, and CI from the per- spective of uncertainty challenges, as well as suitable directions for future research in these domains. The contributions of this work are as follows. First, we consider uncer- tainty challenges in each of the 5 V’s big data characteristics. Second, we review several Page 3 of 16Hariri et al. J Big Data (2019) 6:44 techniques on big data analytics with impact of uncertainty for each technique, and also review the impact of uncertainty on several big data analytic techniques. Third, we dis- cuss available strategies to handle each challenge presented by uncertainty. To the best of our knowledge, this is the first article surveying uncertainty in big data analytics. The remainder of the paper is organized as follows. “Background” section pre-
  • 39. sents background information on big data, uncertainty, and big data analytics. “Uncer- tainty perspective of big data analytics” section considers challenges and opportunities regarding uncertainty in different AI techniques for big data analytics. “Summary of mit- igation strategies” section correlates the surveyed works with their respective uncertain- ties. Lastly, “Discussion” section summarizes this paper and presents future directions of research. Background This section reviews background information on the main characteristics of big data, uncertainty, and the analytics processes that address the uncertainty inherent in big data. Big data In May 2011, big data was announced as the next frontier for productivity, innovation, and competition [11]. In 2018, the number of Internet users grew 7.5% from 2016 to over 3.7 billion people [2]. In 2010, over 1 zettabyte (ZB) of data was generated worldwide and rose to 7 ZB by 2014 [17]. In 2001, the emerging characteristics of big data were defined with three V’s (Volume, Velocity, and Variety) [18]. Similarly, IDC defined big data using four V’s (Volume, Variety, Velocity, and Value) in 2011 [19]. In 2012, Veracity was introduced as a fifth characteristic of big data [20–22]. While many other V’s exist [10], we focus on the five most common characteristics of big data, as next illustrated in
  • 40. Fig. 1. Volume refers to the massive amount of data generated every second and applies to the size and scale of a dataset. It is impractical to define a universal threshold for big data volume (i.e., what constitutes a ‘big dataset’) because the time and type of data can influ- ence its definition [23]. Currently, datasets that reside in the exabyte (EB) or ZB ranges are generally considered as big data [8, 24], however challenges still exist for datasets in smaller size ranges. For example, Walmart collects 2.5 PB from over a million custom- ers every hour [25]. Such huge volumes of data can introduce scalability and uncertainty problems (e.g., a database tool may not be able to accommodate infinitely large datasets). Many existing data analysis techniques are not designed for large-scale databases and can fall short when trying to scan and understand the data at scale [8, 15]. Variety refers to the different forms of data in a dataset including structured data, semi-structured data, and unstructured data. Structured data (e.g., stored in a rela- tional database) is mostly well-organized and easily sorted, but unstructured data (e.g., text and multimedia content) is random and difficult to analyze. Semi-structured data (e.g., NoSQL databases) contains tags to separate data elements [23, 26], but enforcing this structure is left to the database user. Uncertainty can manifest when converting between different data types (e.g., from unstructured
  • 41. to structured data), in representing data of mixed data types, and in changes to the underlying struc- ture of the dataset at run time. From the point of view of variety, traditional big data Page 4 of 16Hariri et al. J Big Data (2019) 6:44 analytics algorithms face challenges for handling multi-modal, incomplete and noisy data. Because such techniques (e.g., data mining algorithms) are designed to consider well-formatted input data, they may not be able to deal with incomplete and/or dif- ferent formats of input data [7]. This paper focuses on uncertainty with regard to big data analytics, however uncertainty can impact the dataset itself as well. Efficiently analysing unstructured and semi-structured data can be challenging, as the data under observation comes from heterogeneous sources with a variety of data types and representations. For example, real-world databases are negatively influenced by inconsistent, incomplete, and noisy data. Therefore, a number of data preprocessing techniques, including data cleaning, data integrating, and data trans- forming used to remove noise from data [27]. Data cleaning techniques address data quality and uncertainty problems resulting from variety in big data (e.g., noise and inconsistent data). Such techniques for removing noisy objects
  • 42. during the analysis process can significantly enhance the performance of data analysis. For example, data cleaning for error detection and correction is facilitated by identifying and eliminat- ing mislabeled training samples, ideally resulting in an improvement in classification accuracy in ML [28]. Velocity comprises the speed (represented in terms of batch, near-real time, real time, and streaming) of data processing, emphasizing that the speed with which the data is processed must meet the speed with which the data is produced [8]. For example, Inter- net of Things (IoT) devices continuously produce large amounts of sensor data. If the device monitors medical information, any delays in processing the data and sending the results to clinicians may result in patient injury or death (e.g., a pacemaker that reports emergencies to a doctor or facility) [20]. Similarly, devices in the cyber-physical domain often rely on real-time operating systems enforcing strict timing standards on execution, Fig. 1 Common big data characteristics Page 5 of 16Hariri et al. J Big Data (2019) 6:44 and as such, may encounter problems when data provided from a big data application fails to be delivered on time.
  • 43. Veracity represents the quality of the data (e.g., uncertain or imprecise data). For example, IBM estimates that poor data quality costs the US economy $3.1 trillion per year [21]. Because data can be inconsistent, noisy, ambiguous, or incomplete, data verac- ity is categorized as good, bad, and undefined. Due to the increasingly diverse sources and variety of data, accuracy and trust become more difficult to establish in big data analytics. For example, an employee may use Twitter to share official corporate informa- tion but at other times use the same account to express personal opinions, causing prob- lems with any techniques designed to work on the Twitter dataset. As another example, when analyzing millions of health care records to determine or detect disease trends, for instance to mitigate an outbreak that could impact many people, any ambiguities or inconsistencies in the dataset can interfere or decrease the precision of the analytics pro- cess [21]. Value represents the context and usefulness of data for decision making, whereas the prior V’s focus more on representing challenges in big data. For example, Facebook, Google, and Amazon have leveraged the value of big data via analytics in their respective products. Amazon analyzes large datasets of users and their purchases to provide prod- uct recommendations, thereby increasing sales and user participation. Google collects location data from Android users to improve location services in Google Maps. Face-
  • 44. book monitors users’ activities to provide targeted advertising and friend recommenda- tions. These three companies have each become massive by examining large sets of raw data and drawing and retrieving useful insight to make better business decisions [29]. Uncertainty Generally, “uncertainty is a situation which involves unknown or imperfect information” [30]. Uncertainty exists in every phase of big data learning [7] and comes from many dif- ferent sources, such as data collection (e.g., variance in environmental conditions and issues related to sampling), concept variance (e.g., the aims of analytics do not present similarly) and multimodality (e.g., the complexity and noise introduced with patient health records from multiple sensors include numerical, textual, and image data). For instance, most of the attribute values relating to the timing of big data (e.g., when events occur/have occurred) are missing due to noise and incompleteness. Furthermore, the number of missing links between data points in social networks is approximately 80% to 90% and the number of missing attribute values within patient reports transcribed from doctor diagnoses are more than 90% [31]. Based on IBM research in 2014, industry ana- lysts believe that, by 2015, 80% of the world’s data will be uncertain [32]. Various forms of uncertainty exist in big data and big data analytics that may nega-
  • 45. tively impact the effectiveness and accuracy of the results. For example, if training data is biased in any way, incomplete, or obtained through inaccurate sampling, the learning algorithm using corrupted training data will likely output inaccurate results. Therefore, it is critical to augment big data analytic techniques to handle uncertainty. Recently, meta-analysis studies that integrate uncertainty and learning from data have seen a sharp increase [33–35]. The handling of the uncertainty embedded in the entire process of data analytics has a significant effect on the performance of learning Page 6 of 16Hariri et al. J Big Data (2019) 6:44 from big data [16]. Other research also indicates that two more features for big data, such as multimodality (very complex types of data) and changed-uncertainty (the modeling and measure of uncertainty for big data) is remarkably different from that of small-size data. There is also a positive correlation in increasing the size of a dataset to the uncertainty of data itself and data processing [34]. For example, fuzzy sets may be applied to model uncertainty in big data to combat vague or incorrect information [36]. Moreover, and because the data may contain hidden relationships, the uncer- tainty is further increased. Therefore, it is not an easy task to evaluate uncertainty in big
  • 46. data, especially when the data may have been collected in a manner that creates bias. To combat the many types of uncertainty that exist, many theories and techniques have been developed to model its various forms. We next describe several common techniques. Bayesian theory assumes a subjective interpretation of the probability based on past event/prior knowledge. In this interpretation the probability is defined as an expres- sion of a rational agent’s degrees of belief about uncertain propositions [37]. Belief function theory is a framework for aggregating imperfect data through an informa- tion fusion process when under uncertainty [38]. Probability theory incorporates randomness and generally deals with the statistical characteristics of the input data [34]. Classification entropy measures ambiguity between classes to provide an index of confidence when classifying. Entropy varies on a scale from zero to one, where val- ues closer to zero indicate more complete classification in a single class, while values closer to one indicate membership among several different classes [39]. Fuzziness is used to measure uncertainty in classes, notably in human language (e.g., good and bad) [16, 33, 40]. Fuzzy logic then handles the uncertainty associated with human perception by creating an approximate reasoning mechanism [41, 42]. The method- ology was intended to imitate human reasoning to better handle uncertainty in the
  • 47. real world [43]. Shannon’s entropy quantifies the amount of information in a variable to determine the amount of missing information on average in a random source [44, 45]. The concept of entropy in statistics was introduced into the theory of communi- cation and transmission of information by Shannon [46]. Shannon entropy provides a method of information quantification when it is not possible to measure crite- ria weights using a decision–maker. Rough set theory provides a mathematical tool for reasoning on vague, uncertain or incomplete information. With the rough set approach, concepts are described by two approximations (upper and lower) instead of one precise concept [47], making such methods invaluable to dealing with uncertain information systems [48]. Probabilistic theory and Shannon’s entropy are often used to model imprecise, incomplete, and inaccurate data. Moreover, fuzzy set and rough theory are used for modeling vague or ambiguous data [49], as shown in Fig. 2. Evaluating the level of uncertainty is a critical step in big data analytics. Although a variety of techniques exist to analyze big data, the accuracy of the analysis may be negatively affected if uncertainty in the data or the technique itself is ignored. Uncer- tainty models such as probability theory, fuzziness, rough set theory, etc. can be used to augment big data analytic techniques to provide more accurate and more mean- ingful results. Based on the previous research, Bayesian model
  • 48. and fuzzy set theory are common for modeling uncertainty and decision-making. Table 1 compares and Page 7 of 16Hariri et al. J Big Data (2019) 6:44 summarizes the techniques we have identified as relevant, including a comparison between different uncertainty strategies, focusing on probabilistic theory, Shannon’s entropy, fuzzy set theory, and rough set theory. Big data analytics Big data analytics describe the process of analyzing massive datasets to discover pat- terns, unknown correlations, market trends, user preferences, and other valuable information that previously could not be analyzed with traditional tools [52]. With the formalization of the big data’s five V characteristics, analysis techniques needed to be reevaluated to overcome their limitations on processing in terms of time and space [29]. Opportunities for utilizing big data are growing in the modern world of digital data. The global annual growth rate of big data technologies and services is Measuring uncertainty in big data Imprecise, inaccurate, and incomplete data
  • 49. Probability Theory Shannon's Entropy Vague or ambiguous data Fuzzy Set Theory Rough Set Theory Fig. 2 Measuring uncertainty in big data Table 1 Comparison of uncertainty strategies Uncertainty models Features Probability theory Bayesian theory Shannon’s entropy Powerful for handling randomness and subjective uncertainty where precision is required Capable of handling complex data [50] Fuzziness Handles vague and imprecise information in systems that are difficult to model Precision not guaranteed Easy to implement and interpret [50] Belief function Handle situations with some degree of ignorance Combines distinct evidence from several sources to compute the
  • 50. probability of specific hypotheses Considers all evidence available for the hypothesis Ideal for incomplete and high complex data Mathematically complex but improves uncertainty reduction [50] Rough set theory Provides an objective form of analysis [47] Deals with vagueness in data Minimal information necessary to determine set membership Only uses the information presented within the given data [51] Classification entropy Handles ambiguity between the classes [39] Page 8 of 16Hariri et al. J Big Data (2019) 6:44 predicted to increase about 36% between 2014 and 2019, with the global income for big data and business analytics anticipated to increase more than 60% [53]. Several advanced data analysis techniques (i.e., ML, data mining, NLP, and CI) and potential strategies such as parallelization, divide-and-conquer, incremental learn- ing, sampling, granular computing, feature selection [16], and instance selection [34] can convert big problems to small problems and can be used to make better deci- sions, reduce costs, and enable more efficient processing. With respect to big data analytics, parallelization reduces
  • 51. computation time by splitting large problems into smaller instances of itself and performing the smaller tasks simultaneously (e.g., distributing the smaller tasks across multiple threads, cores, or processors). Parallelization does not decrease the amount of work per- formed but rather reduces computation time as the small tasks are completed at the same point in time instead of one after another sequentially [16]. The divide-and-conquer strategy plays an important role in processing big data. Divide-and-conquer consists of three phases: (1) reduce one large problem into sev- eral smaller problems, (2) complete the smaller problems, where the solving of each small problem contributes to the solving of the large problem, and (3) incorporate the solutions of the smaller problems into one large solution such that the large problem is considered solved. For many years the divide-and- conquer strategy has been used in very massive databases to manipulate records in groups rather than all the data at once [54]. Incremental learning is a learning algorithm popularly used with streaming data that is trained only with new data rather than only training with existing data. Incre- mental learning adjusts the parameters in the learning algorithm over time accord- ing to each new input data and each input is used for training only once [16].
  • 52. Sampling can be used as a data reduction method for big data analytics for deriv- ing patterns in large data sets by choosing, manipulating, and analyzing a subset of the data [16, 55]. Some research indicates that obtaining effective results using sam- pling depends on the data sampling criteria used [56]. Granular computing groups elements from a large space to simplify the elements into subsets, or granules [57, 58]. Granular computing is an effective approach to define uncertainty of objects in the search space as it reduces large objects to a smaller search space [59]. Feature selection is a conventional approach to handle big data with the purpose of choosing a subset of relative features for an aggregate but more precise data repre- sentation [60, 61]. Feature selection is a very useful strategy in data mining for pre- paring high-scale data [60]. Instance selection is practical in many ML or data mining tasks as a major feature in data pre-processing. By utilizing instance selection, it is possible to reduce train- ing sets and runtime in the classification or training phases [62]. The costs of uncertainty (both monetarily and computationally) and challenges in generating effective models for uncertainties in big data analytics have become
  • 53. key to obtaining robust and performant systems. As such, we examine several open issues of the impacts of uncertainty on big data analytics in the next section. Page 9 of 16Hariri et al. J Big Data (2019) 6:44 Uncertainty perspective of big data analytics This section examines the impact of uncertainty on three AI techniques for big data ana- lytics. Specifically, we focus on ML, NLP, and CI, although many other analytics tech- niques exist. For each presented technique, we examine the inherent uncertainties and discuss methods and strategies for their mitigation. Machine learning and big data When dealing with data analytics, ML is generally used to create models for predic- tion and knowledge discovery to enable data-driven decision- making. Traditional ML methods are not computationally efficient or scalable enough to handle both the char- acteristics of big data (e.g., large volumes, high speeds, varying types, low value density, incompleteness) and uncertainty (e.g., biased training data, unexpected data types, etc.). Several commonly used advanced ML techniques proposed for big data analysis include feature learning, deep learning, transfer learning, distributed learning, and active learn- ing. Feature learning includes a set of techniques that enables a system to automatically
  • 54. discover the representations needed for feature detection or classification from raw data. The performances of the ML algorithms are strongly influenced by the selection of data representation. Deep learning algorithms are designed for analyzing and extracting valu- able knowledge from massive amounts of data and data collected from various sources (e.g., separate variations within an image, such as a light, various materials, and shapes) [56], however current deep learning models incur a high computational cost. Distrib- uted learning can be used to mitigate the scalability problem of traditional ML by carry- ing out calculations on data sets distributed among several workstations to scale up the learning process [63]. Transfer learning is the ability to apply knowledge learned in one context to new contexts, effectively improving a learner from one domain by transfer- ring information from a related domain [64]. Active learning refers to algorithms that employ adaptive data collection [65] (i.e., processes that automatically adjust param- eters to collect the most useful data as quickly as possible) in order to accelerate ML activities and overcome labeling problems. The uncertainty challenges of ML techniques can be mainly attributed to learning from data with low veracity (i.e., uncertain and incomplete data) and data with low value (i.e., unrelated to the current problem). We found that, among the ML techniques, active learning, deep learning, and fuzzy logic theory are uniquely suited to support the challenge of reducing uncertainty, as shown
  • 55. in Fig. 3. Uncertainty can impact ML in terms of incomplete or imprecise training sam- ples, unclear classification boundaries, and rough knowledge of the target data. In some cases, the data is represented without labels, which can become a challenge. Manually labeling large data collections can be an expensive and strenuous task, yet learning from unlabeled data is very … Research Paper – Data Science & Big Data Analytics While this week’s topic highlighted the uncertainty of Big Data, the author identified the following as areas for future research. Pick one of the following for your Research paper. · Additional study must be performed on the interactions between each big data characteristic, as they do not exist separately but naturally interact in the real world. · The scalability and efficacy of existing analytics techniques being applied to big data must be empirically examined. · New techniques and algorithms must be developed in ML and NLP to handle the real-time needs for decisions made based on enormous amounts of data. · More work is necessary on how to efficiently model uncertainty in ML and NLP, as well as how to represent uncertainty resulting from big data analytics. · Since the CI algorithms are able to find an approximate solution within a reasonable time, they have been used to tackle ML problems and uncertainty challenges in data analytics and process in recent years. Your paper should meet the following requirements: • Be approximately 3-5 pages in length, not including the required cover page and reference page. • Follow APA guidelines. Your paper should include an introduction, a body with fully developed content, and a conclusion.
  • 56. • Support your response with the readings from the course and at least five peer-reviewed articles or scholarly journals to support your positions, claims, and observations. The UC Library is a great place to find resources. • Be clear with well-written, concise, using excellent grammar and style techniques. You are being graded in part on the quality of your writing. References: Marcu, D., & Danubianu, M. (2019). Learning Analytics or Educational Data Mining? This is the Question. BRAIN: Broad Research in Artificial Intelligence & Neuroscience, 10, 1–14. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&AuthType= shib&db=a9h&AN=139367236&site=eds-live Hariri, R.H., Fredericks, E.M. & Bowers, K.M. J Big Data (2019) 6: 44. https://doi.org/10.1186/s40537-019-0206-3