HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
Â
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
RANDOM TESTS COMBINING MATHEMATICA PACKAGE AND LATEX COMPILERijseajournal
Â
This paper presents a competent and useful way to elaborate random exams by using Mathematica and
LATEX. With these two tools, the authors suggest how to generate, in an easy way, different PDF
documents containing different models of exams. The main idea is to provide a support to professors who
have to manage groups of large number of students that should take different exams along the term, or even
though not being groups of numerous students, it may be useful when different models of exams want to be
provided to the students. The underlying advantage in this paper is the use of the Mathematica package for
this purpose in a simple way, similarly as it has been done with alternative software. We present in this
paper, some models of exams produced in the context in which the authors work.
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
Â
With the increased number of web databases, major part of deep web is one of the bases of database. In several search engines, encoded data in the returned resultant pages from the web often comes from structured databases which are referred as Web databases (WDB).
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpediaMarco Fossati
Â
Talk given by fellow Claus Stadler at the 11th International Conference on Semantic Systems - SEMANTiCS 2015
Paper available here: http://jens-lehmann.org/files/2015/semantics_dbtax.pdf
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
Â
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
RANDOM TESTS COMBINING MATHEMATICA PACKAGE AND LATEX COMPILERijseajournal
Â
This paper presents a competent and useful way to elaborate random exams by using Mathematica and
LATEX. With these two tools, the authors suggest how to generate, in an easy way, different PDF
documents containing different models of exams. The main idea is to provide a support to professors who
have to manage groups of large number of students that should take different exams along the term, or even
though not being groups of numerous students, it may be useful when different models of exams want to be
provided to the students. The underlying advantage in this paper is the use of the Mathematica package for
this purpose in a simple way, similarly as it has been done with alternative software. We present in this
paper, some models of exams produced in the context in which the authors work.
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
Â
With the increased number of web databases, major part of deep web is one of the bases of database. In several search engines, encoded data in the returned resultant pages from the web often comes from structured databases which are referred as Web databases (WDB).
Unsupervised Learning of an Extensive and Usable Taxonomy for DBpediaMarco Fossati
Â
Talk given by fellow Claus Stadler at the 11th International Conference on Semantic Systems - SEMANTiCS 2015
Paper available here: http://jens-lehmann.org/files/2015/semantics_dbtax.pdf
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The D-basis Algorithm for Association Rules of High ConfidenceITIIIndustries
Â
We develop a new approach for distributed computing of the association rules of high confidence on the attributes/columns of a binary table. It is derived from the D-basis algorithm developed by K.Adaricheva and J.B.Nation (Theoretical Computer Science, 2017), which runs multiple times on sub-tables of a given binary table, obtained by removing one or more rows. The sets of rules retrieved at these runs are then aggregated. This allows us to obtain a basis of association rules of high confidence, which can be used for ranking all attributes of the table with respect to a given fixed attribute. This paper focuses on some algorithmic details and the technical implementation of the new algorithm. Results are given for tests performed on random, synthetic and real data
Day by day data is increasing, and most of the data stored in a database after manual transformations and derivations. Scientists can facilitate data intensive applications to study and understand the behaviour of a complex system. In a data intensive application, a scientific model facilitates raw data products to produce new data products and that data is collected from various sources such as physical, geological, environmental, chemical and biological etc. Based on the generated output, it is important to have the ability of tracing an output data product back to its source values if that particular output seems to have an unexpected value. Data provenance helps scientists to investigate the origin of an unexpected value. In this paper our aim is to find a reason behind the unexpected value from a database using query inversion and we are going to propose some hypothesis to make an inverse query for complex aggregation function and multiple relationship (join, set operation) function.
AbstractâSince the demand for information retrieval increases quickly, indexing structures became an important issue to support fast information retrieval. According to the work in this paper, a new data structure called Dynamic Ordered Multi-field Index (DOMI) for information retrieval has been introduced. It is based on radix trees organized in segments in addition to a hash table to point to the roots of each segment, where each segment is dedicated to store the values of a single field. The hash table is used to access the needed segments directly without traversing the upper segments. So, DOMI improves look-up performance for queries addressing to a single field. In the case of multiple queries addressing, each segment of the radix tree is traversed sequentially without visiting the unrelated branches. The use of segmentation for the proposed DOMI provides flexibility for minimizing communication overhead in the distributed system. Every field in the radix tree is represented by one segment, where each segment can be stored as one block.
In addition to, the proposed DOMI consumes less space comparing to indexes which are built using B or B+ trees. Hence, it is more suitable for intensive-data such as Big Data.
A SURVEY OF EMPLOYERSâ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATESijcseit
Â
ABSTRACT
Motivated by concern about the ability of graduates to succeed in the workforce, universities frequently conduct surveys of local and regional employers, to understand those companiesâ expectations. These can uncover specific needs not being addressed. Following a similar line of inquiry, prior research at Oregon State University interviewed employers, with the aim of identifying skills of concern. The current paper takes this research another step further by presenting a survey-based study aimed at quantifying the prevalence and level of employersâ desire for workers who have these identified skills. Although all skills were rated as moderately useful or better, most soft skills scored higher than most technical skills. Nonetheless, three technical skills (source code versioning, testing and agile methods) scored approximately as well as the soft skills; these three technical skills, like soft skills, were cross-cutting and applicable to more than one software development context. Further survey questions revealed that employers preferred that, to the extent that students focus on building technical skill, these learning experiences ideally should involve creating software that students can use as evidence of their qualifications.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Â
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled âAda-GAâ where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
Empirical Study on Classification Algorithm For Evaluation of Students Academ...iosrjce
Â
Data mining techniques (DMT) are extensively used in educational field to find new hidden patterns
from studentâs data. In recent years, the greatest issues that educational institutions are facing the unstable
expansion of educational data and to utilize this information data to progress the quality of managerial
decisions. Educational institutions are playing a prominent role in the public and also playing an essential role
for enlargement and progress of nation. The idea is predicting the paths of students, thus identifying the student
achievement. The data mining methods are very useful in predicting the educational database. Educational data
mining is concerns with improving techniques for determining knowledge from data which comes from the
educational database. However it has issue with accuracy of classification algorithms. To overcome this
problem the higher accuracy of the classification J48 algorithm is used. This work takes consideration with the
locality and the performance of the student in education in order to analyse the student achievement is high over
schooling or in graduation
Educational Data Mining is a growing trend in case of higher education. The quality of the Educational
Institute may be enhanced through discovering hidden knowledge from the student databases/ data
warehouses. Present paper is designed to carry out a comparative study with the TDC (Three Year Degree)
Course students of different colleges affiliated to Dibrugarh University. The study is conducted with major
subject wise, gender wise and category/caste wise. The experimental results may be visualized with
Scatterplot3D, Bubble Plot, Fit Y by X, Run Chart, Control Chart etc. of the SAS JMP Software.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Studentâs
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
A SURVEY OF EMPLOYERSâ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES ijcseit
Â
Motivated by concern about the ability of graduates to succeed in the workforce, universities frequently
conduct surveys of local and regional employers, to understand those companiesâ expectations. These can
uncover specific needs not being addressed. Following a similar line of inquiry, prior research at Oregon
State University interviewed employers, with the aim of identifying skills of concern. The current paper
takes this research another step further by presenting a survey-based study aimed at quantifying the
prevalence and level of employersâ desire for workers who have these identified skills. Although all skills
were rated as moderately useful or better, most soft skills scored higher than most technical skills.
Nonetheless, three technical skills (source code versioning, testing and agile methods) scored
approximately as well as the soft skills; these three technical skills, like soft skills, were cross-cutting and
applicable to more than one software development context. Further survey questions revealed that
employers preferred that, to the extent that students focus on building technical skill, these learning
experiences ideally should involve creating software that students can use as evidence of their
qualifications.
The D-basis Algorithm for Association Rules of High ConfidenceITIIIndustries
Â
We develop a new approach for distributed computing of the association rules of high confidence on the attributes/columns of a binary table. It is derived from the D-basis algorithm developed by K.Adaricheva and J.B.Nation (Theoretical Computer Science, 2017), which runs multiple times on sub-tables of a given binary table, obtained by removing one or more rows. The sets of rules retrieved at these runs are then aggregated. This allows us to obtain a basis of association rules of high confidence, which can be used for ranking all attributes of the table with respect to a given fixed attribute. This paper focuses on some algorithmic details and the technical implementation of the new algorithm. Results are given for tests performed on random, synthetic and real data
Day by day data is increasing, and most of the data stored in a database after manual transformations and derivations. Scientists can facilitate data intensive applications to study and understand the behaviour of a complex system. In a data intensive application, a scientific model facilitates raw data products to produce new data products and that data is collected from various sources such as physical, geological, environmental, chemical and biological etc. Based on the generated output, it is important to have the ability of tracing an output data product back to its source values if that particular output seems to have an unexpected value. Data provenance helps scientists to investigate the origin of an unexpected value. In this paper our aim is to find a reason behind the unexpected value from a database using query inversion and we are going to propose some hypothesis to make an inverse query for complex aggregation function and multiple relationship (join, set operation) function.
AbstractâSince the demand for information retrieval increases quickly, indexing structures became an important issue to support fast information retrieval. According to the work in this paper, a new data structure called Dynamic Ordered Multi-field Index (DOMI) for information retrieval has been introduced. It is based on radix trees organized in segments in addition to a hash table to point to the roots of each segment, where each segment is dedicated to store the values of a single field. The hash table is used to access the needed segments directly without traversing the upper segments. So, DOMI improves look-up performance for queries addressing to a single field. In the case of multiple queries addressing, each segment of the radix tree is traversed sequentially without visiting the unrelated branches. The use of segmentation for the proposed DOMI provides flexibility for minimizing communication overhead in the distributed system. Every field in the radix tree is represented by one segment, where each segment can be stored as one block.
In addition to, the proposed DOMI consumes less space comparing to indexes which are built using B or B+ trees. Hence, it is more suitable for intensive-data such as Big Data.
A SURVEY OF EMPLOYERSâ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATESijcseit
Â
ABSTRACT
Motivated by concern about the ability of graduates to succeed in the workforce, universities frequently conduct surveys of local and regional employers, to understand those companiesâ expectations. These can uncover specific needs not being addressed. Following a similar line of inquiry, prior research at Oregon State University interviewed employers, with the aim of identifying skills of concern. The current paper takes this research another step further by presenting a survey-based study aimed at quantifying the prevalence and level of employersâ desire for workers who have these identified skills. Although all skills were rated as moderately useful or better, most soft skills scored higher than most technical skills. Nonetheless, three technical skills (source code versioning, testing and agile methods) scored approximately as well as the soft skills; these three technical skills, like soft skills, were cross-cutting and applicable to more than one software development context. Further survey questions revealed that employers preferred that, to the extent that students focus on building technical skill, these learning experiences ideally should involve creating software that students can use as evidence of their qualifications.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Â
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled âAda-GAâ where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
Empirical Study on Classification Algorithm For Evaluation of Students Academ...iosrjce
Â
Data mining techniques (DMT) are extensively used in educational field to find new hidden patterns
from studentâs data. In recent years, the greatest issues that educational institutions are facing the unstable
expansion of educational data and to utilize this information data to progress the quality of managerial
decisions. Educational institutions are playing a prominent role in the public and also playing an essential role
for enlargement and progress of nation. The idea is predicting the paths of students, thus identifying the student
achievement. The data mining methods are very useful in predicting the educational database. Educational data
mining is concerns with improving techniques for determining knowledge from data which comes from the
educational database. However it has issue with accuracy of classification algorithms. To overcome this
problem the higher accuracy of the classification J48 algorithm is used. This work takes consideration with the
locality and the performance of the student in education in order to analyse the student achievement is high over
schooling or in graduation
Educational Data Mining is a growing trend in case of higher education. The quality of the Educational
Institute may be enhanced through discovering hidden knowledge from the student databases/ data
warehouses. Present paper is designed to carry out a comparative study with the TDC (Three Year Degree)
Course students of different colleges affiliated to Dibrugarh University. The study is conducted with major
subject wise, gender wise and category/caste wise. The experimental results may be visualized with
Scatterplot3D, Bubble Plot, Fit Y by X, Run Chart, Control Chart etc. of the SAS JMP Software.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Studentâs
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
A SURVEY OF EMPLOYERSâ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES ijcseit
Â
Motivated by concern about the ability of graduates to succeed in the workforce, universities frequently
conduct surveys of local and regional employers, to understand those companiesâ expectations. These can
uncover specific needs not being addressed. Following a similar line of inquiry, prior research at Oregon
State University interviewed employers, with the aim of identifying skills of concern. The current paper
takes this research another step further by presenting a survey-based study aimed at quantifying the
prevalence and level of employersâ desire for workers who have these identified skills. Although all skills
were rated as moderately useful or better, most soft skills scored higher than most technical skills.
Nonetheless, three technical skills (source code versioning, testing and agile methods) scored
approximately as well as the soft skills; these three technical skills, like soft skills, were cross-cutting and
applicable to more than one software development context. Further survey questions revealed that
employers preferred that, to the extent that students focus on building technical skill, these learning
experiences ideally should involve creating software that students can use as evidence of their
qualifications.
A SURVEY OF EMPLOYERSâ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES ijcseit
Â
Motivated by concern about the ability of graduates to succeed in the workforce, universities frequently
conduct surveys of local and regional employers, to understand those companiesâ expectations. These can
uncover specific needs not being addressed. Following a similar line of inquiry, prior research at Oregon
State University interviewed employers, with the aim of identifying skills of concern. The current paper
takes this research another step further by presenting a survey-based study aimed at quantifying the
prevalence and level of employersâ desire for workers who have these identified skills. Although all skills
were rated as moderately useful or better, most soft skills scored higher than most technical skills.
Nonetheless, three technical skills (source code versioning, testing and agile methods) scored
approximately as well as the soft skills; these three technical skills, like soft skills, were cross-cutting and
applicable to more than one software development context. Further survey questions revealed that
employers preferred that, to the extent that students focus on building technical skill, these learning
experiences ideally should involve creating software that students can use as evidence of their
qualifications.
A SURVEY OF EMPLOYERSâ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES ijcseit
Â
Motivated by concern about the ability of graduates to succeed in the workforce, universities frequently
conduct surveys of local and regional employers, to understand those companiesâ expectations. These can
uncover specific needs not being addressed. Following a similar line of inquiry, prior research at Oregon
State University interviewed employers, with the aim of identifying skills of concern. The current paper
takes this research another step further by presenting a survey-based study aimed at quantifying the
prevalence and level of employersâ desire for workers who have these identified skills. Although all skills
were rated as moderately useful or better, most soft skills scored higher than most technical skills.
Nonetheless, three technical skills (source code versioning, testing and agile methods) scored
approximately as well as the soft skills; these three technical skills, like soft skills, were cross-cutting and
applicable to more than one software development context. Further survey questions revealed that
employers preferred that, to the extent that students focus on building technical skill, these learning
experiences ideally should involve creating software that students can use as evidence of their
qualifications.
A SURVEY OF EMPLOYERSâ NEEDS FOR TECHNICAL AND SOFT SKILLS AMONG NEW GRADUATES ijcseit
Â
Motivated by concern about the ability of graduates to succeed in the workforce, universities frequently
conduct surveys of local and regional employers, to understand those companiesâ expectations. These can
uncover specific needs not being addressed. Following a similar line of inquiry, prior research at Oregon
State University interviewed employers, with the aim of identifying skills of concern. The current paper
takes this research another step further by presenting a survey-based study aimed at quantifying the
prevalence and level of employersâ desire for workers who have these identified skills. Although all skills
were rated as moderately useful or better, most soft skills scored higher than most technical skills.
Nonetheless, three technical skills (source code versioning, testing and agile methods) scored
approximately as well as the soft skills; these three technical skills, like soft skills, were cross-cutting and
applicable to more than one software development context. Further survey questions revealed that
employers preferred that, to the extent that students focus on building technical skill, these learning
experiences ideally should involve creating software that students can use as evidence of their
qualifications.
A Model for Predicting Studentsâ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Â
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting studentsâ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting studentsâ performance and also used
different algorithms in predicting studentsâ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
A Model for Predicting Studentsâ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Â
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting studentsâ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting studentsâ performance and also used
different algorithms in predicting studentsâ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Analyzing undergraduate studentsâ performance in various perspectives using d...
Â
Karen Tran - ENGE 4994 Paper
1. Tran 1
Karen Tran
Dr. Jacob R. Grohs
ENGE 4994
8 May 2016
The Effects of Statics Credit on Future Mechanics Courses
Objective
The goal for this course (ENGE 4994) during the Spring 2016 semester was to hard-code
raw data, provided by Virginia Polytechnic Institute and State University, containing a large
subset of students taking statics and mechanics courses for a select portion of time (a few years).
The data obtained was 165,543 rows of de-identified transcripts, with each student identified by
a unique number. The raw data was then converted and coded to a Statistical Package for the
Social Sciences (SPSS) file by Dr. Jacob R. Grohs. The true objective was to run different
statistical analysis tests on the data to observe and quantify how transferring statics credit may
affect the performance of a student in future mechanics courses. However, because the data was
quite large, SPSS crashed multiple times trying to run certain tests. This resulted into
restructuring the data into a much more powerful and stronger program that could handle these
statistical tests: R.
Restructuring the data in another programming language (R) became the initial and
prominent objective of the course. Of course, the end goal remained to investigate and
understand how transferring statics credit affected future mechanics courses such as deforms and
dynamics. Other factors that were to be considered were the amount of times a student took a
course, how many other credits were they taking during the same semester, their GPA, their
2. Tran 2
major, and much more. There was an endless amount of questions to be proposed and answered
using statistical analysis.
Overarching Challenge
The real overarching challenge was learning how to code in R. Throughout the
experience of a typical engineering student at Virginia Polytechnic Institute and State University,
the most programming and coding knowledge through degree courses is standard MATLAB
(Mathematica) and basic Java. Unless the student is a Computer Science or Computer
Engineering major, it poses a slight disadvantage for students of other majors. Restructuring the
data into R became task that constantly needed research, Google, manuals, online guides, and
YouTube tutorials. The goal was to format each row of data to a unique student number
identified as the following:
{student_id, student_admit_type, degree1, degree2, degree3, degree4, class1, class2, etc.1
}.
Ideally, the goal was to be able to simply call out a unique student or specific category
and retrieve all of the information necessary and needed for statistical analysis. The desired
statistical analysis tests such as comparisons, hypothesis testing (p-values), and t-tests were
going to be the stepping stones to understanding performance in future courses after taking
statics (at Virginia Tech or somewhere else).
1
There were 26 different classes.
3. Tran 3
Key Progress
While there were many roadblocks and struggles, there were also many movements of
progression during the semester. I was able to sort and understand the raw data in a short amount
of time (a few weeks) as well as organize and manipulate the data to structure it aesthetically.
The original data frame (mydata), at 165,543 rows, was condensed to unique student identifiers
in a new data frame (myfinaldata) at 23,364 rows. The next few columns of myfinaldata were
built from left to right and contained headers labeled: student_admit_type, degree1, degree2,
degree3, and degree4, respectfully. With the header âstudent_admit_type,â it contained a list of
three (freshman or transfer, first term attended, last term attended). The degree headers contained
lists of three as well (major, GPA, graduating year). Creating lists within lists was necessary to
be able to call out a certain element from a specific header for a distinct student.
R: Programming Language
Although the raw data was not completely restructured enough to run statistical tests, I
learned a great deal about R coding. I was able to code and restructure a good amount of raw
data using built in functions and logic. I found it extremely similar to MATLAB, however the
syntax was very different and building a data frame was much more extensive than solving a
mathematics problem. By learning how to use functions such as naming and assigning variables,
âunique,â âstr,â and âas.list,â I was able to make lists within each column and could call out
pieces (elements) of data for a certain student. This was helpful in a sense that it was possible to
4. Tran 4
isolate certain variables (students, degrees, freshman/transfer) and manipulate them for future
analyses.2
Remaining Challenges
Still, there are many hurdles to overcome. For students who only had one degree, the
placeholders for degree2, degree3, and degree4 were replaced with âNAâ for each element (9
NAâs). This created an immense amount of unnecessary space in the data frame which could
have been easily fixed by creating a list that could have an unlimited amount of elements within
the same column (appending to a list).
For example, taken from myfinaldata, there are two students (Student 1 and Student 17)
who have a different number of degrees. The data currently looks like this:
⢠[Student1], [Freshman,199909,2015012], [MATH,2.18174603,199907,NA,NA,NA,NA,NA,NA,NA,NA,NA]
⢠[Student17], [Freshman,199809,201401], [CE,2.89947644,200301,ME,2.89947644,201201, NA,NA,NA,NA,NA,NA].
Ideally the rows should project and display like this:
⢠[Student1], [Freshman,199909,2015012], [MATH,2.18174603,199907]
⢠[Student17], [Freshman,199809,201401], [CE,2.89947644,200301,ME,2.89947644,201201].
The list should close and end if there is not anymore information applicable to the unique
student. If a student only had one degree, the third header should only contain a single list of
three. If a student only had two degrees, the third header should contain two lists of three (a total
2
The code for myfinaldata and screenshot demonstrating how to call out a unique student is
provided at the end of this document.
5. Tran 5
of six elements in the degree category). And the pattern should continue for students who had
three and four degrees (nine and 12 elements, respectively).
Another major challenge that needs to be tackled is a way to enter the data into R so that
it can read it line by line. There needs to be a method (while or for loop) to only enter
information for each unique student if they have taken a certain class. Because there are 26
classes, it would be redundant to have NA for 25 classes if a student only took one class.
9. Tran 9
Figure 2. mydata - Raw Data Output
Figure 3. myfinaldata - Manipulated Data Output
10. Tran 10
Figure 4. myfinaldata - Calling Out Select Students and Categories
Analysis of Figure 4
⢠x = student_admit_type
⢠x2 = degree1
⢠x3 = degree2
⢠x4 = degree3
⢠x5 = degree4
Interpretation:
⢠x2[[2]] = retrieve all information of degree1 for student number two
⢠x2[[2]][[1]] = retrieve the first element of degree1 for student number two
⢠x2[[3]] = retrieve all information of degree1 for student number three
⢠x2[[3]][[2]] = retrieve the second element of degree1 for student number three
⢠x3[[3]] = retrieve all information of degree2 for student number three
⢠x[[5]] = retrieve all information of the student_admit_type for student number five
⢠x[[5]][[1]] = retrieve the first element of the student_admit_type for student number five