Taking wise career decision is so crucial for anybody for sure. In modern days there are excellent decision support tools like data mining tools for the people to make right decisions. This paper is an attempt to help the prospective students to make wise career decisions using technologies like data mining. In India technical manpower analysis is carried out by an organization named NTMIS (National Technical Manpower Information System), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead center in the IAMR, New Delhi, and 21 nodal centers located at different parts of the country. The Kerala State Nodal Center is located in the Cochin University of Science and Technology. Last 4 years information is obtained from the NODAL Centre of Kerala State (located in CU-SAT, Kochi, India), which stores records of all students passing out from various technical colleges in Kerala State, by sending postal questionnaire. Analysis is done based on Entrance Rank, Branch, Gender (M/F), Sector (rural/urban) and Reservation (OBC/SC/ST/GEN).
A Prediction Model for Taiwan Tourism Industry Stock Indexijcsit
Investors and scholars pay continuous attention to the stock market, as each day, many investors attempt to
use different methods to predict stock price trends. However, as stock price is affected by economy, politics,
domestic and foreign situations, emergency, human factor, and other unknown factors, it is difficult to
establish an accurate prediction model. This study used a back-propagation neural network (BPN) as the
research approach, and input 29 variables, such as international exchange rate, indices of international
stock markets, Taiwan stock market analysis indicators, and overall economic indicators, to predict
Taiwan’s monthly tourism industry stock index. The empirical findings show that the BPN prediction model
has better predictive accuracy, Absolute Relative Error is 0.090058, and correlation coefficient is
0.944263. The model has low error and high correlation, and can serve as reference for investors and
relevant industries
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
Medical informatics growth can be observed now days. Advancement in different medical fields
discovers the various critical diseases and provides the guidelines for their cure. This has been possible
only because of well heeled medical databases as well as automation of data analysis process. Towards
this analysis process lots of learning and intelligence is required, the data mining techniques provides the
basis for that and various data mining techniques are available like Decision tree Induction, Rule Based
Classification or mining, Support vector machine, Stochastic classification, Logistic regression, Naïve
bayes, Artificial Neural Network & Fuzzy Logic, Genetic Algorithms. This paper provides the basic of
data mining with their effective techniques availability in medical sciences & reveals the efforts done on
medical databases using data mining techniques for human disease diagnosis.
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
In Industrial environments, huge amount of data is being generated which in turn collected indatabase anddata warehouses from all involved areas such as planning, process design, materials, assembly, production, quality, process control, scheduling, fault detection,shutdown, customer relation management, and so on. Data Mining has become auseful tool for knowledge acquisition for industrial process of Iron and steel making. Due to the rapid growth in Data Mining, various industries started using data mining technology to search the hidden patterns, which might further be used to the system with the new knowledge which might design new models to enhance the production quality, productivity optimum cost and maintenance etc. The continuous improvement of all steel production process regarding the avoidance of quality deficiencies and the related improvement of production yield is an essential task of steel producer. Therefore, zero defect strategy is popular today and to maintain it several quality assurancetechniques areused. The present report explains the methods of data mining and describes its application in the industrial environment and especially, in the steel industry.
IRJET - Student Pass Percentage Dedection using Ensemble LearninngIRJET Journal
This document discusses using ensemble learning methods to predict student pass rates. It begins with an abstract describing ensemble learning and its applications. It then provides background on strengthening the STEM workforce and using prediction modeling in educational data mining. The methodology section describes using decision trees, logistic regression, nearest neighbors, neural networks, naive Bayes, and support vector machines as base classifiers in an ensemble model to predict student enrollment in STEM courses. The results show the J48 decision tree algorithm correctly classified 84% of instances, outperforming naive Bayes and CART. The conclusion is that ensemble models can better categorize factors affecting student choice to enroll in STEM by combining multiple classification techniques.
This document discusses machine learning algorithms and their applications. It begins with an abstract discussing supervised, unsupervised, and reinforcement learning techniques. It then discusses machine learning in more detail, explaining that machine learning algorithms represent data instances with a set of features and classify instances based on their labels. The main focus is on supervised and unsupervised learning techniques and their performance parameters. It provides an overview of support vector machines, neural networks, and other machine learning algorithms. In summary, the document provides a survey of different machine learning techniques, how they work, and their applications.
A Prediction Model for Taiwan Tourism Industry Stock Indexijcsit
Investors and scholars pay continuous attention to the stock market, as each day, many investors attempt to
use different methods to predict stock price trends. However, as stock price is affected by economy, politics,
domestic and foreign situations, emergency, human factor, and other unknown factors, it is difficult to
establish an accurate prediction model. This study used a back-propagation neural network (BPN) as the
research approach, and input 29 variables, such as international exchange rate, indices of international
stock markets, Taiwan stock market analysis indicators, and overall economic indicators, to predict
Taiwan’s monthly tourism industry stock index. The empirical findings show that the BPN prediction model
has better predictive accuracy, Absolute Relative Error is 0.090058, and correlation coefficient is
0.944263. The model has low error and high correlation, and can serve as reference for investors and
relevant industries
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
Medical informatics growth can be observed now days. Advancement in different medical fields
discovers the various critical diseases and provides the guidelines for their cure. This has been possible
only because of well heeled medical databases as well as automation of data analysis process. Towards
this analysis process lots of learning and intelligence is required, the data mining techniques provides the
basis for that and various data mining techniques are available like Decision tree Induction, Rule Based
Classification or mining, Support vector machine, Stochastic classification, Logistic regression, Naïve
bayes, Artificial Neural Network & Fuzzy Logic, Genetic Algorithms. This paper provides the basic of
data mining with their effective techniques availability in medical sciences & reveals the efforts done on
medical databases using data mining techniques for human disease diagnosis.
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
In Industrial environments, huge amount of data is being generated which in turn collected indatabase anddata warehouses from all involved areas such as planning, process design, materials, assembly, production, quality, process control, scheduling, fault detection,shutdown, customer relation management, and so on. Data Mining has become auseful tool for knowledge acquisition for industrial process of Iron and steel making. Due to the rapid growth in Data Mining, various industries started using data mining technology to search the hidden patterns, which might further be used to the system with the new knowledge which might design new models to enhance the production quality, productivity optimum cost and maintenance etc. The continuous improvement of all steel production process regarding the avoidance of quality deficiencies and the related improvement of production yield is an essential task of steel producer. Therefore, zero defect strategy is popular today and to maintain it several quality assurancetechniques areused. The present report explains the methods of data mining and describes its application in the industrial environment and especially, in the steel industry.
IRJET - Student Pass Percentage Dedection using Ensemble LearninngIRJET Journal
This document discusses using ensemble learning methods to predict student pass rates. It begins with an abstract describing ensemble learning and its applications. It then provides background on strengthening the STEM workforce and using prediction modeling in educational data mining. The methodology section describes using decision trees, logistic regression, nearest neighbors, neural networks, naive Bayes, and support vector machines as base classifiers in an ensemble model to predict student enrollment in STEM courses. The results show the J48 decision tree algorithm correctly classified 84% of instances, outperforming naive Bayes and CART. The conclusion is that ensemble models can better categorize factors affecting student choice to enroll in STEM by combining multiple classification techniques.
This document discusses machine learning algorithms and their applications. It begins with an abstract discussing supervised, unsupervised, and reinforcement learning techniques. It then discusses machine learning in more detail, explaining that machine learning algorithms represent data instances with a set of features and classify instances based on their labels. The main focus is on supervised and unsupervised learning techniques and their performance parameters. It provides an overview of support vector machines, neural networks, and other machine learning algorithms. In summary, the document provides a survey of different machine learning techniques, how they work, and their applications.
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
Data mining techniques play an important role in data analysis. For the construction of a classification model which could predict performance of students, particularly for engineering branches, a decision tree algorithm associated with the data mining techniques have been used in the research. A number of factors may affect the performance of students. Data mining technology which can related to this student grade well and we also used classification algorithms prediction. In this paper, we used educational data mining to predict students final grade based on their performance. We proposed student data classification using ID3 Iterative Dichotomiser 3 Decision Tree Algorithm Khin Khin Lay | San San Nwe "Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26545.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26545/using-id3-decision-tree-algorithm-to-the-student-grade-analysis-and-prediction/khin-khin-lay
PARTICIPATION ANTICIPATING IN ELECTIONS USING DATA MINING METHODSIJCI JOURNAL
Anticipating the political behavior of people will be considerable help for election candidates to assess the
possibility of their success and to be acknowledged about the public motivations to select them. In this
paper, we provide a general schematic of the architecture of participation anticipating system in
presidential election by using KNN, Classification Tree and Naïve Bayes and tools orange based on crisp
which had hopeful output. To test and assess the proposed model, we begin to use the case study by
selecting 100 qualified persons who attend in 11th presidential election of Islamic republic of Iran and
anticipate their participation in Kohkiloye & Boyerahmad. We indicate that KNN can perform anticipation
and classification processes with high accuracy in compared with two other algorithms to anticipate
participation.
An Algorithm Analysis on Data Mining-396Nida Rashid
This document discusses and analyzes six major data mining algorithms: C4.5, k-Means, SVM, Apriori, EM, and PageRank. It provides a brief description of each algorithm, discusses their impact, and reviews current and future research on each one. These six algorithms cover important data mining topics like classification, clustering, statistical learning, association analysis, and link mining.
The main objective of this paper is to develop a basic prototype model which can determine and extract
unknown knowledge (patterns, concepts and relations) related with multiple factors from past database records of
specific students. Data mining is science and engineering study of extracting previously undiscovered patterns
from a huge set of data. Data mining techniques are helpful for decision making as well as for discovering patterns
of data. In this paper students eligibility prediction system using Rule based classification is proposed to predict
the eligibility of students based on their details with high prediction accuracy. In Educational Institutes, a
tremendous amount of data is generated. This paper outlines the idea of predicting a particular student’s placement
eligibility by performing operations on the data stored. In this paper an efficient algorithm with the technique
Fuzzy for prediction is proposed.
This document provides an overview of the book "Fuzzy Multi-Criteria Decision Making: Theory and Applications with Recent Developments". It is edited by Cengiz Kahraman and contains 22 chapters on fuzzy multi-criteria decision making (MCDM) methods and applications. The book is divided into two parts, with the first focusing on fuzzy multiple-attribute decision making (MADM) techniques and applications, and the second on fuzzy multiple-objective decision making (MODM) methods. Some of the key MADM and MODM techniques covered include fuzzy analytic hierarchy process, fuzzy TOPSIS, fuzzy outranking methods, fuzzy multi-objective linear programming, and fuzzy multi-objective integer goal programming.
Analysis of Media Graphs and Significant SenseAM Publications
This paper discusses aspects of interpretation of graphs including those related to the interface between
different contexts of interpretation. Particularly it is discussed the notion of Critical Sense as a skill to analyse data its
interrelations rather than simply accepting the initial impression given by the graph. We suggest that Critical Sense in
graphing can help readers to balance several aspects involved interpretation of media graphs in certain contexts. This
paper also presents initial findings from our analyses of study with student teachers interpreting media graphs. We
are concerned to study Critical Sense in student teachers as a way of helping us, and them, think about teaching and
learning graphing in ways that will support the development of Critical Sense.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Linking Behavioral Patterns to Personal Attributes through Data Re-Miningertekg
Download Link >https://ertekprojects.com/gurdal-ertek-publications/blog/linking-behavioral-patterns-to-personal-attributes-through-data-re-mining/
A fundamental challenge in behavioral informatics is the development of methodologies and systems that can achieve its goals and tasks, including be-havior pattern analysis. This study presents such a methodology, that can be con-verted into a decision support system, by the appropriate integration of existing tools for association mining and graph visualization. The methodology enables the linking of behavioral patterns to personal attributes, through the re-mining of colored association graphs that represent item associations. The methodology is described and mathematically formalized, and is demonstrated in a case study related with retail industry.
Tips on Setting Up Excel Spreadsheets - Nancy BudererNancy Buderer
The document provides tips for setting up an Excel spreadsheet for research projects in order to facilitate exporting the data into statistical analysis software like SAS or SPSS. Some key tips include: having one row per subject, one column per variable, using a unique subject ID as the first column, using standard variable naming conventions, formatting columns by data type (e.g. numeric, date), entering data as numbers rather than text where possible, always including a code sheet to document meaning of codes, and carefully handling missing data. The document recommends consulting with a statistician before finalizing the spreadsheet design.
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. Naïve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
Choosing Optimization Process In The Event Of Fligh Plan Interruption With Th...IJESM JOURNAL
In the world today, superior industry is one that makes balance between its capabilities and demands of customer. This requires organization’s knowledge of customer, and their demands. In this research, a process-oriented process was proposed for choosing optimized process by managers of aviation industry in the event of interruption of flight plan. In first stage, interrupting factors of flight plan were described together with solutions to reduce them. Then, effect of each of these factors on interruption as well as on chosen options were collected from experts using paired comparison questionnaires, by geometrical average of combination and the outcomes were analyzed using network analysis process. The results included effect of each factor on interruption of flight plan together with weight of each option for decision making. In the end, based on the obtained results, proper decisions to be adopted in the event of interruption were proposed. This research shows that technical defect and delayed arrival are among the most important interrupting factors of flight plan, and notice of delay, cancellation, and replacement of route were among the most important decision instances faced by managers in aviation industry in the event of interruption of flight.
An approach for discrimination prevention in data miningeSAT Journals
Abstract In the age of Database technologies a large amount of data is collected and analyzed by using data mining techniques. However, the main issue in data mining is potential privacy invasion and potential discrimination. One of the techniques used in data mining for making decision is classification. On the other hand, if the dataset is biased then the discriminatory decision may occur. Therefore, in this paper we review the recent state of the art approaches for antidiscrimination techniques and also focuses on discrimination discovery and prevention in data mining. On the other hand, we also study a theoretical proposal for enhancing the results of the data quality. Keywords- Antidiscrimination, data mining, direct and indirect discrimination prevention, rule protection, rule generalization, privacy.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
Process Mining (PM) emerged from business process management but has recently been applied to
educational data and has been found to facilitate the understanding of the educational process.
Educational Process Mining (EPM) bridges the gap between process analysis and data analysis, based on
the techniques of model discovery, conformance checking and extension of existing process models. We
present a systematic review of the recent and current status of research in the EPM domain, focusing on
application domains, techniques, tools and models, to highlight the use of EPM in comprehending and
improving educational processes.
Regression, theil’s and mlp forecasting models of stock indexiaemedu
This document compares different forecasting models for daily stock prices: linear regression, Theil's incomplete method, and multilayer perceptron (MLP). Principal component analysis was used to reduce the input variables to a single component. Linear regression and Theil's method had similar error rates that were lower than MLP based on MAE, MAPE, and SMAPE. The linear regression and Theil's method models had R-squared values near 1, indicating close fit to the data. Overall, the linear and Theil's models provided more accurate short-term forecasts of daily stock prices than the MLP based on error and fit metrics.
Regression, theil’s and mlp forecasting models of stock indexIAEME Publication
This document compares different forecasting models for daily stock prices: linear regression, Theil's incomplete method, and multilayer perceptron (MLP). Principal component analysis was used to reduce 4 stock price variables to 1 principal component, which was then used to predict closing prices. Linear regression and Theil's method produced similar results, with MAE around 110 and R-squared over 0.99. MLP had slightly higher error at 118 MAE. Overall, linear regression and Theil's method provided the best forecasts of closing stock prices based on this analysis of models and error metrics.
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...IJITE
Predictive models are able to predict edX student grades with an accuracy error of 0.1 (10%, about one letter grade standard deviation), based on participation data. Student background variables are not useful for predicting grades. By using a combination of segmentation, random forest regression, linear transformation and application beyond the segmented data, it is possible to determine the population of the Auditors student use case, a population larger than those students completing courses with grades.
This document discusses data analysis and various techniques used in data analysis such as data editing, coding, classification, tabulation, and statistical analysis. It describes different types of statistical tests like z-test, t-test, chi-square test, and their uses. It also discusses various types of tables, diagrams, and graphical representations that are used to present statistical data in a meaningful way. Key types of diagrams mentioned include bar charts, pie charts, histograms and scatter plots. Rules for properly constructing tables and graphs are also provided.
IRJET- Student Placement Prediction using Machine LearningIRJET Journal
This document describes a study that uses machine learning algorithms to predict whether students will be placed in jobs after graduating. Specifically, it uses Naive Bayes and K-Nearest Neighbors classifiers to analyze historical student data and predict placements. The algorithms consider parameters like academic results, skills, and previous placement data to make predictions. This system aims to help institutions increase placement percentages by identifying students' strengths and areas for improvement. It is intended to benefit both students in preparing for careers and placement cells in targeting support. Accurately predicting placements could boost a school's reputation by demonstrating career outcomes.
This document describes a student comprehensive evaluation system based on data mining methods. The system constructs a student data warehouse containing information on student performance, course selections, and employment outcomes. It then builds a multi-dimensional OLAP model and uses an ID3 classification algorithm to construct a decision tree that can evaluate students' comprehensive quality based on their moral, academic, and extracurricular activity levels. A prototype of the system was developed with interfaces for searching student records and viewing evaluation results. The system aims to provide useful information to help improve student management and guide employment.
Developing Competitive Strategies in Higher Education through Visual Data MiningGurdal Ertek
Information visualization is the growing field of computer science that aims at visually mining data for knowledge discovery. In this paper, a data mining framework and a novel information visualization scheme is developed and applied
to the domain of higher education. The presented framework consists of three main types of visual data analysis: Discovering general insights, carrying out competitive
benchmarking, and planning for High School Relationship Management (HSRM). In this paper the framework and the square tiles visualization scheme are described and an application at a private university in Turkey with the goal of attracting brightest students is demonstrated.
http://research.sabanciuniv.edu.
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
Data mining techniques play an important role in data analysis. For the construction of a classification model which could predict performance of students, particularly for engineering branches, a decision tree algorithm associated with the data mining techniques have been used in the research. A number of factors may affect the performance of students. Data mining technology which can related to this student grade well and we also used classification algorithms prediction. In this paper, we used educational data mining to predict students final grade based on their performance. We proposed student data classification using ID3 Iterative Dichotomiser 3 Decision Tree Algorithm Khin Khin Lay | San San Nwe "Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26545.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26545/using-id3-decision-tree-algorithm-to-the-student-grade-analysis-and-prediction/khin-khin-lay
PARTICIPATION ANTICIPATING IN ELECTIONS USING DATA MINING METHODSIJCI JOURNAL
Anticipating the political behavior of people will be considerable help for election candidates to assess the
possibility of their success and to be acknowledged about the public motivations to select them. In this
paper, we provide a general schematic of the architecture of participation anticipating system in
presidential election by using KNN, Classification Tree and Naïve Bayes and tools orange based on crisp
which had hopeful output. To test and assess the proposed model, we begin to use the case study by
selecting 100 qualified persons who attend in 11th presidential election of Islamic republic of Iran and
anticipate their participation in Kohkiloye & Boyerahmad. We indicate that KNN can perform anticipation
and classification processes with high accuracy in compared with two other algorithms to anticipate
participation.
An Algorithm Analysis on Data Mining-396Nida Rashid
This document discusses and analyzes six major data mining algorithms: C4.5, k-Means, SVM, Apriori, EM, and PageRank. It provides a brief description of each algorithm, discusses their impact, and reviews current and future research on each one. These six algorithms cover important data mining topics like classification, clustering, statistical learning, association analysis, and link mining.
The main objective of this paper is to develop a basic prototype model which can determine and extract
unknown knowledge (patterns, concepts and relations) related with multiple factors from past database records of
specific students. Data mining is science and engineering study of extracting previously undiscovered patterns
from a huge set of data. Data mining techniques are helpful for decision making as well as for discovering patterns
of data. In this paper students eligibility prediction system using Rule based classification is proposed to predict
the eligibility of students based on their details with high prediction accuracy. In Educational Institutes, a
tremendous amount of data is generated. This paper outlines the idea of predicting a particular student’s placement
eligibility by performing operations on the data stored. In this paper an efficient algorithm with the technique
Fuzzy for prediction is proposed.
This document provides an overview of the book "Fuzzy Multi-Criteria Decision Making: Theory and Applications with Recent Developments". It is edited by Cengiz Kahraman and contains 22 chapters on fuzzy multi-criteria decision making (MCDM) methods and applications. The book is divided into two parts, with the first focusing on fuzzy multiple-attribute decision making (MADM) techniques and applications, and the second on fuzzy multiple-objective decision making (MODM) methods. Some of the key MADM and MODM techniques covered include fuzzy analytic hierarchy process, fuzzy TOPSIS, fuzzy outranking methods, fuzzy multi-objective linear programming, and fuzzy multi-objective integer goal programming.
Analysis of Media Graphs and Significant SenseAM Publications
This paper discusses aspects of interpretation of graphs including those related to the interface between
different contexts of interpretation. Particularly it is discussed the notion of Critical Sense as a skill to analyse data its
interrelations rather than simply accepting the initial impression given by the graph. We suggest that Critical Sense in
graphing can help readers to balance several aspects involved interpretation of media graphs in certain contexts. This
paper also presents initial findings from our analyses of study with student teachers interpreting media graphs. We
are concerned to study Critical Sense in student teachers as a way of helping us, and them, think about teaching and
learning graphing in ways that will support the development of Critical Sense.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Linking Behavioral Patterns to Personal Attributes through Data Re-Miningertekg
Download Link >https://ertekprojects.com/gurdal-ertek-publications/blog/linking-behavioral-patterns-to-personal-attributes-through-data-re-mining/
A fundamental challenge in behavioral informatics is the development of methodologies and systems that can achieve its goals and tasks, including be-havior pattern analysis. This study presents such a methodology, that can be con-verted into a decision support system, by the appropriate integration of existing tools for association mining and graph visualization. The methodology enables the linking of behavioral patterns to personal attributes, through the re-mining of colored association graphs that represent item associations. The methodology is described and mathematically formalized, and is demonstrated in a case study related with retail industry.
Tips on Setting Up Excel Spreadsheets - Nancy BudererNancy Buderer
The document provides tips for setting up an Excel spreadsheet for research projects in order to facilitate exporting the data into statistical analysis software like SAS or SPSS. Some key tips include: having one row per subject, one column per variable, using a unique subject ID as the first column, using standard variable naming conventions, formatting columns by data type (e.g. numeric, date), entering data as numbers rather than text where possible, always including a code sheet to document meaning of codes, and carefully handling missing data. The document recommends consulting with a statistician before finalizing the spreadsheet design.
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. Naïve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
Choosing Optimization Process In The Event Of Fligh Plan Interruption With Th...IJESM JOURNAL
In the world today, superior industry is one that makes balance between its capabilities and demands of customer. This requires organization’s knowledge of customer, and their demands. In this research, a process-oriented process was proposed for choosing optimized process by managers of aviation industry in the event of interruption of flight plan. In first stage, interrupting factors of flight plan were described together with solutions to reduce them. Then, effect of each of these factors on interruption as well as on chosen options were collected from experts using paired comparison questionnaires, by geometrical average of combination and the outcomes were analyzed using network analysis process. The results included effect of each factor on interruption of flight plan together with weight of each option for decision making. In the end, based on the obtained results, proper decisions to be adopted in the event of interruption were proposed. This research shows that technical defect and delayed arrival are among the most important interrupting factors of flight plan, and notice of delay, cancellation, and replacement of route were among the most important decision instances faced by managers in aviation industry in the event of interruption of flight.
An approach for discrimination prevention in data miningeSAT Journals
Abstract In the age of Database technologies a large amount of data is collected and analyzed by using data mining techniques. However, the main issue in data mining is potential privacy invasion and potential discrimination. One of the techniques used in data mining for making decision is classification. On the other hand, if the dataset is biased then the discriminatory decision may occur. Therefore, in this paper we review the recent state of the art approaches for antidiscrimination techniques and also focuses on discrimination discovery and prevention in data mining. On the other hand, we also study a theoretical proposal for enhancing the results of the data quality. Keywords- Antidiscrimination, data mining, direct and indirect discrimination prevention, rule protection, rule generalization, privacy.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
Process Mining (PM) emerged from business process management but has recently been applied to
educational data and has been found to facilitate the understanding of the educational process.
Educational Process Mining (EPM) bridges the gap between process analysis and data analysis, based on
the techniques of model discovery, conformance checking and extension of existing process models. We
present a systematic review of the recent and current status of research in the EPM domain, focusing on
application domains, techniques, tools and models, to highlight the use of EPM in comprehending and
improving educational processes.
Regression, theil’s and mlp forecasting models of stock indexiaemedu
This document compares different forecasting models for daily stock prices: linear regression, Theil's incomplete method, and multilayer perceptron (MLP). Principal component analysis was used to reduce the input variables to a single component. Linear regression and Theil's method had similar error rates that were lower than MLP based on MAE, MAPE, and SMAPE. The linear regression and Theil's method models had R-squared values near 1, indicating close fit to the data. Overall, the linear and Theil's models provided more accurate short-term forecasts of daily stock prices than the MLP based on error and fit metrics.
Regression, theil’s and mlp forecasting models of stock indexIAEME Publication
This document compares different forecasting models for daily stock prices: linear regression, Theil's incomplete method, and multilayer perceptron (MLP). Principal component analysis was used to reduce 4 stock price variables to 1 principal component, which was then used to predict closing prices. Linear regression and Theil's method produced similar results, with MAE around 110 and R-squared over 0.99. MLP had slightly higher error at 118 MAE. Overall, linear regression and Theil's method provided the best forecasts of closing stock prices based on this analysis of models and error metrics.
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...IJITE
Predictive models are able to predict edX student grades with an accuracy error of 0.1 (10%, about one letter grade standard deviation), based on participation data. Student background variables are not useful for predicting grades. By using a combination of segmentation, random forest regression, linear transformation and application beyond the segmented data, it is possible to determine the population of the Auditors student use case, a population larger than those students completing courses with grades.
This document discusses data analysis and various techniques used in data analysis such as data editing, coding, classification, tabulation, and statistical analysis. It describes different types of statistical tests like z-test, t-test, chi-square test, and their uses. It also discusses various types of tables, diagrams, and graphical representations that are used to present statistical data in a meaningful way. Key types of diagrams mentioned include bar charts, pie charts, histograms and scatter plots. Rules for properly constructing tables and graphs are also provided.
IRJET- Student Placement Prediction using Machine LearningIRJET Journal
This document describes a study that uses machine learning algorithms to predict whether students will be placed in jobs after graduating. Specifically, it uses Naive Bayes and K-Nearest Neighbors classifiers to analyze historical student data and predict placements. The algorithms consider parameters like academic results, skills, and previous placement data to make predictions. This system aims to help institutions increase placement percentages by identifying students' strengths and areas for improvement. It is intended to benefit both students in preparing for careers and placement cells in targeting support. Accurately predicting placements could boost a school's reputation by demonstrating career outcomes.
This document describes a student comprehensive evaluation system based on data mining methods. The system constructs a student data warehouse containing information on student performance, course selections, and employment outcomes. It then builds a multi-dimensional OLAP model and uses an ID3 classification algorithm to construct a decision tree that can evaluate students' comprehensive quality based on their moral, academic, and extracurricular activity levels. A prototype of the system was developed with interfaces for searching student records and viewing evaluation results. The system aims to provide useful information to help improve student management and guide employment.
Developing Competitive Strategies in Higher Education through Visual Data MiningGurdal Ertek
Information visualization is the growing field of computer science that aims at visually mining data for knowledge discovery. In this paper, a data mining framework and a novel information visualization scheme is developed and applied
to the domain of higher education. The presented framework consists of three main types of visual data analysis: Discovering general insights, carrying out competitive
benchmarking, and planning for High School Relationship Management (HSRM). In this paper the framework and the square tiles visualization scheme are described and an application at a private university in Turkey with the goal of attracting brightest students is demonstrated.
http://research.sabanciuniv.edu.
Main points of this slide presentation:
1.What is statistics?
2.Application
3.Application of Statistics in Computer Science and Engineering
4.Machine learning’s Relation to statistics
5.Application of Statistics in Data mining
6.Data mining relation with Statistics
7.Outline of Applications
8.Some Outline of Application’s details are given below
Thank you
Analysis Of Data Mining Model For Successful Implementation Of Data Warehouse...Scott Bou
This document discusses a proposed data mining model for analyzing data from tertiary institutions in order to successfully implement a data warehouse. It begins by introducing data mining and knowledge discovery in databases. Then, it describes using object oriented analysis and design methodology to analyze the current system at Ibrahim Badamasi Babangida University in Nigeria and identify problems. This includes examining the admissions process and how exam records are collected. The goals of the proposed model are then stated as finding relationships between admission results and procedures/lecturers, and allowing predictions with little user intervention.
DIFFERENCE OF PROBABILITY AND INFORMATION ENTROPY FOR SKILLS CLASSIFICATION A...ijaia
The probability of an event is in the range of [0, 1]. In a sample space S, the value of probability determines whether an outcome is true or false. The probability of an event Pr(A) that will never occur = 0. The probability of the event Pr(B) that will certainly occur = 1. This makes both events A and B thus a certainty. Furthermore, the sum of probabilities Pr(E1) + Pr(E2) + … + Pr(En) of a finite set of events in a given sample space S = 1. Conversely, the difference of the sum of two probabilities that will certainly occur is 0. Firstly, this paper discusses Bayes’ theorem, then complement of probability and the difference of probability for occurrences of learning-events, before applying these in the prediction of learning objects in student learning. Given the sum total of 1; to make recommendation for student learning, this paper submits that the difference of argMaxPr(S) and probability of student-performance quantifies the weight of learning objects for students. Using a dataset of skill-set, the computational procedure demonstrates: i) the probability of skill-set events that has occurred that would lead to higher level learning; ii) the probability of the events that has not occurred that requires subject-matter relearning; iii) accuracy of decision tree in the prediction of student performance into class labels; and iv) information entropy about skill-set data and its implication on student cognitive performance and recommendation of learning [1].
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
FAMILY OF 2-SIMPLEX COGNITIVE TOOLS AND THEIR APPLICATIONS FOR DECISION-MAKIN...cscpconf
The urgency of application and development of cognitive graphic tools for usage in intelligent systems of data analysis, decision making and its justifications is given. Cognitive graphic tool “2-simplex prism" and examples of its usage are presented. Specificity of program realization of cognitive graphics tools invariant to problem areas is described. Most significant results are given and discussed. Future investigations are connected with usage of new approach to rendering, cross-platform realization, cognitive features improving and expanding of n-simplex
family.
FAMILY OF 2-SIMPLEX COGNITIVE TOOLS AND THEIR APPLICATIONS FOR DECISION-MAKIN...csandit
Urgency of application and development of cognitive graphic tools for usage in intelligent systems of data analysis, decision making and its justifications is given. Cognitive graphic tool
“2-simplex prism" and examples of its usage are presented. Specificity of program realization of cognitive graphics tools invariant to problem areas is described. Most significant results are given and discussed. Future investigations are connected with usage of new approach to rendering, cross-platform realization, cognitive features improving and expanding of n-simplex family
Application of Exponential Gamma Distribution in Modeling Queuing Dataijtsrd
There are many events in daily life where a queue is formed. Queuing theory is the study of waiting lines and it is very crucial in analyzing the procedure of queuing in daily life of human being. Queuing theory applies not only in day to day life but also in sequence of computer programming, networks, medical field, banking sectors etc. Researchers have applied many statistical distributions in analyzing a queuing data. In this study, we apply a new distribution named Exponential Gamma distribution in fitting a data on waiting time of bank customers before service is been rendered. We compared the adequacy and performance of the results with other existing statistical distributions. The result shows that the Exponential Gamma distribution is adequate and also performed better than the existing distributions. Ayeni Taiwo Michael | Ogunwale Olukunle Daniel | Adewusi Oluwasesan Adeoye ""Application of Exponential-Gamma Distribution in Modeling Queuing Data"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30097.pdf
Paper Url : https://www.ijtsrd.com/mathemetics/statistics/30097/application-of-exponential-gamma-distribution-in-modeling-queuing-data/ayeni-taiwo-michael
Fuzzy Association Rule Mining based Model to Predict Students’ Performance IJECEIAES
The major intention of higher education institutions is to supply quality education to its students. One approach to get maximum level of quality in higher education system is by discovering knowledge for prediction regarding the internal assessment and end semester examination. The projected work intends to approach this objective by taking the advantage of fuzzy inference technique to classify student scores data according to the level of their performance. In this paper, student’s performance is evaluated using fuzzy association rule mining that describes Prediction of performance of the students at the end of the semester, on the basis of previous database like Attendance, Midsem Marks, Previous semester marks and Previous Academic Records were collected from the student’s previous database, to identify those students which needed individual attention to decrease fail ration and taking suitable action for the next semester examination.
Educational Data Mining is a growing trend in case of higher education. The quality of the Educational
Institute may be enhanced through discovering hidden knowledge from the student databases/ data
warehouses. Present paper is designed to carry out a comparative study with the TDC (Three Year Degree)
Course students of different colleges affiliated to Dibrugarh University. The study is conducted with major
subject wise, gender wise and category/caste wise. The experimental results may be visualized with
Scatterplot3D, Bubble Plot, Fit Y by X, Run Chart, Control Chart etc. of the SAS JMP Software.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
This document compares three classification methods - artificial neural networks, decision trees, and logistic regression - for predicting malignancy in thyroid tumor patients using a clinical dataset. It describes each method and applies them to a dataset of 259 thyroid tumor patients. The artificial neural network achieved 98% accuracy on the training set and 92% on the validation set. The decision tree method used 150 cases to build a model and achieved 86% accuracy. Logistic regression analysis resulted in 88% accuracy. The artificial neural network was able to accurately predict malignancy and identified important attributes like multiple nodules and family cancer history.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
This paper discusses the several research methodologies that can
be used in Computer Science (CS) and Information Systems
(IS). The research methods vary according to the science
domain and project field. However a little of research
methodologies can be reasonable for Computer Science and
Information System.
An overlapping conscious relief-based feature subset selection methodIJECEIAES
Feature selection is considered as a fundamental prepossessing step in various data mining and machine learning based works. The quality of features is essential to achieve good classification performance and to have better data analysis experience. Among several feature selection methods, distance-based methods are gaining popularity because of their eligibility in capturing feature interdependency and relevancy with the endpoints. However, most of the distance-based methods only rank the features and ignore the class overlapping issues. Features with class overlapping data work as an obstacle during classification. Therefore, the objective of this research work is to propose a method named overlapping conscious MultiSURF (OMsurf) to handle data overlapping and select a subset of informative features discarding the noisy ones. Experimental results over 20 benchmark dataset demonstrates the superiority of OMsurf over six existing state-of-the-art methods.
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...Editor IJCATR
In this paper we focus on some techniques for solving data mining tasks such as: Statistics, Decision Trees and Neural
Networks. The new approach has succeed in defining some new criteria for the evaluation process, and it has obtained valuable results
based on what the technique is, the environment of using each techniques, the advantages and disadvantages of each technique, the
consequences of choosing any of these techniques to extract hidden predictive information from large databases, and the methods of
implementation of each technique. Finally, the paper has presented some valuable recommendations in this field.
Analyzing undergraduate students’ performance in various perspectives using d...Alexander Decker
This document discusses analyzing undergraduate student performance using data mining techniques. It analyzes student performance in two perspectives: 1) supervised vs unsupervised assessment instruments and 2) performance in mathematics, English, and programming courses. The study uses association rule mining with the Apriori algorithm to discover patterns in student performance data from both analyses. The goal is to identify useful insights that can help improve assessment methods, curriculum structure, and course prerequisites.
Similar to Applying statistical dependency analysis techniques In a Data mining Domain (20)
The Use of Java Swing’s Components to Develop a WidgetWaqas Tariq
Widget is a kind of application provides a single service such as a map, news feed, simple clock, battery-life indicators, etc. This kind of interactive software object has been developed to facilitate user interface (UI) design. A user interface (UI) function may be implemented using different widgets with the same function. In this article, we present the widget as a platform that is generally used in various applications, such as in desktop, web browser, and mobile phone. We also describe a visual menu of Java Swing’s components that will be used to establish widget. It will assume that we have successfully compiled and run a program that uses Swing components.
3D Human Hand Posture Reconstruction Using a Single 2D ImageWaqas Tariq
Passive sensing of the 3D geometric posture of the human hand has been studied extensively over the past decade. However, these research efforts have been hampered by the computational complexity caused by inverse kinematics and 3D reconstruction. In this paper, our objective focuses on 3D hand posture estimation based on a single 2D image with aim of robotic applications. We introduce the human hand model with 27 degrees of freedom (DOFs) and analyze some of its constraints to reduce the DOFs without any significant degradation of performance. A novel algorithm to estimate the 3D hand posture from eight 2D projected feature points is proposed. Experimental results using real images confirm that our algorithm gives good estimates of the 3D hand pose. Keywords: 3D hand posture estimation; Model-based approach; Gesture recognition; human- computer interface; machine vision.
Camera as Mouse and Keyboard for Handicap Person with Troubleshooting Ability...Waqas Tariq
Camera mouse has been widely used for handicap person to interact with computer. The utmost important of the use of camera mouse is must be able to replace all roles of typical mouse and keyboard. It must be able to provide all mouse click events and keyboard functions (include all shortcut keys) when it is used by handicap person. Also, the use of camera mouse must allow users troubleshooting by themselves. Moreover, it must be able to eliminate neck fatigue effect when it is used during long period. In this paper, we propose camera mouse system with timer as left click event and blinking as right click event. Also, we modify original screen keyboard layout by add two additional buttons (button “drag/ drop” is used to do drag and drop of mouse events and another button is used to call task manager (for troubleshooting)) and change behavior of CTRL, ALT, SHIFT, and CAPS LOCK keys in order to provide shortcut keys of keyboard. Also, we develop recovery method which allows users go from camera and then come back again in order to eliminate neck fatigue effect. The experiments which involve several users have been done in our laboratory. The results show that the use of our camera mouse able to allow users do typing, left and right click events, drag and drop events, and troubleshooting without hand. By implement this system, handicap person can use computer more comfortable and reduce the dryness of eyes.
A Proposed Web Accessibility Framework for the Arab DisabledWaqas Tariq
The Web is providing unprecedented access to information and interaction for people with disabilities. This paper presents a Web accessibility framework which offers the ease of the Web accessing for the disabled Arab users and facilitates their lifelong learning as well. The proposed framework system provides the disabled Arab user with an easy means of access using their mother language so they don’t have to overcome the barrier of learning the target-spoken language. This framework is based on analyzing the web page meta-language, extracting its content and reformulating it in a suitable format for the disabled users. The basic objective of this framework is supporting the equal rights of the Arab disabled people for their access to the education and training with non disabled people. Key Words : Arabic Moon code, Arabic Sign Language, Deaf, Deaf-blind, E-learning Interactivity, Moon code, Web accessibility , Web framework , Web System, WWW.
Real Time Blinking Detection Based on Gabor FilterWaqas Tariq
The document proposes a new method for real-time blinking detection based on Gabor filters. It begins by reviewing existing methods and their limitations in dealing with noise, variations in eye shape, and blinking speed. The proposed method uses a Gabor filter to extract the top and bottom arcs of the eye from an image. It then measures the distance between these arcs and compares it to a threshold: a distance below the threshold indicates a closed eye, while a distance above indicates an open eye. The document claims this Gabor filter-based approach is robust to noise, variations in eye shape and blinking speed. It presents experimental results showing the method can accurately detect blinking across different users.
Computer Input with Human Eyes-Only Using Two Purkinje Images Which Works in ...Waqas Tariq
A method for computer input with human eyes-only using two Purkinje images which works in a real time basis without calibration is proposed. Experimental results shows that cornea curvature can be estimated by using two light sources derived Purkinje images so that no calibration for reducing person-to-person difference of cornea curvature. It is found that the proposed system allows usersf movements of 30 degrees in roll direction and 15 degrees in pitch direction utilizing detected face attitude which is derived from the face plane consisting three feature points on the face, two eyes and nose or mouth. Also it is found that the proposed system does work in a real time basis.
Toward a More Robust Usability concept with Perceived Enjoyment in the contex...Waqas Tariq
Mobile multimedia service is relatively new but has quickly dominated people¡¯s lives, especially among young people. To explain this popularity, this study applies and modifies the Technology Acceptance Model (TAM) to propose a research model and conduct an empirical study. The goal of study is to examine the role of Perceived Enjoyment (PE) and what determinants can contribute to PE in the context of using mobile multimedia service. The result indicates that PE is influencing on Perceived Usefulness (PU) and Perceived Ease of Use (PEOU) and directly Behavior Intention (BI). Aesthetics and flow are key determinants to explain Perceived Enjoyment (PE) in mobile multimedia usage.
Collaborative Learning of Organisational KnolwedgeWaqas Tariq
This paper presents recent research into methods used in Australian Indigenous Knowledge sharing and looks at how these can support the creation of suitable collaborative envi- ronments for timely organisational learning. The protocols and practices as used today and in the past by Indigenous communities are presented and discussed in relation to their relevance to a personalised system of knowledge sharing in modern organisational cultures. This research focuses on user models, knowledge acquisition and integration of data for constructivist learning in a networked repository of or- ganisational knowledge. The data collected in the repository is searched to provide collections of up-to-date and relevant material for training in a work environment. The aim is to improve knowledge collection and sharing in a team envi- ronment. This knowledge can then be collated into a story or workflow that represents the present knowledge in the organisation.
Our research aims to propose a global approach for specification, design and verification of context awareness Human Computer Interface (HCI). This is a Model Based Design approach (MBD). This methodology describes the ubiquitous environment by ontologies. OWL is the standard used for this purpose. The specification and modeling of Human-Computer Interaction are based on Petri nets (PN). This raises the question of representation of Petri nets with XML. We use for this purpose, the standard of modeling PNML. In this paper, we propose an extension of this standard for specification, generation and verification of HCI. This extension is a methodological approach for the construction of PNML with Petri nets. The design principle uses the concept of composition of elementary structures of Petri nets as PNML Modular. The objective is to obtain a valid interface through verification of properties of elementary Petri nets represented with PNML.
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 BoardWaqas Tariq
The main aim of this paper is to build a system that is capable of detecting and recognizing the hand gesture in an image captured by using a camera. The system is built based on Altera’s FPGA DE2 board, which contains a Nios II soft core processor. Image processing techniques and a simple but effective algorithm are implemented to achieve this purpose. Image processing techniques are used to smooth the image in order to ease the subsequent processes in translating the hand sign signal. The algorithm is built for translating the numerical hand sign signal and the result are displayed on the seven segment display. Altera’s Quartus II, SOPC Builder and Nios II EDS software are used to construct the system. By using SOPC Builder, the related components on the DE2 board can be interconnected easily and orderly compared to traditional method that requires lengthy source code and time consuming. Quartus II is used to compile and download the design to the DE2 board. Then, under Nios II EDS, C programming language is used to code the hand sign translation algorithm. Being able to recognize the hand sign signal from images can helps human in controlling a robot and other applications which require only a simple set of instructions provided a CMOS sensor is included in the system.
An overview on Advanced Research Works on Brain-Computer InterfaceWaqas Tariq
A brain–computer interface (BCI) is a proficient result in the research field of human- computer synergy, where direct articulation between brain and an external device occurs resulting in augmenting, assisting and repairing human cognitive. Advanced works like generating brain-computer interface switch technologies for intermittent (or asynchronous) control in natural environments or developing brain-computer interface by Fuzzy logic Systems or by implementing wavelet theory to drive its efficacies are still going on and some useful results has also been found out. The requirements to develop this brain machine interface is also growing day by day i.e. like neuropsychological rehabilitation, emotion control, etc. An overview on the control theory and some advanced works on the field of brain machine interface are shown in this paper.
Exploring the Relationship Between Mobile Phone and Senior Citizens: A Malays...Waqas Tariq
There is growing ageing phenomena with the rise of ageing population throughout the world. According to the World Health Organization (2002), the growing ageing population indicates 694 million, or 223% is expected for people aged 60 and over, since 1970 and 2025.The growth is especially significant in some advanced countries such as North America, Japan, Italy, Germany, United Kingdom and so forth. This growing older adult population has significantly impact the social-culture, lifestyle, healthcare system, economy, infrastructure and government policy of a nation. However, there are limited research studies on the perception and usage of a mobile phone and its service for senior citizens in a developing nation like Malaysia. This paper explores the relationship between mobile phones and senior citizens in Malaysia from the perspective of a developing country. We conducted an exploratory study using contextual interviews with 5 senior citizens of how they perceive their mobile phones. This paper reveals 4 interesting themes from this preliminary study, in addition to the findings of the desirable mobile requirements for local senior citizens with respect of health, safety and communication purposes. The findings of this study bring interesting insight to local telecommunication industries as a whole, and will also serve as groundwork for more in-depth study in the future.
Principles of Good Screen Design in WebsitesWaqas Tariq
Visual techniques for proper arrangement of the elements on the user screen have helped the designers to make the screen look good and attractive. Several visual techniques emphasize the arrangement and ordering of the screen elements based on particular criteria for best appearance of the screen. This paper investigates few significant visual techniques in various web user interfaces and showcases the results for better understanding and their presence.
This document discusses the progress of virtual teams in Albania. It provides context on virtual teams and how they differ from traditional teams in their reliance on technology for communication across distances. The document then examines the use of virtual teams in Albania, noting the growing infrastructure and technology usage that enables virtual collaboration. It highlights some virtual team examples in Albanian government and academic projects.
Cognitive Approach Towards the Maintenance of Web-Sites Through Quality Evalu...Waqas Tariq
It is a well established fact that the Web-Applications require frequent maintenance because of cutting– edge business competitions. The authors have worked on quality evaluation of web-site of Indian ecommerce domain. As a result of that work they have made a quality-wise ranking of these sites. According to their work and also the survey done by various other groups Futurebazaar web-site is considered to be one of the best Indian e-shopping sites. In this research paper the authors are assessing the maintenance of the same site by incorporating the problems incurred during this evaluation. This exercise gives a real world maintainability problem of web-sites. This work will give a clear picture of all the quality metrics which are directly or indirectly related with the maintainability of the web-site.
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...Waqas Tariq
A paradox has been observed whereby web site usability is proven to be an essential element in a web site, yet at the same time there exist an abundance of web pages with poor usability. This discrepancy is the result of limitations that are currently preventing web developers in the commercial sector from producing usable web sites. In this paper we propose a framework whose objective is to alleviate this problem by automating certain aspects of the usability evaluation process. Mainstreaming comes as a result of automation, therefore enabling a non-expert in the field of usability to conduct the evaluation. This results in reducing the costs associated with such evaluation. Additionally, the framework allows the flexibility of adding, modifying or deleting guidelines without altering the code that references them since the guidelines and the code are two separate components. A comparison of the evaluation results carried out using the framework against published evaluations of web sites carried out by web site usability professionals reveals that the framework is able to automatically identify the majority of usability violations. Due to the consistency with which it evaluates, it identified additional guideline-related violations that were not identified by the human evaluators.
Robot Arm Utilized Having Meal Support System Based on Computer Input by Huma...Waqas Tariq
A robot arm utilized having meal support system based on computer input by human eyes only is proposed. The proposed system is developed for handicap/disabled persons as well as elderly persons and tested with able persons with several shapes and size of eyes under a variety of illumination conditions. The test results with normal persons show the proposed system does work well for selection of the desired foods and for retrieve the foods as appropriate as usersf requirements. It is found that the proposed system is 21% much faster than the manually controlled robotics.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Applying statistical dependency analysis techniques In a Data mining Domain
1. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 14
Applying Statistical Dependency Analysis Techniques
In a Data Mining Domain
Sudheep Elayidom sudheepelayidom@hotmail.com
Computer Science and Engineering Division
School of Engineering
Cochin University of Science and Technology
Kochi, 682022, India
Sumam Mary Idikkula umam@cusat.ac.in
Department of Computer Science
Cochin University of Science and Technology
Kochi, 682022, India
Joseph Alexander josephalexander@cusat.ac.in
Project Officer, Nodel Centre
Cochin University of Science and Technology
Kochi, 682022, India
Abstract
Taking wise career decision is so crucial for anybody for sure. In modern days
there are excellent decision support tools like data mining tools for the people to
make right decisions. This paper is an attempt to help the prospective students to
make wise career decisions using technologies like data mining. In India techni-
cal manpower analysis is carried out by an organization named NTMIS (National
Technical Manpower Information System), established in 1983-84 by India's Min-
istry of Education & Culture. The NTMIS comprises of a lead center in the IAMR,
New Delhi, and 21 nodal centers located at different parts of the country. The
Kerala State Nodal Center is located in the Cochin University of Science and
Technology. Last 4 years information is obtained from the NODAL Centre of
Kerala State (located in CUSAT, Kochi, India), which stores records of all stu-
dents passing out from various technical colleges in Kerala State, by sending
postal questionnaire. Analysis is done based on Entrance Rank, Branch, Gender
(M/F), Sector (rural/urban) and Reservation (OBC/SC/ST/GEN).
Key Words: Confusion matrix, Data Mining, Decision tree, Neural Network, Chi- square
1. INTRODUCTION
The popularity of subjects in science and engineering in colleges around the world is up to a large
extent dependent on the viability of securing a job in the corresponding field of study. Appropria-
tion of funding of students from various sections of society is a major decision making hurdle par-
ticularly in the developing countries. An educational institution contains a large number of student
records. This data is a wealth of information, but is too large for any one person to understand in
its entirety. Finding patterns and characteristics in this data is an essential task in education re-
search, and is part of a larger task of developing programs that increase student learning. This
type of data is presented to decision makers in the state government in the form of tables or
2. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 15
charts, and without any substantive analysis, most analysis of the data is done according to indi-
vidual intuition, or is interpreted based on prior research. this paper analyzed the trends of
placements in the colleges, keeping in account of details like rank, sex, category and location
using decision tree models like naive bayes classifier, neural networks etc. chi square test is often
shorthand for pearson chi square test.it is a statistical hypothesis test. spss (statistical package
for social science) is most widely used programed for statistical analysis. the data preprocessing
for this problem has been described in detail in articles [1] & [9], which are papers published by
the same authors. The problem of placement chance prediction may be implemented using deci-
sion trees. [4] Surveys a work on decision tree construction, attempting to identify the important
issues involved, directions which the work has taken and the current state of the art. Studies have
been conducted in similar area such as understanding student data as in [2]. there they apply and
evaluate a decision tree algorithm to university records, producing graphs that are useful both for
predicting graduation, and finding factors that lead to graduation. It’s always been an active de-
bate over which engineering branch is in demand .so this work gives a scientific solution to an-
swer these. Article [3] provides an overview of this emerging field clarifying how data mining and
knowledge discovery in databases are related both to each other and to related fields, such as
machine learning, statistics, and databases.
[5] Suggests methods to classify objects or predict outcomes by selecting from a large number of
variables, the most important ones in determining the outcome variable. the method in [6] is used
for performance evaluation of the system using confusion matrix which contains information about
actual and predicted classifications done by a classification system. [7] & [8] suggest further im-
provements in obtaining the various measures of evaluation of the classification model. [10] Sug-
gests a data mining approach in students results data while [11] and [12] represents association
rules techniques in the data mining domain.
2. DATA
The data used in this project is the data supplied by National Technical Manpower Information
System (NTMIS) via Nodal center. Data is compiled by them from feedback by graduates, post
graduates, diploma holders in engineering from various engineering colleges and polytechnics
located within the state during the year 2000-2003. This survey of technical manpower informa-
tion was originally done by the Board of Apprenticeship Training (BOAT) for various individual
establishments. A prediction model is prepared from data during the year 2000-2002 and tested
with data from the year 2003.
3. PROBLEM STATEMENT
To prepare data mining models and predict placement chances for students keeping account of
input details like his/her Rank, Gender, Branch, Category, Reservation and Sector. Statistical de-
pendency analysis techniques are to be used for input attributes to determine the attributes on
which placement chances are dependent on. Also performances of the models are to be com-
pared.
4. CONCEPTS USED
4.1. Data Mining
Data mining is the principle of searching through large amounts of data and picking out interest-
ing patterns. It is usually used by business intelligence organizations, and financial analysts, but it
is increasingly used in the sciences to extract information from the enormous data sets generated
by modern experimental and observational methods. It has been described as "the nontrivial ex-
traction of implicit, previously unknown, and potentially useful information from data" and "the sci-
ence of extracting useful information from large data sets or databases".
3. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 16
A typical example for a data mining scenario may be “In a mining analysis if it is observed that
people who buy butter tend to buy bread too then for better business results the seller can place
butter and bread together.”
4.2 Cross Tabulation
ChanceCount
1 2 3 4
Total
Sector 1 324 194 68 168 754
2 416 114 69 192 791
Total 740 308 137 360 1545
TABLE 1: A Sample Cross tabulation
The row and column variables are independent, or unrelated. If that assumption was true one
would expect that the values in the cells of the table are balanced. To determine what is meant by
balanced, consider a simple example with two variables, sector and chance for example. It is to
be decided on whether there is a relation between sectors (male/female) and chance (yes/no), or
whether the two variables are independent of each other. An experiment is to be conducted by
constructing a crosstab table as in table 2.
chance-
1
chance-
2
Totals
sector-1 22 18 40
sector-2 26 34 60
Totals 48 52 100
TABLE 2: Cross tabulation table 2
Now the sector-1 and sector-2 are divided pretty much evenly among chance-1and chance-2
suggesting perhaps that the two variables are independent of each other. Suppose it is decided
again to conduct the experiment and select some random sample, but, if only totals for each vari-
able separately are considered, for example:
Number of sector-1 is 30; number of sector-2 is 70
Number of chance-1 is 40; number of chance-2 is 60
Total number of data values (subjects) is 100
With this information we could construct a crosstabs tables as in table 3.
4. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 17
TABLE 3: Cross tabulation table 3
Now it is to be understood what kind of distribution in the various cells one would expect if the two
variables were independent. It is known that 30 of 100 (30%) are sector-1 there are 40 of chance-
1and 60 of chance-2- if chance had nothing to do with sector (the variables were independent)
that we would expect that 30% of the 40 of chance-1are of sector-1, while 30% of the 60 chance-
2 are of sector-1. Same concept may be applied to sector 2 also.
Under the assumption of independence one can expect the table to look as in table 4:
chance-1 chance-2 Totals
sector-1 30/100 * 40 = 12 30/100 * 60 = 18 30
sector-2 70/100 * 40 = 28 70/100 * 60 = 42 70
Totals 40 60 100
TABLE 4: Cross tabulation table 4
In other words, if a crosstabs table with 2 rows and 2 columns has a row totals r1 and r2, respec-
tively, and column totals c1 and c2, and then if the two variables were indeed independent one
would expect the complete table to look as follows:
X Y Totals
A r1 * c1 / total r1 * c2 / total r1
B r2 * c1 / total r2 * c2 / total r2
Totals c1 c2 total
TABLE 5: Cross tabulation table 5
The procedure to test whether two variables are independent is as follows:
Create a crosstabs table as usual, called the actual or observed values (not percentages)
Create a second crosstabs table where you leave the row and column totals, but erase the num-
ber in the individual cells.
If the two variables were independent, the entry in i-th row and j-th column is expected to be,
(TotalOfRow i) * (totalOfColumn j) / (overallTotal)
Fill in all cells in this way and call the resulting crosstabs table the expected values table
chance-
1
chance-
2
Totals
sector-1 30
sector-2 70
Totals 40 60 100
5. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 18
The important point to be noted is that, if the actual values are very different from the expected
values, the conclusion is that the variables can not be independent after all (because if they were
independent the actual values should look similar to the expected values).
The only question left to answer is "how different is very different", in other words when it can be
decided that actual and expected values are sufficiently different to conclude that the variables
are not independent? The answer to this question is the Chi-Square Test.
4.3 The Chi-Square Test
The Chi-Square test computes the sum of the differences between actual and expected values
(or to be precise the sum of the squares of the differences) and assign a probability value to that
number depending on the size of the difference and the number of rows and columns of the
crosstabs table.
If the probability value p computed by the Chi-Square test is very small, differences between ac-
tual and expected values are judged to be significant (large) and therefore you conclude that the
assumption of independence is invalid and there must be a relation between the variables. The
error you commit by rejecting the independence assumption is given by this value of p.
If the probability value p computed by the Chi-Square test is large, differences between actual
and expected values are not significant (small) and you do not reject the assumption of inde-
pendence, i.e. it is likely that the variables are indeed independent.
Value Df Asymp. Sig
(2-sided)
Pearson chi-square
Likelyhood ratio
Linear by linear associa-
tion
N of valid cases
32.957
a
33.209
.909
1545
3
3
1
.000
.000
.340
TABLE 6: Sample Chi Square output from SPSS
4.4 SPSS
SPSS is among the most widely used software for statistical analysis in social science problems.
It is used by market researchers, health researchers, survey companies, government, education
researchers, marketing organizations and others. The original SPSS manual (Nie, Bent & Hull,
1970) has been described as one of "sociology's most influential books".In addition to statistical
analysis, data management (case selection, file reshaping, creating derived data) and data
documentation (a metadata dictionary is stored in the datafile) are features of the base software.
SPSS can read and write data from ASCII text files (including hierarchical files), other statistics
packages, spreadsheets and databases. SPSS can read and write to external relational database
tables via ODBC and SQL. In this project data was fed as MS Excel spread Sheets of data.
4.5 Results of the Chi-Square Test
An "observation" consists of the values of two outcomes and the null hypothesis is that the occurrence of
these outcomes is statistically independent. Each observation is allocated to one cell of a two-dimensional
array of cells (called a table) according to the values of the two outcomes. If there are r rows and
c columns in the table, the "theoretical frequency" for a cell, given the hypothesis of independ-
ence, is
6. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 19
And fitting the model of "independence" reduces the number of degrees of freedom by p = r + c −
1. The value of the test-statistic is
The number of degrees of freedom is equal to the number of cells rc, minus the reduction in de-
grees of freedom, p, which reduces to (r − 1) (c − 1). For the test of independence, a chi-square
probability of less than or equal to 0.05 (or the chi-square statistic being at or larger than the 0.05
critical point) is commonly interpreted by applied workers as justification for rejecting the null hy-
pothesis that the row variable is unrelated (that is, only randomly related) to the column variable.
In our analysis, pairs of attributes namely reservation, sex, sector and rank verus the placement
chance were having pearson's chi square value less than 0.05. So It could be concluded that this
pair of attributes is dependent. In other words placement chances are showing dependency on
reservation, sector, sex and rank.
5. DATA PRE PROCESSING
The individual database files(DBF format) for the years 2000-2003 were obtained and one con-
taining records of students from the year 2000-2002 and another for year 2003, were created.
List of attributes extracted:
RANK: Rank secured by candidate in the engineering entrance exam.
CATEGORY: Social background.
Range: {General, Scheduled Cast, Scheduled Tribe, Other Backward Class}
SEX : Range {Male, Female}
SECTOR: Range {Urban, Rural}
BRANCH: Range {A-J}
PLACEMENT: Indicator of whether the candidate is placed.
The data mining models were built using data from years 2000-2002 and tested using data of
year 2003.
6. IMPLEMENTATION LOGIC
6.1. Data Preparation
The implementation begins by extracting the attributes RANK, SEX, CATEGORY, SECTOR, and
BRANCH from the master database for the year 2000-2003 at the NODAL Centre. The database
was not intended to be used for any purpose other maintaining records of students. Hence there
7. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 20
were several inconsistencies in the database structure. By effective pruning the database was
cleaned.
A new table is created which reduces individual ranks to classes and makes the number of cases
limited. All queries will belong to a fix set of known cases like:
RANK (A) SECTOR (U) SEX (M)
CATEGORY (GEN) BRANCH (A)
With this knowledge we calculate the chance for case by calculating probability of placement for a
test case:
Probability (P) = Number Placed/ Total Number
The chance is obtained by the following rules:
If P>=95 Chance='E'
If P>=85 && P<95 Chance='G'
If P>=50 && P<75 Chance='A';
Else Chance='P';
Where E, G, A, P stand for Excellent, Good, Average & Poor respectively.
The important point is that, for each of the different combination of input attribute values, we
compute the placement chances as shown in the above conditions and prepare another table,
which is the input for building the data mining models.
7 DATA MINING PROCESSES APPLIED TO THE PROBLEM
7.1 Using Naive Bayes Classifier
In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular
feature of a class is unrelated to the presence (or absence) of any other feature.
The classifier is based on Bayes theorem, which is stated as:
P (A|B) = P (B|A)*P (A)/P (B)
Each term in Bayes' theorem has a conventional name:
*P (A) is the prior probability or marginal probability of A. It is "prior" in the sense that it does not
take into account any information about B.
*P (A|B) is the conditional probability of A, given B. It is also called the posterior probability be-
cause it is derived from or depends upon the specified value of B.
*P (B|A) is the conditional probability of B given A.
8. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 21
*P (B) is the prior or marginal probability of B, and acts as a normalizing constant.
Bayes' theorem in this form gives a mathematical representation of how the conditional proba-
bability of event A given B is related to the converse conditional probabablity of B given A.
Confusion Matrix
P R E D I C T E D
E P A G
E 496 10 13 0
P 60 97 12 1
A 30 18 248 0
AC-
TUAL
G 34 19 22 3
TABLE 7: Confusion Matrix (student data)
For training, we have used records 2000-2002 and for testing we used the records of year 2003.
We compared the predictions of the model for typical inputs from the training set and that with
records in test set, whose actual data are already available for test comparisons.
The results of the test are modelled as a confusion matrix as shown in the above diagram, as its
this matrix that is usually used to descrbe test results in data mining type of research works.
The confusion matrix obtained for the test data was as follows:
ACCURACY = 844/1063 = 0.7939
To obtain more accuracy measures, we club the field Excellent and Good as positive and Aver-
age and Poor as negative. In this case we got an accuracy of 83.0%. The modified Confusion
matrix obtained is as follows:
TABLE 8: Modified Confusion Matrix (student data)
TP = 0.90, FP = 0.22, TN = 0.78, FN = 0.09.
We have used WEKA, the data mining package for testing using Naive Bayes classifier.
Predicted
Negative Positive
Negative 365 101Actual
Positive 57 540
9. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 22
7.2 Decision Tree
A decision tree is a popular classification method that results in a tree like structure where each
node denotes a test on an attribute value and each branch represents an outcome of the test.
The tree leaves represent the classes. Decision tree is a model that is both predictive and de-
scriptive. In data mining and machine learning, a decision tree is a predictive model. More de-
scriptive names for such tree models are classification tree (discrete outcome) or regression tree
(continuous outcome). The machine learning technique for inducing a decision tree from data is
called decision tree learning. For simulation/evaluation we use the data of year 2003 obtained
from NTMIS. The knowledge discovered is expressed in the form of confusion matrix.
Confusion Matrix
P R E D I C T E D
P A G E
P 30 1 3 86
A 7 404 4 11
G 2 1 4 7
AC-
TUA
L
E 74 6 7 416
TABLE 9: Confusion Matrix (student data)
Since the negative cases here are when the prediction was Poor /average and the corresponding
observed values were Excellent/good and vice versa. Therefore the accuracy was given by AC =
854/1063 = 0.803. To obtain more accuracy measures, we club the field Excellent and Good as
positive and Average and Poor as negative. Then the observed accuracy was 82.4%.
TP = 0.84, FP = 0.19, TN = 0.81, FN = 0.16. We implemented and tested the decision tree con-
cept in a web site using php, mysql platform by using data structure called adjascency list to im-
plement a decision tree.
7.3 Neural Network
Neural Network has the ability to realize pattern recognition and derive meaning from complicated
or imprecise data that are too complex to be noticed by either humans or other computer tech-
niques. For simulation/evaluation we use the data of year 2003 obtained from NTMIS. The
knowledge discovered is expressed in the form of confusion matrix
10. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 23
Confusion Matrix
P R E D I C T E D
P A G E
P 31 4 1 91
A 5 410 2 9
G 1 1 6 12
AC-
TUA
L
E 72 13 4 401
TABLE 10: Confusion Matrix (student data)
Since the negative cases here are when the prediction was Poor /average and the corresponding
observed values were Excellent/good and vice versa.
Therefore the accuracy is given by
AC = 848/1063 = 0.797
To obtain more accuracy measures, we club the field Excellent and Good as positive and Aver-
age and Poor as negative. Then the observed accuracy was 82.1 %.
TP = 0.83, FP = 0.17, TN = 0.81, FN = 0.17.
We used MATLAB to implement and test the neural network concept.
8. CONCLUSION
Choosing the right career is so important for any one's success. For that, we may have to do lot
of history data analysis, experience based assessments etc. Nowadays technologies like data
mining is there which uses concepts like naïve Bayes prediction to make logical decisions. This
paper demonstrates how Chi Square based test can be used to evaluate attributes dependen-
cies. Hence this work is an attempt to demonstrate how technology can be used to take wise de-
cisions for a prospective career. The methodology has been verified for its correctness and may
be extended to cover any type of careers other than engineering branches. This methodology can
very efficeiently be implemented by the governments to help the students make career decisions.
It was observed that the performances of all the three models were comparable in the domain of
placement chance prediction as a part of the original research work.
9. ACKNOWLEDGEMENT
I would like to acknowledge the technical contributions of Sunny(sunny@hotmail.co.in), Vikash
kumar(vikashhotice2006@gmail.com),Vinesh.B(vineshbalan@gmail.com),Amit.N(amitnanda@ho
tmail.com), Vikash Agarwal (vikash.vicky007@gmail.com), Division of Computer Engineering,
Cochin University Of Science and Technology, India.
11. Sudheep Elayidom.M, Sumam Mary Idikkula & Joseph Alexander
International Journal Of Data Engineering (IJDE), Volume (1): Issue (2) 24
10. REFERENCES
[1] SudheepElayidom.M, Sumam Mary Idikkula, Joseph Alexander. “Applying datamining using statistical
techniques for career selection”. IJRTE, 1(1):446-449, 2009
[2] Elizabeth Murray. “Using Decision Trees to Understand Student Data”. In Proceedings of the 22nd Inter-
national Conference on Machine Learning, Bonn, Germany, 2005
[3] Fayyad, R. Uthurusamy. “From Data mining to knowledge discovery”, Advances in data mining and
knowledge discovery Cambridge, MA: MIT Press., pp. 1-34, (1996)
[4] Sreerama K. Murthy, Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey,
Data Mining and Knowledge Discovery, pp. 345-389, 1998
[5] L. Breiman, J. Friedman, R. Olshen, and C. Stone. “Classification and Regression Trees”, Wadsworth
Inc., Chapter 3, (1984.)
[6] Kohavi R. and F. Provost, Editorial for the Special Issue on application of machine learning and the
knowledge of discovery process, Machine Learning, 30: 271-274, 1998
[7] M. Kubat, S. Matwin. “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection”. In
Proceedings of the 14th International Conference on Machine Learning, ICML, Nashville, Tennessee, USA,
1997
[8] Lewis D. D. & Gale W. A. “A sequential algorithm for training text classifiers”. In proceedings of SIGIR,
Dublin, Ireland, 1994
[9] SudheepElayidom.M, Sumam Mary Idikkula, Joseph Alexander. “Applying Data mining techniques for
placement chance prediction”. In Proceedings of international conference on advances in computing, con-
trol and telecommunication technologies, India, 2009
[10] Oyelade, Oladipupo, Olufunke. “Knowledge Discovery from students result repository: Association
Rules mining approach”. CSC- IJCSS, 4(2):199-207, 2010
[11] Anandavalli, Ghose, Gauthaman. “Mining spatial gene expression data using association rules”. CSC -
IJCSS, 3(5): 351-357, 2009
[12] Anyanwn, Shiva. “Comparitive analysis of serial decision tree classification algorithms”. CSC- IJCSS,
3(3):230-240, 2009