The data mining its main process is to collect, extract and store the valuable information and now-a-days it’s
done by many enterprises actively. In advanced analytics, Predictive analytics is the one of the branch which is
mainly used to make predictions about future events which are unknown. Predictive analytics which uses
various techniques from machine learning, statistics, data mining, modeling, and artificial intelligence for
analyzing the current data and to make predictions about future. The two main objectives of predictive
analytics are Regression and Classification. It is composed of various analytical and statistical techniques used
for developing models which predicts the future occurrence, probabilities or events. Predictive analytics deals
with both continuous changes and discontinuous changes. It provides a predictive score for each individual
(healthcare patient, product SKU, customer, component, machine, or other organizational unit, etc.) to
determine, or influence the organizational processes which pertain across huge numbers of individuals, like in
fraud detection, manufacturing, credit risk assessment, marketing, and government operations including law
enforcement.
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET Journal
This document describes a probability-based method for imputing missing data. It begins with an abstract that outlines the goal of developing an application to identify and replace missing values in a dataset using a probability approach. It then provides background on missing data issues and different imputation techniques. The proposed method uses a probability approach to calculate possible values for missing data based on attributes of known values, stores this information separately, and then imputes values based on probability calculations. It claims this map-reduce approach reduces processing time for large datasets compared to existing methods. The method and imputed dataset will be analyzed using clustering algorithms to examine changes from the original missing data.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
A Comprehensive review of Conversational Agent and its prediction algorithmvivatechijri
There is an exponential increase in the use of conversational bots. Conversational bots can be
described as a platform that can chat with people using artificial intelligence. The recent advancement has
made A.I capable of learning from data and produce an output. This learning of data can be performed by using
various machine learning algorithm. Machine learning techniques involves construction of algorithms that can
learn for data and can predict the outcome. This paper reviews the efficiency of different machine learning
algorithm that are used in conversational bot.
Comparative Analysis: Effective Information Retrieval Using Different Learnin...RSIS International
Information Retrieval is the activity of searching meaningful information from a collection of information resources such as Documents, relational databases and the World Wide Web. Information retrieval system mainly consists of two phases, storing indexed documents and retrieval of relevant result. Retrieving information effectively from huge data storage, it requires Machine Learning for computer systems. Machine learning has objective to instruct computers to use data or past experience to solve a given problem. Machine learning has number of applications, including classifier to be trained on email messages to learn in order to distinguish between spam and non-spam messages, systems that analyze past sales data to predict customer buying behavior, fraud detection etc. Machine learning can be applied as association analysis through supervised learning, unsupervised learning and Reinforcement Learning. The goal of these three learning is to provide an effective way of information retrieval from data warehouse to avoid problems such as ambiguity. This study will compare the effectiveness and impuissance of these learning approaches.
Software Effort Estimation using Neuro Fuzzy Inference System: Past and Presentrahulmonikasharma
Most important reason for project failure is poor effort estimation. Software development effort estimation is needed for assigning appropriate team members for development, allocating resources for software development, binding etc. Inaccurate software estimation may lead to delay in project, over-budget or cancellation of the project. But the effort estimation models are not very efficient. In this paper, we are analyzing the new approach for estimation i.e. Neuro Fuzzy Inference System (NFIS). It is a mixture model that consolidates the components of artificial neural network with fuzzy logic for giving a better estimation.
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...ijtsrd
Network has brought convenience to the earth by permitting versatile transformation of information, however it conjointly exposes a high range of vulnerabilities. A Network Intrusion Detection System helps network directors and system to view network security violation in their organizations. Characteristic unknown and new attacks are one of the leading challenges in Intrusion Detection System researches. Deep learning that a subfield of machine learning cares with algorithms that are supported the structure and performance of brain known as artificial neural networks. The improvement in such learning algorithms would increase the probability of IDS and the detection rate of unknown attacks. Throughout, we have a tendency to suggest a deep learning approach to implement increased IDS and associate degree economical. Priya N | Ishita Popli "Comparative Study on Machine Learning Algorithms for Network Intrusion Detection System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd38175.pdf Paper URL : https://www.ijtsrd.com/computer-science/computer-network/38175/comparative-study-on-machine-learning-algorithms-for-network-intrusion-detection-system/priya-n
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Software Bug Detection Algorithm using Data mining TechniquesAM Publications
The main aim of software development is to develop high quality software and high quality software is
developed using enormous amount of software engineering data. The software engineering data can be used to gain
empirically based understanding of software development. The meaning full information can be extracted using
various data mining techniques. As Data Mining for Secure Software Engineering improves software productivity and
quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks.
However mining software engineering data poses several challenges, requiring various algorithms to effectively mine
sequences, graphs and text from such data. Software engineering data includes code bases, execution traces,
historical code changes, mailing lists and bug data bases. They contains a wealth of information about a projectsstatus,
progress and evolution. Using well established data mining techniques, practitioners and researchers can
explore the potential of this valuable data in order to better manage their projects and do produce higher-quality
software systems that are delivered on time and within budget
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET Journal
This document describes a probability-based method for imputing missing data. It begins with an abstract that outlines the goal of developing an application to identify and replace missing values in a dataset using a probability approach. It then provides background on missing data issues and different imputation techniques. The proposed method uses a probability approach to calculate possible values for missing data based on attributes of known values, stores this information separately, and then imputes values based on probability calculations. It claims this map-reduce approach reduces processing time for large datasets compared to existing methods. The method and imputed dataset will be analyzed using clustering algorithms to examine changes from the original missing data.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
A Comprehensive review of Conversational Agent and its prediction algorithmvivatechijri
There is an exponential increase in the use of conversational bots. Conversational bots can be
described as a platform that can chat with people using artificial intelligence. The recent advancement has
made A.I capable of learning from data and produce an output. This learning of data can be performed by using
various machine learning algorithm. Machine learning techniques involves construction of algorithms that can
learn for data and can predict the outcome. This paper reviews the efficiency of different machine learning
algorithm that are used in conversational bot.
Comparative Analysis: Effective Information Retrieval Using Different Learnin...RSIS International
Information Retrieval is the activity of searching meaningful information from a collection of information resources such as Documents, relational databases and the World Wide Web. Information retrieval system mainly consists of two phases, storing indexed documents and retrieval of relevant result. Retrieving information effectively from huge data storage, it requires Machine Learning for computer systems. Machine learning has objective to instruct computers to use data or past experience to solve a given problem. Machine learning has number of applications, including classifier to be trained on email messages to learn in order to distinguish between spam and non-spam messages, systems that analyze past sales data to predict customer buying behavior, fraud detection etc. Machine learning can be applied as association analysis through supervised learning, unsupervised learning and Reinforcement Learning. The goal of these three learning is to provide an effective way of information retrieval from data warehouse to avoid problems such as ambiguity. This study will compare the effectiveness and impuissance of these learning approaches.
Software Effort Estimation using Neuro Fuzzy Inference System: Past and Presentrahulmonikasharma
Most important reason for project failure is poor effort estimation. Software development effort estimation is needed for assigning appropriate team members for development, allocating resources for software development, binding etc. Inaccurate software estimation may lead to delay in project, over-budget or cancellation of the project. But the effort estimation models are not very efficient. In this paper, we are analyzing the new approach for estimation i.e. Neuro Fuzzy Inference System (NFIS). It is a mixture model that consolidates the components of artificial neural network with fuzzy logic for giving a better estimation.
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...ijtsrd
Network has brought convenience to the earth by permitting versatile transformation of information, however it conjointly exposes a high range of vulnerabilities. A Network Intrusion Detection System helps network directors and system to view network security violation in their organizations. Characteristic unknown and new attacks are one of the leading challenges in Intrusion Detection System researches. Deep learning that a subfield of machine learning cares with algorithms that are supported the structure and performance of brain known as artificial neural networks. The improvement in such learning algorithms would increase the probability of IDS and the detection rate of unknown attacks. Throughout, we have a tendency to suggest a deep learning approach to implement increased IDS and associate degree economical. Priya N | Ishita Popli "Comparative Study on Machine Learning Algorithms for Network Intrusion Detection System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd38175.pdf Paper URL : https://www.ijtsrd.com/computer-science/computer-network/38175/comparative-study-on-machine-learning-algorithms-for-network-intrusion-detection-system/priya-n
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Software Bug Detection Algorithm using Data mining TechniquesAM Publications
The main aim of software development is to develop high quality software and high quality software is
developed using enormous amount of software engineering data. The software engineering data can be used to gain
empirically based understanding of software development. The meaning full information can be extracted using
various data mining techniques. As Data Mining for Secure Software Engineering improves software productivity and
quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks.
However mining software engineering data poses several challenges, requiring various algorithms to effectively mine
sequences, graphs and text from such data. Software engineering data includes code bases, execution traces,
historical code changes, mailing lists and bug data bases. They contains a wealth of information about a projectsstatus,
progress and evolution. Using well established data mining techniques, practitioners and researchers can
explore the potential of this valuable data in order to better manage their projects and do produce higher-quality
software systems that are delivered on time and within budget
For the agriculture sector, detecting and identifying plant diseases at an early stage is extremely important and
still very challenging. Machine learning is an application of AI that helps us achieve this purpose effectively. It
uses a group of algorithms to analyze and interpret data, learn from it, and using it, smart decisions can be
made. For accomplishing this project, a dataset that contains a set of healthy & diseased plant leaf images are
used then using image processing we extract the features of the image. Then we model this dataset with
different machine learning algorithms like Random Forest, Support Vector Machine, Naïve Bayes etc. The aim is
to hold out a comparative study to spot which of those algorithm can predict diseases with the at most
accuracy. We compare factors like precision, accuracy, error rates as well as prediction time of different
machine learning algorithms. After all these comparison, valuable conclusions can be made for this project.
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET Journal
This document proposes a modification to the least squares regression method to reduce errors in data prediction. It divides the original data set into three parts, uses the first part to make predictions with least squares regression and fits those predictions to the second part of the data to minimize errors. It then validates the model on the third part of data and compares errors to the original least squares method. The proposed method shows reduced errors in prediction based on mean absolute error, mean relative error and root mean square error metrics in most test ranges of the validation data.
This document summarizes a research paper that proposes a new inventory prediction method for supply chain management called BP-GA chaos prediction algorithm. The method uses a backpropagation neural network combined with a genetic algorithm to forecast inventory levels based on chaotic time series analysis. It aims to overcome limitations of traditional chaos prediction approaches. The paper reviews other inventory forecasting research and chaotic prediction methods. It then describes the new hybrid BP-GA method in detail, which establishes a chaotic neural network model optimized through a genetic algorithm. An experiment applying this method to inventory prediction is said to achieve good results.
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET Journal
This document summarizes a research paper that predicts road accidents using machine learning algorithms. It discusses how large datasets have enabled data mining techniques to discover useful information. The paper aims to determine the most suitable machine learning classification technique for road accident prediction. It uses logistic regression, an algorithm that predicts a binary outcome (yes/no). The researchers clean the data, divide it into training and testing sets, and use logistic regression in Jupyter notebooks with the Python programming language. It provides percentage predictions of accident likelihood to users through a website interface. The results show logistic regression can accurately predict accidents for numerical data but has limitations for non-numerical text data.
IRJET- GDPS - General Disease Prediction SystemIRJET Journal
The document describes a General Disease Prediction System (GDPS) that uses machine learning and data mining techniques to predict diseases based on patient symptoms.
The GDPS first collects patient data, preprocesses it, and extracts relevant features. It then implements the ID3 decision tree algorithm to generate a predictive model and classify diseases. As an admin, one can train the model using sample data. As a user, one can enter symptoms and the trained model will predict the likely disease and recommend precautions.
The GDPS was tested on a dataset of 120 patients and achieved 86.67% accuracy in disease prediction. The system currently covers common diseases but future work involves expanding it to predict more serious or fatal diseases like various cancers
A Clustering Method for Weak Signals to Support Anticipative IntelligenceCSCJournals
This document proposes a clustering method to analyze weak signals, which are short texts that may indicate future trends when analyzed together. The method involves preprocessing weak signals by removing stop words, stemming words, and identifying synonyms. It then clusters the weak signals using the K-medoids algorithm based on the number of similar words between signals, including identical words, stemmed words, and synonyms. The method was tested on a database of weak signals related to bioenergy. The clustering is intended to group similar weak signals to help form hypotheses about potential future changes or opportunities.
IRJET- Weather Prediction for Tourism Application using ARIMAIRJET Journal
This document discusses using an ARIMA model to predict weather patterns for tourism applications. It begins with an introduction to weather forecasting and its importance for the tourism industry. It then reviews related work on weather prediction using machine learning methods. The proposed method involves collecting weather data, preprocessing it, converting it to a stationary time series, analyzing it using an ARIMA model, and concluding that ARIMA can accurately predict weather patterns to help tourists plan trips based on the forecast.
Due to diagnosis problem in detecting lung Cancer, it becomes the most dangerous cancer seen in human being. Because of early diagnosis, the survival rate among people is increased. The prediction of lung cancer is the most challenging cancer problem, due to its structure of cells in human body. In which most of tissues or cells are overlapping on one another. Now-a-days, the use of images processing techniques is increased in growing medical field for its disease diagnosis, where the time factor plays important role. Detecting cancer within a time, increases the survival rate of patients. Many radiologists still use MRI only for assessment of superior sulcus tumors and in cases where invasion of spinal cord canal is suspected. MRI can detect and stage lung cancer and this method would be excellent of lung malignancies and other diseases.
Unsupervised Distance Based Detection of Outliers by using Anti-hubsIRJET Journal
This document summarizes research on using anti-hubs for unsupervised outlier detection in high-dimensional data. It discusses how existing distance-based outlier detection methods struggle with high-dimensional data as distances become less meaningful. Anti-hubs, which are points that are infrequently in the k-nearest neighbor lists of other points, have been used for outlier detection. However, calculating anti-hubs is computationally expensive for high-dimensional data. The document proposes applying feature selection before calculating anti-hubs to reduce dimensionality and computational cost, thereby extending anti-hub based outlier detection to high-dimensional data more efficiently.
Comparison of Data Mining Techniques used in Anomaly Based IDS IRJET Journal
This document discusses anomaly-based intrusion detection systems and compares various data mining techniques used in these systems. It begins by defining intrusion detection systems and the two main categories of misuse detection and anomaly detection. Anomaly detection involves learning normal patterns from data and detecting deviations from these patterns as potential anomalies or intrusions.
The document then examines several data mining techniques used for anomaly detection, including statistical-based approaches like chi-square statistics, and clustering algorithms like k-means, k-medoids, and EM clustering. It notes that these techniques can be applied to intrusion detection to analyze data and detect anomalies representing potential malicious activity. The methodology of anomaly detection is also summarized as involving parameterization of data,
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Bang Xiang Yong
Presented at MET4FOF Workshop, JULY 2020
I talk about our recent work of combining Bayesian Deep learning with Explainable Artificial Intelligence (XAI) methods. In particular, we look at Bayesian Autoencoders.
A comprehensive study on disease risk predictions in machine learning IJECEIAES
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. A Comprehensive study on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavors have been shifted.
Data mining and machine learning have become a vital part of crime detection and prevention. In this
research, we use WEKA, an open source data mining software, to conduct a comparative study between the
violent crime patterns from the Communities and Crime Unnormalized Dataset provided by the University
of California-Irvine repository and actual crime statistical data for the state of Mississippi that has been
provided by neighborhoodscout.com. We implemented the Linear Regression, Additive Regression, and
Decision Stump algorithms using the same finite set of features, on the Communities and Crime Dataset.
Overall, the linear regression algorithm performed the best among the three selected algorithms. The scope
of this project is to prove how effective and accurate the machine learning algorithms used in data mining
analysis can be at predicting violent crime patterns.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...acijjournal
This document discusses adaptive real-time data mining techniques for wireless body area networks used in healthcare applications. It presents an innovative framework called Wireless Mobile Real-time Health care Monitoring (WMRHM) that applies data mining to physiological signals acquired through wireless sensors to predict a patient's health risk. Key challenges addressed include the continuous and changing nature of real-time data streams, which require efficient concept-adapting algorithms to handle concept drift. The paper reviews state-of-the-art approaches and introduces five algorithms for tasks like ensemble classification, concept drift detection and adaptation that are suitable for mining real-time physiological signals to support healthcare predictions and decisions.
This document summarizes research on intrusion detection systems using data mining techniques. It first describes the architecture of a data mining-based IDS, including sensors to collect data, detectors to evaluate the data using models, a data warehouse to store data and models, and a model generator to develop and distribute new models. It then discusses supervised and unsupervised learning approaches for intrusion detection. The document concludes by summarizing several papers on intrusion detection using techniques like neural networks, decision trees, clustering, and ensemble methods.
In the present paper, applicability and
capability of A.I techniques for effort estimation prediction has
been investigated. It is seen that neuro fuzzy models are very
robust, characterized by fast computation, capable of handling
the distorted data. Due to the presence of data non-linearity, it is
an efficient quantitative tool to predict effort estimation. The one
hidden layer network has been developed named as OHLANFIS
using MATLAB simulation environment.
Here the initial parameters of the OHLANFIS are
identified using the subtractive clustering method. Parameters of
the Gaussian membership function are optimally determined
using the hybrid learning algorithm. From the analysis it is seen
that the Effort Estimation prediction model developed using
OHLANFIS technique has been able to perform well over normal
ANFIS Model.
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNINGIRJET Journal
This document describes a study on analyzing crime data and predicting crimes using machine learning techniques. The study uses an Indian crime dataset to analyze past crimes and identify patterns. Regression, k-means clustering, and decision tree algorithms are implemented to predict the type of future crimes based on conditions. The algorithms can identify crime-prone areas and anticipate crimes. The proposed system aims to conduct criminal analysis, identify trends, disseminate knowledge to support crime prevention measures, and recognize recurring crime patterns to prevent future incidents.
SCCAI- A Student Career Counselling Artificial Intelligencevivatechijri
As education is growing day by day, the competition has prompted a need for the student to
understand more about the educational field. Many times the counselor isn’t available all the time and
sometimes due to the lack of proper knowledge about some educational field. Due to this, it creates an issue of
misconception of that field. This creates a problem for the student to decide a proper educational trajectory and
guidance is not always useful. The proposed paper will overcome all these problem using machine learning
algorithm. Various algorithms are being considered and amongst them the best suitable for our project are used
here. There are 3 major problems that come across our path and they are solved using Random forest, Linear
regression and Searching algorithm using Google API. At first Searching algorithm solves the problem of
location by segregating the college’s location vice, then Random Forest provides the list of colleges by using
stream and range of percentage and finally Linear Regression predicts the current cutoff using previous years’
data. Rather than this, the proposed system also provides information regarding all fields of education helping
students to understand and know about their field of interest better. The following idea is a total fresh idea with
no existing projects of similar kind. This project will help students guide them throughout.
A Biometric Fusion Based on Face and Fingerprint Recognition using ANNrahulmonikasharma
This document presents a biometric fusion system based on face and fingerprint recognition using artificial neural networks. The system first applies pre-processing to input images. It then extracts features from faces using extended local binary patterns and from fingerprints using minutia extraction. A genetic algorithm is used to optimize the extracted features. An artificial neural network is trained on the optimized features to classify images. The system fuses the face and fingerprint recognition results. Performance is evaluated based on false acceptance rate, false rejection rate and accuracy, with the proposed system achieving over 94% accuracy.
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...cscpconf
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns, sentiments and attitudes toward entities, products, services and their attributes. With the rapid development in the field of Internet, potential customer’s provides a satisfactory level of product/service reviews. The high volume of customer reviews were developed for product/review through taxonomy-aware processing but, it was difficult to identify the best reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is developed to predict the pattern for service provider and to improve customer satisfaction based on the review comments. Associative Regression based Decision Rule Mining performs twosteps for improving the customer satisfactory level. Initially, the Machine Learning Bayes Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After that, Regressive factor of the opinion words and Class labels were checked for Association between the words by using various probabilistic rules. Based on the probabilistic rules, the opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of service preferred by the customers with their review comments. The Associative Regressive Decision Rule helps the service provider to take decision on improving the customer satisfactory level. The experimental results reveal that the Associative Regression Decision Rule Mining (ARDRM) technique improved the performance in terms of true positive rate, Associative Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of similar pattern.
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...csandit
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns,
sentiments and attitudes toward entities, products, services and their attributes. With the rapid
development in the field of Internet, potential customer’s provides a satisfactory level of
product/service reviews. The high volume of customer reviews were developed for
product/review through taxonomy-aware processing but, it was difficult to identify the best
reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is
developed to predict the pattern for service provider and to improve customer satisfaction based
on the review comments. Associative Regression based Decision Rule Mining performs twosteps
for improving the customer satisfactory level. Initially, the Machine Learning Bayes
Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After
that, Regressive factor of the opinion words and Class labels were checked for Association
between the words by using various probabilistic rules. Based on the probabilistic rules, the
opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of
service preferred by the customers with their review comments. The Associative Regressive
Decision Rule helps the service provider to take decision on improving the customer satisfactory
level. The experimental results reveal that the Associative Regression Decision Rule Mining
(ARDRM) technique improved the performance in terms of true positive rate, Associative
Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of
similar pattern.
For the agriculture sector, detecting and identifying plant diseases at an early stage is extremely important and
still very challenging. Machine learning is an application of AI that helps us achieve this purpose effectively. It
uses a group of algorithms to analyze and interpret data, learn from it, and using it, smart decisions can be
made. For accomplishing this project, a dataset that contains a set of healthy & diseased plant leaf images are
used then using image processing we extract the features of the image. Then we model this dataset with
different machine learning algorithms like Random Forest, Support Vector Machine, Naïve Bayes etc. The aim is
to hold out a comparative study to spot which of those algorithm can predict diseases with the at most
accuracy. We compare factors like precision, accuracy, error rates as well as prediction time of different
machine learning algorithms. After all these comparison, valuable conclusions can be made for this project.
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET Journal
This document proposes a modification to the least squares regression method to reduce errors in data prediction. It divides the original data set into three parts, uses the first part to make predictions with least squares regression and fits those predictions to the second part of the data to minimize errors. It then validates the model on the third part of data and compares errors to the original least squares method. The proposed method shows reduced errors in prediction based on mean absolute error, mean relative error and root mean square error metrics in most test ranges of the validation data.
This document summarizes a research paper that proposes a new inventory prediction method for supply chain management called BP-GA chaos prediction algorithm. The method uses a backpropagation neural network combined with a genetic algorithm to forecast inventory levels based on chaotic time series analysis. It aims to overcome limitations of traditional chaos prediction approaches. The paper reviews other inventory forecasting research and chaotic prediction methods. It then describes the new hybrid BP-GA method in detail, which establishes a chaotic neural network model optimized through a genetic algorithm. An experiment applying this method to inventory prediction is said to achieve good results.
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET Journal
This document summarizes a research paper that predicts road accidents using machine learning algorithms. It discusses how large datasets have enabled data mining techniques to discover useful information. The paper aims to determine the most suitable machine learning classification technique for road accident prediction. It uses logistic regression, an algorithm that predicts a binary outcome (yes/no). The researchers clean the data, divide it into training and testing sets, and use logistic regression in Jupyter notebooks with the Python programming language. It provides percentage predictions of accident likelihood to users through a website interface. The results show logistic regression can accurately predict accidents for numerical data but has limitations for non-numerical text data.
IRJET- GDPS - General Disease Prediction SystemIRJET Journal
The document describes a General Disease Prediction System (GDPS) that uses machine learning and data mining techniques to predict diseases based on patient symptoms.
The GDPS first collects patient data, preprocesses it, and extracts relevant features. It then implements the ID3 decision tree algorithm to generate a predictive model and classify diseases. As an admin, one can train the model using sample data. As a user, one can enter symptoms and the trained model will predict the likely disease and recommend precautions.
The GDPS was tested on a dataset of 120 patients and achieved 86.67% accuracy in disease prediction. The system currently covers common diseases but future work involves expanding it to predict more serious or fatal diseases like various cancers
A Clustering Method for Weak Signals to Support Anticipative IntelligenceCSCJournals
This document proposes a clustering method to analyze weak signals, which are short texts that may indicate future trends when analyzed together. The method involves preprocessing weak signals by removing stop words, stemming words, and identifying synonyms. It then clusters the weak signals using the K-medoids algorithm based on the number of similar words between signals, including identical words, stemmed words, and synonyms. The method was tested on a database of weak signals related to bioenergy. The clustering is intended to group similar weak signals to help form hypotheses about potential future changes or opportunities.
IRJET- Weather Prediction for Tourism Application using ARIMAIRJET Journal
This document discusses using an ARIMA model to predict weather patterns for tourism applications. It begins with an introduction to weather forecasting and its importance for the tourism industry. It then reviews related work on weather prediction using machine learning methods. The proposed method involves collecting weather data, preprocessing it, converting it to a stationary time series, analyzing it using an ARIMA model, and concluding that ARIMA can accurately predict weather patterns to help tourists plan trips based on the forecast.
Due to diagnosis problem in detecting lung Cancer, it becomes the most dangerous cancer seen in human being. Because of early diagnosis, the survival rate among people is increased. The prediction of lung cancer is the most challenging cancer problem, due to its structure of cells in human body. In which most of tissues or cells are overlapping on one another. Now-a-days, the use of images processing techniques is increased in growing medical field for its disease diagnosis, where the time factor plays important role. Detecting cancer within a time, increases the survival rate of patients. Many radiologists still use MRI only for assessment of superior sulcus tumors and in cases where invasion of spinal cord canal is suspected. MRI can detect and stage lung cancer and this method would be excellent of lung malignancies and other diseases.
Unsupervised Distance Based Detection of Outliers by using Anti-hubsIRJET Journal
This document summarizes research on using anti-hubs for unsupervised outlier detection in high-dimensional data. It discusses how existing distance-based outlier detection methods struggle with high-dimensional data as distances become less meaningful. Anti-hubs, which are points that are infrequently in the k-nearest neighbor lists of other points, have been used for outlier detection. However, calculating anti-hubs is computationally expensive for high-dimensional data. The document proposes applying feature selection before calculating anti-hubs to reduce dimensionality and computational cost, thereby extending anti-hub based outlier detection to high-dimensional data more efficiently.
Comparison of Data Mining Techniques used in Anomaly Based IDS IRJET Journal
This document discusses anomaly-based intrusion detection systems and compares various data mining techniques used in these systems. It begins by defining intrusion detection systems and the two main categories of misuse detection and anomaly detection. Anomaly detection involves learning normal patterns from data and detecting deviations from these patterns as potential anomalies or intrusions.
The document then examines several data mining techniques used for anomaly detection, including statistical-based approaches like chi-square statistics, and clustering algorithms like k-means, k-medoids, and EM clustering. It notes that these techniques can be applied to intrusion detection to analyze data and detect anomalies representing potential malicious activity. The methodology of anomaly detection is also summarized as involving parameterization of data,
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Bang Xiang Yong
Presented at MET4FOF Workshop, JULY 2020
I talk about our recent work of combining Bayesian Deep learning with Explainable Artificial Intelligence (XAI) methods. In particular, we look at Bayesian Autoencoders.
A comprehensive study on disease risk predictions in machine learning IJECEIAES
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. A Comprehensive study on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavors have been shifted.
Data mining and machine learning have become a vital part of crime detection and prevention. In this
research, we use WEKA, an open source data mining software, to conduct a comparative study between the
violent crime patterns from the Communities and Crime Unnormalized Dataset provided by the University
of California-Irvine repository and actual crime statistical data for the state of Mississippi that has been
provided by neighborhoodscout.com. We implemented the Linear Regression, Additive Regression, and
Decision Stump algorithms using the same finite set of features, on the Communities and Crime Dataset.
Overall, the linear regression algorithm performed the best among the three selected algorithms. The scope
of this project is to prove how effective and accurate the machine learning algorithms used in data mining
analysis can be at predicting violent crime patterns.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
Adaptive Real Time Data Mining Methodology for Wireless Body Area Network Bas...acijjournal
This document discusses adaptive real-time data mining techniques for wireless body area networks used in healthcare applications. It presents an innovative framework called Wireless Mobile Real-time Health care Monitoring (WMRHM) that applies data mining to physiological signals acquired through wireless sensors to predict a patient's health risk. Key challenges addressed include the continuous and changing nature of real-time data streams, which require efficient concept-adapting algorithms to handle concept drift. The paper reviews state-of-the-art approaches and introduces five algorithms for tasks like ensemble classification, concept drift detection and adaptation that are suitable for mining real-time physiological signals to support healthcare predictions and decisions.
This document summarizes research on intrusion detection systems using data mining techniques. It first describes the architecture of a data mining-based IDS, including sensors to collect data, detectors to evaluate the data using models, a data warehouse to store data and models, and a model generator to develop and distribute new models. It then discusses supervised and unsupervised learning approaches for intrusion detection. The document concludes by summarizing several papers on intrusion detection using techniques like neural networks, decision trees, clustering, and ensemble methods.
In the present paper, applicability and
capability of A.I techniques for effort estimation prediction has
been investigated. It is seen that neuro fuzzy models are very
robust, characterized by fast computation, capable of handling
the distorted data. Due to the presence of data non-linearity, it is
an efficient quantitative tool to predict effort estimation. The one
hidden layer network has been developed named as OHLANFIS
using MATLAB simulation environment.
Here the initial parameters of the OHLANFIS are
identified using the subtractive clustering method. Parameters of
the Gaussian membership function are optimally determined
using the hybrid learning algorithm. From the analysis it is seen
that the Effort Estimation prediction model developed using
OHLANFIS technique has been able to perform well over normal
ANFIS Model.
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNINGIRJET Journal
This document describes a study on analyzing crime data and predicting crimes using machine learning techniques. The study uses an Indian crime dataset to analyze past crimes and identify patterns. Regression, k-means clustering, and decision tree algorithms are implemented to predict the type of future crimes based on conditions. The algorithms can identify crime-prone areas and anticipate crimes. The proposed system aims to conduct criminal analysis, identify trends, disseminate knowledge to support crime prevention measures, and recognize recurring crime patterns to prevent future incidents.
SCCAI- A Student Career Counselling Artificial Intelligencevivatechijri
As education is growing day by day, the competition has prompted a need for the student to
understand more about the educational field. Many times the counselor isn’t available all the time and
sometimes due to the lack of proper knowledge about some educational field. Due to this, it creates an issue of
misconception of that field. This creates a problem for the student to decide a proper educational trajectory and
guidance is not always useful. The proposed paper will overcome all these problem using machine learning
algorithm. Various algorithms are being considered and amongst them the best suitable for our project are used
here. There are 3 major problems that come across our path and they are solved using Random forest, Linear
regression and Searching algorithm using Google API. At first Searching algorithm solves the problem of
location by segregating the college’s location vice, then Random Forest provides the list of colleges by using
stream and range of percentage and finally Linear Regression predicts the current cutoff using previous years’
data. Rather than this, the proposed system also provides information regarding all fields of education helping
students to understand and know about their field of interest better. The following idea is a total fresh idea with
no existing projects of similar kind. This project will help students guide them throughout.
A Biometric Fusion Based on Face and Fingerprint Recognition using ANNrahulmonikasharma
This document presents a biometric fusion system based on face and fingerprint recognition using artificial neural networks. The system first applies pre-processing to input images. It then extracts features from faces using extended local binary patterns and from fingerprints using minutia extraction. A genetic algorithm is used to optimize the extracted features. An artificial neural network is trained on the optimized features to classify images. The system fuses the face and fingerprint recognition results. Performance is evaluated based on false acceptance rate, false rejection rate and accuracy, with the proposed system achieving over 94% accuracy.
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...cscpconf
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns, sentiments and attitudes toward entities, products, services and their attributes. With the rapid development in the field of Internet, potential customer’s provides a satisfactory level of product/service reviews. The high volume of customer reviews were developed for product/review through taxonomy-aware processing but, it was difficult to identify the best reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is developed to predict the pattern for service provider and to improve customer satisfaction based on the review comments. Associative Regression based Decision Rule Mining performs twosteps for improving the customer satisfactory level. Initially, the Machine Learning Bayes Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After that, Regressive factor of the opinion words and Class labels were checked for Association between the words by using various probabilistic rules. Based on the probabilistic rules, the opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of service preferred by the customers with their review comments. The Associative Regressive Decision Rule helps the service provider to take decision on improving the customer satisfactory level. The experimental results reveal that the Associative Regression Decision Rule Mining (ARDRM) technique improved the performance in terms of true positive rate, Associative Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of similar pattern.
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...csandit
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns,
sentiments and attitudes toward entities, products, services and their attributes. With the rapid
development in the field of Internet, potential customer’s provides a satisfactory level of
product/service reviews. The high volume of customer reviews were developed for
product/review through taxonomy-aware processing but, it was difficult to identify the best
reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is
developed to predict the pattern for service provider and to improve customer satisfaction based
on the review comments. Associative Regression based Decision Rule Mining performs twosteps
for improving the customer satisfactory level. Initially, the Machine Learning Bayes
Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After
that, Regressive factor of the opinion words and Class labels were checked for Association
between the words by using various probabilistic rules. Based on the probabilistic rules, the
opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of
service preferred by the customers with their review comments. The Associative Regressive
Decision Rule helps the service provider to take decision on improving the customer satisfactory
level. The experimental results reveal that the Associative Regression Decision Rule Mining
(ARDRM) technique improved the performance in terms of true positive rate, Associative
Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of
similar pattern.
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
1) The document discusses using machine learning techniques to predict crime patterns and types based on historical crime data.
2) It proposes using a random forest classification algorithm to analyze crime data and predict the type of crime that may occur in a particular area.
3) The random forest algorithm would be trained on a dataset containing information about past crimes like date, location, and crime type to make predictions about future crimes.
IRJET- Spot Me - A Smart Attendance System based on Face RecognitionIRJET Journal
The article discusses international issues. It mentions that globalization has increased economic interdependence between nations while also raising tensions over immigration and trade. Solutions will require cooperation and compromise and a recognition that isolationism is not a viable strategy in an interconnected world.
Face Recognition Smart Attendance System: (InClass System)IRJET Journal
- The document describes a face recognition system called "InClass" to automate student attendance tracking. It aims to address issues with traditional manual attendance systems like being inaccurate, time-consuming, and difficult to maintain.
- The InClass system uses a CNN face detector to detect and identify students' faces from images captured with a camera. It can handle variations in lighting, angles, and occlusions. Matching faces to a database allows for automated attendance marking.
- The system aims to simplify the attendance process, reduce time and errors compared to existing biometric systems, and make attendance records easily accessible and storable digitally rather than on paper.
Significant Role of Statistics in Computational SciencesEditor IJCATR
This paper is focused on the issues related to optimizing statistical approaches in the emerging fields of Computer Science
and Information Technology. More emphasis has been given on the role of statistical techniques in modern data mining. Statistics is
the science of learning from data and of measuring, controlling, and communicating uncertainty. Statistical approaches can play a vital
role for providing significance contribution in the field of software engineering, neural network, data mining, bioinformatics and other
allied fields. Statistical techniques not only helps make scientific models but it quantifies the reliability, reproducibility and general
uncertainty associated with these models. In the current scenario, large amount of data is automatically recorded with computers and
managed with the data base management systems (DBMS) for storage and fast retrieval purpose. The practice of examining large preexisting
databases in order to generate new information is known as data mining. Presently, data mining has attracted substantial
attention in the research and commercial arena which involves applications of a variety of statistical techniques. Twenty years ago
mostly data was collected manually and the data set was in simple form but in present time, there have been considerable changes in
the nature of data. Statistical techniques and computer applications can be utilized to obtain maximum information with the fewest
possible measurements to reduce the cost of data collection.
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
This document presents a study that uses machine learning techniques to predict crime rates. Specifically, it aims to analyze crime data using supervised machine learning classification algorithms like decision trees, support vector machines, logistic regression, k-nearest neighbors, and random forests. The document outlines collecting and preprocessing crime data, selecting relevant features, training models on a portion of the data and testing them on the remaining data. It finds that random forest achieved the best prediction accuracy compared to other algorithms tested. The goal is to help law enforcement agencies better predict and reduce crime rates by analyzing historical crime data patterns.
IRJET-Survey on Data Mining Techniques for Disease PredictionIRJET Journal
This document discusses using data mining techniques to predict disease, specifically focusing on heart disease. It provides an overview of different classification algorithms that can be used for disease prediction, including decision trees, Bayesian classifiers, multilayer perceptrons, and ensemble techniques. These algorithms are analyzed based on their accuracy, time efficiency, and area under the ROC curve. The document also reviews related literature applying various data mining methods like decision trees, KNN, and support vector machines to heart disease prediction. Overall, the document examines using classification algorithms and data mining to extract patterns from medical data that can help predict heart disease and other illnesses.
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...IRJET Journal
This document presents a novel approach for software defect prediction using dimensionality reduction techniques. The proposed approach uses an artificial neural network to extract features from initial change measures, and then trains a classifier on the extracted features. This is compared to other dimensionality reduction techniques like principal component analysis, linear discriminant analysis, and kernel principal component analysis. Five open source datasets from NASA are used to evaluate the different techniques based on accuracy, F1 score, and area under the receiver operating characteristic curve. The results show that the artificial neural network approach outperforms the other dimensionality reduction techniques, and kernel principal component analysis performs best among those techniques. The document also discusses related work on using machine learning for software defect prediction.
1. Data mining is the process of obtaining meaningful information from large amounts of data through techniques like cluster analysis, regression analysis, social network analysis, and time series analysis. These techniques are used to discover patterns and relationships in data to help organizations make predictions and decisions.
2. Common data mining techniques include cluster analysis, which groups similar objects together; regression analysis, which estimates relationships between variables; social network analysis, which analyzes relationships between individuals; and time series analysis, which analyzes time-based data to make predictions.
3. Data mining is increasingly important for obtaining relevant information from large datasets and can be used across many fields for applications such as market segmentation, fraud detection, weather prediction, and intrusion
This document summarizes several research papers on human face recognition using feature extraction and measurements. It discusses using face recognition for applications like surveillance, access control, and banking validation. Key steps in face recognition systems include extracting features from captured images, comparing them to known images in a training database, and identifying errors like false acceptance and false rejection rates. Methods discussed for feature extraction and dimensionality reduction include Linear Discriminant Analysis and Principal Component Analysis. The document also examines factors that affect face recognition performance like illumination changes, aging, and expressions. Quantifying uncertainty in face recognition algorithms is identified as important for evaluating system performance.
Concept drift and machine learning model for detecting fraudulent transaction...IJECEIAES
The document presents a machine learning model for detecting fraudulent transactions in a streaming environment that addresses concept drift. The proposed approach uses the extreme gradient boosting (XGBoost) algorithm and employs four algorithms to continuously detect concept drift in data streams. The approach is evaluated on credit card and Twitter fraud datasets and is shown to outperform traditional machine learning models in terms accuracy, precision, and recall, and is more robust to concept drift. The proposed approach can be utilized as a real-time fraud detection system across different industries.
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
This document describes a proposed system to use machine learning methods to predict crime rates and types of crimes in specific areas based on historical crime data. The system would analyze crime data collected from websites including date, location, and crime type to identify patterns. Machine learning algorithms would be trained on the data to build predictive models. The goal is to help law enforcement agencies more quickly detect, resolve, and prevent crimes by predicting where and what types of crimes may occur based on the characteristics of past crimes.
As we know the fingerprint is unique of every living objects. It is quite difficult to find out the prints.
Usually the Forensics use Fine powder and duct tapes to identify the prints of living object. As powder is
exceptionally muddled, so such molecule can cause loss of information after that examination the information is
coordinated with the system. The proposed system consists of an embedded device in which it consists of ultra
light to glow the fingerprints details. After that we can detect the fingerprint, analysis and it will checks on the
database, and it will return the output after matching. For matching and analysis of the Fingerprint, we will be
using the Algorithm for matching.
Comparative Study of Enchancement of Automated Student Attendance System Usin...IRJET Journal
This document discusses developing an automated student attendance system using facial recognition and deep learning algorithms. It begins with an overview of how facial recognition can be used to take attendance accurately and efficiently. It then describes the methodology, which involves using a convolutional neural network (CNN) to detect and recognize faces. Dimensionality reduction techniques like principal component analysis (PCA) and linear discriminant analysis (LDA) are also used to improve recognition accuracy. The goal is to build a system that can identify students in real-time with a high degree of accuracy, even in varying lighting conditions. It aims to automate the entire attendance tracking process for both students and teachers.
Face Recognition Smart Attendance System- A SurveyIRJET Journal
This document surveys 15 research papers on face recognition smart attendance systems. It summarizes each paper's methodology, including the databases and images used, feature extraction and matching algorithms like PCA, LDA, CNN, techniques for addressing issues like lighting and pose variations, and the accuracy and limitations of each system. Overall, the papers presented a variety of approaches to developing face recognition systems for automated student attendance, comparing methods like PCA, LDA, HOG, and deep learning algorithms and evaluating factors like recognition rate, robustness, and speed.
Similar to A REVIEW ON PREDICTIVE ANALYTICS IN DATA MINING (20)
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
1. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
DOI:10.5121/ijccms.2016.5301 1
A REVIEW ON PREDICTIVE ANALYTICS IN DATA
MINING
Kavya.V1
, Arumugam.S2
1
M.E.Scholar, Department of Computer Science & Engineering, Nandha Engineering
College, Erode-638052, Tamil Nadu, India
2
Professor, Department of Computer Science & Engineering, Nandha Engineering
College, Erode-638052, Tamil Nadu, India
ABSTRACT
The data mining its main process is to collect, extract and store the valuable information and now-a-days it’s
done by many enterprises actively. In advanced analytics, Predictive analytics is the one of the branch which is
mainly used to make predictions about future events which are unknown. Predictive analytics which uses
various techniques from machine learning, statistics, data mining, modeling, and artificial intelligence for
analyzing the current data and to make predictions about future. The two main objectives of predictive
analytics are Regression and Classification. It is composed of various analytical and statistical techniques used
for developing models which predicts the future occurrence, probabilities or events. Predictive analytics deals
with both continuous changes and discontinuous changes. It provides a predictive score for each individual
(healthcare patient, product SKU, customer, component, machine, or other organizational unit, etc.) to
determine, or influence the organizational processes which pertain across huge numbers of individuals, like in
fraud detection, manufacturing, credit risk assessment, marketing, and government operations including law
enforcement.
KEYWORDS
Predictive analytics, Credit history, forecasting, Regression techniques
1. INTRODUCTION
Large amount of data available in information databases becomes waste until the useful information
is extracted. Predictive analytics is the roof of advanced analytics which is to predict the future
events. Predictive analytics is capsuled with the data collection and modelling, statistics and
deployment. The figure 1 gives the basis of predictive analysis. Based on the availability of high
quality data and effective sharing, the success of data mining relies.
2. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
2
The figure 1 gives the basis of predictive analysis.
The business users allow discovering the predictive intelligence by uncovering patterns and
relationships in both the structured and unstructured data through the data mining and text analytics
along with statistics. The structured data like gender, age, income .etc. Unstructured data are social
media and they are extracted data used in the model building process. The figure 2 explains the cycle
of predictive analytics.
The figure 2 explains the cycle of predictive analytics.
2. OVERVIEW
PREDICTIVE ANALYTICS
Predictive analytics encapsulated with statistical techniques from predictive modelling, data mining
and machine learning which are used to analyse the current and historical facts to found the
predictions about future [15]. In business, predictive analytics are used to identify risks and
opportunities. Predictive analytics are used in various fields such as marketing, finance, travel and
health care [10].
REGRESSION TECHNIQUES
A data mining function is regression which predicts a number. Regression techniques are used to
predict the age, weight, distance, and temperature [7]. In regression task starts with a dataset in that
Understand
the data
Deploy
Monitor
Prepare the
data
Model
Evaluate
Statistics Deploym
ent
Data
collection
and
Predicti
ve
3. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
3
the target values are known. Common applications of regression are trend analysis, biomedical and
financial forecasting [4]. There were various regression algorithms which are generalized linear
models and support vector machines [13].
FORECASTING
Forecasting focus towards the predictions of the future based past and present data [3, 14]. The
two terms are focussed on the forecasting are risk and uncertainty.
3. LITERATURE SURVEY
Carlos Márquez-Vera et al [1] A genetic programming algorithm and different data mining
approaches are proposed for solving challenge due to the high number of factors that can affect the
low performance of students and the imbalanced nature. Methodologies used to resolve are Data
Gathering, Pre-Processing, Data Mining, and Interpretation. Interpretable Classification Rule Mining
(ICRM) and SMOTE (Synthetic Minority Over-sampling Technique) algorithms are used. Student’s
data set from where data’s are collected.WEKA tool is used. Accuracy, True positive rate, True
negative rate and Geometric mean are the parameters for performance measurement. It results in the
accurate and comprehensible classification rules and it achieved the best predictions of student failure
(98.7 %).
Branko G. Celler et al [2] The telemonitoring of vital signs from the home for the management of
patients with chronic conditions. New measurement modalities and signal processing techniques are
proposed for increasing the ions about future.quality and value of vital signs monitoring. Automated
risk stratification algorithm is used. QRS height and width (ECG), PR (Pressure Rate), RR
(Respiratory Rate), QT (Abnormal heart rate) and Body temperature are the parameters. jBoss tool is
used. Data’s are taken from Department of Human Services Medicare databases and Telemonitoring
system data. It results in identifying the key technical performance characteristics of at-home
telemonitoring systems.
Hao-Tsung Yang et al [3] A neural network model to predict the value of playing style information
in predicting match quality. A mix of Sternberg’s thinking style theory and individual histories are
used to categorize League of Legend (LoL) players. The data’s are collected from LoLBase website.
The algorithms used are Feed forward/ feedback propagation algorithm and Match making algorithm.
Win rate, Match duration, Group number (ELO) are the parameters for performance measurement.
PYTHON is used. It finds that the presence of global-liberal (G-L) style players is positively
correlated with match enjoyment.
Jacob R. Scanlon et al [4] To forecast the daily level of cyber-recruitment activity of VE (violent
extremist) groups. LDA-based Topics as predictors within time series models reduce forecast error.
Latent Dirichlet allocation (LDA) algorithm is used in forecasting. Western jihadist discussion forum
is used as dataset. Number of posts and % of recruitment post (per day) are the parameters for
measurement. RTextTools and tm text mining packages in R are used. It achieves the automatic
forecast of VE cyber recruitment using natural language processing, supervised machine learning,
and time series analysis.
Quanzeng You, Liangliang Cao et al [5] To build a reliable forecasting system for the elections and
its modelled to figure out the inter relationship between social multimedia as image-centric and real-
world entities. Competitive Vector Auto Regression (CVAR) algorithm is used. By the competition
mechanism, CVAR compares the popularity among multiple competing candidates. Data’s are taken
from Flickr. Number of images, Number of users, images uploaded per day (IPD) and users
uploading images per day (UPD) are the parameters for measurement. OpenCV Tool is used. As a
4. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
4
result CVAR is able to take prior knowledge which leads to better performance in terms of prediction
accuracy.
Sean M. Arietta et al [6] A method which found and evaluate automatically to figure out predictive
relationships between the optic presence of a city and its non-visual attributes. Scalable distributed
processing framework is implemented that speeds up the main computational barrier by an order of
magnitude. Algorithms used are hard negative mining Algorithm, classification and recognition
algorithm and Dijkstra’s shortest path algorithm. Datasets are from 10,000 Google StreetView
panorama projections (2,000 positives, 8,000 negatives) Training dataset. The parameters for
performance measurements are Number of Detections, % of Detections fromPositiveSet. MATLAB
is used. As a result it’s used to define the visual boundary of city neighbourhoods, generate walking
directions that prohibit or find exposure to city attributes and validate user-specified visual elements
for prediction.
Abish Malik, Ross Maciejewski et Al [7] A visual analytics approach that provides decision makers
with a proactive and predictive environment which helps themin making effective resource allocation
and deployment decisions. Analysts are provided with a suite of natural scale templates and methods
that enable them to focus and drill down to appropriate geospatial and temporal resolution levels.
Prediction algorithm is used. Accuracy, Average, Count and Time are the parameters for performance
measurement. This Methodology is applied to Criminal, Traffic and Civil (CTC) incident datasets. It
provides users with a suite of natural scale templates that support analysis at multiple spatiotemporal
granularity levels.
Ronaldo C. Prati et al [8] Various graphical performance evaluation methods are increasingly
drawing the consideration of data mining. Ability to depict the trade-offs between evaluation aspects
in a multidimensional area. Graphical evaluation methods are applicable for binary classification
problems. The predictive models deployed on Classification, Ranking, Probability estimation. The
parameters for performance measurements are True positive and false positive rate, precision and
recall. The Insurance Company (TIC).Benchmark data set includes 86- variables are used and the tool
used is WEKA- 3.5.8 version. ROC graph, ROC curve, cost lines, cost curve Precision-recall curves,
lift curve, Reliability diagram, ROI curve, Discrimination diagram and attribute diagram are different
graphical evaluation methods. It helps on deciding the methods which is well fitted for the situations.
Gang Fang, Gaurav Pandey et al [9]Discriminative patterns can provide valuable insights into
data sets with class labels. Low-support patterns that can be discovered using SupMaxPair. Per
pattern precision, Density, dimension, count and Frequency are the parameters for performance
measurement. Frequent pattern mining algorithm and discriminative pattern mining algorithms are
used. Synthetic and cancer gene expression data sets are used for prediction. This result in exploring
discriminative patterns by speculating patterns with relatively low support from dense and high-
dimensional data sets comparably the other approaches fall to explore within desired amount of time.
Nanlin Jin, Peter Flach et al [10] Data mining methods for exploring incredible consumption
patterns and their associated descriptive models from smart electricity meter data. Target concept,
Target type, Double regression, Coverage and Strategy type are the parameters for performance
measurement. Subgroup discovery algorithms and S-Transform algorithms are used. Data used were
collected by the Energy Demand Research Project (EDRP). Cortana tool is used. This approach
outperforms more conventional data mining methods in terms of their predictive power and
classification accuracy, while consuming similar computational resource.
5. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
5
4. COMPARISONS ON DIFFERENT PREDICTIVE ANALYTIC TECHNIQUES
Title Techniques
And
Algorithms
Datasets Parameter Conclusion
Predicting Student
Failure At School
Using Genetic
Programming
And Different
Data Mining
Approaches With
High Dimensional
And Imbalanced
Data
Data Gathering, Pre-
Processing, Data
Mining, And
Interpretation.
Interpretable
Classification Rule
Mining (ICRM) And
SMOTE (Synthetic
Minority Over-
Sampling Technique)
Algorithms
Student’s
Data Set
Accuracy, True
Positive Rate,
True Negative
Rate And
Geometric Mean
Accurate And
Comprehensib
le
Classification
Rules
Home
Telemonitoring
Of Vital Signs
Technical
Challenges And
Future Directions
Automated Risk
Stratification
Algorithm
Departme
nt Of
Human
Services
Medicare
Databases
And
Telemoni
toring
System
Data
QRS Height And
Width (ECG),
PR (Pressure
Rate), RR
(Respiratory
Rate), QT
(Abnormal Heart
Rate) And Body
Temperature
Identifying The
Key Technical
Performance
Characteristics
Of At-Home
Telemonitoring
Systems.
Thinking Style
And Team
Competition
Game
Performance And
Enjoyment
Feed Forward/
Feedback Propagation
Algorithm And Match
Making Algorithm.
Lolbase
Website
Win Rate, Match
Duration, Group
Number (ELO)
Finds That
The Presence
Of Global-
Liberal (G-L)
Style Players
Is Positively
Correlated
With Match
Enjoyment
Forecasting
Violent Extremist
Cyber
Recruitment
Latent Dirichlet
Allocation (LDA)
Algorithm
Western
Jihadist
Discussio
n Forum
Number Of Posts
And % Of
Recruitment Post
(Per Day)
Achieves The
Automatic
Forecast Of
VE Cyber
Recruitment
Using Natural
Language
Processing,
Supervised
Machine
Learning, And
Time Series
Analysis
A Multifaceted
Approach To
Social
Competitive Vector
Auto Regression
(CVAR) Algorithm
Flickr Number Of
Images, Number
Of Users, Images
Better
Performance In
Terms Of
6. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
6
Multimedia-
Based Prediction
Of Elections
Uploaded Per
Day (IPD) And
Users Uploading
Images Per Day
(UPD)
Prediction
Accuracy.
City Forensics:
Using Visual
Elements To
Predict Non-
Visual City
Attributes
Hard Negative Mining
Algorithm,
Classification And
Recognition
Algorithm And
Dijkstra’s Shortest
Path Algorithm
Google
Streetvie
w
Panorama
Projectio
ns
Training
Number Of
Detections, % Of
Detections From
Positive Set
Define The
Visual
Boundary Of
City
Neighbourhoo
ds, Generate
Walking
Directions
That Prohibit
Or Find
Exposure To
City Attributes
And Validate
User-Specified
Visual
Elements For
Prediction
Proactive
Spatiotemporal
Resource
Allocation And
Predictive Visual
Analytics For
Community
Policing And Law
Enforcement
Prediction Algorithm Criminal,
Traffic And
Civil
(CTC)
Incident
Datasets.
Accuracy,
Average, Count
And Time
It Provides
Users With A
Suite Of
Natural Scale
Templates That
Support
Analysis At
Multiple
Spatiotemporal
Granularity
Levels.
A Survey On
Graphical Methods
For Classification
Predictive
Performance
Evaluation
Graphical Evaluation
Methods
Insurance
Company
(TIC).Be
nchmark
Data Set
True Positive
And False
Positive Rate,
Precision And
Recall
It Helps On
Deciding The
Methods Which
Is Well Fitted
For The
Situations.
Mining Low-
Support
Discriminative
Patterns From
Dense And
High-
Dimensional
Data
Frequent Pattern
Mining Algorithm
And Discriminative
Pattern Mining
Algorithms
Synthetic
And
Cancer
Gene
Expressio
n Data
Sets
Per Pattern
Precision,
Density,
Dimension,
Count And
Frequency
Exploring
Discriminative
Patterns By
Speculating
Patterns With
Relatively Low
Support From
Dense And
High-
Dimensional
Data Sets
Comparably
The Other
7. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
7
Approaches
Fall To Explore
Within Desired
Amount Of
Time.
Subgroup
Discovery In
Smart Electricity
Meter Data
Subgroup Discovery
Algorithms And S-
Transform Algorithms
Energy
Demand
Research
Project
(EDRP)
Target Concept,
Target Type,
Double
Regression,
Coverage And
Strategy Type
Outperforms
More
Conventional
Data Mining
Methods In
Terms Of Their
Predictive
Power And
Classification
Accuracy,
While
Consuming
Similar
Computational
Resource.
5. CONCLUSION
Predictive analytics is the future of data mining .This study focus towards the predictive analytics,
regression techniques and forecasting in knowledge discovery domain. Business intelligence is used
in predictive analytics for modelling and forecasting. Predictive analytics are more efficient in
choosing marketing methods and helpful in social media analytics.
REFERENCES
[1] Carlos Márquez-Vera, Alberto Cano, Cristóbal Romero, Sebastián Ventura,“Predicting student failure at
school using genetic programming and different data mining approaches with high dimensional and
imbalanced data”, Springer Science+Business Media, LLC 2012.
[2] Branko G. Celler, and Ross S. Sparks, “Home Telemonitoring of Vital Signs Technical Challenges and
Future Directions”, IEEE Journal Of Biomedical And Health Informatics, Vol. 19, No. 1, January 2015.
[3] Hao Wang, Hao-Tsung Yang, and Chuen-Tsai Sun,” Thinking Style and Team Competition Game
Performance and Enjoyment”, IEEE Transactions On Computational Intelligence And Ai In Games,
Vol. 7, No. 3, September 2015.
[4] Jacob R. Scanlon and Matthew S. Gerber, “Forecasting Violent Extremist Cyber Recruitment”, IEEE
Transactions On Information Forensics And Security, Vol. 10, No. 11, November 2015.
[5] Quanzeng You, Liangliang Cao, Yang Cong, Xianchao Zhang, and Jiebo Luo” A Multifaceted Approach
to Social Multimedia-Based Prediction of Elections”, IEEE Transactions On Multimedia, Vol. 17, No. 12,
December 2015.
[6] Sean M. Arietta Alexei A. Efros Ravi Ramamoorthi Maneesh Agrawala, “City Forensics: Using Visual
Elements to Predict Non-Visual City Attributes”, IEEE Transactions On Visualization And Computer
Graphics, Vol. 20, No. 12, December 2014.
[7] Abish Malik, Ross Maciejewski, Sherry Towers, Sean McCullough, and David S. Ebert,” Proactive
Spatiotemporal Resource Allocation and Predictive Visual Analytics for Community Policing and Law
Enforcement”, IEEE Transactions On Visualization And Computer Graphics, Vol. 20, No. 12, December
2014.
[8] Ronaldo C. Prati, Gustavo E.A.P.A. Batista, and Maria Carolina Monard,” A Survey on Graphical
Methods for Classification Predictive Performance Evaluation”, IEEE Transactions On Knowledge And
8. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
8
Data Engineering, Vol. 23, No. 11, November 2011.
[9] Gang Fang, Gaurav Pandey, Wen Wang, Manish Gupta, Michael Steinbach, and Vipin Kumar,” Mining
Low-Support Discriminative Patterns from Dense and High-Dimensional Data”, IEEE Transactions
On Knowledge And Data Engineering, Vol. 24, No. 2, February 2012 .
[10] Nanlin Jin, Peter Flach, Tom Wilcox, Royston Sellman, Joshua Thumim, and Arno Knobbe,” Subgroup
Discovery in Smart Electricity Meter Data”, IEEE Transactions On Industrial Informatics, Vol. 10, No. 2,
May 2014.
[11] Hao Wang, Hao-Tsung Yang, and Chuen-Tsai Sun,” Thinking Style and Team Competition Game
Performance and Enjoyment”, IEEE Transactions On Computational Intelligence And Ai In Games, Vol.
7, No. 3, September 2015.
[12] Quanzeng You, Liangliang Cao, Yang Cong, Senior Member, IEEE, Xianchao Zhang, and Jiebo Luo,
Fellow, IEEE.” A Multifaceted Approach to Social Multimedia-Based Prediction of Elections”, IEEE
Transactions On Multimedia, Vol. 17, No. 12, December 2015.
[13] Yun Wang and Sudha Ram,” Predicting Location- Based Sequential PurchasingEvents byUsing Spatial,
Temporal, and Social Patterns”, IEEE Intelligent Systems, May/June 2015.
[14] Jesse Rio Russell,” Predictive analytics and child protection: Constraints and Opportunities”, Child Abuse
& Neglect 46 (2015) 182–189- ELSEVIER.
[15] Karel Dejaeger, Wouter Verbeke, David Martens, and Bart Baesens,” Data Mining Techniques for
Software Effort Estimation: A Comparative Study”, IEEE Transactions On Software Engineering, Vol.
38, No. 2, March/April 2012.
[16] Leonardo Feltrin,” KNIME an Open Source Solution for Predictive Analytics in the Geosciences”, IEEE
Geoscience and remote sensing magazine, December 2015.
[17] Josep Ll. Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Daron Green,” ALOJA: A
Framework for Benchmarking and Predictive Analytics in Big Data Deployments”, IEEE Transactions on
Emerging Topics in Computing • November 2015.
[18] Minghui Zhou and Audris Mockus,” Who Will Stay in the FLOSS Community? Modeling Participant’s
Initial Behavior”, IEEE Transactions On Software Engineering, Vol. 41, No. 1, January 2015 .
[19] Sean M. Arietta Alexei A. Efros Ravi Ramamoorthi Maneesh Agrawala, “City Forensics: Using Visual
Elements to Predict Non-Visual City Attributes”, IEEE Transactions On Visualization And Computer
Graphics, Vol. 20, No. 12, December 2014.
[20] Francisco C. Pereira, Member, IEEE, Filipe Rodrigues, Evgheni Polisciuc, and Moshe Ben-Akiva”, Why
so many people? Explaining Nonhabitual Transport Overcrowding With Internet Data”, IEEE
Transactions On Intelligent Transportation Systems, Vol. 16, No. 3, June 2015.