The data mining its main process is to collect, extract and store the valuable information and now-a-days it’s
done by many enterprises actively. In advanced analytics, Predictive analytics is the one of the branch which is
mainly used to make predictions about future events which are unknown. Predictive analytics which uses
various techniques from machine learning, statistics, data mining, modeling, and artificial intelligence for
analyzing the current data and to make predictions about future. The two main objectives of predictive
analytics are Regression and Classification. It is composed of various analytical and statistical techniques used
for developing models which predicts the future occurrence, probabilities or events. Predictive analytics deals
with both continuous changes and discontinuous changes. It provides a predictive score for each individual
(healthcare patient, product SKU, customer, component, machine, or other organizational unit, etc.) to
determine, or influence the organizational processes which pertain across huge numbers of individuals, like in
fraud detection, manufacturing, credit risk assessment, marketing, and government operations including law
enforcement.
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET Journal
This document describes a probability-based method for imputing missing data. It begins with an abstract that outlines the goal of developing an application to identify and replace missing values in a dataset using a probability approach. It then provides background on missing data issues and different imputation techniques. The proposed method uses a probability approach to calculate possible values for missing data based on attributes of known values, stores this information separately, and then imputes values based on probability calculations. It claims this map-reduce approach reduces processing time for large datasets compared to existing methods. The method and imputed dataset will be analyzed using clustering algorithms to examine changes from the original missing data.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
A Comprehensive review of Conversational Agent and its prediction algorithmvivatechijri
There is an exponential increase in the use of conversational bots. Conversational bots can be
described as a platform that can chat with people using artificial intelligence. The recent advancement has
made A.I capable of learning from data and produce an output. This learning of data can be performed by using
various machine learning algorithm. Machine learning techniques involves construction of algorithms that can
learn for data and can predict the outcome. This paper reviews the efficiency of different machine learning
algorithm that are used in conversational bot.
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Software Bug Detection Algorithm using Data mining TechniquesAM Publications
The main aim of software development is to develop high quality software and high quality software is
developed using enormous amount of software engineering data. The software engineering data can be used to gain
empirically based understanding of software development. The meaning full information can be extracted using
various data mining techniques. As Data Mining for Secure Software Engineering improves software productivity and
quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks.
However mining software engineering data poses several challenges, requiring various algorithms to effectively mine
sequences, graphs and text from such data. Software engineering data includes code bases, execution traces,
historical code changes, mailing lists and bug data bases. They contains a wealth of information about a projectsstatus,
progress and evolution. Using well established data mining techniques, practitioners and researchers can
explore the potential of this valuable data in order to better manage their projects and do produce higher-quality
software systems that are delivered on time and within budget
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET Journal
This document proposes a modification to the least squares regression method to reduce errors in data prediction. It divides the original data set into three parts, uses the first part to make predictions with least squares regression and fits those predictions to the second part of the data to minimize errors. It then validates the model on the third part of data and compares errors to the original least squares method. The proposed method shows reduced errors in prediction based on mean absolute error, mean relative error and root mean square error metrics in most test ranges of the validation data.
This document summarizes a research paper that proposes a new inventory prediction method for supply chain management called BP-GA chaos prediction algorithm. The method uses a backpropagation neural network combined with a genetic algorithm to forecast inventory levels based on chaotic time series analysis. It aims to overcome limitations of traditional chaos prediction approaches. The paper reviews other inventory forecasting research and chaotic prediction methods. It then describes the new hybrid BP-GA method in detail, which establishes a chaotic neural network model optimized through a genetic algorithm. An experiment applying this method to inventory prediction is said to achieve good results.
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET Journal
This document summarizes a research paper that predicts road accidents using machine learning algorithms. It discusses how large datasets have enabled data mining techniques to discover useful information. The paper aims to determine the most suitable machine learning classification technique for road accident prediction. It uses logistic regression, an algorithm that predicts a binary outcome (yes/no). The researchers clean the data, divide it into training and testing sets, and use logistic regression in Jupyter notebooks with the Python programming language. It provides percentage predictions of accident likelihood to users through a website interface. The results show logistic regression can accurately predict accidents for numerical data but has limitations for non-numerical text data.
IRJET- Probability based Missing Value Imputation Method and its AnalysisIRJET Journal
This document describes a probability-based method for imputing missing data. It begins with an abstract that outlines the goal of developing an application to identify and replace missing values in a dataset using a probability approach. It then provides background on missing data issues and different imputation techniques. The proposed method uses a probability approach to calculate possible values for missing data based on attributes of known values, stores this information separately, and then imputes values based on probability calculations. It claims this map-reduce approach reduces processing time for large datasets compared to existing methods. The method and imputed dataset will be analyzed using clustering algorithms to examine changes from the original missing data.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
A Comprehensive review of Conversational Agent and its prediction algorithmvivatechijri
There is an exponential increase in the use of conversational bots. Conversational bots can be
described as a platform that can chat with people using artificial intelligence. The recent advancement has
made A.I capable of learning from data and produce an output. This learning of data can be performed by using
various machine learning algorithm. Machine learning techniques involves construction of algorithms that can
learn for data and can predict the outcome. This paper reviews the efficiency of different machine learning
algorithm that are used in conversational bot.
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Software Bug Detection Algorithm using Data mining TechniquesAM Publications
The main aim of software development is to develop high quality software and high quality software is
developed using enormous amount of software engineering data. The software engineering data can be used to gain
empirically based understanding of software development. The meaning full information can be extracted using
various data mining techniques. As Data Mining for Secure Software Engineering improves software productivity and
quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks.
However mining software engineering data poses several challenges, requiring various algorithms to effectively mine
sequences, graphs and text from such data. Software engineering data includes code bases, execution traces,
historical code changes, mailing lists and bug data bases. They contains a wealth of information about a projectsstatus,
progress and evolution. Using well established data mining techniques, practitioners and researchers can
explore the potential of this valuable data in order to better manage their projects and do produce higher-quality
software systems that are delivered on time and within budget
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET Journal
This document proposes a modification to the least squares regression method to reduce errors in data prediction. It divides the original data set into three parts, uses the first part to make predictions with least squares regression and fits those predictions to the second part of the data to minimize errors. It then validates the model on the third part of data and compares errors to the original least squares method. The proposed method shows reduced errors in prediction based on mean absolute error, mean relative error and root mean square error metrics in most test ranges of the validation data.
This document summarizes a research paper that proposes a new inventory prediction method for supply chain management called BP-GA chaos prediction algorithm. The method uses a backpropagation neural network combined with a genetic algorithm to forecast inventory levels based on chaotic time series analysis. It aims to overcome limitations of traditional chaos prediction approaches. The paper reviews other inventory forecasting research and chaotic prediction methods. It then describes the new hybrid BP-GA method in detail, which establishes a chaotic neural network model optimized through a genetic algorithm. An experiment applying this method to inventory prediction is said to achieve good results.
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET Journal
This document summarizes a research paper that predicts road accidents using machine learning algorithms. It discusses how large datasets have enabled data mining techniques to discover useful information. The paper aims to determine the most suitable machine learning classification technique for road accident prediction. It uses logistic regression, an algorithm that predicts a binary outcome (yes/no). The researchers clean the data, divide it into training and testing sets, and use logistic regression in Jupyter notebooks with the Python programming language. It provides percentage predictions of accident likelihood to users through a website interface. The results show logistic regression can accurately predict accidents for numerical data but has limitations for non-numerical text data.
A Clustering Method for Weak Signals to Support Anticipative IntelligenceCSCJournals
This document proposes a clustering method to analyze weak signals, which are short texts that may indicate future trends when analyzed together. The method involves preprocessing weak signals by removing stop words, stemming words, and identifying synonyms. It then clusters the weak signals using the K-medoids algorithm based on the number of similar words between signals, including identical words, stemmed words, and synonyms. The method was tested on a database of weak signals related to bioenergy. The clustering is intended to group similar weak signals to help form hypotheses about potential future changes or opportunities.
IRJET- GDPS - General Disease Prediction SystemIRJET Journal
The document describes a General Disease Prediction System (GDPS) that uses machine learning and data mining techniques to predict diseases based on patient symptoms.
The GDPS first collects patient data, preprocesses it, and extracts relevant features. It then implements the ID3 decision tree algorithm to generate a predictive model and classify diseases. As an admin, one can train the model using sample data. As a user, one can enter symptoms and the trained model will predict the likely disease and recommend precautions.
The GDPS was tested on a dataset of 120 patients and achieved 86.67% accuracy in disease prediction. The system currently covers common diseases but future work involves expanding it to predict more serious or fatal diseases like various cancers
IRJET- Weather Prediction for Tourism Application using ARIMAIRJET Journal
This document discusses using an ARIMA model to predict weather patterns for tourism applications. It begins with an introduction to weather forecasting and its importance for the tourism industry. It then reviews related work on weather prediction using machine learning methods. The proposed method involves collecting weather data, preprocessing it, converting it to a stationary time series, analyzing it using an ARIMA model, and concluding that ARIMA can accurately predict weather patterns to help tourists plan trips based on the forecast.
Comparison of Data Mining Techniques used in Anomaly Based IDS IRJET Journal
This document discusses anomaly-based intrusion detection systems and compares various data mining techniques used in these systems. It begins by defining intrusion detection systems and the two main categories of misuse detection and anomaly detection. Anomaly detection involves learning normal patterns from data and detecting deviations from these patterns as potential anomalies or intrusions.
The document then examines several data mining techniques used for anomaly detection, including statistical-based approaches like chi-square statistics, and clustering algorithms like k-means, k-medoids, and EM clustering. It notes that these techniques can be applied to intrusion detection to analyze data and detect anomalies representing potential malicious activity. The methodology of anomaly detection is also summarized as involving parameterization of data,
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Bang Xiang Yong
Presented at MET4FOF Workshop, JULY 2020
I talk about our recent work of combining Bayesian Deep learning with Explainable Artificial Intelligence (XAI) methods. In particular, we look at Bayesian Autoencoders.
Unsupervised Distance Based Detection of Outliers by using Anti-hubsIRJET Journal
This document summarizes research on using anti-hubs for unsupervised outlier detection in high-dimensional data. It discusses how existing distance-based outlier detection methods struggle with high-dimensional data as distances become less meaningful. Anti-hubs, which are points that are infrequently in the k-nearest neighbor lists of other points, have been used for outlier detection. However, calculating anti-hubs is computationally expensive for high-dimensional data. The document proposes applying feature selection before calculating anti-hubs to reduce dimensionality and computational cost, thereby extending anti-hub based outlier detection to high-dimensional data more efficiently.
A comprehensive study on disease risk predictions in machine learning IJECEIAES
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. A Comprehensive study on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavors have been shifted.
Data mining and machine learning have become a vital part of crime detection and prevention. In this
research, we use WEKA, an open source data mining software, to conduct a comparative study between the
violent crime patterns from the Communities and Crime Unnormalized Dataset provided by the University
of California-Irvine repository and actual crime statistical data for the state of Mississippi that has been
provided by neighborhoodscout.com. We implemented the Linear Regression, Additive Regression, and
Decision Stump algorithms using the same finite set of features, on the Communities and Crime Dataset.
Overall, the linear regression algorithm performed the best among the three selected algorithms. The scope
of this project is to prove how effective and accurate the machine learning algorithms used in data mining
analysis can be at predicting violent crime patterns.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
This document summarizes research on intrusion detection systems using data mining techniques. It first describes the architecture of a data mining-based IDS, including sensors to collect data, detectors to evaluate the data using models, a data warehouse to store data and models, and a model generator to develop and distribute new models. It then discusses supervised and unsupervised learning approaches for intrusion detection. The document concludes by summarizing several papers on intrusion detection using techniques like neural networks, decision trees, clustering, and ensemble methods.
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONijaia
Data augmentation has been broadly applied in training deep-learning models to increase the diversity of
data. This study ingestigates the effectiveness of different data augmentation methods for deep-learningbased human intention prediction when only limited training data is available. A human participant pitches
a ball to nine potential targets in our experiment. We expect to predict which target the participant pitches
the ball to. Firstly, the effectiveness of 10 data augmentation groups is evaluated on a single-participant
data set using RGB images. Secondly, the best data augmentation method (i.e., random cropping) on the
single-participant data set is further evaluated on a multi-participant data set to assess its generalization
ability. Finally, the effectiveness of random cropping on fusion data of RGB images and optical flow is
evaluated on both single- and multi-participant data sets. Experiment results show that: 1) Data
augmentation methods that crop or deform images can improve the prediction performance; 2) Random
cropping can be generalized to the multi-participant data set (prediction accuracy is improved from 50%
to 57.4%); and 3) Random cropping with fusion data of RGB images and optical flow can further improve
the prediction accuracy from 57.4% to 63.9% on the multi-participant data set.
Trends in Advanced Computing in 2020 - Advanced Computing: An International J...acijjournal
Advanced Computing: An International Journal (ACIJ) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the advanced computing. The journal focuses on all technical and practical aspects of high performance computing, green computing, pervasive computing, cloud computing etc. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding advances in computing and establishing new collaborations in these areas.
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAYIJDKP
Flight delay has been the fiendish problem to the world's aviation industry, so there is very important
significance to research for computer system predicting flight delay propagation. Extraction of hidden
information from large datasets of raw data could be one of the ways for building predictive model. This
paper describes the application of classification techniques for analysing the Flight delay pattern in Egypt
Airline’s Flight dataset. In this work, four decision tree classifiers were evaluated and results show that the
REPTree have the best accuracy 80.3% with respect to Forest, Stump and J48. However, four rules based
classifiers were compared and results show that PART provides best accuracy among studied rule-based
classifiers with accuracy of 83.1%. By analysing running time for all classifiers, the current work
concluded that REPtree is the most efficient classifier with respect to accuracy and running time. Also, the
current work is extended to apply of Apriori association technique to extract some important information
about flight delay. Association rules are presented and association technique is evaluated.
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...csandit
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns,
sentiments and attitudes toward entities, products, services and their attributes. With the rapid
development in the field of Internet, potential customer’s provides a satisfactory level of
product/service reviews. The high volume of customer reviews were developed for
product/review through taxonomy-aware processing but, it was difficult to identify the best
reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is
developed to predict the pattern for service provider and to improve customer satisfaction based
on the review comments. Associative Regression based Decision Rule Mining performs twosteps
for improving the customer satisfactory level. Initially, the Machine Learning Bayes
Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After
that, Regressive factor of the opinion words and Class labels were checked for Association
between the words by using various probabilistic rules. Based on the probabilistic rules, the
opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of
service preferred by the customers with their review comments. The Associative Regressive
Decision Rule helps the service provider to take decision on improving the customer satisfactory
level. The experimental results reveal that the Associative Regression Decision Rule Mining
(ARDRM) technique improved the performance in terms of true positive rate, Associative
Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of
similar pattern.
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
Abstract: Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems.
They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery.
The British royal family, including Prince William and Kate Middleton, visited Mumbai in April 2016. They attended the premiere of the film "Lion" and met with Bollywood stars like Amitabh Bachchan and Shah Rukh Khan. The royals received a warm welcome from Bollywood and their visit helped strengthen cultural ties between India and Britain.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help enhance one's emotional well-being and mental clarity.
This document provides descriptions of various foods and drinks traditionally consumed during the Holi festival in India. It discusses gujiya, a fried pastry stuffed with sweet khoya filling, phirni made from ground rice, bhang lassi which mixes bhang into dahi, sugar and spices, bhang thandai regarded as an offering to Lord Shiva, colorful multi-colored sweets from Bengal, dahi vada featuring lentil dumplings in whipped yogurt, ram ladoo made from yellow lentil and chickpeas, besan ladoo consisting of chickpea flour balls, bhaang pakore as a popular snack, and namakpara which are deep fried flour strips that
The document summarizes information on 16 famous Indian actresses from the 1940s to 1980s who worked in Bollywood: Suraiya, Madhubala, Nutan, Meena Kumari, Mala Sinha, Sharmila Tagore, Tanuja, Hema Malini, Zeenat Aman, Dimple Kapadia, Shabana Azmi, Sridevi, Parveen Babi. It provides brief details about each actress such as the films and decades they worked in, awards received, and accomplishments. Many of these actresses are considered among the most prominent and beautiful in the history of Indian cinema.
Holi is a spring festival celebrated with colors in India. Different regions have varying rituals for Holi celebrations. In Maharashtra, the play with colors is reserved for Rangpanchami, which falls five days after Phalgun Poornima. Common rituals include lighting bonfires called Holika, singing songs and dancing around the fires. Groups form pyramids to break pots of buttermilk hung high for entertainment. Historically, the accidental splashing of colors between two children on Holi led to their engagement and marriage, from which Shivaji the founder of the Maratha Empire was born.
Signals from a distant source travel as parallel rays over long distances, losing energy. Curved reflector dishes are used to strengthen received signals by collecting weak signals over a large area and focusing them to a point where the detector is placed, providing a stronger signal. Examples include radio and microwave telescopes which use curved reflectors to receive signals from satellites and other distant sources.
A Clustering Method for Weak Signals to Support Anticipative IntelligenceCSCJournals
This document proposes a clustering method to analyze weak signals, which are short texts that may indicate future trends when analyzed together. The method involves preprocessing weak signals by removing stop words, stemming words, and identifying synonyms. It then clusters the weak signals using the K-medoids algorithm based on the number of similar words between signals, including identical words, stemmed words, and synonyms. The method was tested on a database of weak signals related to bioenergy. The clustering is intended to group similar weak signals to help form hypotheses about potential future changes or opportunities.
IRJET- GDPS - General Disease Prediction SystemIRJET Journal
The document describes a General Disease Prediction System (GDPS) that uses machine learning and data mining techniques to predict diseases based on patient symptoms.
The GDPS first collects patient data, preprocesses it, and extracts relevant features. It then implements the ID3 decision tree algorithm to generate a predictive model and classify diseases. As an admin, one can train the model using sample data. As a user, one can enter symptoms and the trained model will predict the likely disease and recommend precautions.
The GDPS was tested on a dataset of 120 patients and achieved 86.67% accuracy in disease prediction. The system currently covers common diseases but future work involves expanding it to predict more serious or fatal diseases like various cancers
IRJET- Weather Prediction for Tourism Application using ARIMAIRJET Journal
This document discusses using an ARIMA model to predict weather patterns for tourism applications. It begins with an introduction to weather forecasting and its importance for the tourism industry. It then reviews related work on weather prediction using machine learning methods. The proposed method involves collecting weather data, preprocessing it, converting it to a stationary time series, analyzing it using an ARIMA model, and concluding that ARIMA can accurately predict weather patterns to help tourists plan trips based on the forecast.
Comparison of Data Mining Techniques used in Anomaly Based IDS IRJET Journal
This document discusses anomaly-based intrusion detection systems and compares various data mining techniques used in these systems. It begins by defining intrusion detection systems and the two main categories of misuse detection and anomaly detection. Anomaly detection involves learning normal patterns from data and detecting deviations from these patterns as potential anomalies or intrusions.
The document then examines several data mining techniques used for anomaly detection, including statistical-based approaches like chi-square statistics, and clustering algorithms like k-means, k-medoids, and EM clustering. It notes that these techniques can be applied to intrusion detection to analyze data and detect anomalies representing potential malicious activity. The methodology of anomaly detection is also summarized as involving parameterization of data,
Uncertainty Quantification with Unsupervised Deep learning and Multi Agent Sy...Bang Xiang Yong
Presented at MET4FOF Workshop, JULY 2020
I talk about our recent work of combining Bayesian Deep learning with Explainable Artificial Intelligence (XAI) methods. In particular, we look at Bayesian Autoencoders.
Unsupervised Distance Based Detection of Outliers by using Anti-hubsIRJET Journal
This document summarizes research on using anti-hubs for unsupervised outlier detection in high-dimensional data. It discusses how existing distance-based outlier detection methods struggle with high-dimensional data as distances become less meaningful. Anti-hubs, which are points that are infrequently in the k-nearest neighbor lists of other points, have been used for outlier detection. However, calculating anti-hubs is computationally expensive for high-dimensional data. The document proposes applying feature selection before calculating anti-hubs to reduce dimensionality and computational cost, thereby extending anti-hub based outlier detection to high-dimensional data more efficiently.
A comprehensive study on disease risk predictions in machine learning IJECEIAES
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. A Comprehensive study on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavors have been shifted.
Data mining and machine learning have become a vital part of crime detection and prevention. In this
research, we use WEKA, an open source data mining software, to conduct a comparative study between the
violent crime patterns from the Communities and Crime Unnormalized Dataset provided by the University
of California-Irvine repository and actual crime statistical data for the state of Mississippi that has been
provided by neighborhoodscout.com. We implemented the Linear Regression, Additive Regression, and
Decision Stump algorithms using the same finite set of features, on the Communities and Crime Dataset.
Overall, the linear regression algorithm performed the best among the three selected algorithms. The scope
of this project is to prove how effective and accurate the machine learning algorithms used in data mining
analysis can be at predicting violent crime patterns.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
This document summarizes research on intrusion detection systems using data mining techniques. It first describes the architecture of a data mining-based IDS, including sensors to collect data, detectors to evaluate the data using models, a data warehouse to store data and models, and a model generator to develop and distribute new models. It then discusses supervised and unsupervised learning approaches for intrusion detection. The document concludes by summarizing several papers on intrusion detection using techniques like neural networks, decision trees, clustering, and ensemble methods.
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONijaia
Data augmentation has been broadly applied in training deep-learning models to increase the diversity of
data. This study ingestigates the effectiveness of different data augmentation methods for deep-learningbased human intention prediction when only limited training data is available. A human participant pitches
a ball to nine potential targets in our experiment. We expect to predict which target the participant pitches
the ball to. Firstly, the effectiveness of 10 data augmentation groups is evaluated on a single-participant
data set using RGB images. Secondly, the best data augmentation method (i.e., random cropping) on the
single-participant data set is further evaluated on a multi-participant data set to assess its generalization
ability. Finally, the effectiveness of random cropping on fusion data of RGB images and optical flow is
evaluated on both single- and multi-participant data sets. Experiment results show that: 1) Data
augmentation methods that crop or deform images can improve the prediction performance; 2) Random
cropping can be generalized to the multi-participant data set (prediction accuracy is improved from 50%
to 57.4%); and 3) Random cropping with fusion data of RGB images and optical flow can further improve
the prediction accuracy from 57.4% to 63.9% on the multi-participant data set.
Trends in Advanced Computing in 2020 - Advanced Computing: An International J...acijjournal
Advanced Computing: An International Journal (ACIJ) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the advanced computing. The journal focuses on all technical and practical aspects of high performance computing, green computing, pervasive computing, cloud computing etc. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding advances in computing and establishing new collaborations in these areas.
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAYIJDKP
Flight delay has been the fiendish problem to the world's aviation industry, so there is very important
significance to research for computer system predicting flight delay propagation. Extraction of hidden
information from large datasets of raw data could be one of the ways for building predictive model. This
paper describes the application of classification techniques for analysing the Flight delay pattern in Egypt
Airline’s Flight dataset. In this work, four decision tree classifiers were evaluated and results show that the
REPTree have the best accuracy 80.3% with respect to Forest, Stump and J48. However, four rules based
classifiers were compared and results show that PART provides best accuracy among studied rule-based
classifiers with accuracy of 83.1%. By analysing running time for all classifiers, the current work
concluded that REPtree is the most efficient classifier with respect to accuracy and running time. Also, the
current work is extended to apply of Apriori association technique to extract some important information
about flight delay. Association rules are presented and association technique is evaluated.
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...csandit
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns,
sentiments and attitudes toward entities, products, services and their attributes. With the rapid
development in the field of Internet, potential customer’s provides a satisfactory level of
product/service reviews. The high volume of customer reviews were developed for
product/review through taxonomy-aware processing but, it was difficult to identify the best
reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is
developed to predict the pattern for service provider and to improve customer satisfaction based
on the review comments. Associative Regression based Decision Rule Mining performs twosteps
for improving the customer satisfactory level. Initially, the Machine Learning Bayes
Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After
that, Regressive factor of the opinion words and Class labels were checked for Association
between the words by using various probabilistic rules. Based on the probabilistic rules, the
opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of
service preferred by the customers with their review comments. The Associative Regressive
Decision Rule helps the service provider to take decision on improving the customer satisfactory
level. The experimental results reveal that the Associative Regression Decision Rule Mining
(ARDRM) technique improved the performance in terms of true positive rate, Associative
Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of
similar pattern.
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
Abstract: Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems.
They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery.
The British royal family, including Prince William and Kate Middleton, visited Mumbai in April 2016. They attended the premiere of the film "Lion" and met with Bollywood stars like Amitabh Bachchan and Shah Rukh Khan. The royals received a warm welcome from Bollywood and their visit helped strengthen cultural ties between India and Britain.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help enhance one's emotional well-being and mental clarity.
This document provides descriptions of various foods and drinks traditionally consumed during the Holi festival in India. It discusses gujiya, a fried pastry stuffed with sweet khoya filling, phirni made from ground rice, bhang lassi which mixes bhang into dahi, sugar and spices, bhang thandai regarded as an offering to Lord Shiva, colorful multi-colored sweets from Bengal, dahi vada featuring lentil dumplings in whipped yogurt, ram ladoo made from yellow lentil and chickpeas, besan ladoo consisting of chickpea flour balls, bhaang pakore as a popular snack, and namakpara which are deep fried flour strips that
The document summarizes information on 16 famous Indian actresses from the 1940s to 1980s who worked in Bollywood: Suraiya, Madhubala, Nutan, Meena Kumari, Mala Sinha, Sharmila Tagore, Tanuja, Hema Malini, Zeenat Aman, Dimple Kapadia, Shabana Azmi, Sridevi, Parveen Babi. It provides brief details about each actress such as the films and decades they worked in, awards received, and accomplishments. Many of these actresses are considered among the most prominent and beautiful in the history of Indian cinema.
Holi is a spring festival celebrated with colors in India. Different regions have varying rituals for Holi celebrations. In Maharashtra, the play with colors is reserved for Rangpanchami, which falls five days after Phalgun Poornima. Common rituals include lighting bonfires called Holika, singing songs and dancing around the fires. Groups form pyramids to break pots of buttermilk hung high for entertainment. Historically, the accidental splashing of colors between two children on Holi led to their engagement and marriage, from which Shivaji the founder of the Maratha Empire was born.
Signals from a distant source travel as parallel rays over long distances, losing energy. Curved reflector dishes are used to strengthen received signals by collecting weak signals over a large area and focusing them to a point where the detector is placed, providing a stronger signal. Examples include radio and microwave telescopes which use curved reflectors to receive signals from satellites and other distant sources.
Hikayat Bayan Budiman menceritakan tentang Khojan Maimun yang menitipkan istrinya Bibi Zainab untuk menjaga rumah dan amanah selama dirinya berdagang jauh. Namun Bibi Zainab lalai dan ingin berbuat zina dengan anak raja. Bayan Budiman yang bijak mencegah perbuatan tersebut dengan menceritakan 24 kisah selama 24 malam hingga Bibi Zainab insaf.
This document provides a list of gift ideas for weddings, including champagne, a music system, pearl necklace set for the bride, cufflinks and pen for the groom, romantic paintings, couple watches, key rings, wallets, perfumes for both the bride and groom, and belts for both the bride and groom.
In the present paper, applicability and
capability of A.I techniques for effort estimation prediction has
been investigated. It is seen that neuro fuzzy models are very
robust, characterized by fast computation, capable of handling
the distorted data. Due to the presence of data non-linearity, it is
an efficient quantitative tool to predict effort estimation. The one
hidden layer network has been developed named as OHLANFIS
using MATLAB simulation environment.
Here the initial parameters of the OHLANFIS are
identified using the subtractive clustering method. Parameters of
the Gaussian membership function are optimally determined
using the hybrid learning algorithm. From the analysis it is seen
that the Effort Estimation prediction model developed using
OHLANFIS technique has been able to perform well over normal
ANFIS Model.
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNINGIRJET Journal
This document describes a study on analyzing crime data and predicting crimes using machine learning techniques. The study uses an Indian crime dataset to analyze past crimes and identify patterns. Regression, k-means clustering, and decision tree algorithms are implemented to predict the type of future crimes based on conditions. The algorithms can identify crime-prone areas and anticipate crimes. The proposed system aims to conduct criminal analysis, identify trends, disseminate knowledge to support crime prevention measures, and recognize recurring crime patterns to prevent future incidents.
SCCAI- A Student Career Counselling Artificial Intelligencevivatechijri
As education is growing day by day, the competition has prompted a need for the student to
understand more about the educational field. Many times the counselor isn’t available all the time and
sometimes due to the lack of proper knowledge about some educational field. Due to this, it creates an issue of
misconception of that field. This creates a problem for the student to decide a proper educational trajectory and
guidance is not always useful. The proposed paper will overcome all these problem using machine learning
algorithm. Various algorithms are being considered and amongst them the best suitable for our project are used
here. There are 3 major problems that come across our path and they are solved using Random forest, Linear
regression and Searching algorithm using Google API. At first Searching algorithm solves the problem of
location by segregating the college’s location vice, then Random Forest provides the list of colleges by using
stream and range of percentage and finally Linear Regression predicts the current cutoff using previous years’
data. Rather than this, the proposed system also provides information regarding all fields of education helping
students to understand and know about their field of interest better. The following idea is a total fresh idea with
no existing projects of similar kind. This project will help students guide them throughout.
A Biometric Fusion Based on Face and Fingerprint Recognition using ANNrahulmonikasharma
This document presents a biometric fusion system based on face and fingerprint recognition using artificial neural networks. The system first applies pre-processing to input images. It then extracts features from faces using extended local binary patterns and from fingerprints using minutia extraction. A genetic algorithm is used to optimize the extracted features. An artificial neural network is trained on the optimized features to classify images. The system fuses the face and fingerprint recognition results. Performance is evaluated based on false acceptance rate, false rejection rate and accuracy, with the proposed system achieving over 94% accuracy.
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...cscpconf
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns, sentiments and attitudes toward entities, products, services and their attributes. With the rapid development in the field of Internet, potential customer’s provides a satisfactory level of product/service reviews. The high volume of customer reviews were developed for product/review through taxonomy-aware processing but, it was difficult to identify the best reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is developed to predict the pattern for service provider and to improve customer satisfaction based on the review comments. Associative Regression based Decision Rule Mining performs twosteps for improving the customer satisfactory level. Initially, the Machine Learning Bayes Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After that, Regressive factor of the opinion words and Class labels were checked for Association between the words by using various probabilistic rules. Based on the probabilistic rules, the opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of service preferred by the customers with their review comments. The Associative Regressive Decision Rule helps the service provider to take decision on improving the customer satisfactory level. The experimental results reveal that the Associative Regression Decision Rule Mining (ARDRM) technique improved the performance in terms of true positive rate, Associative Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of similar pattern.
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
1) The document discusses using machine learning techniques to predict crime patterns and types based on historical crime data.
2) It proposes using a random forest classification algorithm to analyze crime data and predict the type of crime that may occur in a particular area.
3) The random forest algorithm would be trained on a dataset containing information about past crimes like date, location, and crime type to make predictions about future crimes.
IRJET- Spot Me - A Smart Attendance System based on Face RecognitionIRJET Journal
The article discusses international issues. It mentions that globalization has increased economic interdependence between nations while also raising tensions over immigration and trade. Solutions will require cooperation and compromise and a recognition that isolationism is not a viable strategy in an interconnected world.
Face Recognition Smart Attendance System: (InClass System)IRJET Journal
- The document describes a face recognition system called "InClass" to automate student attendance tracking. It aims to address issues with traditional manual attendance systems like being inaccurate, time-consuming, and difficult to maintain.
- The InClass system uses a CNN face detector to detect and identify students' faces from images captured with a camera. It can handle variations in lighting, angles, and occlusions. Matching faces to a database allows for automated attendance marking.
- The system aims to simplify the attendance process, reduce time and errors compared to existing biometric systems, and make attendance records easily accessible and storable digitally rather than on paper.
Significant Role of Statistics in Computational SciencesEditor IJCATR
This paper is focused on the issues related to optimizing statistical approaches in the emerging fields of Computer Science
and Information Technology. More emphasis has been given on the role of statistical techniques in modern data mining. Statistics is
the science of learning from data and of measuring, controlling, and communicating uncertainty. Statistical approaches can play a vital
role for providing significance contribution in the field of software engineering, neural network, data mining, bioinformatics and other
allied fields. Statistical techniques not only helps make scientific models but it quantifies the reliability, reproducibility and general
uncertainty associated with these models. In the current scenario, large amount of data is automatically recorded with computers and
managed with the data base management systems (DBMS) for storage and fast retrieval purpose. The practice of examining large preexisting
databases in order to generate new information is known as data mining. Presently, data mining has attracted substantial
attention in the research and commercial arena which involves applications of a variety of statistical techniques. Twenty years ago
mostly data was collected manually and the data set was in simple form but in present time, there have been considerable changes in
the nature of data. Statistical techniques and computer applications can be utilized to obtain maximum information with the fewest
possible measurements to reduce the cost of data collection.
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
This document presents a study that uses machine learning techniques to predict crime rates. Specifically, it aims to analyze crime data using supervised machine learning classification algorithms like decision trees, support vector machines, logistic regression, k-nearest neighbors, and random forests. The document outlines collecting and preprocessing crime data, selecting relevant features, training models on a portion of the data and testing them on the remaining data. It finds that random forest achieved the best prediction accuracy compared to other algorithms tested. The goal is to help law enforcement agencies better predict and reduce crime rates by analyzing historical crime data patterns.
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...IRJET Journal
This document presents a novel approach for software defect prediction using dimensionality reduction techniques. The proposed approach uses an artificial neural network to extract features from initial change measures, and then trains a classifier on the extracted features. This is compared to other dimensionality reduction techniques like principal component analysis, linear discriminant analysis, and kernel principal component analysis. Five open source datasets from NASA are used to evaluate the different techniques based on accuracy, F1 score, and area under the receiver operating characteristic curve. The results show that the artificial neural network approach outperforms the other dimensionality reduction techniques, and kernel principal component analysis performs best among those techniques. The document also discusses related work on using machine learning for software defect prediction.
IRJET-Survey on Data Mining Techniques for Disease PredictionIRJET Journal
This document discusses using data mining techniques to predict disease, specifically focusing on heart disease. It provides an overview of different classification algorithms that can be used for disease prediction, including decision trees, Bayesian classifiers, multilayer perceptrons, and ensemble techniques. These algorithms are analyzed based on their accuracy, time efficiency, and area under the ROC curve. The document also reviews related literature applying various data mining methods like decision trees, KNN, and support vector machines to heart disease prediction. Overall, the document examines using classification algorithms and data mining to extract patterns from medical data that can help predict heart disease and other illnesses.
1. Data mining is the process of obtaining meaningful information from large amounts of data through techniques like cluster analysis, regression analysis, social network analysis, and time series analysis. These techniques are used to discover patterns and relationships in data to help organizations make predictions and decisions.
2. Common data mining techniques include cluster analysis, which groups similar objects together; regression analysis, which estimates relationships between variables; social network analysis, which analyzes relationships between individuals; and time series analysis, which analyzes time-based data to make predictions.
3. Data mining is increasingly important for obtaining relevant information from large datasets and can be used across many fields for applications such as market segmentation, fraud detection, weather prediction, and intrusion
This document summarizes several research papers on human face recognition using feature extraction and measurements. It discusses using face recognition for applications like surveillance, access control, and banking validation. Key steps in face recognition systems include extracting features from captured images, comparing them to known images in a training database, and identifying errors like false acceptance and false rejection rates. Methods discussed for feature extraction and dimensionality reduction include Linear Discriminant Analysis and Principal Component Analysis. The document also examines factors that affect face recognition performance like illumination changes, aging, and expressions. Quantifying uncertainty in face recognition algorithms is identified as important for evaluating system performance.
Concept drift and machine learning model for detecting fraudulent transaction...IJECEIAES
The document presents a machine learning model for detecting fraudulent transactions in a streaming environment that addresses concept drift. The proposed approach uses the extreme gradient boosting (XGBoost) algorithm and employs four algorithms to continuously detect concept drift in data streams. The approach is evaluated on credit card and Twitter fraud datasets and is shown to outperform traditional machine learning models in terms accuracy, precision, and recall, and is more robust to concept drift. The proposed approach can be utilized as a real-time fraud detection system across different industries.
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
This document describes a proposed system to use machine learning methods to predict crime rates and types of crimes in specific areas based on historical crime data. The system would analyze crime data collected from websites including date, location, and crime type to identify patterns. Machine learning algorithms would be trained on the data to build predictive models. The goal is to help law enforcement agencies more quickly detect, resolve, and prevent crimes by predicting where and what types of crimes may occur based on the characteristics of past crimes.
As we know the fingerprint is unique of every living objects. It is quite difficult to find out the prints.
Usually the Forensics use Fine powder and duct tapes to identify the prints of living object. As powder is
exceptionally muddled, so such molecule can cause loss of information after that examination the information is
coordinated with the system. The proposed system consists of an embedded device in which it consists of ultra
light to glow the fingerprints details. After that we can detect the fingerprint, analysis and it will checks on the
database, and it will return the output after matching. For matching and analysis of the Fingerprint, we will be
using the Algorithm for matching.
Comparative Study of Enchancement of Automated Student Attendance System Usin...IRJET Journal
This document discusses developing an automated student attendance system using facial recognition and deep learning algorithms. It begins with an overview of how facial recognition can be used to take attendance accurately and efficiently. It then describes the methodology, which involves using a convolutional neural network (CNN) to detect and recognize faces. Dimensionality reduction techniques like principal component analysis (PCA) and linear discriminant analysis (LDA) are also used to improve recognition accuracy. The goal is to build a system that can identify students in real-time with a high degree of accuracy, even in varying lighting conditions. It aims to automate the entire attendance tracking process for both students and teachers.
Face Recognition Smart Attendance System- A SurveyIRJET Journal
This document surveys 15 research papers on face recognition smart attendance systems. It summarizes each paper's methodology, including the databases and images used, feature extraction and matching algorithms like PCA, LDA, CNN, techniques for addressing issues like lighting and pose variations, and the accuracy and limitations of each system. Overall, the papers presented a variety of approaches to developing face recognition systems for automated student attendance, comparing methods like PCA, LDA, HOG, and deep learning algorithms and evaluating factors like recognition rate, robustness, and speed.
Mental Illness Prediction using Machine Learning AlgorithmsIRJET Journal
This document presents research on predicting mental illness like depression, anxiety, and stress using machine learning algorithms. The researchers used the Depression Anxiety Stress Scale questionnaire (DASS-21) to collect data on depression, anxiety, and stress levels. They then trained and tested various machine learning classification algorithms like support vector machine (SVM), random forest, naive bayes, etc. on the data to predict mental illness. SVM achieved the highest accuracy among the algorithms. The researchers then used AdaBoost, an ensemble learning method, to boost the accuracy of SVM, achieving even higher prediction performance. The goal of the research was to develop an effective machine learning model for predicting mental illness levels based on the DASS-21 questionnaire.
Similar to A REVIEW ON PREDICTIVE ANALYTICS IN DATA MINING (20)
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
A REVIEW ON PREDICTIVE ANALYTICS IN DATA MINING
1. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
DOI:10.5121/ijccms.2016.5301 1
A REVIEW ON PREDICTIVE ANALYTICS IN DATA
MINING
Kavya.V1
, Arumugam.S2
1
M.E.Scholar, Department of Computer Science & Engineering, Nandha Engineering
College, Erode-638052, Tamil Nadu, India
2
Professor, Department of Computer Science & Engineering, Nandha Engineering
College, Erode-638052, Tamil Nadu, India
ABSTRACT
The data mining its main process is to collect, extract and store the valuable information and now-a-days it’s
done by many enterprises actively. In advanced analytics, Predictive analytics is the one of the branch which is
mainly used to make predictions about future events which are unknown. Predictive analytics which uses
various techniques from machine learning, statistics, data mining, modeling, and artificial intelligence for
analyzing the current data and to make predictions about future. The two main objectives of predictive
analytics are Regression and Classification. It is composed of various analytical and statistical techniques used
for developing models which predicts the future occurrence, probabilities or events. Predictive analytics deals
with both continuous changes and discontinuous changes. It provides a predictive score for each individual
(healthcare patient, product SKU, customer, component, machine, or other organizational unit, etc.) to
determine, or influence the organizational processes which pertain across huge numbers of individuals, like in
fraud detection, manufacturing, credit risk assessment, marketing, and government operations including law
enforcement.
KEYWORDS
Predictive analytics, Credit history, forecasting, Regression techniques
1. INTRODUCTION
Large amount of data available in information databases becomes waste until the useful information
is extracted. Predictive analytics is the roof of advanced analytics which is to predict the future
events. Predictive analytics is capsuled with the data collection and modelling, statistics and
deployment. The figure 1 gives the basis of predictive analysis. Based on the availability of high
quality data and effective sharing, the success of data mining relies.
2. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
2
The figure 1 gives the basis of predictive analysis.
The business users allow discovering the predictive intelligence by uncovering patterns and
relationships in both the structured and unstructured data through the data mining and text analytics
along with statistics. The structured data like gender, age, income .etc. Unstructured data are social
media and they are extracted data used in the model building process. The figure 2 explains the cycle
of predictive analytics.
The figure 2 explains the cycle of predictive analytics.
2. OVERVIEW
PREDICTIVE ANALYTICS
Predictive analytics encapsulated with statistical techniques from predictive modelling, data mining
and machine learning which are used to analyse the current and historical facts to found the
predictions about future [15]. In business, predictive analytics are used to identify risks and
opportunities. Predictive analytics are used in various fields such as marketing, finance, travel and
health care [10].
REGRESSION TECHNIQUES
A data mining function is regression which predicts a number. Regression techniques are used to
predict the age, weight, distance, and temperature [7]. In regression task starts with a dataset in that
Understand
the data
Deploy
Monitor
Prepare the
data
ModelEvaluate
Statistics Deploym
ent
Data
collection
and
Predicti
ve
3. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
3
the target values are known. Common applications of regression are trend analysis, biomedical and
financial forecasting [4]. There were various regression algorithms which are generalized linear
models and support vector machines [13].
FORECASTING
Forecasting focus towards the predictions of the future based past and present data [3, 14]. The
two terms are focussed on the forecasting are risk and uncertainty.
3. LITERATURE SURVEY
Carlos Márquez-Vera et al [1] A genetic programming algorithm and different data mining
approaches are proposed for solving challenge due to the high number of factors that can affect the
low performance of students and the imbalanced nature. Methodologies used to resolve are Data
Gathering, Pre-Processing, Data Mining, and Interpretation. Interpretable Classification Rule Mining
(ICRM) and SMOTE (Synthetic Minority Over-sampling Technique) algorithms are used. Student’s
data set from where data’s are collected.WEKA tool is used. Accuracy, True positive rate, True
negative rate and Geometric mean are the parameters for performance measurement. It results in the
accurate and comprehensible classification rules and it achieved the best predictions of student failure
(98.7 %).
Branko G. Celler et al [2] The telemonitoring of vital signs from the home for the management of
patients with chronic conditions. New measurement modalities and signal processing techniques are
proposed for increasing the ions about future.quality and value of vital signs monitoring. Automated
risk stratification algorithm is used. QRS height and width (ECG), PR (Pressure Rate), RR
(Respiratory Rate), QT (Abnormal heart rate) and Body temperature are the parameters. jBoss tool is
used. Data’s are taken from Department of Human Services Medicare databases and Telemonitoring
system data. It results in identifying the key technical performance characteristics of at-home
telemonitoring systems.
Hao-Tsung Yang et al [3] A neural network model to predict the value of playing style information
in predicting match quality. A mix of Sternberg’s thinking style theory and individual histories are
used to categorize League of Legend (LoL) players. The data’s are collected from LoLBase website.
The algorithms used are Feed forward/ feedback propagation algorithm and Match making algorithm.
Win rate, Match duration, Group number (ELO) are the parameters for performance measurement.
PYTHON is used. It finds that the presence of global-liberal (G-L) style players is positively
correlated with match enjoyment.
Jacob R. Scanlon et al [4] To forecast the daily level of cyber-recruitment activity of VE (violent
extremist) groups. LDA-based Topics as predictors within time series models reduce forecast error.
Latent Dirichlet allocation (LDA) algorithm is used in forecasting. Western jihadist discussion forum
is used as dataset. Number of posts and % of recruitment post (per day) are the parameters for
measurement. RTextTools and tm text mining packages in R are used. It achieves the automatic
forecast of VE cyber recruitment using natural language processing, supervised machine learning,
and time series analysis.
Quanzeng You, Liangliang Cao et al [5] To build a reliable forecasting system for the elections and
its modelled to figure out the inter relationship between social multimedia as image-centric and real-
world entities. Competitive Vector Auto Regression (CVAR) algorithm is used. By the competition
mechanism, CVAR compares the popularity among multiple competing candidates. Data’s are taken
from Flickr. Number of images, Number of users, images uploaded per day (IPD) and users
uploading images per day (UPD) are the parameters for measurement. OpenCV Tool is used. As a
4. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
4
result CVAR is able to take prior knowledge which leads to better performance in terms of prediction
accuracy.
Sean M. Arietta et al [6] A method which found and evaluate automatically to figure out predictive
relationships between the optic presence of a city and its non-visual attributes. Scalable distributed
processing framework is implemented that speeds up the main computational barrier by an order of
magnitude. Algorithms used are hard negative mining Algorithm, classification and recognition
algorithm and Dijkstra’s shortest path algorithm. Datasets are from 10,000 Google StreetView
panorama projections (2,000 positives, 8,000 negatives) Training dataset. The parameters for
performance measurements are Number of Detections, % of Detections fromPositiveSet. MATLAB
is used. As a result it’s used to define the visual boundary of city neighbourhoods, generate walking
directions that prohibit or find exposure to city attributes and validate user-specified visual elements
for prediction.
Abish Malik, Ross Maciejewski et Al [7] A visual analytics approach that provides decision makers
with a proactive and predictive environment which helps themin making effective resource allocation
and deployment decisions. Analysts are provided with a suite of natural scale templates and methods
that enable them to focus and drill down to appropriate geospatial and temporal resolution levels.
Prediction algorithm is used. Accuracy, Average, Count and Time are the parameters for performance
measurement. This Methodology is applied to Criminal, Traffic and Civil (CTC) incident datasets. It
provides users with a suite of natural scale templates that support analysis at multiple spatiotemporal
granularity levels.
Ronaldo C. Prati et al [8] Various graphical performance evaluation methods are increasingly
drawing the consideration of data mining. Ability to depict the trade-offs between evaluation aspects
in a multidimensional area. Graphical evaluation methods are applicable for binary classification
problems. The predictive models deployed on Classification, Ranking, Probability estimation. The
parameters for performance measurements are True positive and false positive rate, precision and
recall. The Insurance Company (TIC).Benchmark data set includes 86- variables are used and the tool
used is WEKA- 3.5.8 version. ROC graph, ROC curve, cost lines, cost curve Precision-recall curves,
lift curve, Reliability diagram, ROI curve, Discrimination diagram and attribute diagram are different
graphical evaluation methods. It helps on deciding the methods which is well fitted for the situations.
Gang Fang, Gaurav Pandey et al [9]Discriminative patterns can provide valuable insights into
data sets with class labels. Low-support patterns that can be discovered using SupMaxPair. Per
pattern precision, Density, dimension, count and Frequency are the parameters for performance
measurement. Frequent pattern mining algorithm and discriminative pattern mining algorithms are
used. Synthetic and cancer gene expression data sets are used for prediction. This result in exploring
discriminative patterns by speculating patterns with relatively low support from dense and high-
dimensional data sets comparably the other approaches fall to explore within desired amount of time.
Nanlin Jin, Peter Flach et al [10] Data mining methods for exploring incredible consumption
patterns and their associated descriptive models from smart electricity meter data. Target concept,
Target type, Double regression, Coverage and Strategy type are the parameters for performance
measurement. Subgroup discovery algorithms and S-Transform algorithms are used. Data used were
collected by the Energy Demand Research Project (EDRP). Cortana tool is used. This approach
outperforms more conventional data mining methods in terms of their predictive power and
classification accuracy, while consuming similar computational resource.
5. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
5
4. COMPARISONS ON DIFFERENT PREDICTIVE ANALYTIC TECHNIQUES
Title Techniques
And
Algorithms
Datasets Parameter Conclusion
Predicting Student
Failure At School
Using Genetic
Programming
And Different
Data Mining
Approaches With
High Dimensional
And Imbalanced
Data
Data Gathering, Pre-
Processing, Data
Mining, And
Interpretation.
Interpretable
Classification Rule
Mining (ICRM) And
SMOTE (Synthetic
Minority Over-
Sampling Technique)
Algorithms
Student’s
Data Set
Accuracy, True
Positive Rate,
True Negative
Rate And
Geometric Mean
Accurate And
Comprehensib
le
Classification
Rules
Home
Telemonitoring
Of Vital Signs
Technical
Challenges And
Future Directions
Automated Risk
Stratification
Algorithm
Departme
nt Of
Human
Services
Medicare
Databases
And
Telemoni
toring
System
Data
QRS Height And
Width (ECG),
PR (Pressure
Rate), RR
(Respiratory
Rate), QT
(Abnormal Heart
Rate) And Body
Temperature
Identifying The
Key Technical
Performance
Characteristics
Of At-Home
Telemonitoring
Systems.
Thinking Style
And Team
Competition
Game
Performance And
Enjoyment
Feed Forward/
Feedback Propagation
Algorithm And Match
Making Algorithm.
Lolbase
Website
Win Rate, Match
Duration, Group
Number (ELO)
Finds That
The Presence
Of Global-
Liberal (G-L)
Style Players
Is Positively
Correlated
With Match
Enjoyment
Forecasting
Violent Extremist
Cyber
Recruitment
Latent Dirichlet
Allocation (LDA)
Algorithm
Western
Jihadist
Discussio
n Forum
Number Of Posts
And % Of
Recruitment Post
(Per Day)
Achieves The
Automatic
Forecast Of
VE Cyber
Recruitment
Using Natural
Language
Processing,
Supervised
Machine
Learning, And
Time Series
Analysis
A Multifaceted
Approach To
Social
Competitive Vector
Auto Regression
(CVAR) Algorithm
Flickr Number Of
Images, Number
Of Users, Images
Better
Performance In
Terms Of
6. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
6
Multimedia-
Based Prediction
Of Elections
Uploaded Per
Day (IPD) And
Users Uploading
Images Per Day
(UPD)
Prediction
Accuracy.
City Forensics:
Using Visual
Elements To
Predict Non-
Visual City
Attributes
Hard Negative Mining
Algorithm,
Classification And
Recognition
Algorithm And
Dijkstra’s Shortest
Path Algorithm
Google
Streetvie
w
Panorama
Projectio
ns
Training
Number Of
Detections, % Of
Detections From
Positive Set
Define The
Visual
Boundary Of
City
Neighbourhoo
ds, Generate
Walking
Directions
That Prohibit
Or Find
Exposure To
City Attributes
And Validate
User-Specified
Visual
Elements For
Prediction
Proactive
Spatiotemporal
Resource
Allocation And
Predictive Visual
Analytics For
Community
Policing And Law
Enforcement
Prediction Algorithm Criminal,
Traffic And
Civil
(CTC)
Incident
Datasets.
Accuracy,
Average, Count
And Time
It Provides
Users With A
Suite Of
Natural Scale
Templates That
Support
Analysis At
Multiple
Spatiotemporal
Granularity
Levels.
A Survey On
Graphical Methods
For Classification
Predictive
Performance
Evaluation
Graphical Evaluation
Methods
Insurance
Company
(TIC).Be
nchmark
Data Set
True Positive
And False
Positive Rate,
Precision And
Recall
It Helps On
Deciding The
Methods Which
Is Well Fitted
For The
Situations.
Mining Low-
Support
Discriminative
Patterns From
Dense And
High-
Dimensional
Data
Frequent Pattern
Mining Algorithm
And Discriminative
Pattern Mining
Algorithms
Synthetic
And
Cancer
Gene
Expressio
n Data
Sets
Per Pattern
Precision,
Density,
Dimension,
Count And
Frequency
Exploring
Discriminative
Patterns By
Speculating
Patterns With
Relatively Low
Support From
Dense And
High-
Dimensional
Data Sets
Comparably
The Other
7. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
7
Approaches
Fall To Explore
Within Desired
Amount Of
Time.
Subgroup
Discovery In
Smart Electricity
Meter Data
Subgroup Discovery
Algorithms And S-
Transform Algorithms
Energy
Demand
Research
Project
(EDRP)
Target Concept,
Target Type,
Double
Regression,
Coverage And
Strategy Type
Outperforms
More
Conventional
Data Mining
Methods In
Terms Of Their
Predictive
Power And
Classification
Accuracy,
While
Consuming
Similar
Computational
Resource.
5. CONCLUSION
Predictive analytics is the future of data mining .This study focus towards the predictive analytics,
regression techniques and forecasting in knowledge discovery domain. Business intelligence is used
in predictive analytics for modelling and forecasting. Predictive analytics are more efficient in
choosing marketing methods and helpful in social media analytics.
REFERENCES
[1] Carlos Márquez-Vera, Alberto Cano, Cristóbal Romero, Sebastián Ventura,“Predicting student failure at
school using genetic programming and different data mining approaches with high dimensional and
imbalanced data”, Springer Science+Business Media, LLC 2012.
[2] Branko G. Celler, and Ross S. Sparks, “Home Telemonitoring of Vital Signs Technical Challenges and
Future Directions”, IEEE Journal Of Biomedical And Health Informatics, Vol. 19, No. 1, January 2015.
[3] Hao Wang, Hao-Tsung Yang, and Chuen-Tsai Sun,” Thinking Style and Team Competition Game
Performance and Enjoyment”, IEEE Transactions On Computational Intelligence And Ai In Games,
Vol. 7, No. 3, September 2015.
[4] Jacob R. Scanlon and Matthew S. Gerber, “Forecasting Violent Extremist Cyber Recruitment”, IEEE
Transactions On Information Forensics And Security, Vol. 10, No. 11, November 2015.
[5] Quanzeng You, Liangliang Cao, Yang Cong, Xianchao Zhang, and Jiebo Luo” A Multifaceted Approach
to Social Multimedia-Based Prediction of Elections”, IEEE Transactions On Multimedia, Vol. 17, No. 12,
December 2015.
[6] Sean M. Arietta Alexei A. Efros Ravi Ramamoorthi Maneesh Agrawala, “City Forensics: Using Visual
Elements to Predict Non-Visual City Attributes”, IEEE Transactions On Visualization And Computer
Graphics, Vol. 20, No. 12, December 2014.
[7] Abish Malik, Ross Maciejewski, Sherry Towers, Sean McCullough, and David S. Ebert,” Proactive
Spatiotemporal Resource Allocation and Predictive Visual Analytics for Community Policing and Law
Enforcement”, IEEE Transactions On Visualization And Computer Graphics, Vol. 20, No. 12, December
2014.
[8] Ronaldo C. Prati, Gustavo E.A.P.A. Batista, and Maria Carolina Monard,” A Survey on Graphical
Methods for Classification Predictive Performance Evaluation”, IEEE Transactions On Knowledge And
8. International Journal of Chaos, Control, Modelling and Simulation (IJCCMS) Vol.5, No.1/2/3, September 2016
8
Data Engineering, Vol. 23, No. 11, November 2011.
[9] Gang Fang, Gaurav Pandey, Wen Wang, Manish Gupta, Michael Steinbach, and Vipin Kumar,” Mining
Low-Support Discriminative Patterns from Dense and High-Dimensional Data”, IEEE Transactions
On Knowledge And Data Engineering, Vol. 24, No. 2, February 2012 .
[10] Nanlin Jin, Peter Flach, Tom Wilcox, Royston Sellman, Joshua Thumim, and Arno Knobbe,” Subgroup
Discovery in Smart Electricity Meter Data”, IEEE Transactions On Industrial Informatics, Vol. 10, No. 2,
May 2014.
[11] Hao Wang, Hao-Tsung Yang, and Chuen-Tsai Sun,” Thinking Style and Team Competition Game
Performance and Enjoyment”, IEEE Transactions On Computational Intelligence And Ai In Games, Vol.
7, No. 3, September 2015.
[12] Quanzeng You, Liangliang Cao, Yang Cong, Senior Member, IEEE, Xianchao Zhang, and Jiebo Luo,
Fellow, IEEE.” A Multifaceted Approach to Social Multimedia-Based Prediction of Elections”, IEEE
Transactions On Multimedia, Vol. 17, No. 12, December 2015.
[13] Yun Wang and Sudha Ram,” Predicting Location- Based Sequential PurchasingEvents byUsing Spatial,
Temporal, and Social Patterns”, IEEE Intelligent Systems, May/June 2015.
[14] Jesse Rio Russell,” Predictive analytics and child protection: Constraints and Opportunities”, Child Abuse
& Neglect 46 (2015) 182–189- ELSEVIER.
[15] Karel Dejaeger, Wouter Verbeke, David Martens, and Bart Baesens,” Data Mining Techniques for
Software Effort Estimation: A Comparative Study”, IEEE Transactions On Software Engineering, Vol.
38, No. 2, March/April 2012.
[16] Leonardo Feltrin,” KNIME an Open Source Solution for Predictive Analytics in the Geosciences”, IEEE
Geoscience and remote sensing magazine, December 2015.
[17] Josep Ll. Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Daron Green,” ALOJA: A
Framework for Benchmarking and Predictive Analytics in Big Data Deployments”, IEEE Transactions on
Emerging Topics in Computing • November 2015.
[18] Minghui Zhou and Audris Mockus,” Who Will Stay in the FLOSS Community? Modeling Participant’s
Initial Behavior”, IEEE Transactions On Software Engineering, Vol. 41, No. 1, January 2015 .
[19] Sean M. Arietta Alexei A. Efros Ravi Ramamoorthi Maneesh Agrawala, “City Forensics: Using Visual
Elements to Predict Non-Visual City Attributes”, IEEE Transactions On Visualization And Computer
Graphics, Vol. 20, No. 12, December 2014.
[20] Francisco C. Pereira, Member, IEEE, Filipe Rodrigues, Evgheni Polisciuc, and Moshe Ben-Akiva”, Why
so many people? Explaining Nonhabitual Transport Overcrowding With Internet Data”, IEEE
Transactions On Intelligent Transportation Systems, Vol. 16, No. 3, June 2015.